k8s集群calico网络故障排查思路
报错
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
\\calico未准备好,BGP协议不能与172.16.0.20,172.16.0.30内网IP地址连接
BGP协议:边界网关协议
访问k8s的dashboard界面无法访问网站,查看pod
未知原因导致calico的Pod资源重新创建后无法启动,显示的是0/1状态
[root@k8s-master yaml]# kubectl get pod -n kube-system
NAMESPACE NAME READY STATUS RESTARTS AGE
...
kube-system calico-kube-controllers-578894d4cd-rsgqd 1/1 Running 0 115d
kube-system calico-node-64s8s 1/1 Running 3 127d
kube-system calico-node-j4t7q 1/1 Running 0 127d
kube-system calico-node-n6vr4 0/1 Running 0 40s
Calico的Pod报错内容
[root@k8s-master yaml]# kubectl describe pod -n kube-system calico-node-n6vr4
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/calico-node-n6vr4 to k8s-master
Normal Pulled 41s kubelet, k8s-master Container image "calico/cni:v3.15.1" already present on machine
Normal Created 41s kubelet, k8s-master Created container upgrade-ipam
Normal Started 40s kubelet, k8s-master Started container upgrade-ipam
Normal Pulled 40s kubelet, k8s-master Container image "calico/cni:v3.15.1" already present on machine
Normal Started 39s kubelet, k8s-master Started container install-cni
Normal Created 39s kubelet, k8s-master Created container install-cni
Normal Pulled 39s kubelet, k8s-master Container image "calico/pod2daemon-flexvol:v3.15.1" already present on machine
Normal Pulled 38s kubelet, k8s-master Container image "calico/node:v3.15.1" already present on machine
Normal Started 38s kubelet, k8s-master Started container flexvol-driver
Normal Created 38s kubelet, k8s-master Created container flexvol-driver
Normal Created 37s kubelet, k8s-master Created container calico-node
Normal Started 37s kubelet, k8s-master Started container calico-node
Warning Unhealthy 27s kubelet, k8s-master Readiness probe failed: 2020-08-14 02:16:54.068 [INFO][142] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
Warning Unhealthy 17s kubelet, k8s-master Readiness probe failed: 2020-08-14 02:17:04.059 [INFO][181] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
Warning Unhealthy 7s kubelet, k8s-master Readiness probe failed: 2020-08-14 02:17:14.065 [INFO][207] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
原因:calico没有发现实node节点实际的网卡名称
解决方法
调整calicao的网络插件的网卡发现机制,修改IP_AUTODETECTION_METHOD
对应的value值。下载的官方提供的yaml文件中,ip识别策略(IPDETECTMETHOD)
没有配置,即默认为first-found
,这会导致一个网络异常的ip作为nodeIP被注册,从而影响node之间的网络连接。可以修改成can-reach
或者interface
的策略,尝试连接某一个Ready的node的IP,以此选择出正确的IP
# 修改calicao的yaml文件,添加两行配置
# - name: IP_AUTODETECTION_METHOD
# value: "interface=eth1" # 根据实际网卡名称配置
[root@k8s-master yaml]# vim calico.yaml
...(3546行)
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
#新添加的配置
- name: IP_AUTODETECTION_METHOD
value: "interface=eth1"
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
value: "Always"
# Enable or Disable VXLAN on the default IP pool.
- name: CALICO_IPV4POOL_VXLAN
value: "Never"
#重新构建
kubectl apply -f calico.yaml
修复完成
[root@k8s-master yaml]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-578894d4cd-rsgqd 1/1 Running 0 115d
calico-node-6ktn4 1/1 Running 0 26m
calico-node-8k5z8 1/1 Running 0 26m
calico-node-g87hc 1/1 Running 0 1m
再次访问集群的各种资源已经可以访问了