k8s集群dns解析域名问题
自身问题排查
官方教程:https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/dns-debugging-resolution/
DNS解析外网域名有问题排查
检查coredns日志,可以发现10.244.0.5:48840->100.100.2.136报错timeout,这里的10.244.0.5是coredns的pod ip,100.100.2.136是node服务器上/etc/resolv.conf所配置的dns服务器,此时是访问外部dns服务器timeout
[root@k8s-0 nfs_dir]# kubectl -n kube-system logs coredns-74586cf9b6-4xhpx -f=true --tail=30
[ERROR] plugin/errors: 2 acme-staging-v02.api.letsencrypt.org. AAAA: read udp 10.244.0.5:48840->100.100.2.136:53: i/o timeout
[ERROR] plugin/errors: 2 acme-staging-v02.api.letsencrypt.org. AAAA: read udp 10.244.0.5:49652->100.100.2.138:53: i/o timeout
尝试分别服务器和pod内ping外网dns域名,发现服务器可以ping通,pod内svc ping不通(这里注意了正常可以pod之间无法ping,可以ping svc)
[root@k8s-0 ~]# ping 100.100.2.136
PING 100.100.2.136 (100.100.2.136) 56(84) bytes of data.
64 bytes from 100.100.2.136: icmp_seq=1 ttl=64 time=0.323 ms
64 bytes from 100.100.2.136: icmp_seq=2 ttl=64 time=0.235 ms
64 bytes from 100.100.2.136: icmp_seq=3 ttl=64 time=0.245 ms
64 bytes from 100.100.2.136: icmp_seq=4 ttl=64 time=0.200 ms
^C
--- 100.100.2.136 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3106ms
rtt min/avg/max/mdev = 0.200/0.250/0.323/0.044 ms
[root@k8s-0 ~]# kubectl exec -i -t nginx-deployment-64c9fc4b88-rfw6p -- ping 100.100.2.136
PING 100.100.2.136 (100.100.2.136): 56 data bytes
^C
--- 100.100.2.136 ping statistics ---
那么就需要检查集群网络,最后发现k8s重装时忘记装网络插件了(因为换服务器了),重装flannel网络插件
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml