新增master

原有master节点，获取join node命令

kubeadm token create --print-join-command
# 得到
# kubeadm join 172.16.92.196:6443 --token nknwoo.b2psf52tkqnntty7 --discovery-token-ca-cert-hash sha256:bb6546ceaee7948ab57789aaaedb7ba1211cb4ca9c05e1855407a7c033d683e3

原有master节点，生成key

kubeadm init phase upload-certs --upload-certs
# 得到
# I0520 23:15:19.456755 1323595 version.go:255] remote version is much newer: v1.27.2; falling back to: stable-1.24
# [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
# [upload-certs] Using certificate key:
# bb2d5bbc1467ae5931282e4afc7bce98a179796d7c692dedb961003cd0785f4e

在新的node 服务器使用 --control-plane --certificate-key 拼接key证书获取master join命令

# 先初始化节点
kubeadm reset
# 再加入集群
kubeadm join 172.16.92.196:6443 --token nknwoo.b2psf52tkqnntty7 --discovery-token-ca-cert-hash sha256:bb6546ceaee7948ab57789aaaedb7ba1211cb4ca9c05e1855407a7c033d683e3 --control-plane --certificate-key  bb2d5bbc1467ae5931282e4afc7bce98a179796d7c692dedb961003cd0785f4e

报错处理

曾经加入过集群的node节点，再次加入集群时会报错

[root@k8s-0 ~]# kubeadm join 172.16.92.196:6443 --token nknwoo.b2psf52tkqnntty7 --discovery-token-ca-cert-hash sha256:bb6546ceaee7948ab57789aaaedb7ba1211cb4ca9c05e1855407a7c033d683e3 --control-plane --certificate-key  bb2d5bbc1467ae5931282e4afc7bce98a179796d7c692dedb961003cd0785f4e
[preflight] Running pre-flight checks
        [WARNING SystemVerification]: missing optional cgroups: blkio
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
        [ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

错误原因：如果原有node节点已经加入集群，再次执行join命令会报错
解决方法：需要先删除node节点，再kubeadm reset后join

master节点删除work节点

kubectl get node
kubectl delete nodes k8s-0

work节点执行kubeadm reset

kubeadm reset

再次加入

kubeadm join 172.16.92.196:6443 --token nknwoo.b2psf52tkqnntty7 --discovery-token-ca-cert-hash sha256:bb6546ceaee7948ab57789aaaedb7ba1211cb4ca9c05e1855407a7c033d683e3 --control-plane --certificate-key  bb2d5bbc1467ae5931282e4afc7bce98a179796d7c692dedb961003cd0785f4e

非高可用节点加入多个master报错

[root@k8s-0 ~]# kubeadm join 172.16.92.196:6443 --token nknwoo.b2psf52tkqnntty7 --discovery-token-ca-cert-hash sha256:bb6546ceaee7948ab57789aaaedb7ba1211cb4ca9c05e1855407a7c033d683e3 --control-plane --certificate-key  bb2d5bbc1467ae5931282e4afc7bce98a179796d7c692dedb961003cd0785f4e
[preflight] Running pre-flight checks
        [WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.

unable to add a new control plane instance to a cluster that doesn't have a stable controlPlaneEndpoint address

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher

错误原因：原有k8s非高可用集群，需要添加：controlPlaneEndpoint
解决方法：

查看kubeadm-config.yaml

kubectl -n kube-system get cm kubeadm-config -oyaml

添加controlPlaneEndpoint

# 编辑
kubectl -n kube-system edit cm kubeadm-config

# 添加
...
    kind: ClusterConfiguration
    kubernetesVersion: v1.24.0
    controlPlaneEndpoint: 172.16.92.196:6443 # 这里的ip是master0的ip
    networking:
...

新的master节点使用kubectl命令报错

[root@k8s-0 ~]# kubectl get node
The connection to the server localhost:8080 was refused - did you specify the right host or port?

错误原因：新master node服务器未配置kubectl
解决方法：配置常规用户使用kubectl访问k8s集群

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

新master节点加入集群后，所有的pod都是pending状态

查看pod状态发现新增master node有污染

# 发现所有的pod都是pending状态
kubectl get pods --all-namespaces -o wide
NAMESPACE              NAME                                         READY   STATUS     RESTARTS      AGE    IP               NODE     NOMINATED NODE   READINESS GATES
ai-nav                 ai-nav-nginx-75b6df9c64-qlb5c                0/1     Pending    0             24m    <none>           <none>   <none>           <none>
apisix                 apisix-dashboard-7b6cdd75d6-mp7zk            0/1     Pending    0             24m    <none>           <none>   <none>           <none>
apisix                 apisix-etcd-0                                0/1     Pending    0             24m    <none>           <none>   <none>           <none>

# 查看pod状态发现没有node可以调度
kubectl -n ai-nav describe pods ai-nav-nginx-75b6df9c64-mtbwt
...
 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
...

处理方法：删除污染

# 检查master节点污染，发现果然有污染
kubectl describe node k8s-0 | grep Taint

Taints:             node-role.kubernetes.io/control-plane:NoSchedule
# 删除污染
kubectl taint nodes k8s-0  node-role.kubernetes.io/control-plane:NoSchedule-