k8s主节点ip变更了,从192.168.101.1变为了192.168.201.1。记录这次不成功的修改,最后还是重新初始化了,因为pod的ip还是没有变
1、现象执行命令报错kubectl get nodes
Unable to connect to the server: dial tcp 192.168.101.1:6443: i/o timeout
2、修改/etc/kubernetes下的配置文件,除pki目录以外,所有文件都要改IP地址/etc/kubernetes/admin.conf
/etc/kubernetes/controller-manager.conf
/etc/kubernetes/kubelet.conf
/etc/kubernetes/scheduler.conf
/etc/kubernetes/manifests/etcd.yaml
/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml
命令cd /etc/kubernetes/
sed -i 's/192.168.101.1/192.168.201.1/g' ./.*conf
cd /etc/kubernetes/manifests/
sed -i 's/192.168.101.1/192.168.201.1/g' ./*.yaml
3、修改$HOME/.kube下的配置文件sed -i 's/192.168.101.1/192.168.201.1/g' $HOME/.kube/config
mv $HOME/.kube/cache/discovery/192.168.101.1_6443 $HOME/.kube/cache/discovery/192.168.201.1_6443
4、查询文件是否修改find . -type f | xargs grep 192.168.101.1 | awk '{print $1}' | sort | uniq
没有了
5、重启机器reboot
kubectl get nodes
报错Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 192.168.101.1, not 192.168.201.1
6、生成新证书mv /etc/kubernetes/pki /etc/kubernetes/pki.old
kubeadm init phase certs all
7、查看证书有效期&重启kubeadm alpha certs check-expiration
reboot
8、再次执行kubectl get nodes
还是报错Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
还要根据新证书生成新的客户端文件mv /etc/kubernetes/*.conf /tmp
kubeadm init phase kubeconfig all
替换老的config文件cp -f /etc/kubernetes/admin.conf ~/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
PS:其实/etc/kubernetes下的conf文件不用改ip
9、再次查看证书有效期&重启kubeadm alpha certs check-expiration
reboot
10、再次执行kubectl get nodesNAME STATUS ROLES AGE VERSION
k8s-master NotReady master 113d v1.19.3
k8s-node1 NotReady 110d v1.19.3
k8s-node2 NotReady 110d v1.19.3
发现是NotReady状态
11、检查原因kubectl get pods -n kube-system -o wide
命令显示pod的ip还是旧的
是不是因为pod创建后ip就不会变了,那么查看pod详情kubectl describe pod kube-apiserver-k8s-master -n kube-system
确实pod的创建时间是Sat, 17 Oct 2020 10:59:50
12、强制删除所有的pod
1)删除kubectl delete pods --all -n kube-system --force --grace-period=0
2)查看kubectl get pods -n kube-system -o wide
现在是Pending状态
3)查看详细信息kubectl describe pod coredns-66bff467f8-6gzxt -n kube-system
事件为:0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn’t tolerate.
原来是基础的pod没有。etcd、kube-apiserver、kube-controller-manager、kube-scheduler这几个
13、只能reset节点了kubeadm reset
systemctl daemon-reload
重新初始化主节点和工作节,工作节点重新加入。。。。。。
14、查看kubectl -n kube-system get cm kubeadm-config -oyaml
15、小结
估计IP是和dns解析绑定的,不知道怎么改