今天突然发现kubernetes集群命令行不可用了,但是pod一切都正常,报错如下:
# kubectl get pods
E1121 14:58:00.891411 53566 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E1121 14:58:00.892169 53566 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E1121 14:58:00.894181 53566 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E1121 14:58:00.895377 53566 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E1121 14:58:00.896437 53566 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

经过检查,应该是kubeconfig文件的问题
先检查current-context,确认权限正确
# kubectl config current-context
admin@kubernetes
备份配置文件
# cp -ra ~/.kube/ ~/.kube_backup
生成新的配置文件
# kubeadm kubeconfig user --org=system:masters --client-name admin > ~/.kube/config
I1121 15:00:08.203775 55245 version.go:256] remote version is much newer: v1.31.3; falling back to: stable-1.30
结果在生成过程中出现一个信息,这信息的意思是:
kubeadm 版本与远程集群的版本不一致,远程集群的版本较新(v1.31.3),而 kubeadm 回退到了稳定版本(stable-1.30)
导致出现这个报错是因为我最近升级了几次1.31,但是升级完之后cilium就无法创建配置文件导致集群起不来,所以回退了好几次,可能是这个原因导致今天这个故障的发生。
再执行 kubectl get pods 就恢复正常了
[…] 近期从1.30升级到1.31,每次升级cilium都会报错,错误见上一篇文章:https://www.luyouli.com/?p=797 […]