处理Kubernetes集群初始化报错 timed out waiting for the condition

集群环境:

ubuntu:20.04

kubernetes:1.25.4

containerd:1.6.9

最近在部署kubernetes集群的时候,在初始化的时候总是无缘无故的报错,而且在手动获取token的时候也会报错

# kubeadm token create –print-join-command

timed out waiting for the condition

To see the stack trace of this error execute with –v=5 or higher

根据报错开始排查:

# systemctl status kubelet

Nov 11 11:37:38 rook-master1 kubelet[50097]: E1111 11:37:38.743508 50097 kubelet.go:2448] “Error getting node” err=”node \”rook-master1\” not found”

Nov 11 11:37:38 rook-master1 kubelet[50097]: E1111 11:37:38.844150 50097 kubelet.go:2448] “Error getting node” err=”node \”rook-master1\” not found”

# journalctl -xeu kubelet

Nov 11 12:13:40 rook-master1 kubelet[50908]: E1111 12:13:40.500017 50908 kuberuntime_sandbox.go:71] “Failed to create sandbox for pod” err=”rpc error: code = Unknown desc = failed to get sandbox image \”registry.k8s.io/pause:3.6\”: f>

Nov 11 12:13:40 rook-master1 kubelet[50908]: E1111 12:13:40.500084 50908 kuberuntime_manager.go:772] “CreatePodSandbox for pod failed” err=”rpc error: code = Unknown desc = failed to get sandbox image \”registry.k8s.io/pause:3.6\”: f>

Nov 11 12:13:40 rook-master1 kubelet[50908]: E1111 12:13:40.500226 50908 pod_workers.go:965] “Error syncing pod, skipping” err=”failed to \”CreatePodSandbox\” for \”etcd-rook-master1_kube-system(24d163f695e7dac3fdc912ceb74bfc34)\” wi>

然后发现了一条很重要的错误日志:

Unknown desc = failed to get sandbox image “registry.k8s.io/pause:3.6”

再根据这个镜像地址,详查一下syslog日志

# grep ‘registry.k8s.io/pause:3.6’ /var/log/syslog

Nov 11 00:00:18 rook-master1 containerd[14961]: time=”2022-11-11T00:00:18.677976277+08:00″ level=error msg=”RunPodSandbox for &PodSandboxMetadata{Name:etcd-rook-master1,Uid:24d163f695e7dac3fdc912ceb74bfc34,Namespace:kube-system,Attempt:0,} failed, error” error=”failed to get sandbox image \”registry.k8s.io/pause:3.6\”: failed to pull image \”registry.k8s.io/pause:3.6\”: failed to pull and unpack image \”registry.k8s.io/pause:3.6\”: failed to resolve reference \”registry.k8s.io/pause:3.6\”: failed to do request: Head \”https://asia-northeast1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6\”: dial tcp 74.125.204.82:443: i/o timeout”

Nov 11 00:00:18 rook-master1 kubelet[16170]: E1111 00:00:18.678447 16170 remote_runtime.go:222] “RunPodSandbox from runtime service failed” err=”rpc error: code = Unknown desc = failed to get sandbox image \”registry.k8s.io/pause:3.6\”: failed to pull image \”registry.k8s.io/pause:3.6\”: failed to pull and unpack image \”registry.k8s.io/pause:3.6\”: failed to resolve reference \”registry.k8s.io/pause:3.6\”: failed to do request: Head \”https://asia-northeast1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6\”: dial tcp 74.125.204.82:443: i/o timeout”

问题的原因很明显了,找不到 registry.k8s.io/pause:3.6 镜像

然后开始详查明明都是阿里云的镜像,为什么突然出现 registry.k8s.io 镜像

经过一番长达两天的探究,最终得知最终原因:

在 containerd 1.6.9 的更新日志中:

Migrate from k8s.gcr.io to registry.k8s.io

至此,问题得以查清和解决

留下评论

error: Content is protected !!