kubenetes基于rook-ceph创建pv失败的一次故障排除

1、本次问题出现,新创建statefulset的pod无法正常创建pv

Events:Type     Reason            Age        From               Message----     ------            ----       ----               -------Warning  FailedScheduling  <unknown>  default-scheduler  running "VolumeBinding" filter plugin for pod "mysql-0": pod has unbound immediate PersistentVolumeClaimsWarning  FailedScheduling  <unknown>  default-scheduler  running "VolumeBinding" filter plugin for pod "mysql-0": pod has unbound immediate PersistentVolumeClaims

出现pvc没法正常挂载pv,PV未正常挂载的情况;

排查问题过程:

1、首先检查rook-ceph状态

[root@master1 images]# kubectl exec -it -n rook-ceph rook-ceph-tools-6b4889fdfd-86dp5 /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
[root@rook-ceph-tools-6b4889fdfd-86dp5 /]# ceph -scluster:id:     bb5107d5-d3f7-45df-9146-1148efa378b5health: HEALTH_OKservices:mon: 3 daemons, quorum b,c,d (age 67m)mgr: a(active, since 7m)mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replayosd: 10 osds: 10 up (since 7h), 10 in (since 8d)task status:scrub status:mds.myfs-a: idlemds.myfs-b: idledata:pools:   4 pools, 97 pgsobjects: 1.20k objects, 3.2 GiBusage:   19 GiB used, 1.9 TiB / 2.0 TiB availpgs:     97 active+cleanio:client:   852 B/s rd, 1 op/s rd, 0 op/s wr

检查发现ceph状态正常;

2、检查kube-system集群

[root@master1 images]# kubectl get po -n kube-system
NAME                                         READY   STATUS    RESTARTS   AGE
calico-kube-controllers-578894d4cd-8wlg4     1/1     Running   0          8d
calico-node-5rnjk                            1/1     Running   0          8d
calico-node-7rvj2                            1/1     Running   0          8d
calico-node-p7hpq                            1/1     Running   0          8d
calico-node-vgrlg                            1/1     Running   0          8d
calico-node-zd2mn                            1/1     Running   0          8d
coredns-66bff467f8-fj7td                     1/1     Running   0          5d3h
coredns-66bff467f8-rmnzk                     1/1     Running   0          8d
dashboard-metrics-scraper-6b4884c9d5-8gtnl   1/1     Running   0          8d
etcd-master1                                 1/1     Running   0          20m
etcd-master2                                 1/1     Running   0          20m
etcd-master3                                 1/1     Running   0          20m
kube-apiserver-master1                       1/1     Running   0          8d
kube-apiserver-master2                       1/1     Running   0          8d
kube-apiserver-master3                       1/1     Running   0          8d
kube-controller-manager-master1              1/1     Running   63         8d
kube-controller-manager-master2              1/1     Running   64         8d
kube-controller-manager-master3              1/1     Running   64         8d
kube-proxy-6n7lz                             1/1     Running   0          8d
kube-proxy-7nstv                             1/1     Running   0          8d
kube-proxy-kxzhp                             1/1     Running   0          8d
kube-proxy-tw9j4                             1/1     Running   0          8d
kube-proxy-w4s47                             1/1     Running   0          8d
kube-scheduler-master1                       1/1     Running   63         32m
kube-scheduler-master2                       1/1     Running   64         22m
kube-scheduler-master3                       1/1     Running   74         22m
kubernetes-dashboard-6f77f7cfdb-kb6fx        1/1     Running   4          8d
metrics-server-584b5f4754-z58xl              1/1     Running   0          8d
traefik-5875c779f4-4z62m                     1/1     Running   0          4d21h

检查发现,kube-scheduler-master和kube-controller-manager-master 出现多次重启;

E0406 05:13:21.278261       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E0406 05:17:05.409616       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E0406 05:35:03.180503       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E0406 06:07:26.579433       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E0406 07:14:02.476881       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E0406 07:48:13.975004       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E0406 08:27:33.280699       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E0406 09:01:27.363775       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E0406 09:40:58.889225       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E0406 10:13:18.380376       1 leaderelection.go:320] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out

发现etcd出现频繁选主的情况;

于是检查etcd的状态,发现etcd的状态正常;初步判断etcd出现异常可能由于网络波动造成etcd重新选主;

[root@master1 images]#  ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt  --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key  --write-out=table --endpoints=172.10.25.184:2379,172.10.25.185:2379,172.10.25.186:2379 endpoint status
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|      ENDPOINT      |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 172.10.25.184:2379 | 3b85f750baf8d841 |   3.4.3 |   59 MB |     false |      false |      1886 |    3128844 |            3128844 |        |
| 172.10.25.185:2379 |  5f95ee4c3d9d164 |   3.4.3 |   59 MB |     false |      false |      1886 |    3128845 |            3128845 |        |
| 172.10.25.186:2379 | be2885dc23c5f563 |   3.4.3 |   59 MB |      true |      false |      1886 |    3128846 |            3128846 |        |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

之后再检查kubelet的状态,

[root@master1 mysql]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node AgentLoaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)Drop-In: /usr/lib/systemd/system/kubelet.service.d└─10-kubeadm.confActive: active (running) since Tue 2021-04-06 17:57:51 CST; 2min 28s agoDocs: https://kubernetes.io/docs/Main PID: 13723 (kubelet)Tasks: 30Memory: 88.4MCGroup: /system.slice/kubelet.service└─13723 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infr...Apr 06 18:00:12 master1 kubelet[13723]: I0406 18:00:12.930233   13723 operation_generator.go:181] scheme "" not registered, fallback to default scheme
Apr 06 18:00:12 master1 kubelet[13723]: I0406 18:00:12.930252   13723 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins_registry/rook-ceph.rbd.csi.ceph.com-reg.sock  <nil> 0 <nil>}] <nil> <nil>}
Apr 06 18:00:12 master1 kubelet[13723]: I0406 18:00:12.930263   13723 clientconn.go:933] ClientConn switching balancer to "pick_first"
Apr 06 18:00:12 master1 kubelet[13723]: W0406 18:00:12.930363   13723 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/rook-ceph.rbd.csi.ceph.com-reg.sock  <nil> 0 <nil>}. Err...
Apr 06 18:00:13 master1 kubelet[13723]: W0406 18:00:13.930508   13723 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/rook-ceph.cephfs.csi.ceph.com-reg.sock  <nil> 0 <nil>}. ...
Apr 06 18:00:13 master1 kubelet[13723]: W0406 18:00:13.930641   13723 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/rook-ceph.rbd.csi.ceph.com-reg.sock  <nil> 0 <nil>}. Err...
Apr 06 18:00:15 master1 kubelet[13723]: W0406 18:00:15.319378   13723 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/rook-ceph.cephfs.csi.ceph.com-reg.sock  <nil> 0 <nil>}. ...
Apr 06 18:00:15 master1 kubelet[13723]: W0406 18:00:15.473849   13723 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/rook-ceph.rbd.csi.ceph.com-reg.sock  <nil> 0 <nil>}. Err...
Apr 06 18:00:17 master1 kubelet[13723]: W0406 18:00:17.583410   13723 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/rook-ceph.cephfs.csi.ceph.com-reg.sock  <nil> 0 <nil>}. ...
Apr 06 18:00:18 master1 kubelet[13723]: W0406 18:00:18.305165   13723 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/rook-ceph.rbd.csi.ceph.com-reg.sock  <nil> 0 <nil>}. Err...
Hint: Some lines were ellipsized, use -l to show in full.

此时kubelet调度出现异常,无法正常调度到CSI( rook-ceph.rbd.csi.ceph.com),本次问题出现原因浮现

于是检查rook-ceph的pod情况,发现rook-ceph的相关csi的pod缺失,导致没发正常调度

[root@master1 mysql]# kubectl get po -n rook-ceph
NAME                                                READY   STATUS      RESTARTS   AGE
rook-ceph-crashcollector-master1-9ff9c4b7f-92zqn    1/1     Running     0          8d
rook-ceph-crashcollector-master2-6fd8fd857d-m4ngp   1/1     Running     0          8d
rook-ceph-crashcollector-master3-78869fc5b5-9rsrs   1/1     Running     0          5d
rook-ceph-crashcollector-node1-765b49998c-25wjd     1/1     Running     0          4d20h
rook-ceph-crashcollector-node2-5c9bf65fcd-rpv26     1/1     Running     0          7h31m
rook-ceph-mds-myfs-a-765f596697-cp7zs               1/1     Running     43         4d20h
rook-ceph-mds-myfs-b-5488556c97-kdr9m               1/1     Running     0          8d
rook-ceph-mgr-a-77b889cb6d-rqktg                    1/1     Running     0          8d
rook-ceph-mon-b-5d747c4957-mgv2t                    1/1     Running     0          8d
rook-ceph-mon-c-55c86765c7-7clf6                    1/1     Running     0          5d
rook-ceph-mon-d-85d9bcd45b-lkjvs                    1/1     Running     0          7d20h
rook-ceph-operator-6f9fc8c7dd-bk68g                 1/1     Running     0          3d16h
rook-ceph-osd-0-65d88658cb-gkthq                    1/1     Running     0          5d
rook-ceph-osd-1-7dc95f7cd7-46ppp                    1/1     Running     0          8d
rook-ceph-osd-2-5894c6b9c8-88q98                    1/1     Running     0          4d20h
rook-ceph-osd-3-8565f8b8cc-l4llh                    1/1     Running     0          8d
rook-ceph-osd-4-6cf8449f54-qh2lq                    1/1     Running     0          5d
rook-ceph-osd-5-c7b84d7b7-qqv5p                     1/1     Running     0          5d
rook-ceph-osd-6-5485fdcfc5-hwdl2                    1/1     Running     0          8d
rook-ceph-osd-7-b54c78b68-m88kh                     1/1     Running     0          4d20h
rook-ceph-osd-8-74b4575bd8-cx2fb                    1/1     Running     0          8d
rook-ceph-osd-9-6b8d6bb87f-7tgcj                    1/1     Running     0          5d
rook-ceph-osd-prepare-master1-cbgzb                 0/1     Completed   0          3d16h
rook-ceph-osd-prepare-master2-lmglb                 0/1     Completed   0          3d16h
rook-ceph-osd-prepare-master3-j9fx5                 0/1     Completed   0          3d16h
rook-ceph-osd-prepare-node1-cdmcc                   0/1     Completed   0          3d16h
rook-ceph-tools-6b4889fdfd-86dp5                    1/1     Running     0          4d20h
rook-discover-5grcs                                 1/1     Running     0          3d16h
rook-discover-7ltj8                                 1/1     Running     0          3d16h
rook-discover-bnrrw                                 1/1     Running     0          3d16h
rook-discover-m8lbb                                 1/1     Running     0          7h31m
rook-discover-zkdb5                                 1/1     Running     0          3d16h

由于rook-ceph的所有的pod的启动调度都和rook-ceph-operator的调度有关,于是重启了 rook-ceph-operator这个pod;

[root@master1 mysql]# kubectl delete po -n rook-ceph rook-ceph-operator-6f9fc8c7dd-bk68g
pod "rook-ceph-operator-6f9fc8c7dd-bk68g" deleted

重启后检查rook-ceph:

[root@master1 mysql]# kubectl get po -n rook-ceph
NAME                                                READY   STATUS              RESTARTS   AGE
csi-cephfsplugin-6hntb                              0/3     ContainerCreating   0          1s
csi-cephfsplugin-6zdm8                              0/3     ContainerCreating   0          1s
csi-cephfsplugin-84dhz                              0/3     Pending             0          1s
csi-cephfsplugin-twkbn                              0/3     ContainerCreating   0          1s
csi-cephfsplugin-xg4mg                              0/3     ContainerCreating   0          1s
csi-rbdplugin-48zk8                                 0/3     ContainerCreating   0          1s
csi-rbdplugin-4tn8s                                 0/3     ContainerCreating   0          1s
csi-rbdplugin-6vrwq                                 0/3     ContainerCreating   0          1s
csi-rbdplugin-provisioner-b4d4bc45d-s2sfx           0/6     ContainerCreating   0          1s
csi-rbdplugin-provisioner-b4d4bc45d-shz27           0/6     ContainerCreating   0          1s
csi-rbdplugin-s4jlv                                 0/3     ContainerCreating   0          1s
csi-rbdplugin-sdvjt                                 0/3     ContainerCreating   0          2s
rook-ceph-crashcollector-master1-9ff9c4b7f-92zqn    1/1     Running             0          8d
rook-ceph-crashcollector-master2-6fd8fd857d-m4ngp   1/1     Running             0          8d
rook-ceph-crashcollector-master3-78869fc5b5-9rsrs   1/1     Running             0          5d
rook-ceph-crashcollector-node1-765b49998c-25wjd     1/1     Running             0          4d20h
rook-ceph-crashcollector-node2-5c9bf65fcd-rpv26     1/1     Running             0          7h32m
rook-ceph-detect-version-wkcpw                      0/1     Terminating         0          6s
rook-ceph-mds-myfs-a-765f596697-cp7zs               1/1     Running             43         4d20h
rook-ceph-mds-myfs-b-5488556c97-kdr9m               1/1     Running             0          8d
rook-ceph-mgr-a-77b889cb6d-rqktg                    1/1     Running             0          8d
rook-ceph-mon-b-5d747c4957-mgv2t                    1/1     Running             0          8d
rook-ceph-mon-c-55c86765c7-7clf6                    1/1     Running             0          5d
rook-ceph-mon-d-85d9bcd45b-lkjvs                    1/1     Running             0          7d20h
rook-ceph-operator-6f9fc8c7dd-ktkw5                 1/1     Running             0          12s
rook-ceph-osd-0-65d88658cb-gkthq                    1/1     Running             0          5d
rook-ceph-osd-1-7dc95f7cd7-46ppp                    1/1     Running             0          8d
rook-ceph-osd-2-5894c6b9c8-88q98                    1/1     Running             0          4d20h
rook-ceph-osd-3-8565f8b8cc-l4llh                    1/1     Running             0          8d
rook-ceph-osd-4-6cf8449f54-qh2lq                    1/1     Running             0          5d
rook-ceph-osd-5-c7b84d7b7-qqv5p                     1/1     Running             0          5d
rook-ceph-osd-6-5485fdcfc5-hwdl2                    1/1     Running             0          8d
rook-ceph-osd-7-b54c78b68-m88kh                     1/1     Running             0          4d20h
rook-ceph-osd-8-74b4575bd8-cx2fb                    1/1     Running             0          8d
rook-ceph-osd-9-6b8d6bb87f-7tgcj                    1/1     Running             0          5d
rook-ceph-osd-prepare-master1-cbgzb                 0/1     Completed           0          3d16h
rook-ceph-osd-prepare-master2-lmglb                 0/1     Completed           0          3d16h
rook-ceph-osd-prepare-master3-j9fx5                 0/1     Completed           0          3d16h
rook-ceph-osd-prepare-node1-cdmcc                   0/1     Completed           0          3d16h
rook-ceph-tools-6b4889fdfd-86dp5                    1/1     Running             0          4d20h
rook-discover-5grcs                                 1/1     Running             0          3d16h
rook-discover-7ltj8                                 1/1     Running             0          3d16h
rook-discover-bnrrw                                 1/1     Running             0          3d16h
rook-discover-m8lbb                                 1/1     Running             0          7h32m
rook-discover-zkdb5                                 1/1     Running             0          3d16h

检查发现CSI等插件重新创建后,本地等待pod全部正常运行;

之后再次检查本地的初始创建pod的情况:

[root@master1 mysql]# kubectl get po -n ztw
NAME      READY   STATUS    RESTARTS   AGE
mysql-0   2/2     Running   0          54m
mysql-1   2/2     Running   0          27m
mysql-2   2/2     Running   0          27m

创建pod正常启动

一次kubenetes的rook-ceph创建pv失败的故障排查相关推荐

  1. Hive在SQL标准权限模式下创建UDF失败的问题排查

    环境: CDH 5.16 Hive 1.1.0 已开启Kerberos Hive 授权使用SQL StandardsBased Authorization模式(以下简称SSBA模式) 症状表现: 在编 ...

  2. 无法创建文件系统以及无法创建PV时怎么办?

    我们平常对磁盘分区格式化的时候有时无法格式化,报告的信息为: 1 "/dev/sdb3 is apparently in use by the system; will not make a ...

  3. linux 无法创建文件,无法创建文件系统以及无法创建PV时怎么办?

    我们平常对磁盘分区格式化的时候有时无法格式化,报告的信息为:"/dev/sdb3 is apparently in use by the system; will not make a fi ...

  4. K8S通过rook部署rook ceph集群、配置dashboard访问并创建pvc

    Rook概述 Ceph简介 Ceph是一种高度可扩展的分布式存储解决方案,提供对象.文件和块存储.在每个存储节点上,将找到Ceph存储对象的文件系统和Ceph OSD(对象存储守护程序)进程.在Cep ...

  5. kubernetes存储:local,openEBS,rook ceph

    文章目录 Local 存储(PV) 概念 hostPath Local PV storageClassName指定延迟绑定动作 pv的删除流程 OpenEBS存储 控制平面 OpenEBS PV Pr ...

  6. kubernetes部署 rook ceph

    环境: centos7.6, kubernetes 1.15.3, rook 1.3.4 部署 rook ceph 1.部署 rook ceph 官网下载 rook.解压后, cd rook-1.3. ...

  7. k8s + rook + Ceph 记录

    k8s 部署 ceph git clone git@github.com:rook/rook.git --single-branch --branch v1.6.11 failed to reconc ...

  8. K8S部署rook+ceph云原生存储

    1. 概念 1.1. Ceph 1.2. Rook 1.3. 架构 2. 部署rook+ceph 2.1. 准备事项 2.1.1. 建议配置 2.1.2. 本文环境 2.1.3. 注意事项 2.1.4 ...

  9. Rook Ceph Snapshot的清理

    目录 问题 背景知识 Ceph rbd快照 VolumeSnapshotContent和rbd快照 删除一个rbd snapshot 删除多个rbd snapshot 参考 问题 平时用k8s进行开发 ...

最新文章

  1. Linux(64位)下OpenBabel 2.4.1、python2.7和Ipython实战(二)
  2. C语言之free函数及野指针
  3. join left 大数据_Java并发编程笔记-JDK内置并行执行框架Fork/Join
  4. python系统-Python(第八课,简单开发系统)
  5. ArcGIS Server9.3+ArcGIS Desktop9.3破解安装(for microsoft .net)
  6. 【Java】用键盘输入若干数字,以非数字字符结束,计算这些数的和和平均值
  7. Pytorch中model.eval()的作用分析
  8. 论文 | 港中文自动驾驶点云上采样方法
  9. elasticsearch最大节点数_Elasticsearch究竟要设置多少分片数?
  10. Ubuntu18 安装yum
  11. 9. HTML DOM getElementsByName() 方法
  12. mysql 键 索引_五、MySQL索引和键
  13. 赤兔oracle恢复软件 收费,赤兔Oracle数据库恢复软件 v11.6
  14. WebService接口的生成和调用(WebService接口)
  15. OPPO消息推送服务器,OPPO开放平台消息推送申请教程
  16. python获取本机IP
  17. Redis持久化 - 邱乘屹的个人技术博客
  18. 电脑里的所有播放器只能播放声音没有画面
  19. 日常技术积累-ARM中RO/RW/ZI
  20. word_state

热门文章

  1. GPT-4正式发布!如何访问 怎么免费使用GPT-4?
  2. Python AST node转为string(source code)
  3. php tp6 错误接管分析,ThinkPHP5 异常接管
  4. 《天天数学》连载55:二月二十四日
  5. 视频分辨率、帧率和码率三者之间关系详解
  6. 查看电脑可支持最大内存容量的方法
  7. 遗传算法(GA)求解TSP问题
  8. java 账本 创建数据库_想用你所学的JAVA与数据库写一个属于自己的账本吗?一起来看看呗!看如何用java项目操作数据库...
  9. “%,/,//”的用法
  10. Docker基础讲解狂神笔记(1/2)