Error: “MountVolume.SetUp failed for volume pvc 故障处理
文章目录
- 故障描述
- 排查思路
- 1.尝试重启Pod
- 2.查看pod events事件
- 3.查看kubelet日志
- 4.检查pvc与pv资源对象
- 5.检查磁盘挂载
- 解决方案
故障描述
内部环境收到Pod异常告警
[Alerting] Pod 状态告警
集群中存在 Pod 处于异常状态超过 1 分钟1. ti-inf/etcd-1 (Pending): 1.000
详请链接, http://xx.xx.xx.xx/grafana/d/default/alert-dashboard?tab=alert&viewPanel=19&orgId=1
查看k8s集群中异常Pod,发现为数据组件pod
排查思路
1.尝试重启Pod
~]# kubectl delete pod etcd-1 -nti-inf
发现还是处于异常状态。
2.查看pod events事件
~]# kubectl describe pod redis-server-2 -nti-inf
Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 28m volcano Successfully assigned ti-inf/redis-server-2 to x.x.x.xWarning FailedMount 3m17s (x3599 over 28m) kubelet MountVolume.SetUp failed for volume "pvc-9d1c0e76-6d56-439d-8070-741d8846d569" : rpc error: code = Internal desc = stat /csi-data-dir/ti-database/pv: input/output error
从events事件中可以看到,kubelet程序在MountVolume这一步骤Failed,暴露出来的信息为“pvc input/output error”
3.查看kubelet日志
[root@VM-2-29-centos prometheus-db]# grep -i error /var/log/messages| tail -n 5
Jun 28 20:14:13 VM-2-29-centos kubelet: E0628 20:14:13.819828 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada podName: nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.319804053 +0800 CST m=+11760883.388055363 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"pvc-668750fa-cc0a-4105-96f3-7fa184db4ada\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada\") pod \"etcd-1\" (UID: \"1c99773c-3845-4141-ac30-1c3d26f1f30a\") : rpc error: code = Internal desc = stat /csi-data-dir/ti-database/pv: input/output error"
Jun 28 20:14:13 VM-2-29-centos kubelet: E0628 20:14:13.901519 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada podName:4c5d9bdf-498a-4456-9c6c-e6f7b456e693 nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.401482582 +0800 CST m=+11760883.469733942 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"data\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada\") pod \"4c5d9bdf-498a-4456-9c6c-e6f7b456e693\" (UID: \"4c5d9bdf-498a-4456-9c6c-e6f7b456e693\") : kubernetes.io/csi: mounter.TearDownAt failed: rpc error: code = Internal desc = stat /var/lib/kubelet/pods/4c5d9bdf-498a-4456-9c6c-e6f7b456e693/volumes/kubernetes.io~csi/pvc-668750fa-cc0a-4105-96f3-7fa184db4ada/mount: input/output error"
Jun 28 20:14:14 VM-2-29-centos kubelet: E0628 20:14:14.018249 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569 podName: nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.518217097 +0800 CST m=+11760883.586468437 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"pvc-9d1c0e76-6d56-439d-8070-741d8846d569\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569\") pod \"redis-server-2\" (UID: \"5550e257-2245-4401-bd9a-cf275ff94675\") : rpc error: code = Internal desc = stat /csi-data-dir/ti-database/pv: input/output error"
Jun 28 20:14:14 VM-2-29-centos kubelet: E0628 20:14:14.102735 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569 podName:daea4ba4-b97c-46c6-866b-aa7cc29af0a8 nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.602692068 +0800 CST m=+11760883.670943428 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"data\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569\") pod \"daea4ba4-b97c-46c6-866b-aa7cc29af0a8\" (UID: \"daea4ba4-b97c-46c6-866b-aa7cc29af0a8\") : kubernetes.io/csi: mounter.TearDownAt failed: rpc error: code = Internal desc = stat /var/lib/kubelet/pods/daea4ba4-b97c-46c6-866b-aa7cc29af0a8/volumes/kubernetes.io~csi/pvc-9d1c0e76-6d56-439d-8070-741d8846d569/mount: input/output error"经过日志分析可以看到是磁盘出现了部分阻塞,出现以上大量报错信息。
4.检查pvc与pv资源对象
[root@VM-2-29-centos ~]# kubectl get pvc -nti-inf |grep redis
data-redis-server-0 Bound pvc-59fde781-e03e-4b26-b07c-7de93f608395 10Gi RWO csi-localpv-tidb 136d
data-redis-server-1 Bound pvc-6bf28ec2-40e1-4b52-8d54-b4ab0aa9f67a 10Gi RWO csi-localpv-tidb 136d
data-redis-server-2 Bound pvc-9d1c0e76-6d56-439d-8070-741d8846d569 10Gi RWO csi-localpv-tidb 136d
[root@VM-2-29-centos ~]#
[root@VM-2-29-centos ~]# kubectl get pv |grep redis
pvc-59fde781-e03e-4b26-b07c-7de93f608395 10Gi RWO Delete Bound ti-inf/data-redis-server-0 csi-localpv-tidb 136d
pvc-6bf28ec2-40e1-4b52-8d54-b4ab0aa9f67a 10Gi RWO Delete Bound ti-inf/data-redis-server-1 csi-localpv-tidb 136d
pvc-9d1c0e76-6d56-439d-8070-741d8846d569 10Gi RWO Delete Bound ti-inf/data-redis-server-2 csi-localpv-tidb 136dpvc与pv资源均正常。
5.检查磁盘挂载
dmesg(display message) [or display driver],即看内核相关信息
[二 6月 28 20:22:47 2022] buffer_io_error: 6 callbacks suppressed
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971392, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971393, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971394, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971395, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971396, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971397, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971398, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971399, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop4, logical block 20971392, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop4, logical block 20971393, async page read
因pvc对应磁盘为/dev/vdc,而且系统做了lvm逻辑卷,显然是逻辑卷故障了
通过系统终端查询此目录,已经无法正常访问
~]# ls /data/ti-database
ls: 无法访问/data/ti-database: 输入/输出错误说明:缓冲区 I/O 错误,逻辑块20971393 异步页面读取失败
解决方案
因平台数据组件(etcd/redis/es)均为3个副本,可容忍单点故障,并且此逻辑卷在起初规划设计时只给数据组件使用,所以对其他服务没有影响,只需要重新制作lvm逻辑卷即可。
详细操作流程:
1、mysql/etcd/es 数据备份
2、卸载逻辑卷挂载
3、使用lvremove删除逻辑卷LV
4、使用vgremove删除卷组VG
5、使用pvremove删除物理卷设备
在上述操作执行完毕之后,再执行 lvdisplay、vgdisplay、pvdisplay 命令来查看 LVM 的信息时就不会再看到信息了
6、删除此节点pv与pvc
7、重新制作lvm逻辑卷并进行挂载
8、创建pv、pvc资源对象,与Pod进行关联绑定
9、验证Pod状态
10、检查redis与etcd组件集群健康状态,及数据一致性校验
参考资料:
https://github.com/longhorn/longhorn/issues/1210
https://developer.aliyun.com/article/521158
Error: “MountVolume.SetUp failed for volume pvc 故障处理相关推荐
- pod一直处于ContainerCreating,查看报错信息为挂载错误MountVolume.SetUp failed for volume
背景,在搭建redis集群时,使用的是nfs挂载卷,中途我好像把挂载盘的文件移走了,当我再次启动pod时就出现挂载错误. [root@master redis-cluster-sts]# kubect ...
- 解决argo workflow报错:MountVolume.SetUp failed for volume “docker-sock“ : hostPath type check failed
提交workflow时报错: MountVolume.SetUp failed for volume "docker-sock" : hostPath type check fai ...
- MountVolume.MountDevice failed for volume “pvc“ ...问题解决
一.问题描述 Warning FailedMount 44s (x2 over 108s) kubelet MountVolume.MountDevice failed for volume &quo ...
- MountVolume.NewMounter initialization failed for volume “pvc-61dedc85-ea5a-4ac7-aaf3-e072e2e46e18“
报错 本地测试环境k8s重启后,stateful set报错了 # 报错信息 MountVolume.NewMounter initialization failed for volume " ...
- repo sync error.GitError: manifests rev-list : fatal: revision walk setup failed
更新代码是repo sync 出错:error.GitError: manifests rev-list ('^HEAD', u'a78728c68089372c3ce03a76f10143d7a5d ...
- pip install nmslib 失败 (error: command ‘x86_64-linux-gnu-gcc‘ failed with exit status 1)
1. 问题现象 使用 pip 安装 nmslib 命令时出现如下错误: sudo pip install nmslib ....ERROR: Complete output from command ...
- python mysql gcc_MySQL-python “error: command 'gcc' failed with exit status 1”错误
安装MySQL-python-1.2.3c1出现"error: command 'gcc' failed with exit status 1"错误 具体报错信息如下: _mysq ...
- 安装MySQL-python报错 error: command 'gcc' failed with exit status 1解决方法
错误如: _mysql.c:2331: error: '_mysql_ConnectionObject' has no member named 'open' _mysql.c:2338: error ...
- pycuda installation error: command 'gcc' failed with exit status 1
原文:python采坑之路 Setup script exited with error: command 'gcc' failed with exit status 1 伴随出现"cuda ...
最新文章
- 社交网络图挖掘3--重叠社区的发现及Simrank
- dubbo集群和负载均衡
- Silverlight4实现三维企业网站
- javascript高级程序设计---js事件思维导图
- topcoder SRM712 Div1 LR
- 9切换中文mac_超详细的Mac重装系统教程!让重装系统变得简单起来!
- 20170721L08-02-02老男孩Linux运维实战培训初级第八节课课前【上机实战】考试讲解...
- java语言复制数组的四种方法
- linux redis 监控工具,Redis服务器监控工具redis-live
- 计算机图形学----投影矩阵
- [验证码实现] Captcha 验证码类,一个很个性的验证码类 (转载)
- Java的一些基础小知识之JVM与GC (转)
- (通用版)salesforce中soql及sosl的伪‘Like’模糊检索
- PHP从入门到精通pdf
- TF-IDF算法(原理+python代码实现)
- 群晖之邮件服务器搭建
- 共享打印机连接报错问题汇总
- C语言学习中遇到的问题和解决方法
- 【Linux】【操作】Linux操作集锦系列之三——进程管理系列之(一) 进程信息查看
- 日常运维-端口查询篇
热门文章
- arm gnu linux系统,GNU ARM汇编
- 高并发高可用复杂系统中的缓存架构(十六) 实现缓存与数据库双写一致性保障方案
- IT风投的一个典型案例--阿里巴巴
- oki/5330c.html,oki5330scXP驱动怎么安装;打打印机驱动安装
- span 文字自动换行(实测可用)
- 系统安装、qt配置、安装pcie驱动
- 登录页面的SQL注入漏洞
- QWidget: Must construct a QApplication before a QWidget 请按任意键继续. . .
- SAP GUI 740 windows 免费下载
- AI4DB:人工智能之慢SQL根因分析