k8s集群中有一个节点创建的pod总是起不来,状态一直是ContainerCreating,describe pod发现sandbox一直创建不起来

kubectl describe pod  xxxxx -n xxx

如下:

 Normal   SandboxChanged          22m (x90 over 32m)  kubelet, node4     Pod sandbox changed, it will be killed and re-created. Warning  FailedCreatePodSandBox  32m (x7 over 32m)   kubelet, node4     Failed create pod sandbox.Warning  FailedSync              27m (x48 over 32m)  kubelet, node4     Error syncing pod

登录节点查看节点日志:

tail -f /var/log/messages

日志输出如下,从日志中可以看出是节点内存的buffer/cache满了导致sandbox无法创建。

Jan 25 15:12:52 node4 kubelet: W0125 15:12:52.446651   20224 cni.go:265] CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "a722b111fe3a8e78d1d7ee49280ab743d80c6e6ba955195b65bcfe60d5cf3264"
Jan 25 15:12:53 node4 docker: time="2019-01-25T15:12:53.102069754+08:00" level=error msg="Handler for POST /v1.26/containers/a722b111fe3a8e78d1d7ee49280ab743d80c6e6ba955195b65bcfe60d5cf3264/stop returned error: Container a722b111fe3a8e78d1d7ee49280ab743d80c6e6ba955195b65bcfe60d5cf3264 is already stopped"
Jan 25 15:12:53 node4 kernel: runc:[1:CHILD]: page allocation failure: order:6, mode:0x10c0d0
Jan 25 15:12:53 node4 kernel: CPU: 2 PID: 26598 Comm: runc:[1:CHILD] Tainted: G               ------------ T 3.10.0-693.5.2.el7.x86_64 #1
Jan 25 15:12:53 node4 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Jan 25 15:12:53 node4 kernel: 000000000010c0d0 000000008bf84052 ffff88003d21fa20 ffffffff816a3e51
Jan 25 15:12:53 node4 kernel: ffff88003d21fab0 ffffffff81188820 0000000000000000 ffff88023ffd9000
Jan 25 15:12:53 node4 kernel: 0000000000000006 000000000010c0d0 ffff88003d21fab0 000000008bf84052
Jan 25 15:12:53 node4 kernel: Call Trace:
Jan 25 15:12:53 node4 kernel: [<ffffffff816a3e51>] dump_stack+0x19/0x1b
Jan 25 15:12:53 node4 kernel: [<ffffffff81188820>] warn_alloc_failed+0x110/0x180
Jan 25 15:12:53 node4 kernel: [<ffffffff8169fe2a>] __alloc_pages_slowpath+0x6b6/0x724
Jan 25 15:12:53 node4 kernel: [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420
Jan 25 15:12:53 node4 kernel: [<ffffffff811d1078>] alloc_pages_current+0x98/0x110
Jan 25 15:12:53 node4 kernel: [<ffffffff8118761e>] __get_free_pages+0xe/0x40
Jan 25 15:12:53 node4 kernel: [<ffffffff811dca2e>] kmalloc_order_trace+0x2e/0xa0
Jan 25 15:12:53 node4 kernel: [<ffffffff811e05c1>] __kmalloc+0x211/0x230
Jan 25 15:12:53 node4 kernel: [<ffffffff811f5df9>] memcg_register_cache+0xb9/0xe0
Jan 25 15:12:53 node4 kernel: [<ffffffff811a6ca0>] kmem_cache_create_memcg+0x110/0x230
Jan 25 15:12:53 node4 kernel: [<ffffffff811a6deb>] kmem_cache_create+0x2b/0x30
Jan 25 15:12:53 node4 kernel: [<ffffffffc03559d1>] nf_conntrack_init_net+0x101/0x250 [nf_conntrack]
Jan 25 15:12:53 node4 kernel: [<ffffffffc03562a4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Jan 25 15:12:53 node4 kernel: [<ffffffff8157cbe1>] ops_init+0x41/0x150
Jan 25 15:12:53 node4 kernel: [<ffffffff8157cd93>] setup_net+0xa3/0x160
Jan 25 15:12:53 node4 kernel: [<ffffffff8157d6f5>] copy_net_ns+0xb5/0x180
Jan 25 15:12:53 node4 kernel: [<ffffffff810b5989>] create_new_namespaces+0xf9/0x180
Jan 25 15:12:53 node4 kernel: [<ffffffff810b5bca>] unshare_nsproxy_namespaces+0x5a/0xc0
Jan 25 15:12:53 node4 kernel: [<ffffffff81086f13>] SyS_unshare+0x193/0x300
Jan 25 15:12:53 node4 kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
Jan 25 15:12:53 node4 kernel: Mem-Info:
Jan 25 15:12:53 node4 kernel: active_anon:422434 inactive_anon:282 isolated_anon:0#012 active_file:389880 inactive_file:422271 isolated_file:0#012 unevictable:0 dirty:746 writeback:0 unstable:0#012 slab_reclaimable:404737 slab_unreclaimable:255217#012 mapped:62492 shmem:1127 pagetables:6817 bounce:0#012 free:51212 free_pcp:6 free_cma:0
Jan 25 15:12:53 node4 kernel: Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 25 15:12:53 node4 kernel: lowmem_reserve[]: 0 2814 7805 7805
Jan 25 15:12:53 node4 kernel: Node 0 DMA32 free:73680kB min:24324kB low:30404kB high:36484kB active_anon:547328kB inactive_anon:340kB active_file:538904kB inactive_file:565980kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2884232kB mlocked:0kB dirty:520kB writeback:0kB mapped:88968kB shmem:1488kB slab_reclaimable:754588kB slab_unreclaimable:332604kB kernel_stack:12608kB pagetables:7456kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 25 15:12:53 node4 kernel: lowmem_reserve[]: 0 0 4990 4990
Jan 25 15:12:53 node4 kernel: Node 0 Normal free:115268kB min:43124kB low:53904kB high:64684kB active_anon:1142408kB inactive_anon:788kB active_file:1020616kB inactive_file:1123104kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5242880kB managed:5110056kB mlocked:0kB dirty:2464kB writeback:0kB mapped:161000kB shmem:3020kB slab_reclaimable:864360kB slab_unreclaimable:688256kB kernel_stack:16928kB pagetables:19812kB unstable:0kB bounce:0kB free_pcp:24kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 25 15:12:53 node4 kernel: lowmem_reserve[]: 0 0 0 0
Jan 25 15:12:53 node4 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
Jan 25 15:12:53 node4 kernel: Node 0 DMA32: 2019*4kB (UEM) 781*8kB (UEM) 1373*16kB (UEM) 897*32kB (UEM) 139*64kB (UEM) 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 74020kB
Jan 25 15:12:53 node4 kernel: Node 0 Normal: 12464*4kB (UEM) 2962*8kB (UEM) 1229*16kB (UEM) 356*32kB (UEM) 129*64kB (UEM) 16*128kB (UEM) 2*256kB (E) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 115424kB
Jan 25 15:12:53 node4 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 25 15:12:53 node4 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 25 15:12:53 node4 kernel: 813284 total pagecache pages
Jan 25 15:12:53 node4 kernel: 0 pages in swap cache
Jan 25 15:12:53 node4 kernel: Swap cache stats: add 0, delete 0, find 0/0
Jan 25 15:12:53 node4 kernel: Free swap  = 0kB
Jan 25 15:12:53 node4 kernel: Total swap = 0kB
Jan 25 15:12:53 node4 kernel: 2097022 pages RAM
Jan 25 15:12:53 node4 kernel: 0 pages HighMem/MovableOnly
Jan 25 15:12:53 node4 kernel: 94473 pages reserved
Jan 25 15:12:53 node4 kernel: kmem_cache_create(nf_conntrack_ffff88018f2fbcc0) failed with error -12
Jan 25 15:12:53 node4 kernel: CPU: 2 PID: 26598 Comm: runc:[1:CHILD] Tainted: G               ------------ T 3.10.0-693.5.2.el7.x86_64 #1
Jan 25 15:12:53 node4 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014

可以通过如下方法解决:

在/etc/sysctl.conf中加入:
vm.zone_reclaim_mode = 1
然后执行sysctl -p 
这个参数的作用时告诉内核当内存不够用时就直接回收buffer/cache。

之后问题解决了,在get pod发现pod状态已经变成running了。

kubernetes failed to start sandbox相关推荐

  1. k8s中pod sandbox创建失败failed to start sandbox container

    背景 今天在k8s更新服务时,发现pod启动失败,报错failed to start sandbox container,如下所示: Events:Type Reason Age From Messa ...

  2. k8s FailedCreatePodSandBox: Failed create pod sandbox

    创建一个pods的时候,发现一直处于ContainerCreating状态: 一开始怀疑是镜像拉取过慢,于是到相应节点上手动docker pull镜像. kubectl delete pods 后,还 ...

  3. Failed create pod sandbox

    pod一直处于ContainerCreating状态, kubectl describe pod 看到报错Failed create pod sandbox journalctl -u kubelet ...

  4. Kubernetes Pod报错 filed to get sandbox image “k8s.gcr.io/pause:3.6“

    最近工作中在部署Pod后发现无法正常启动,查看Pod详情后看到以下报错信息: Failed to create pod sandbox: rpc error: code = Unknown desc ...

  5. kubernetes Sandbox删除详解

    上一篇blog讲解了一下gc的整个流程.后面介绍了删除sandbox.但sandbox怎么删除还是没有讲清楚,在此详细展开说一下,通过sandbox id去删除sandbox代码在pkg/kubele ...

  6. k8s Failed to create pod sandbox错误处理

    错误信息: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image &qu ...

  7. CentOS7系统上Kubernetes集群搭建

    虚拟机创建 在自己的Mac系统里面利用Parallels Desktop创建3台虚拟机,具体信息如下: CentOS7-Node1: 10.211.55.7 parallels/centos-test ...

  8. Kubernetes | 《Kubernetes in Action中文版》第8章错误

    Kubernetes | <Kubernetes in Action中文版>第8章错误 1. 查看容器详情 Name: downward Namespace: default Priori ...

  9. Kubernetes部署(一):K8s 二进制方式安装

    一.介绍: docker 完全隔离需要在内核3.8 以上,所以Centos6 不行 所有docker解决不了的事情,k8s来解决. k8s思维引导图vsdx-Linux文档类资源-CSDN下载 1.1 ...

  10. kubernetes源码分析-pod创建流程

    前言 首先放一张kubernetes的架构图.其中apiserver是整个架构的信息交互中心.所有组件都会与apisever交互. kubernetes中,每个node都部署了一个kubelet,通过 ...

最新文章

  1. 专题 16 基于UDP的通信程序设计
  2. oracle多表关联查询报表,oracle多表关联查询和子查询
  3. Fiddler抓包使用教程-模拟低速网络环境
  4. 试题 历届试题 买不到的数目(dp/数学)
  5. 文献记录(part95)--CCMS: A nonlinear clustering method based on crowd movement and selection
  6. linux 第一个内核模块Hello World
  7. Java并发面试,幸亏有点道行,不然又被忽悠了 1
  8. 网络计算模式复习大纲
  9. 2019百度网盘破解不限速
  10. 使用Python进行数独求解(二)
  11. win10照片查看器_Windows 10如何找回自带的照片查看器?
  12. 无法将数据库从SINGLE_USER模式切换回MULTI_USER模式(Error 5064)
  13. 百练:4151 电影节
  14. 谋定而后动 知止而有得
  15. java实现APP版本比对工具类
  16. 中科院华为诺亚提出ViG:一种全新的骨干网络,性能不输CNN、ViT!
  17. 并发编程面试宝典(内含69道常问面试题及答案解析)
  18. 和画意思相近的字_写出两个与画横线词语意思相近的四字词语:         ——青夏教育精英家教网——...
  19. 用层次分析法分析如何选购电脑
  20. php字符值函数,php从指定ASCII值返回字符函数chr()

热门文章

  1. 纯电动两档箱实际项目模型,本模型基于Cruise软件和搭建完成,本资料包包含所有源文件
  2. 公安如何通过大数据破案?知识图谱实现公安情报分析(人工智能大数据公司)
  3. Calibre Umd Plugin (插件)
  4. 游戏开发工具,让你事半功倍!
  5. cs6导入库闪退 flash_flash cs6源文件怎么修复,导入老跳出意外格式,我是用flash cs6做的。我还有一半没有导出 swf 呢?...
  6. 《物联网Android程序开发案例式教程》Demo2:相对布局
  7. 整理编程语言列表大全,你最熟悉哪些?
  8. 软件设计文档编写概述
  9. 破解版PDF编辑器————Adobe Acrobat DC
  10. 冬雷快递单打印软件anyPrint