现象

某个宿主机突然登录不上了,重启后查看系统日志/var/log/messages和下面的日志类似

Jan  5 15:50:01 hanginx01 systemd: Started Session 196 of user root.
Jan  5 15:50:01 hanginx01 systemd: Starting Session 196 of user root.
Jan  5 15:50:11 hanginx01 dockerd: time="2020-01-05T15:50:11.479595119+08:00" level=info msg="Container d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd failed to exit within
10 seconds of signal 15 - using the force"
Jan  5 15:50:11 hanginx01 containerd: time="2020-01-05T15:50:11.670173843+08:00" level=info msg="shim reaped" id=d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd
Jan  5 15:50:11 hanginx01 dockerd: time="2020-01-05T15:50:11.679960565+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan  5 15:50:11 hanginx01 kernel: docker0: port 1(vethb1da919) entered disabled state
Jan  5 15:50:11 hanginx01 kernel: docker0: port 1(vethb1da919) entered disabled state
Jan  5 15:50:11 hanginx01 kernel: device vethb1da919 left promiscuous mode
Jan  5 15:50:11 hanginx01 kernel: docker0: port 1(vethb1da919) entered disabled state
Jan  5 15:50:11 hanginx01 NetworkManager[1028]: <info>  [1578210611.8637] device (vethf33af93): driver 'veth' does not support carrier detection.
Jan  5 15:50:11 hanginx01 NetworkManager[1028]: <info>  [1578210611.8652] manager: (vethf33af93): new Veth device (/org/freedesktop/NetworkManager/Devices/445)
Jan  5 15:50:11 hanginx01 NetworkManager[1028]: <info>  [1578210611.8680] device (vethb1da919): released from master device docker0
Jan  5 15:50:11 hanginx01 dockerd: time="2020-01-05T15:50:11.890774286+08:00" level=warning msg="d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd cleanup: failed to unmount IP
C: umount /home/docker/containers/d1c1b175808a9e91137eda25b18ffc6c0c48a416fddd29ffc14905e0c1de2cbd/mounts/shm, flags: 0x2: no such file or directory"
Jan  5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered blocking state
Jan  5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered disabled state
Jan  5 15:50:11 hanginx01 kernel: device veth0d967d8 entered promiscuous mode
Jan  5 15:50:11 hanginx01 kernel: IPv6: ADDRCONF(NETDEV_UP): veth0d967d8: link is not ready
Jan  5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered blocking state
Jan  5 15:50:11 hanginx01 kernel: docker0: port 1(veth0d967d8) entered forwarding state
Jan  5 15:50:11 hanginx01 NetworkManager[1028]: <info>  [1578210611.9005] manager: (veth4feb836): new Veth device (/org/freedesktop/NetworkManager/Devices/446)
Jan  5 15:50:11 hanginx01 NetworkManager[1028]: <info>  [1578210611.9018] manager: (veth0d967d8): new Veth device (/org/freedesktop/NetworkManager/Devices/447)
Jan  5 15:50:11 hanginx01 containerd: time="2020-01-05T15:50:11.918219527+08:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/d1c1b175808a9e91137eda25b18ffc6c0
c48a416fddd29ffc14905e0c1de2cbd/shim.sock" debug=false pid=28696
Jan  5 15:50:11 hanginx01 kernel: IPVS: Creating netns size=2040 id=150
Jan  5 15:50:12 hanginx01 NetworkManager[1028]: <info>  [1578210612.0821] device (veth0d967d8): link connected
Jan  5 15:50:12 hanginx01 NetworkManager[1028]: <info>  [1578210612.0822] device (docker0): link connected
Jan  5 15:50:12 hanginx01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth0d967d8: link becomes ready
Jan  5 15:50:14 hanginx01 ntpd[1046]: Listen normally on 156 veth0d967d8 fe80::847:2cff:feb9:1a90 UDP 123
Jan  5 15:50:14 hanginx01 ntpd[1046]: Deleting interface #155 vethb1da919, fe80::b81f:e0ff:fe16:22bb#123, interface stats: received=0, sent=0, dropped=0, active_time=605 secs
Jan  5 15:55:25 hanginx01 kernel: microcode: microcode updated early to revision 0xb000021, date = 2017-03-01
Jan  5 15:55:25 hanginx01 kernel: Initializing cgroup subsys cpuset
Jan  5 15:55:25 hanginx01 kernel: Initializing cgroup subsys cpu
Jan  5 15:55:25 hanginx01 kernel: Initializing cgroup subsys cpuacct
Jan  5 15:55:25 hanginx01 kernel: Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2
017
Jan  5 15:55:25 hanginx01 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb qui
et LANG=en_US.UTF-8
Jan  5 15:55:25 hanginx01 kernel: e820: BIOS-provided physical RAM map:
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009afff] usable
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x000000000009b000-0x000000000009ffff] reserved
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000078888fff] usable
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000078889000-0x0000000079a3afff] reserved
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000079a3b000-0x0000000079a9efff] ACPI data
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000079a9f000-0x0000000079ff9fff] ACPI NVS
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000079ffa000-0x000000008fffffff] reserved
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed44fff] reserved
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
Jan  5 15:55:25 hanginx01 kernel: BIOS-e820: [mem 0x0000000100000000-0x000000407fffffff] usable
Jan  5 15:55:25 hanginx01 kernel: NX (Execute Disable) protection: active
Jan  5 15:55:25 hanginx01 kernel: SMBIOS 3.0 present.
Jan  5 15:55:25 hanginx01 kernel: e820: last_pfn = 0x4080000 max_arch_pfn = 0x400000000
Jan  5 15:55:25 hanginx01 kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Jan  5 15:55:25 hanginx01 kernel: e820: last_pfn = 0x78889 max_arch_pfn = 0x400000000
Jan  5 15:55:25 hanginx01 kernel: found SMP MP-table at [mem 0x000fdcb0-0x000fdcbf] mapped at [ffff8800000fdcb0]
Jan  5 15:55:25 hanginx01 kernel: Using GB pages for direct mapping
Jan  5 15:55:25 hanginx01 kernel: RAMDISK: [mem 0x357ea000-0x36becfff]
Jan  5 15:55:25 hanginx01 kernel: Early table checksum verification disabled
Jan  5 15:55:25 hanginx01 kernel: ACPI: RSDP 00000000000f0530 00024 (v02 ALASKA)
Jan  5 15:55:25 hanginx01 kernel: ACPI: XSDT 0000000079a4f098 000B4 (v01 ALASKA   A M I  01072009 AMI  00010013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: FACP 0000000079a838d0 0010C (v05 ALASKA   A M I  01072009 AMI  00010013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: DSDT 0000000079a4f1e0 346EE (v02 ALASKA   A M I  01072009 INTL 20091013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: FACS 0000000079ff8f80 00040
Jan  5 15:55:25 hanginx01 kernel: ACPI: APIC 0000000079a839e0 00224 (v03 ALASKA   A M I  01072009 AMI  00010013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: FPDT 0000000079a83c08 00044 (v01 ALASKA   A M I  01072009 AMI  00010013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: FIDT 0000000079a83c50 0009C (v01 ALASKA   A M I  01072009 AMI  00010013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: SPMI 0000000079a83cf0 00041 (v05 ALASKA   A M I  00000000 AMI. 00000000)
Jan  5 15:55:25 hanginx01 kernel: ACPI: MCFG 0000000079a83d38 0003C (v01 ALASKA    A M I 01072009 MSFT 00000097)
Jan  5 15:55:25 hanginx01 kernel: ACPI: UEFI 0000000079a83d78 00042 (v01 ALASKA   A M I  01072009      00000000)
Jan  5 15:55:25 hanginx01 kernel: ACPI: HPET 0000000079a83dc0 00038 (v01 ALASKA   A M I  00000001 INTL 20091013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: MSCT 0000000079a83df8 00090 (v01 ALASKA   A M I  00000001 INTL 20091013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: SLIT 0000000079a83e88 00030 (v01 ALASKA   A M I  00000001 INTL 20091013)
Jan  5 15:55:25 hanginx01 kernel: ACPI: SRAT 0000000079a83eb8 01158 (v03 ALASKA   A M I  00000001 INTL 20091013)Jan  5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0x90000000-0xfed1bfff]
Jan  5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0xfed1c000-0xfed44fff]
Jan  5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0xfed45000-0xfeffffff]
Jan  5 15:55:25 hanginx01 kernel: PM: Registered nosave memory: [mem 0xff000000-0xffffffff]
Jan  5 15:55:25 hanginx01 kernel: e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
Jan  5 15:55:25 hanginx01 kernel: Booting paravirtualized kernel on bare hardware
Jan  5 15:55:25 hanginx01 kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:32 nr_cpu_ids:32 nr_node_ids:2
Jan  5 15:55:25 hanginx01 kernel: PERCPU: Embedded 33 pages/cpu @ffff881fffa00000 s97048 r8192 d29928 u262144
Jan  5 15:55:25 hanginx01 kernel: Built 2 zonelists in Zone order, mobility grouping on.  Total pages: 66030059
Jan  5 15:55:25 hanginx01 kernel: Policy zone: Normal
Jan  5 15:55:25 hanginx01 kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap r
hgb quiet LANG=en_US.UTF-8
Jan  5 15:55:25 hanginx01 kernel: PID hash table entries: 4096 (order: 3, 32768 bytes)
Jan  5 15:55:25 hanginx01 kernel: x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
Jan  5 15:55:25 hanginx01 kernel: xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form
Jan  5 15:55:25 hanginx01 kernel: Memory: 5886008k/270532608k available (6886k kernel code, 2219892k absent, 4482536k reserved, 4545k data, 1764k init)
Jan  5 15:55:25 hanginx01 kernel: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=2
Jan  5 15:55:25 hanginx01 kernel: Hierarchical RCU implementation.
Jan  5 15:55:25 hanginx01 kernel: #011RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=32.
Jan  5 15:55:25 hanginx01 kernel: NR_IRQS:327936 nr_irqs:1496 0
Jan  5 15:55:25 hanginx01 kernel: Console: colour VGA+ 80x25
Jan  5 15:55:25 hanginx01 kernel: console [tty0] enabled
Jan  5 15:55:25 hanginx01 kernel: allocated 1073741824 bytes of page_cgroup
Jan  5 15:55:25 hanginx01 kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups
Jan  5 15:55:25 hanginx01 kernel: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
Jan  5 15:55:25 hanginx01 kernel: tsc: Fast TSC calibration using PIT
Jan  5 15:55:25 hanginx01 kernel: tsc: Detected 2095.256 MHz processor
Jan  5 15:55:25 hanginx01 kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 4190.51 BogoMIPS (lpj=2095256)
Jan  5 15:55:25 hanginx01 kernel: pid_max: default: 32768 minimum: 301
Jan  5 15:55:25 hanginx01 kernel: Security Framework initialized
Jan  5 15:55:25 hanginx01 kernel: SELinux:  Initializing.
Jan  5 15:55:25 hanginx01 kernel: Yama: becoming mindful.
Jan  5 15:55:25 hanginx01 kernel: Dentry cache hash table entries: 33554432 (order: 16, 268435456 bytes)
Jan  5 15:55:25 hanginx01 kernel: Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes)
Jan  5 15:55:25 hanginx01 kernel: random: fast init done
Jan  5 15:55:25 hanginx01 kernel: Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Jan  5 15:55:25 hanginx01 kernel: Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Jan  5 15:55:25 hanginx01 kernel: Initializing cgroup subsys memory
Jan  5 15:55:25 hanginx01 kernel: Initializing cgroup subsys devices

解决办法

方法1. 使用 systemd 作为 cgroup 驱动

修改/etc/docker/daemon.json为:

{"exec-opts": ["native.cgroupdriver=systemd"]
}

如果原来/etc/docker/daemon.json里面有内容的话可以按照json格式追加一行.之后重启docker服务,执行docker info|grep Cgroup,发现结果为systemd(默认是cgroupfs),即可

注意: k8s环境中更改已加入集群的节点的 cgroup 驱动是一项敏感的操作。 如果 kubelet 已经使用某 cgroup 驱动的语义创建了 pod,更改运行时以使用 别的 cgroup 驱动,当为现有 Pods 重新创建 PodSandbox 时会产生错误。 重启 kubelet 也可能无法解决此类问题。 如果你有切实可行的自动化方案,使用其他已更新配置的节点来替换该节点, 或者使用自动化方案来重新安装。

k8s文档对Cgroup 驱动程序的解释是:

控制组用来约束分配给进程的资源。

当某个 Linux 系统发行版使用 systemd作为其初始化系统时,初始化进程会生成并使用一个 root 控制组 (cgroup), 并充当 cgroup管理器。 Systemdcgroup 集成紧密,并将为每个systemd单元分配一个 cgroup。 你也可以配置容器运行时和 kubelet 使用 cgroupfs。 连同 systemd一起使用 cgroupfs 意味着将有两个不同的 cgroup 管理器。

单个 cgroup 管理器将简化分配资源的视图,并且默认情况下将对可用资源和使用 中的资源具有更一致的视图。 当有两个管理器共存于一个系统中时,最终将对这些资源产生两种视图。 在此领域人们已经报告过一些案例,某些节点配置让 kubelet 和 docker 使用 cgroupfs,而节点上运行的其余进程则使用 systemd; 这类节点在资源压力下 会变得不稳定。

更改设置,令容器运行时和 kubelet 使用systemd 作为cgroup 驱动,以此使系统更为稳定。 对于 Docker, 设置 native.cgroupdriver=systemd选项。
原文请参考: https://kubernetes.io/zh/docs/setup/production-environment/container-runtimes/

方法2. 升级docker版本或系统内核版本

#升级docker版本
yum remove docker docker-engine docker-common \
docker-client docker-client-latest docker-latest docker-latest-logrotate \
docker-logrotate docker-selinux docker-engine-selinux  -y
yum install yum-utils lvm2 device-mapper-persistent-data -y
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --disable docker-ce-edge docker-ce-test
yum install docker-ce.x86_64 -y
yum update containerd.io -y#升级内核版本,生产环境谨慎操作,升级内核后需要重启
#参考 https://www.cnblogs.com/clsn/p/10925653.html

注意: 升级内核版本,安装完成后需要重启服务器,再使用uname -a可以看到内核版本号

参考

  1. 记录一次docker导致宿主机重启故障解决方法
  2. 容器运行时
  3. Centos内核版本升级
  4. docker的cgroup driver设置为cgroupfs和systemd有什么区别吗?

docker导致宿主机重启和Cgroup 驱动程序有关相关推荐

  1. 记录一次docker导致宿主机重启故障解决方法

    记录一次docker导致宿主机重启故障解决方法 参考文章: (1)记录一次docker导致宿主机重启故障解决方法 (2)https://www.cnblogs.com/caidingyu/p/1215 ...

  2. docker 导致宿主机重启的解决方法

    宿主机操作系统为centos 7.4 在k8s运行一段时间报错: containerd: time="2019-12-19T21:50:49.070815105Z" level=i ...

  3. linux docker重启nginx,记录一次docker导致宿主机重启故障解决方法

    操作系统环境:CentOS Linux release 7.4.1708 (Core) 内核版本:3.10.0-693.el7.x86_64 查看系统日志/var/log/messages Jan 5 ...

  4. VMware打开虚拟机,会立即导致宿主机重启

    运行 VMware 时,不时的会遇到在重启虚拟机时导致宿主机重启, (1)只要 VMware 的 CPU 设置中,"每个处理器的内核数量"为"1"就不会导致该问 ...

  5. Docker容器开机自动启动(在宿主机重启后或者Docker服务重启后)

    一.环境介绍 系统版本:CentOS6.7 X64 内核版本:2.6.32-573.18.1.el6.x86_64 二.测试过程 使用在Docker run的时候使用--restart参数来设置. n ...

  6. nova 宿主机重启自动恢复虚拟机运行状态

    1. 宿主机重启自动恢复虚拟机运行状态 1.1 参数描述与默认值 # Whether to start guests that were running before the host reboote ...

  7. docker与宿主机共享内存通信

    docker与宿主机共享内存通信 docker中的进程要与宿主机使用共享内存通信,需要在启动容器的时候指定"–ipc=host"选项.然后再编写相应的共享内存的程序,一个跑在宿主机 ...

  8. docker 连接宿主机的 MySQL

    docker 连接宿主机的 MySQL 本文地址:https://blog.lucien.ink/archives/505 在实际生产过程中,docker 内的服务有时需要连接宿主机的 MySQL,在 ...

  9. docker需要linux内核版本,docker与宿主机内核版本

    1. 楼主的问题1我也产生过,虽然问法不一样,起初我也是想,如果就是运行在centos的服务器上,是否还需要FROM cetnos.但是既然所有书上一开始都说了要pull centos(ubuntu) ...

最新文章

  1. 【C++】46.宏定义##连接符和符#的使用
  2. ArcGIS API for JavaScript Bookmarks(书签)
  3. 开源、免费、企业级的SiteServer CMS .NET CORE 7.0 预览版发布
  4. mysql 终端模拟_mysql客户端模拟脏读、幻读和可重复读
  5. raid卡组不同raid_RAID 类型介绍
  6. 批量下载,多文件压缩打包zip下载
  7. 应用程序、虚拟目录、应用程序池
  8. Android 系统(260)---Android 读取SIM卡参数
  9. 剑指offer01--二叉树的最近公共祖先
  10. 【NOIP2015】【Vijos1979】信息传递(有向图最小环大小)
  11. C++两个类互相引用,如何处理最好
  12. 学校做计算机教室锐捷,锐捷“云课堂”:先改变桌面云,再改变教室
  13. 微型计算机系统评课,微机课评课稿.pdf
  14. python是一门_人人用Python 篇一:Python是一门人人可掌握的手艺
  15. [11g](ALTER SYSTEM SUSPEND)Suspending and Resuming a Database
  16. Spring securty<三> 认证案例代码
  17. 登入拼多多显示服务器请求失败,拼多多商家后台登录打不开?
  18. 条码旋转后打印不清楚
  19. optics算法matlab实现,OPTICS聚类算法的matlab实现
  20. html+css常见面试问题汇总

热门文章

  1. 燕山大学计算机研究生录取名单,燕山大学2020年硕士研究生调剂生拟录取名单公示...
  2. org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type ‘****‘
  3. 离线下载chrome
  4. Reddit的发帖注意事项和技巧
  5. 蓝桥杯javaB组穿越雷区
  6. 个人怎么做自适应网站
  7. 关于报错FAILURE: Build failed with an exception.
  8. 又回到最初的起点,记忆中你青涩的脸,我们终于来到了这一天
  9. Python提示:Consider using the `--user` option or check the permissions.
  10. Matlab常用函数(control)