跟踪fork: Resource temporarily unavailable的原因

1) 利用systemtap跟踪fork函数的内核执行流:

linux-d4xo-2:~/temp/stap_test # cat fork_monitor.stp
probe kernel.statement(“copy_process@…/kernel/fork.c:*”)
{
printf("%s\n", pp());
}

开启此监控程序:

stap fork_monitor.stp -o fork_monitor.log

2)另开一个shell,创建大量进程

for ((i=1; i<=13000; i++))
do
sleep 300 &
done

当大于12000以后很快就可以看到以下输出。此时强制重启系统。

-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: No child processes
-bash: fork: retry: No child processes
-bash: fork: retry: No child processes
-bash: fork: retry: No child processes

3) 重启后vim打开fork_monitor.log,可以找到到出错时的执行流程。

对比源代码,可以看到从1557行跳转到了1662行,然后顺序执行到1701返回error。
因此是cgroup_can_fork行数返回了非零值。

kernel.statement(“copy_process@…/kernel/fork.c:1547”)
kernel.statement(“copy_process@…/kernel/fork.c:1556”)
kernel.statement(“copy_process@…/kernel/fork.c:1557”)
kernel.statement(“copy_process@…/kernel/fork.c:1662”)
kernel.statement(“copy_process@…/kernel/fork.c:1663”)
kernel.statement(“copy_process@…/kernel/fork.c:1665”)
kernel.statement(“copy_process@…/kernel/fork.c:1667”)
kernel.statement(“copy_process@…/kernel/fork.c:1670”)
kernel.statement(“copy_process@…/kernel/fork.c:1672”)
kernel.statement(“copy_process@…/kernel/fork.c:1673”)
kernel.statement(“copy_process@…/kernel/fork.c:1675”)
kernel.statement(“copy_process@…/kernel/fork.c:1676”)
kernel.statement(“copy_process@…/kernel/fork.c:1678”)
kernel.statement(“copy_process@…/kernel/fork.c:1680”)
kernel.statement(“copy_process@…/kernel/fork.c:1682”)
kernel.statement(“copy_process@…/kernel/fork.c:1684”)
kernel.statement(“copy_process@…/kernel/fork.c:1688”)
kernel.statement(“copy_process@…/kernel/fork.c:1691”)
kernel.statement(“copy_process@…/kernel/fork.c:1696”)
kernel.statement(“copy_process@…/kernel/fork.c:1697”)
kernel.statement(“copy_process@…/kernel/fork.c:1699”)
kernel.statement(“copy_process@…/kernel/fork.c:1701”)
kernel.statement(“copy_process@…/kernel/fork.c:1702”)

1550 /*
1551 * Ensure that the cgroup subsystem policies allow the new process to be
1552 * forked. It should be noted the the new process’s css_set can be changed
1553 * between here and cgroup_post_fork() if an organisation operation is in
1554 * progress.
1555 */
1556 retval = cgroup_can_fork(p, cgrp_ss_priv);
1557 if (retval)
1558 goto bad_fork_free_pid;
1559

1660 bad_fork_free_pid:
1661 threadgroup_change_end(current);
1662 if (pid != &init_struct_pid)
1663 free_pid(pid);
1664 bad_fork_cleanup_thread:
1665 exit_thread§;
1666 bad_fork_cleanup_io:
1667 if (p->io_context)
1668 exit_io_context§;

1695 bad_fork_cleanup_count:
1696 atomic_dec(&p->cred->user->processes);
1697 exit_creds§;
1698 bad_fork_free:
1699 free_task§;
1700 fork_out:
1701 return ERR_PTR(retval);
1702 }

4)cgroup_can_fork 会调用pids_can_fork, 并从/var/log/messages中找到对应的error messages

02021-01-07T20:02:18.654251+08:00 linux-d4xo-2 kernel: [ 1150.704919] cgroup: fork rejected by pids controller in /user.slice/user-0.slice/session-4.scope
2021-01-07T20:02:55.145436+08:00 linux-d4xo-2 kernel: [ 1187.198910] cgroup: fork rejected by pids controller in /user.slice/user-0.slice/session-1.scope

219 static int pids_can_fork(struct task_struct *task, void **priv_p)
220 {
221 struct cgroup_subsys_state *css;
222 struct pids_cgroup pids;
223 int err;
224
225 css = task_css_check(current, pids_cgrp_id, true);
226 pids = css_pids(css);
227 err = pids_try_charge(pids, 1);
228 if (err) {
229 /
Only log the first time events_limit is incremented. */
230 if (atomic64_inc_return(&pids->events_limit) == 1) {
231 pr_info(“cgroup: fork rejected by pids controller in “);
232 pr_cont_cgroup_path(css->cgroup);
233 pr_cont(”\n”);
234 }
235 cgroup_file_notify(&pids->events_file);
236 }
237 return err;
238 }

5)找到对应的文件,限制值为12288.

linux-d4xo-2:/sys/fs/cgroup/pids/user.slice/user-0.slice # cat /sys/fs/cgroup/pids/user.slice/user-0.slice/pids.max
12288
搜索网络关于pids.max的信息,在机器上找到如下systemd的默认配置信息, 如果需要调整需要修改这个文件然后重启系统生效。
/etc/systemd/logind.conf:35:#UserTasksMax=12288

man logind.conf
UserTasksMax=
Sets the maximum number of OS tasks each user may run concurrently. This controls the TasksMax= setting of the per-user slice unit, see
systemd.resource-control(5) for details. Defaults to 12288 (12K).

这是源代码对pids.max的说明,更多例子参考内核文档Documentation/cgroups/pids.txt

1 /*
2 * Process number limiting controller for cgroups.
3 *
4 * Used to allow a cgroup hierarchy to stop any new processes from fork()ing
5 * after a certain limit is reached.
6 *
7 * Since it is trivial to hit the task limit without hitting any kmemcg limits
8 * in place, PIDs are a fundamental resource. As such, PID exhaustion must be
9 * preventable in the scope of a cgroup hierarchy by allowing resource limiting
10 * of the number of tasks in a cgroup.
11 *
12 * In order to use the pids controller, set the maximum number of tasks in
13 * pids.max (this is not available in the root cgroup for obvious reasons). The
14 * number of processes currently in the cgroup is given by pids.current.
15 * Organisational operations are not blocked by cgroup policies, so it is
16 * possible to have pids.current > pids.max. However, it is not possible to
17 * violate a cgroup policy through fork(). fork() will return -EAGAIN if forking
18 * would cause a cgroup policy to be violated.
19 *
20 * To set a cgroup to have no limit, set pids.max to “max”. This is the default
21 * for all new cgroups (N.B. that PID limits are hierarchical, so the most
22 * stringent limit in the hierarchy is followed).
23 *
24 * pids.current tracks all child cgroup hierarchies, so parent/pids.current is
25 * a superset of parent/child/pids.current.
26 *
27 * Copyright © 2015 Aleksa Sarai cyphar@cyphar.com
28 *
29 * This file is subject to the terms and conditions of version 2 of the GNU
30 * General Public License. See the file COPYING in the main directory of the
31 * Linux distribution for more details.
32 */

跟踪fork: Resource temporarily unavailable的原因相关推荐

  1. “BASH: FORK: RESOURCE TEMPORARILY UNAVAILABLE”的解决方案

    这两天,我登陆到一台服务器上,偶尔就出现"bash: fork: Resource temporarily unavailable"的提示,这是什么命令都不能用,但偶尔过一会就好了 ...

  2. .git/hooks/commit-msg: fork: Resource temporarily unavailable error: bogus commit object 00000000000

    git提交代码的时候遇到了这个问题 .git/hooks/commit-msg: fork: Resource temporarily unavailable error: bogus commit ...

  3. -bash:fork:Resource temporarily unavailable

    Java代码   出现这个问题的原因是linux用户的连接数设置的太小,只要修改max user processes就可以    www.2cto.com     设置各linux 用户的最大进程数, ...

  4. 故障:fork failed:Resource Temporarily Unavailable解决方案

    故障:fork failed:Resource Temporarily Unavailable解决方案 在一次crontab bkapp.txt导入N多定时任务时候,该用户无法执行任何命令,再ssh连 ...

  5. psql: could not fork new process for connection: Resource temporarily unavailable

    业务无法和数据库建立连接,上去查看,切换用户就报错 su - postgres su: cannot set user id: Resource temporarily unavailable 看资源 ...

  6. Samba amp; Nginx - Resource temporarily unavailable

    先说说本人的开发环境:Win7 + Editplus + VMware(Centos+Samba+Nginx).用Samba在Centos上把web文件夹(如www)共享,然后在Win7上訪问这个文件 ...

  7. 关于Ubuntu中 E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)解决方案

    在Ubuntu中,有时候运用sudo  apt-get install 安装软件时,会出现一下的情况 E: Could not get lock /var/lib/dpkg/lock - open ( ...

  8. E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)

    问题详细提示如下: 当你执行sudo apt-get XXX这种命令时出现类似下面的输出错误: E: Could not get lock /var/lib/dpkg/lock - open (11 ...

  9. php-hyperf 使用Saber报 HTTP -4 Unknown: Linux Code 11: Resource temporarily unavailable

    hyperf 使用Saber报 HTTP -4 Unknown: Linux Code 11: Resource temporarily unavailable 原因:swoole的一个插件的错误 , ...

  10. E:Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)

    出现这个问题的原因可能是有另外一个程序正在运行,导致资源被锁不可用.而导致资源被锁的原因,可能是上次安装时没正常完成,而导致出现此状况. 解决方法:输入以下命令 sudo rm /var/cache/ ...

最新文章

  1. 20162311 算法复杂度-3
  2. 遥感在计算机领域的应用,遥感技术在测绘工作中的应用分析
  3. Hyper-v Server动态内存
  4. 2015蓝桥杯省赛---java---B---2(立方变自身)
  5. [css] 用CSS绘制一个三角形
  6. Nginx-ingress部署及使用
  7. 在凡客四个月的工作总结
  8. 自己动手写DB数据库框架(增)
  9. 有关linux的GPG签名验证错误的解决方法。
  10. Normalize.css和Reset CSS有什么区别?
  11. Remoting调用的用户名密码问题
  12. 省城两日游,凉透了。。。
  13. highchart 曲线图
  14. 关于等价鞅、反等价鞅、剀利公式、赌徒输光定理(非常有启发意义)
  15. 小程序体验版无法显示内容
  16. swift语言实战晋级-第9章 游戏实战-跑酷熊猫-9-10 移除平台与视差滚动
  17. [附源码]计算机毕业设计小太阳幼儿园学生管理系统Springboot程序
  18. NSIS:迅雷5.8.6.600自由定制版脚本及下载
  19. wmi 计算机应用程序,WMI 提供程序宿主程序 (Wmiprvse.exe) 已安装 NLB 功能的基于 Windows Server 2008 的计算机上可能会崩溃...
  20. 沪深A股分析数据机构持股信息API接口(JSON标准格式,Get请求方式)

热门文章

  1. envi5.6处理gf3(SAR)详细过程记录
  2. 计算机有网络却不能上网,电脑有网络,但是浏览器不能上网怎么办
  3. jQuery 3D图片切换动画
  4. 区块链钱包,新一代支付宝?|筱静观察
  5. Python模拟登陆强智教务以及使用(持续更新中)
  6. 个人博客系统中的评论功能设计
  7. 职场必备:十句外企 office 常用英语
  8. failed to get reply to handshake packet
  9. Ubuntu上安装Chrome浏览器
  10. android7.1 保存图片到系统图库