Linux 内核创世与创生

Linux宇宙诞生之时,创建了三个重要进程,三个进程的PID分别为0,1,2. 做一个不太恰当的比喻,你可以现象成组件一个创业团队,第一步需要找到CEO,CTO,CFO,有管理,有计数,有钱了,啥事都好办,可以继续发展壮大,三个角色就相当与LINUX内核的这三个初创进程。首先,由于初始从GRUB或者UBOOT跳入时刻没有进程存在,bringup阶段首先捏造了PID为0的swapper进程,对应task_struct是静态的,也就是init_task,有了第一个进程之后就好办了,如同女娲创世造人一般,剩下的进程都是按照init_task swapper进程的样子clone的(通过内核fork).swapper进程最后演化为内核idle进程,它是如此特殊,以至于用任何工具,或者/proc文件系统,都看不到它的影子,但是它是实实在在存在的。

swaper完成了自己的任务,创建了kernel thread init进程和kthreadd进程,就去养老了。把后面的工作留给了后继者。其中kernel_thread init进程完成了内核各类驱动的初始化,你看到的modue_init发起的内核驱动初始化调用,多半都是由于kernel init进程发起的,kenrel_thread最终不满足于内核,在执行的最后阶段,完成了内核阶段的使命之后,init终于破土而出,成为了用户空间的第一个进程,化身了另一个用户宇宙的创世者。而kthreadd进程则继续坚守在内核中,负责创造一个个默默无闻,兢兢业业内核态工作线程,你写驱动用到的workqueue,软中断,RCU等等等等线程,其父亲都是kthreadd进程。子又生孙,孙又生子;子又有子,子又有孙; 子子孙孙无穷匮也。

内核的创世很伟大,仅仅通过三个进程,便为我们创建了一个丰富的,可交互的世界。

每一个进程都有一个唯一的标识,叫做PID,前面说到,系统启动时“手工”建立的swapper进程的PID为0,以后建立新进程时,使用前一个进程的PID+1,作为新进程的PID,PID是一个数字,用户可以通过/proc/sys/kernel/pid_max文件来设置最大的PID。

在多线程的操作系统中,一个进程拥有一个或多个线程,进程有进程ID,这些线程又有不同的线程ID,POSIX标准规定对于同一个进程的多个线程,有一个进程ID,多个线程ID,这样便可以根据PID向整个进程发送信号。但是在LINUX中,每个进程都对应一个struct task_struct结构,当建立线程时,其实也是建立一个struct task_struct结构,所以 task_struct有双重身份,既可以作为进程的TCB对象,也可以作为线程对象,这样即使同一个进程的多个线程,他们的PID都不相同,为此又定义了TGID,其实TGID才是真正意义上的进程ID,当创建一个进程时,该进程的PID和TGID一致。进程和线程归属同一个线程组。而PID==TGID的线程则属于thread group header,所以TGID是真正意义上的PID,而PID只是TID(thread ID).

进程创建

0号进程自不必多言,它是内核bringup阶段手工“捏造”出来的,没有parent,用户态进程通过系统调用clone实现,也无需赘述,只有内核线程kthreadd的子子孙孙,是由多个接口创建,这些接口之间是有一定关系的,常用的kthead_run,kthread_create几种接口不外乎下面几种,实现与被实现,调用与被调用,封装与被封装。

进程0有两个孩子,分别是PID为1的init进程和PID为2的kthreadd进程。

进程状态变迁/进程状态机

通过task_struct->state和task_struct->exit_state的不同组合,内和进程主要有如下几种状态。

TASK INTERRUPBLE状态无法接受并处理信号,它是不可中断的睡眠状态,不会处理信号,无法用kill命令关闭处于TASK_UNINTERRUPTBLE的进程。中断处于这种状态的进程是不合适的。因为它可能正在完成某些重要的任务。当它所等待的事件发生时,进程将被显式的唤醒。

内核中msleep有两种实现,默认的实现是UNINTERRUPTBLE的,这一点要留意。

调用sleep进入休眠的进程处于TASK_INTERRUPTBLE浅睡眠的状态,可以响应信号。可以被KILL信号杀死。

TASK_UNINTERRUPTBLE虽然很强大,无法被唤醒,但是如果给它增加一个“累赘",就会变成可以唤醒的进程状态,这个状态名字叫做TASK_KILLABLE,表示是可以被SIGKILL杀死的,增加的标志叫做TASK_WAKEKILL。

怎么做到的呢?由于唤醒判断条件为或,所以增加了标志的TASK_KILLABLE虽然无法通过TASK_UNINTERRUPTBLE唤醒,但是仍然可以通过TASK_WAKEKILL标志唤醒。再次回到唤醒逻辑,可以看到当resume flag被置位后,带有TASK_WAKEKILL标志的进程就可以被唤醒了。

中度睡眠的典型唤醒路径如下图所示:

睡眠层次总结

唤醒睡眠状态任意的线程可以用wake_up_process,它可和唤醒处于深度睡眠状态的线程。

 状态转化

状态之间的转化以及转化条件如下:

进程组与会话

进程组

什么是进程组?

  • 进程组:一组协同工作或关联进程的集合,每个进程组有ID(PGID)
  • 每个进程属于一个进程组,每一个进程组有一个进程组长,该进程组长ID(PID)与进程组ID(PGID)相同
  • 一个信号可以发送给进程组的所有进程、让所有进程终止、暂停或继续运行.

会话

什么是会话?

会话是一个或多个进程组的集合

  • 当用户登录系统时,登录进程会为这个用户创建一个新的会话(session)
  • shell进程(如bash)作为会话的第一个进程,称为会话进程(session leader)
  • 会话的PID(SID):等于会话首进程的PID
  • 会话会分配给用户一个控制终端(只能有一个),用于处理用户的输入输出
  • 一个会话包括了该登录用户的所有活动
  • 会话中的进程由一个前台进程组和N个后台进程组构成

kernel hack module code:

source file:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/stat.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/seq_file.h>
#include <linux/sched/signal.h>
#include <linux/proc_fs.h>
#include <linux/pid.h>MODULE_AUTHOR("zlcao");
MODULE_LICENSE("GPL");int seqfile_debug_mode = 0;
module_param(seqfile_debug_mode, int, 0664);// 开始输出任务列表
// my_seq_ops_start()的返回值,会传递给my_seq_ops_next()的v参数
static void *my_seq_ops_start(struct seq_file *m, loff_t *pos)
{loff_t index = *pos;struct task_struct *task;printk("%s line %d, index %lld.count %ld, size %ld here.\n", __func__, __LINE__, index, m->count, m->size);if(seqfile_debug_mode == 0) {// 如果缓冲区不足, seq_file可能会重新调用start()函数,// 并且传入的pos是之前已经遍历到的位置,// 这里需要根据pos重新计算开始的位置for_each_process(task) {if (index-- == 0) {return task;}}} else if(seqfile_debug_mode == 1) {return NULL + (*pos == 0);} else if(seqfile_debug_mode == 2) {return NULL + (*pos == 0);} else if(seqfile_debug_mode == 3) {return NULL + (*pos == 0);} else {return NULL + (*pos == 0);}return NULL;
}// 继续遍历, 直到my_seq_ops_next()放回NULL或者错误
static void *my_seq_ops_next(struct seq_file *m, void *v, loff_t *pos)
{struct task_struct *task = NULL;if(seqfile_debug_mode == 0) {task = next_task((struct task_struct *)v);// 这里加不加好像都没有作用++ *pos;// 返回NULL, 遍历结束if(task == &init_task) {return NULL;}} else if(seqfile_debug_mode == 1) {++ *pos;} else if(seqfile_debug_mode == 2) {++ *pos;} else if(seqfile_debug_mode == 3) {++ *pos;} else {++ *pos;}return task;
}// 遍历完成/出错时seq_file会调用stop()函数
static void my_seq_ops_stop(struct seq_file *m, void *v)
{}// 此函数将数据写入`seq_file`内部的缓冲区
// `seq_file`会在合适的时候把缓冲区的数据拷贝到应用层
// 参数@V是start/next函数的返回值
static int my_seq_ops_show(struct seq_file *m, void *v)
{struct task_struct * task = NULL;struct task_struct * p = NULL;struct file *file = m->private;if(seqfile_debug_mode == 0) {seq_puts(m, " file=");seq_file_path(m, file, "\n");seq_putc(m, ' ');task = (struct task_struct *)v;seq_printf(m, "PID=%u, task: %s, index=%lld, read_pos=%lld\n", task->tgid, task->comm, m->index, m->read_pos);} else if(seqfile_debug_mode == 1) {struct task_struct *g, *p;static int oldcount = 0;static int entercount = 0;char *str;printk("%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);seq_printf(m, "%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);rcu_read_lock();for_each_process_thread(g, p) {if(list_empty(&p->tasks)) {str = "empty";} else {str = "not empty";}seq_printf(m, "process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. tasks->prev = %p, tasks->next = %p, p->tasks=%p, %s.\n",g->comm, task_pid_nr(g), task_cpu(g), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(g), get_nr_threads(p), p->tasks.prev, p->tasks.next, &p->tasks, str);if(oldcount == 0 || oldcount != m->size) {printk("%s line %d, m->count %ld, m->size %ld.\n", __func__, __LINE__, m->count, m->size);oldcount = m->size;}}rcu_read_unlock();} else if(seqfile_debug_mode == 2) {for_each_process(task) {struct pid *pgrp = task_pgrp(task);seq_printf(m, "Group Header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(pgrp, PIDTYPE_PGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d.\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p));} while_each_pid_task(pgrp, PIDTYPE_PGID, p);}} else if (seqfile_debug_mode == 3) {for_each_process(task) {struct pid *session = task_session(task);seq_printf(m, "session header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(session, PIDTYPE_SID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d.\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p));} while_each_pid_task(pgrp, PIDTYPE_SID, p);}} else if(seqfile_debug_mode == 4) {struct task_struct *thread, *child;for_each_process(task) {seq_printf(m, "process %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));for_each_thread(task, thread) {list_for_each_entry(child, &thread->children, sibling) {seq_printf(m, "      thread %s(%d,cpu%d) child %s(%d,cpu%d),threadnum %d, %d.\n",thread->comm, task_pid_nr(thread), task_cpu(thread), \child->comm, task_pid_nr(child), task_cpu(child), \get_nr_threads(thread), get_nr_threads(child));}}}} else {printk("%s line %d,cant be here, seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);}return 0;
}static const struct seq_operations my_seq_ops = {.start  = my_seq_ops_start,.next   = my_seq_ops_next,.stop   = my_seq_ops_stop,.show   = my_seq_ops_show,
};static int proc_seq_open(struct inode *inode, struct file *file)
{int ret;struct seq_file *m;ret = seq_open(file, &my_seq_ops);if(!ret) {m = file->private_data;    m->private = file;}return ret;
}static ssize_t proc_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos)
{char debug_string[16];int debug_no;memset(debug_string, 0x00, sizeof(debug_string));if (count >= sizeof(debug_string)) {printk("%s line %d, fata error, write count exceed max buffer size.\n", __func__, __LINE__);return -EINVAL;}if (copy_from_user(debug_string, buffer, count)) {printk("%s line %d, fata error, copy from user failure.\n", __func__, __LINE__);return -EFAULT;}if (sscanf(debug_string, "%d", &debug_no) <= 0) {printk("%s line %d, fata error, read debugno failure.\n", __func__, __LINE__);return -EFAULT;}seqfile_debug_mode = debug_no;//printk("%s line %d, debug_no %d.\n", __func__, __LINE__, debug_no);return count;
}static ssize_t proc_seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
{ssize_t ret;printk("%s line %d enter, fuck size %lld size %ld.\n", __func__, __LINE__, *ppos, size);ret = seq_read(file, buf, size, ppos);printk("%s line %d exit, fuck size %lld size %ld,ret = %ld.\n", __func__, __LINE__, *ppos, size, ret);return ret;
}static struct file_operations seq_proc_ops = {.owner      = THIS_MODULE,.open        = proc_seq_open,.release   = seq_release,.read        = proc_seq_read,.write     = proc_seq_write,.llseek       = seq_lseek,.unlocked_ioctl = NULL,
};static struct proc_dir_entry * entry;
static int proc_hook_init(void)
{printk("%s line %d, init. seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);entry = proc_create("dumptask", 0644, NULL, &seq_proc_ops);//entry = proc_create_seq("dumptask", 0644, NULL, &my_seq_ops);return 0;
}static void proc_hook_exit(void)
{proc_remove(entry);printk("%s line %d, exit.\n", __func__, __LINE__);return;
}module_init(proc_hook_init);
module_exit(proc_hook_exit);

Makefile

ifneq ($(KERNELRELEASE),)
obj-m:=seqfile.o
else
KERNELDIR:=/lib/modules/$(shell uname -r)/build
PWD:=$(shell pwd)
default:$(MAKE) -C $(KERNELDIR) M=$(PWD) modulesclean:rm -rf *.o *.mod.c *.mod.o *.ko *.symvers *.mod .*.cmd *.order
endif

install, the module support 3 dump method.

sudo insmod seqfile.ko seqfile_debug_mode=0

cat /proc/dumpstack

sudo insmod seqfile.ko seqfile_debug_mode=1

sudo insmod seqfile.ko seqfile_debug_mode=2

conclusion:

系统中所有的进程都组织在init_task的tasks链表下面,每个进程的线程组织在每个进程task_sturct->signal的链表下,如下图所示

加入子进程的real_parent逻辑关系后,为下图所示:

the related implention in code:

the task struct structure member tasks are used to link all the system process together from init_task, but actually, the  task struct member ->tasks in thread are not used. the should be null, but after get the value verbose by above module, you will find the are not empty in thread as belows:

but why? may be this is inherent from the parent,but not used in child process, so it would be inherent from the parent, this not mean the are used. to prove this, we init the task_struct->task in copy_process function:

recompile the kernel and launch the test, you will find all the thread is empty,only the process group leader which link to init_task as process are not empty,说明上面所制的图是符合实际情况的。

获取进程内线程列表的方式:

上面介绍了通过struct task_struct->signal->thread_head成员获取一个进程线程列表的方式,其实除了这种方式,还有另一个方式,看代码:

如同singal成员的作用,group_leader似乎更加符合进程和线程的定位。并且内核中提供的处理对应链表的宏:

修改代码,增加新的case seqfile_debug_mode=5.

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/stat.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/seq_file.h>
#include <linux/sched/signal.h>
#include <linux/proc_fs.h>
#include <linux/pid.h>MODULE_AUTHOR("zlcao");
MODULE_LICENSE("GPL");int seqfile_debug_mode = 0;
module_param(seqfile_debug_mode, int, 0664);// 开始输出任务列表
// my_seq_ops_start()的返回值,会传递给my_seq_ops_next()的v参数
static void *my_seq_ops_start(struct seq_file *m, loff_t *pos)
{loff_t index = *pos;struct task_struct *task;printk("%s line %d, index %lld.count %ld, size %ld here.\n", __func__, __LINE__, index, m->count, m->size);if(seqfile_debug_mode == 0) {// 如果缓冲区不足, seq_file可能会重新调用start()函数,// 并且传入的pos是之前已经遍历到的位置,// 这里需要根据pos重新计算开始的位置for_each_process(task) {if (index-- == 0) {return task;}}} else {return NULL + (*pos == 0);}return NULL;
}// 继续遍历, 直到my_seq_ops_next()放回NULL或者错误
static void *my_seq_ops_next(struct seq_file *m, void *v, loff_t *pos)
{struct task_struct *task = NULL;if(seqfile_debug_mode == 0) {task = next_task((struct task_struct *)v);// 这里加不加好像都没有作用++ *pos;// 返回NULL, 遍历结束if(task == &init_task) {return NULL;}} else {++ *pos;}return task;
}// 遍历完成/出错时seq_file会调用stop()函数
static void my_seq_ops_stop(struct seq_file *m, void *v)
{}// 此函数将数据写入`seq_file`内部的缓冲区
// `seq_file`会在合适的时候把缓冲区的数据拷贝到应用层
// 参数@V是start/next函数的返回值
static int my_seq_ops_show(struct seq_file *m, void *v)
{struct task_struct * task = NULL;struct task_struct * p = NULL;struct file *file = m->private;if(seqfile_debug_mode == 0) {seq_puts(m, " file=");seq_file_path(m, file, "\n");seq_putc(m, ' ');task = (struct task_struct *)v;seq_printf(m, "PID=%u, task: %s, index=%lld, read_pos=%lld\n", task->tgid, task->comm, m->index, m->read_pos);} else if(seqfile_debug_mode == 1) {struct task_struct *g, *p;static int oldcount = 0;static int entercount = 0;char *str;printk("%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);seq_printf(m, "%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);rcu_read_lock();for_each_process_thread(g, p) {if(list_empty(&p->tasks)) {str = "empty";} else {str = "not empty";}seq_printf(m, "process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. tasks->prev = %p, tasks->next = %p, p->tasks=%p, %s.\n",g->comm, task_pid_nr(g), task_cpu(g), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(g), get_nr_threads(p), p->tasks.prev, p->tasks.next, &p->tasks, str);if(oldcount == 0 || oldcount != m->size) {printk("%s line %d, m->count %ld, m->size %ld.\n", __func__, __LINE__, m->count, m->size);oldcount = m->size;}}rcu_read_unlock();} else if(seqfile_debug_mode == 2) {for_each_process(task) {struct pid *pgrp = task_pgrp(task);seq_printf(m, "Group Header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(pgrp, PIDTYPE_PGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d.\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p));} while_each_pid_task(pgrp, PIDTYPE_PGID, p);}} else if (seqfile_debug_mode == 3) {for_each_process(task) {struct pid *session = task_session(task);seq_printf(m, "session header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(session, PIDTYPE_SID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d.\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p));} while_each_pid_task(pgrp, PIDTYPE_SID, p);}} else if(seqfile_debug_mode == 4) {struct task_struct *thread, *child;for_each_process(task) {seq_printf(m, "process %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));for_each_thread(task, thread) {list_for_each_entry(child, &thread->children, sibling) {seq_printf(m, "      thread %s(%d,cpu%d) child %s(%d,cpu%d),threadnum %d, %d.\n",thread->comm, task_pid_nr(thread), task_cpu(thread), \child->comm, task_pid_nr(child), task_cpu(child), \get_nr_threads(thread), get_nr_threads(child));}}}} else if(seqfile_debug_mode == 5) { struct task_struct *g, *t;do_each_thread (g, t) {seq_printf(m, "Process %s(%d cpu%d), thread %s(%d cpu%d), threadnum %d.\n", g->comm, task_pid_nr(g), task_cpu(g), t->comm, task_pid_nr(t), task_cpu(t), get_nr_threads(g));} while_each_thread (g, t);} else {printk("%s line %d,cant be here, seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);}return 0;
}static const struct seq_operations my_seq_ops = {.start  = my_seq_ops_start,.next   = my_seq_ops_next,.stop   = my_seq_ops_stop,.show   = my_seq_ops_show,
};static int proc_seq_open(struct inode *inode, struct file *file)
{int ret;struct seq_file *m;ret = seq_open(file, &my_seq_ops);if(!ret) {m = file->private_data;    m->private = file;}return ret;
}static ssize_t proc_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos)
{char debug_string[16];int debug_no;memset(debug_string, 0x00, sizeof(debug_string));if (count >= sizeof(debug_string)) {printk("%s line %d, fata error, write count exceed max buffer size.\n", __func__, __LINE__);return -EINVAL;}if (copy_from_user(debug_string, buffer, count)) {printk("%s line %d, fata error, copy from user failure.\n", __func__, __LINE__);return -EFAULT;}if (sscanf(debug_string, "%d", &debug_no) <= 0) {printk("%s line %d, fata error, read debugno failure.\n", __func__, __LINE__);return -EFAULT;}seqfile_debug_mode = debug_no;//printk("%s line %d, debug_no %d.\n", __func__, __LINE__, debug_no);return count;
}static ssize_t proc_seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
{ssize_t ret;printk("%s line %d enter, fuck size %lld size %ld.\n", __func__, __LINE__, *ppos, size);ret = seq_read(file, buf, size, ppos);printk("%s line %d exit, fuck size %lld size %ld,ret = %ld.\n", __func__, __LINE__, *ppos, size, ret);return ret;
}static struct file_operations seq_proc_ops = {.owner      = THIS_MODULE,.open        = proc_seq_open,.release   = seq_release,.read        = proc_seq_read,.write     = proc_seq_write,.llseek       = seq_lseek,.unlocked_ioctl = NULL,
};static struct proc_dir_entry * entry;
static int proc_hook_init(void)
{printk("%s line %d, init. seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);entry = proc_create("dumptask", 0644, NULL, &seq_proc_ops);//entry = proc_create_seq("dumptask", 0644, NULL, &my_seq_ops);return 0;
}static void proc_hook_exit(void)
{proc_remove(entry);printk("%s line %d, exit.\n", __func__, __LINE__);return;
}module_init(proc_hook_init);
module_exit(proc_hook_exit);

what is pgid and tgid.

From the literal, pgid is process group id, but tgid is thread group id.so you know the difference.

thread_group leader:

any process launched in shell would call setpgrp to set it self as a process group leader before execution the actual elf target.

demo about pgrp/session:

#include<stdio.h>
#include<unistd.h>
#include<stdlib.h>
#include<string.h>
#include<sys/wait.h>
#include<pthread.h>pthread_t pt1, pt2, pt3;
void *thr_fn(void *arg)
{pid_t pid;int counter = 600;pid = fork();if(pid == 0) {printf("%s line %d, child.\n", __func__, __LINE__);} else {printf("%s line %d, father.\n", __func__, __LINE__);}while(counter){printf("%s line %d pid %d.\n", __func__, __LINE__, getpid());sleep(1);counter --;}return NULL;
}void create_thread(void)
{pthread_create(&pt1, NULL, thr_fn, NULL);pthread_create(&pt2, NULL, thr_fn, NULL);pthread_create(&pt3, NULL, thr_fn, NULL);return;
}int main(void)
{pid_t pid;int status;pid = fork();if(pid == 0){printf("%s line %d, childpid %d.\n", __func__, __LINE__, getpid());create_thread();pthread_join(pt1, NULL);pthread_join(pt2, NULL);pthread_join(pt3, NULL);return 99;}else{printf("%s line %d, parent.\n", __func__, __LINE__);create_thread();wait(&status);printf("%s line %d, report child exit status %d.\n", __func__, __LINE__, WEXITSTATUS(status));}return 0;
}

about namespace

namespace is based on the PID management of kernel,can be illustrated as below:

在用户空间可以通过一个正整数来唯一标识一个进程(我们称这个正整数为pid number)。在引入容器之后,事情稍微复杂一点,pid这个正整数只能是唯一标识容器内的进程。也就是说,如果有容器1和容器2存在于系统中,那么可以同时存在两个pid等于a的进程,分别位于容器1和容器2。当然,进程也可以不在容器里,例如进程x和进程y,它们就类似传统的linux系统中的进程。当然,你也可以认为进程x和进程y位于一个系统级别的顶层容器0,其中包括进程x和进程y以及两个容器。同样的概念,容器2中也可以嵌套一个容器,从而形成了一个container hierarchy。

容器(linux container)是一个OS级别的虚拟化方法,基本上是属于纯软件的方法来实现虚拟化,开销小,量级轻,当然也有自己的局限。linux container主要应用了内核中的cgroup和namespace隔离技术,当然这些内容不是我们这份文档关心的,我们这里主要关心pid namespace。

当一个进程运行在linux OS之上的时候,它拥有了很多的系统资源,例如pid、user ID、网络设备、协议栈、IP以及端口号、filesystem hierarchy。对于传统的linux,这些资源都是全局性的,一个进程umount了某一个文件系统挂载点,改变了自己的filesystem hierarchy视图,那么所有进程看到的文件系统目录结构都变化了(umount操作被所有进程感知到了)。有没有可能把这些资源隔离开呢?这就是namespace的概念,而PID namespace就是用来隔离pid的地址空间的。

进程是感知不到pid namespace的,它只是知道能够通过getpid获取自己的ID,并不知道自己实际上被关在一个pid namespace的牢笼。从这个角度看,用户空间是简单而幸福的,内核空间就没有这么幸运了,我们需要使用复杂的数据结构来抽象这些形成层次结构的PID。

最后顺便说一句,上面的描述是针对pid而言的,实际上,tid、pgid和sid都是一样的概念,原来直接使用这些ID就可以唯一标识一个实体,现在我们需要用(pid namespace,ID)来唯一标识一个实体。

每个容器都有一个在自身看来PID为1的进程。

比如:

$ sudo docker container top b03e403e73a0
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                16800               16777               0                   08:36               ?                   00:00:00            bash
root                16887               16800               0                   08:36               ?                   00:00:00            top
$ sudo docker exec b03e403e73a0 ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 00:36 pts/0    00:00:00 bash
root        17     1  0 00:36 pts/0    00:00:00 top
root        30     0  0 00:43 ?        00:00:00 ps -ef

dkms auto probe and install module

如果要问 Linux 内核模块如何发布、安装。脑回路的第一反应肯定是 make && insmod。上述方法可以满足嵌入式场景,因为嵌入式产品的软件是整体发布,包括:内核、模块、软件等交付件。但是在 PC/服务器 领域,各个组件都是互相独立的,如果一个模块基于内核 A 编译并发布,那用户更改内核后,之前发布的内核模块就不能用了。所以,DELL 发布了 DKMS,全称 Dynamic Kernel Module System。可以做到内核变更后自动编译模块,适配新内核。may be you have met this during upgrade your system OS by your own compiled kernel.

we try dkms.

install dkms:

$ sudo apt install dkms

create dkms.conf file in module dir.

字段含义还比较清晰,设置模块名,编译方法。最重要的就是 AUTOINSTALL 字段,表示内核变更时要重新编译模块。

PACKAGE_NAME="seqfile"
PACKAGE_VERSION="1.0.0"
CLEAN="make clean"
MAKE[0]="make all"
BUILT_MODULE_NAME[0]="seqfile"
DEST_MODULE_LOCATION[0]="/updates"
AUTOINSTALL="yes"

after this, the module dir should be like this,目录符合 DKMS 规范

install dir

DKMS 模块代码目录位于 /usr/src/modulename-version,比如:/usr/src/seqfile-1.0.0,表示 seqfile 模块的 1.0.0 版本。then we copy the module dir to /usr/src director

modify the makefile,we did not use the kbuild module first stage.

#ifneq ($(KERNELRELEASE),)
obj-m:=seqfile.o
#else
KERNELDIR:=/lib/modules/$(shell uname -r)/build
PWD:=$(shell pwd)
all:$(MAKE) -C $(KERNELDIR) M=$(PWD) modulesclean:rm -rf *.o *.mod.c *.mod.o *.ko *.symvers *.mod .*.cmd *.order
#endif

add the seqfile to dkms management

$ sudo dkms add -m seqfile -v 1.0.0Creating symlink /var/lib/dkms/seqfile/1.0.0/source ->/usr/src/seqfile-1.0.0DKMS: add completed.
$ dkms status
seqfile, 1.0.0: added
virtualbox, 5.2.42, 5.4.0-131-generic, x86_64: installed
virtualbox, 5.2.42, 5.4.0-132-generic, x86_64: installed

compile:

the default target is "all", we run

$ sudo dkms build -m seqfile -v 1.0.0

install:

$ sudo dkms install -m seqfile -v 1.0.0
[sudo] password for e01156:Creating symlink /var/lib/dkms/seqfile/1.0.0/source ->/usr/src/seqfile-1.0.0DKMS: add completed.Kernel preparation unnecessary for this kernel.  Skipping...Building module:
cleaning build area...
make -j8 KERNELRELEASE=5.4.0-135-generic all...
cleaning build area...DKMS: build completed.seqfile.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.4.0-135-generic/updates/dkms/depmod.....DKMS: install completed.

the result

remove

$ sudo dkms remove -m seqfile -v 1.0.0 -k 5.4.0-132-generic

以上命令会将ko模块编译出来并拷贝到正确的位置上,但是并不会将模块insmod到系统中,之后需要手动insmod,或者用modprobe工具加载模块。通过strace跟踪发现,安装的确实正确的/lib/modules/5.4.0-135-generic/updates/dkms/seqfile.ko下的文件。

$ sudo strace -e trace=file modprobe seqfile
execve("/sbin/modprobe", ["modprobe", "seqfile"], 0x7ffd2bc5f1a8 /* 24 vars */) = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
stat("/etc/modprobe.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/etc/modprobe.d", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
newfstatat(3, "intel-microcode-blacklist.conf", {st_mode=S_IFREG|0644, st_size=154, ...}, 0) = 0
newfstatat(3, "blacklist-rare-network.conf", {st_mode=S_IFREG|0644, st_size=583, ...}, 0) = 0
newfstatat(3, "blacklist.conf", {st_mode=S_IFREG|0644, st_size=1667, ...}, 0) = 0
newfstatat(3, "blacklist-framebuffer.conf", {st_mode=S_IFREG|0644, st_size=697, ...}, 0) = 0
newfstatat(3, "blacklist-ath_pci.conf", {st_mode=S_IFREG|0644, st_size=325, ...}, 0) = 0
newfstatat(3, "alsa-base.conf", {st_mode=S_IFREG|0644, st_size=2507, ...}, 0) = 0
newfstatat(3, "dkms.conf", {st_mode=S_IFREG|0644, st_size=127, ...}, 0) = 0
newfstatat(3, "amd64-microcode-blacklist.conf", {st_mode=S_IFREG|0644, st_size=154, ...}, 0) = 0
newfstatat(3, "blacklist-modem.conf", {st_mode=S_IFREG|0644, st_size=156, ...}, 0) = 0
newfstatat(3, "blacklist-oss.conf", {st_mode=S_IFREG|0644, st_size=1059, ...}, 0) = 0
newfstatat(3, "iwlwifi.conf", {st_mode=S_IFREG|0644, st_size=347, ...}, 0) = 0
newfstatat(3, "qemu-system-x86.conf", {st_mode=S_IFREG|0644, st_size=27, ...}, 0) = 0
newfstatat(3, "blacklist-firewire.conf", {st_mode=S_IFREG|0644, st_size=210, ...}, 0) = 0
stat("/run/modprobe.d", 0x7ffeeaf43ae0) = -1 ENOENT (No such file or directory)
stat("/lib/modprobe.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/lib/modprobe.d", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
newfstatat(3, "blacklist_linux-hwe-5.4_5.4.0-135-generic.conf", {st_mode=S_IFREG|0644, st_size=1551, ...}, 0) = 0
newfstatat(3, "blacklist_linux-hwe-5.4_5.4.0-132-generic.conf", {st_mode=S_IFREG|0644, st_size=1551, ...}, 0) = 0
newfstatat(3, "systemd.conf", {st_mode=S_IFREG|0644, st_size=765, ...}, 0) = 0
newfstatat(3, "aliases.conf", {st_mode=S_IFREG|0644, st_size=655, ...}, 0) = 0
newfstatat(3, "fbdev-blacklist.conf", {st_mode=S_IFREG|0644, st_size=390, ...}, 0) = 0
openat(AT_FDCWD, "/lib/modprobe.d/aliases.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/alsa-base.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/amd64-microcode-blacklist.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/blacklist-ath_pci.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/blacklist-firewire.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/blacklist-framebuffer.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/blacklist-modem.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/blacklist-oss.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/blacklist-rare-network.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/blacklist.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modprobe.d/blacklist_linux-hwe-5.4_5.4.0-132-generic.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modprobe.d/blacklist_linux-hwe-5.4_5.4.0-135-generic.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/dkms.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modprobe.d/fbdev-blacklist.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/intel-microcode-blacklist.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/iwlwifi.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modules/5.4.0-135-generic/modules.softdep", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/modprobe.d/qemu-system-x86.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modprobe.d/systemd.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/proc/cmdline", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modules/5.4.0-135-generic/modules.dep.bin", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modules/5.4.0-135-generic/modules.alias.bin", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modules/5.4.0-135-generic/modules.symbols.bin", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/modules/5.4.0-135-generic/modules.builtin.bin", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/sys/module/seqfile/initstate", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/module/seqfile", 0x7ffeeaf43a60) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/module/seqfile/initstate", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/module/seqfile", 0x7ffeeaf43a60) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/modules/5.4.0-135-generic/updates/dkms/seqfile.ko", O_RDONLY|O_CLOEXEC) = 3
+++ exited with 0 +++

generate the deb package

sudo dkms mkdeb -m seqfile -v 1.0.0

$ sudo dkms mkdeb -m seqfile -v 1.0.0
Using /etc/dkms/template-dkms-mkdeb
copying template...
modifying debian/changelog...
modifying debian/compat...
modifying debian/control...
modifying debian/copyright...
modifying debian/dirs...
modifying debian/postinst...
modifying debian/prerm...
modifying debian/README.Debian...
modifying debian/rules...
copying legacy postinstall template...
Copying source tree...
Gathering binaries...Marking modules for 5.4.0-135-generic (x86_64) for archiving...Creating tarball structure to specifically accomodate binaries.Tarball location: /var/lib/dkms/seqfile/1.0.0/tarball//seqfile-1.0.0.dkms.tar.gzDKMS: mktarball completed.Copying DKMS tarball into DKMS tree...
Building binary package...dpkg-buildpackage: 警告: 使用超级用户命令dpkg-source --before-build seqfile-dkms-1.0.0fakeroot debian/rules clean
dh_clean: Compatibility levels before 9 are deprecated (level 7 in use)debian/rules buildfakeroot debian/rules binary
dh_installdirs: Compatibility levels before 9 are deprecated (level 7 in use)
dh_strip: Compatibility levels before 9 are deprecated (level 7 in use)
dh_compress: Compatibility levels before 9 are deprecated (level 7 in use)
dh_installdeb: Compatibility levels before 9 are deprecated (level 7 in use)
dh_shlibdeps: Compatibility levels before 9 are deprecated (level 7 in use)dpkg-genbuildinfo --build=binarydpkg-genchanges --build=binary >../seqfile-dkms_1.0.0_amd64.changes
dpkg-genchanges: info: binary-only upload (no source code included)dpkg-source --after-build seqfile-dkms-1.0.0DKMS: mkdeb completed.
Moving built files to /var/lib/dkms/seqfile/1.0.0/deb...
Cleaning up temporary files...

you can get the file included by the deb file with command:

dpkg-deb -c /var/lib/dkms/seqfile/1.0.0/deb/seqfile-dkms_1.0.0_amd64.deb

对于DKMS的应用,可以参考NVIDIA驱动

验证部分

1.进程和子进程的SESSION ID以及PARENT

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/stat.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/seq_file.h>
#include <linux/sched/signal.h>
#include <linux/proc_fs.h>
#include <linux/pid.h>MODULE_AUTHOR("zlcao");
MODULE_LICENSE("GPL");int seqfile_debug_mode = 0;
module_param(seqfile_debug_mode, int, 0664);// 开始输出任务列表
// my_seq_ops_start()的返回值,会传递给my_seq_ops_next()的v参数
static void *my_seq_ops_start(struct seq_file *m, loff_t *pos)
{loff_t index = *pos;struct task_struct *task;printk("%s line %d, index %lld.count %ld, size %ld here.\n", __func__, __LINE__, index, m->count, m->size);if(seqfile_debug_mode == 0) {// 如果缓冲区不足, seq_file可能会重新调用start()函数,// 并且传入的pos是之前已经遍历到的位置,// 这里需要根据pos重新计算开始的位置for_each_process(task) {if (index-- == 0) {return task;}}} else {return NULL + (*pos == 0);}return NULL;
}// 继续遍历, 直到my_seq_ops_next()放回NULL或者错误
static void *my_seq_ops_next(struct seq_file *m, void *v, loff_t *pos)
{struct task_struct *task = NULL;if(seqfile_debug_mode == 0) {task = next_task((struct task_struct *)v);// 这里加不加好像都没有作用++ *pos;// 返回NULL, 遍历结束if(task == &init_task) {return NULL;}} else {++ *pos;}return task;
}// 遍历完成/出错时seq_file会调用stop()函数
static void my_seq_ops_stop(struct seq_file *m, void *v)
{}// 此函数将数据写入`seq_file`内部的缓冲区
// `seq_file`会在合适的时候把缓冲区的数据拷贝到应用层
// 参数@V是start/next函数的返回值
static int my_seq_ops_show(struct seq_file *m, void *v)
{struct task_struct * task = NULL;struct task_struct * p = NULL;struct file *file = m->private;if(seqfile_debug_mode == 0) {seq_puts(m, " file=");seq_file_path(m, file, "\n");seq_putc(m, ' ');task = (struct task_struct *)v;struct pid *session = task_session(task);struct task_struct *tsk = pid_task(session, PIDTYPE_PID);if(task->flags & PF_KTHREAD) {seq_printf(m, "Kernel thread: PID=%u, task: %s, index=%lld, read_pos=%lld, %s.\n", task->tgid, task->comm, m->index, m->read_pos, tsk? "has session" : "no session");} else {seq_printf(m, "User thread: PID=%u, task: %s, index=%lld, read_pos=%lld %s.\n", task->tgid, task->comm, m->index, m->read_pos, tsk? "has session" : "no session");}} else if(seqfile_debug_mode == 1) {struct task_struct *g, *p;static int oldcount = 0;static int entercount = 0;char *str;printk("%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);seq_printf(m, "%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);rcu_read_lock();for_each_process_thread(g, p) {struct task_struct *session = pid_task(task_session(g), PIDTYPE_PID);struct task_struct *thread = pid_task(task_session(p), PIDTYPE_PID);struct task_struct *ggroup = pid_task(task_pgrp(g), PIDTYPE_PID);struct task_struct *pgroup = pid_task(task_pgrp(p), PIDTYPE_PID);struct pid * pid = task_session(g);if(list_empty(&p->tasks)) {str = "empty";} else {str = "not empty";}seq_printf(m, "process %s(pid %d tgid %d,cpu%d) thread %s(pid %d tgid %d,cpu%d),threadnum %d, %d. tasks->prev = %p, tasks->next = %p, p->tasks=%p, %s, process parent %s(pid %d tgid %d), thread parent%s(pid %d, tgid %d)",g->comm, task_pid_nr(g), task_tgid_nr(g), task_cpu(g), \p->comm, task_pid_nr(p), task_tgid_nr(p), task_cpu(p), \get_nr_threads(g), get_nr_threads(p), p->tasks.prev, p->tasks.next, &p->tasks, str, g->real_parent->comm, \task_pid_nr(g->real_parent),task_tgid_nr(g->real_parent), p->real_parent->comm, task_pid_nr(p->real_parent), task_tgid_nr(p->real_parent));if(ggroup) {seq_printf(m, "ggroup(pid %d tgid %d).", task_pid_nr(ggroup),task_tgid_nr(ggroup));}if(pgroup) {seq_printf(m, "pgroup(pid %d tgid %d).", task_pid_nr(pgroup),task_tgid_nr(pgroup));}seq_printf(m, "current smp processor id %d.", smp_processor_id());if(thread) {seq_printf(m, "thread session %s(%d).", thread->comm, task_pid_nr(thread));}if(session) {seq_printf(m, "process session %s(%d).", session->comm, task_pid_nr(session));}if(oldcount == 0 || oldcount != m->size) {printk("%s line %d, m->count %ld, m->size %ld.", __func__, __LINE__, m->count, m->size);oldcount = m->size;}if(pid){seq_printf(m, "pid task %p,pgid task %p, psid_task %p", pid_task(pid, PIDTYPE_PID), pid_task(pid, PIDTYPE_PGID), pid_task(pid, PIDTYPE_SID));seq_printf(m, "pid task %s,pgid task %s, psid_task %s", pid_task(pid, PIDTYPE_PID)->comm, pid_task(pid, PIDTYPE_PGID)->comm, pid_task(pid, PIDTYPE_SID)->comm);}seq_printf(m, "\n");}rcu_read_unlock();} else if(seqfile_debug_mode == 2) {for_each_process(task) {struct pid *pgrp = task_pgrp(task);seq_printf(m, "Group Header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(pgrp, PIDTYPE_PGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d.\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p));} while_each_pid_task(pgrp, PIDTYPE_PGID, p);}} else if (seqfile_debug_mode == 3) {for_each_process(task) {struct pid *session = task_session(task);struct task_struct *tsk = pid_task(session, PIDTYPE_PID);if(tsk){seq_printf(m, "session task %s(%d,cpu%d):", tsk->comm, task_pid_nr(tsk), task_cpu(tsk));}else {seq_printf(m, "process %s(%d,cpu%d) has no session task.", task->comm, task_pid_nr(task), task_cpu(task));}seq_printf(m, "session header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(session, PIDTYPE_SID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d, spidtask %s(%d,%d).\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), pid_task(session, PIDTYPE_SID)->comm, pid_task(session, PIDTYPE_SID)->tgid, pid_task(session, PIDTYPE_SID)->pid);if(pid_task(session, PIDTYPE_PID)) {seq_printf(m, "pidtask %s(%d,%d).\n", pid_task(session, PIDTYPE_PID)->comm, pid_task(session, PIDTYPE_PID)->tgid, pid_task(session, PIDTYPE_PID)->pid);}} while_each_pid_task(pgrp, PIDTYPE_SID, p);}} else if(seqfile_debug_mode == 4) {struct task_struct *thread, *child;for_each_process(task) {seq_printf(m, "process %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));for_each_thread(task, thread) {list_for_each_entry(child, &thread->children, sibling) {seq_printf(m, "      thread %s(%d,cpu%d) child %s(%d,cpu%d),threadnum %d, %d.\n",thread->comm, task_pid_nr(thread), task_cpu(thread), \child->comm, task_pid_nr(child), task_cpu(child), \get_nr_threads(thread), get_nr_threads(child));}}}} else if(seqfile_debug_mode == 5) { struct task_struct *g, *t;do_each_thread (g, t) {seq_printf(m, "Process %s(%d cpu%d), thread %s(%d cpu%d), threadnum %d.\n", g->comm, task_pid_nr(g), task_cpu(g), t->comm, task_pid_nr(t), task_cpu(t), get_nr_threads(g));} while_each_thread (g, t);} else if(seqfile_debug_mode == 6) {for_each_process(task) {struct pid *pid = task_pid(task);seq_printf(m, "Process %s(%d,cpu%d) pid %d, tgid %d:\n", task->comm, task_pid_nr(task), task_cpu(task), task_pid_vnr(task), task_tgid_vnr(task));do_each_pid_task(pid, PIDTYPE_TGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. pid %d, tgid %d\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), task_pid_vnr(p), task_tgid_vnr(p));} while_each_pid_task(pid, PIDTYPE_TGID, p);}} else if(seqfile_debug_mode == 7) {for_each_process(task) {struct pid *pid = task_pid(task);seq_printf(m, "Process %s(%d,cpu%d) pid %d, tgid %d:\n", task->comm, task_pid_nr(task), task_cpu(task), task_pid_vnr(task), task_tgid_vnr(task));do_each_pid_task(pid, PIDTYPE_PID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. pid %d, tgid %d\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), task_pid_vnr(p), task_tgid_vnr(p));} while_each_pid_task(pid, PIDTYPE_PID, p);}} else {printk("%s line %d,cant be here, seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);}return 0;
}static const struct seq_operations my_seq_ops = {.start  = my_seq_ops_start,.next   = my_seq_ops_next,.stop   = my_seq_ops_stop,.show   = my_seq_ops_show,
};static int proc_seq_open(struct inode *inode, struct file *file)
{int ret;struct seq_file *m;ret = seq_open(file, &my_seq_ops);if(!ret) {m = file->private_data; m->private = file;}return ret;
}static ssize_t proc_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos)
{char debug_string[16];int debug_no;memset(debug_string, 0x00, sizeof(debug_string));if (count >= sizeof(debug_string)) {printk("%s line %d, fata error, write count exceed max buffer size.\n", __func__, __LINE__);return -EINVAL;}if (copy_from_user(debug_string, buffer, count)) {printk("%s line %d, fata error, copy from user failure.\n", __func__, __LINE__);return -EFAULT;}if (sscanf(debug_string, "%d", &debug_no) <= 0) {printk("%s line %d, fata error, read debugno failure.\n", __func__, __LINE__);return -EFAULT;}seqfile_debug_mode = debug_no;//printk("%s line %d, debug_no %d.\n", __func__, __LINE__, debug_no);return count;
}static ssize_t proc_seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
{ssize_t ret;printk("%s line %d enter, fuck size %lld size %ld.\n", __func__, __LINE__, *ppos, size);ret = seq_read(file, buf, size, ppos);printk("%s line %d exit, fuck size %lld size %ld,ret = %ld.\n", __func__, __LINE__, *ppos, size, ret);return ret;
}static struct file_operations seq_proc_ops = {.owner      = THIS_MODULE,.open       = proc_seq_open,.release    = seq_release,.read       = proc_seq_read,.write      = proc_seq_write,.llseek     = seq_lseek,.unlocked_ioctl = NULL,
};static struct proc_dir_entry * entry;
static int proc_hook_init(void)
{printk("%s line %d, init. seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);entry = proc_create("dumptask", 0644, NULL, &seq_proc_ops);//entry = proc_create_seq("dumptask", 0644, NULL, &my_seq_ops);return 0;
}static void proc_hook_exit(void)
{proc_remove(entry);printk("%s line %d, exit.\n", __func__, __LINE__);return;
}module_init(proc_hook_init);
module_exit(proc_hook_exit);

进程内所有线程share相同的task_struct->files结构体

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/stat.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/seq_file.h>
#include <linux/sched/signal.h>
#include <linux/proc_fs.h>
#include <linux/pid.h>
#include <linux/pci.h>MODULE_AUTHOR("zlcao");
MODULE_LICENSE("GPL");int seqfile_debug_mode = 0;
module_param(seqfile_debug_mode, int, 0664);// 开始输出任务列表
// my_seq_ops_start()的返回值,会传递给my_seq_ops_next()的v参数
static void *my_seq_ops_start(struct seq_file *m, loff_t *pos)
{loff_t index = *pos;struct task_struct *task;printk("%s line %d, index %lld.count %ld, size %ld here.\n", __func__, __LINE__, index, m->count, m->size);if(seqfile_debug_mode == 0) {// 如果缓冲区不足, seq_file可能会重新调用start()函数,// 并且传入的pos是之前已经遍历到的位置,// 这里需要根据pos重新计算开始的位置for_each_process(task) {if (index-- == 0) {return task;}}} else {return NULL + (*pos == 0);}return NULL;
}// 继续遍历, 直到my_seq_ops_next()放回NULL或者错误
static void *my_seq_ops_next(struct seq_file *m, void *v, loff_t *pos)
{struct task_struct *task = NULL;if(seqfile_debug_mode == 0) {task = next_task((struct task_struct *)v);// 这里加不加好像都没有作用++ *pos;// 返回NULL, 遍历结束if(task == &init_task) {return NULL;}} else {++ *pos;}return task;
}// 遍历完成/出错时seq_file会调用stop()函数
static void my_seq_ops_stop(struct seq_file *m, void *v)
{}static int lookup_pci_devices(struct device *dev, void *data)
{struct seq_file *m = (struct seq_file *)data;struct pci_dev *pdev = to_pci_dev(dev);seq_printf(m, "vendor id 0x%x, device id 0x%x, devname %s.\n", pdev->vendor, pdev->device, dev_name(&pdev->dev));return 0;
}static int lookup_pci_drivers(struct device_driver *drv, void *data)
{struct seq_file *m = (struct seq_file *)data;seq_printf(m, "driver name %s.\n", drv->name);return 0;
}static int list_device_belongs_todriver(struct device *dev, void *p)
{struct seq_file *m = (struct seq_file *)p;struct pci_dev *pdev = to_pci_dev(dev);seq_printf(m, "vendor id 0x%x, device id 0x%x, devname %s.\n", pdev->vendor, pdev->device, dev_name(&pdev->dev));return 0;
}// 此函数将数据写入`seq_file`内部的缓冲区
// `seq_file`会在合适的时候把缓冲区的数据拷贝到应用层
// 参数@V是start/next函数的返回值
static int my_seq_ops_show(struct seq_file *m, void *v)
{struct task_struct * task = NULL;struct task_struct * p = NULL;struct file *file = m->private;if(seqfile_debug_mode == 0) {seq_puts(m, " file=");seq_file_path(m, file, "\n");seq_putc(m, ' ');task = (struct task_struct *)v;struct pid *session = task_session(task);struct task_struct *tsk = pid_task(session, PIDTYPE_PID);if(task->flags & PF_KTHREAD) {seq_printf(m, "Kernel thread: PID=%u, task: %s, index=%lld, read_pos=%lld, %s.\n", task->tgid, task->comm, m->index, m->read_pos, tsk? "has session" : "no session");} else {seq_printf(m, "User thread: PID=%u, task: %s, index=%lld, read_pos=%lld %s.\n", task->tgid, task->comm, m->index, m->read_pos, tsk? "has session" : "no session");}} else if(seqfile_debug_mode == 1) {struct task_struct *g, *p;static int oldcount = 0;static int entercount = 0;char *str;printk("%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);seq_printf(m, "%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);rcu_read_lock();for_each_process_thread(g, p) {struct task_struct *session = pid_task(task_session(g), PIDTYPE_PID);struct task_struct *thread = pid_task(task_session(p), PIDTYPE_PID);struct task_struct *ggroup = pid_task(task_pgrp(g), PIDTYPE_PID);struct task_struct *pgroup = pid_task(task_pgrp(p), PIDTYPE_PID);struct pid * pid = task_session(g);if(list_empty(&p->tasks)) {str = "empty";} else {str = "not empty";}seq_printf(m, "process %s(pid %d tgid %d,cpu%d) thread %s(pid %d tgid %d,cpu%d),threadnum %d, %d. tasks->prev = %p, tasks->next = %p, p->tasks=%p, %s, process parent %s(pid %d tgid %d), thread parent%s(pid %d, tgid %d, files %p\n)",g->comm, task_pid_nr(g), task_tgid_nr(g), task_cpu(g), \p->comm, task_pid_nr(p), task_tgid_nr(p), task_cpu(p), \get_nr_threads(g), get_nr_threads(p), p->tasks.prev, p->tasks.next, &p->tasks, str, g->real_parent->comm, \task_pid_nr(g->real_parent),task_tgid_nr(g->real_parent), p->real_parent->comm, task_pid_nr(p->real_parent), task_tgid_nr(p->real_parent), p->files);if(ggroup) {seq_printf(m, "ggroup(pid %d tgid %d).", task_pid_nr(ggroup),task_tgid_nr(ggroup));}if(pgroup) {seq_printf(m, "pgroup(pid %d tgid %d).", task_pid_nr(pgroup),task_tgid_nr(pgroup));}seq_printf(m, "current smp processor id %d.", smp_processor_id());if(thread) {seq_printf(m, "thread session %s(%d).", thread->comm, task_pid_nr(thread));}if(session) {seq_printf(m, "process session %s(%d).", session->comm, task_pid_nr(session));}if(oldcount == 0 || oldcount != m->size) {printk("%s line %d, m->count %ld, m->size %ld.", __func__, __LINE__, m->count, m->size);oldcount = m->size;}if(pid){seq_printf(m, "pid task %p,pgid task %p, psid_task %p", pid_task(pid, PIDTYPE_PID), pid_task(pid, PIDTYPE_PGID), pid_task(pid, PIDTYPE_SID));seq_printf(m, "pid task %s,pgid task %s, psid_task %s", pid_task(pid, PIDTYPE_PID)->comm, pid_task(pid, PIDTYPE_PGID)->comm, pid_task(pid, PIDTYPE_SID)->comm);}seq_printf(m, "\n");}rcu_read_unlock();} else if(seqfile_debug_mode == 2) {for_each_process(task) {struct pid *pgrp = task_pgrp(task);seq_printf(m, "Group Header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(pgrp, PIDTYPE_PGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d.\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p));} while_each_pid_task(pgrp, PIDTYPE_PGID, p);}} else if (seqfile_debug_mode == 3) {for_each_process(task) {struct pid *session = task_session(task);struct task_struct *tsk = pid_task(session, PIDTYPE_PID);if(tsk) {seq_printf(m, "session task %s(%d,cpu%d):", tsk->comm, task_pid_nr(tsk), task_cpu(tsk));} else {seq_printf(m, "process %s(%d,cpu%d) has no session task.", task->comm, task_pid_nr(task), task_cpu(task));}seq_printf(m, "session header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(session, PIDTYPE_SID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d, spidtask %s(%d,%d).\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), pid_task(session, PIDTYPE_SID)->comm, pid_task(session, PIDTYPE_SID)->tgid, pid_task(session, PIDTYPE_SID)->pid);if(pid_task(session, PIDTYPE_PID)) {seq_printf(m, "pidtask %s(%d,%d).\n", pid_task(session, PIDTYPE_PID)->comm, pid_task(session, PIDTYPE_PID)->tgid, pid_task(session, PIDTYPE_PID)->pid);}} while_each_pid_task(pgrp, PIDTYPE_SID, p);}} else if(seqfile_debug_mode == 4) {struct task_struct *thread, *child;for_each_process(task) {seq_printf(m, "process %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));for_each_thread(task, thread) {list_for_each_entry(child, &thread->children, sibling) {seq_printf(m, "      thread %s(%d,cpu%d) child %s(%d,cpu%d),threadnum %d, %d.\n",thread->comm, task_pid_nr(thread), task_cpu(thread), \child->comm, task_pid_nr(child), task_cpu(child), \get_nr_threads(thread), get_nr_threads(child));}}}} else if(seqfile_debug_mode == 5) { struct task_struct *g, *t;do_each_thread (g, t) {seq_printf(m, "Process %s(%d cpu%d), thread %s(%d cpu%d), threadnum %d.\n", g->comm, task_pid_nr(g), task_cpu(g), t->comm, task_pid_nr(t), task_cpu(t), get_nr_threads(g));} while_each_thread (g, t);} else if(seqfile_debug_mode == 6) {for_each_process(task) {struct pid *pid = task_pid(task);seq_printf(m, "Process %s(%d,cpu%d) pid %d, tgid %d:\n", task->comm, task_pid_nr(task), task_cpu(task), task_pid_vnr(task), task_tgid_vnr(task));do_each_pid_task(pid, PIDTYPE_TGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. pid %d, tgid %d\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), task_pid_vnr(p), task_tgid_vnr(p));} while_each_pid_task(pid, PIDTYPE_TGID, p);}} else if(seqfile_debug_mode == 7) {for_each_process(task) {struct pid *pid = task_pid(task);seq_printf(m, "Process %s(%d,cpu%d) pid %d, tgid %d:\n", task->comm, task_pid_nr(task), task_cpu(task), task_pid_vnr(task), task_tgid_vnr(task));do_each_pid_task(pid, PIDTYPE_PID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. pid %d, tgid %d\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), task_pid_vnr(p), task_tgid_vnr(p));} while_each_pid_task(pid, PIDTYPE_PID, p);}} else if(seqfile_debug_mode == 8) {bus_for_each_dev(&pci_bus_type, NULL, (void*)m, lookup_pci_devices);bus_for_each_drv(&pci_bus_type, NULL, (void*)m, lookup_pci_drivers);} else if(seqfile_debug_mode == 9) {struct device_driver *drv; drv = driver_find("pcieport", &pci_bus_type);driver_for_each_device(drv, NULL, (void*)m, list_device_belongs_todriver);} else {printk("%s line %d,cant be here, seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);}return 0;
}static const struct seq_operations my_seq_ops = {.start  = my_seq_ops_start,.next   = my_seq_ops_next,.stop   = my_seq_ops_stop,.show   = my_seq_ops_show,
};static int proc_seq_open(struct inode *inode, struct file *file)
{int ret;struct seq_file *m;ret = seq_open(file, &my_seq_ops);if(!ret) {m = file->private_data; m->private = file;}return ret;
}static ssize_t proc_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos)
{char debug_string[16];int debug_no;memset(debug_string, 0x00, sizeof(debug_string));if (count >= sizeof(debug_string)) {printk("%s line %d, fata error, write count exceed max buffer size.\n", __func__, __LINE__);return -EINVAL;}if (copy_from_user(debug_string, buffer, count)) {printk("%s line %d, fata error, copy from user failure.\n", __func__, __LINE__);return -EFAULT;}if (sscanf(debug_string, "%d", &debug_no) <= 0) {printk("%s line %d, fata error, read debugno failure.\n", __func__, __LINE__);return -EFAULT;}seqfile_debug_mode = debug_no;//printk("%s line %d, debug_no %d.\n", __func__, __LINE__, debug_no);return count;
}static ssize_t proc_seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
{ssize_t ret;printk("%s line %d enter, fuck size %lld size %ld.\n", __func__, __LINE__, *ppos, size);ret = seq_read(file, buf, size, ppos);printk("%s line %d exit, fuck size %lld size %ld,ret = %ld.\n", __func__, __LINE__, *ppos, size, ret);return ret;
}static struct file_operations seq_proc_ops = {.owner      = THIS_MODULE,.open       = proc_seq_open,.release    = seq_release,.read       = proc_seq_read,.write      = proc_seq_write,.llseek     = seq_lseek,.unlocked_ioctl = NULL,
};static struct proc_dir_entry * entry;
static int proc_hook_init(void)
{printk("%s line %d, init. seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);entry = proc_create("dumptask", 0644, NULL, &seq_proc_ops);//entry = proc_create_seq("dumptask", 0644, NULL, &my_seq_ops);return 0;
}static void proc_hook_exit(void)
{proc_remove(entry);printk("%s line %d, exit.\n", __func__, __LINE__);return;
}module_init(proc_hook_init);
module_exit(proc_hook_exit);

vim进程下所有的线程的task_struct->files指针完全相同。

如何查看进程内每个线程的CPU占用率?

top -Hp #PID

内核线程没有session

内核线程没有session ID.也就是没有控制台。

内核线程不能处理信号,任何给内核thread发送信号的行为都会无功而返:

所有设置PF_KTHREAD标志的线程都是内核线程,并且所有的内核线程都设置了PF_KTHREAD标志。

信号操作进程组

1.杀死进程组的所有进程

$ killall -9 $programname

另一种杀死进程组的方式,发送的是TERM信号:

$ killall -g $programe

killall`s principle is that, the killall program will tranverse the /proc/$PID/stat file and find the program named target, then send signal with kill syscall one by one.

program can get the target process name through /proc/$PID/stat file.

acutually, compare with "killall -9 $programname", killall -g $programname" method has something difference.the later will get the a.out process group id and call kill send "SIGTERM" signal once to the group leader and kill all the process blong`s this process group. you can find getpgid syscall traced before sending signals.

in kernel, the process of kill process group is line 1581 in below picture.

tips:

kill -9 -1 send signal to PID -1 will kill all the usr space process except process 1.the effect is something like restart the resource exploer in windows task management.  you can see how this works, wen pid recevied is -1, the process will send kill signal to all the user space proces.

kill->kill_something_info->group_send_sig_info.

2.杀死指定pid进程

如果被杀死的进程有子进程,则子进程被托孤

$ kill -9 $PID

进程状态

1.查看zombie进程

方法1,top:

方法2,

$ ps x -A -o stat,ppid,pid,cmd | grep -e '^[Zz]'

创建僵尸进程

僵尸进程的存在需要满足几个必要条件:

1.父进程没有将SIGCHILD设置为IGNORE。

2.父进程健在,子进程退出,如果父进程不存在了将会将子进程托孤回收。

创建一个满足上面必要条件的程序,即可创建ZOMBIE状态的进程。

#include<stdio.h>
#include<sys/types.h>
#include<unistd.h>
#include<sys/wait.h>int main(int argc, const char *argv[])
{pid_t pid = fork();if(pid > 0) {while(1) {printf("this is parent ppid:%d child:%d\n",getpid(),pid);sleep(1);}} else if(pid == 0) {printf("this is child ppid:%d child:%d\n",getppid(),getpid());sleep(1);return 0;} else {perror("fork");}return 0;
}

zombie进程能通过进程链表找到么?

答案是可以的。

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/stat.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/seq_file.h>
#include <linux/sched/signal.h>
#include <linux/proc_fs.h>
#include <linux/pid.h>
#include <linux/pci.h>MODULE_AUTHOR("zlcao");
MODULE_LICENSE("GPL");int seqfile_debug_mode = 0;
module_param(seqfile_debug_mode, int, 0664);// 开始输出任务列表
// my_seq_ops_start()的返回值,会传递给my_seq_ops_next()的v参数
static void *my_seq_ops_start(struct seq_file *m, loff_t *pos)
{loff_t index = *pos;struct task_struct *task;printk("%s line %d, index %lld.count %ld, size %ld here.\n", __func__, __LINE__, index, m->count, m->size);if(seqfile_debug_mode == 0) {// 如果缓冲区不足, seq_file可能会重新调用start()函数,// 并且传入的pos是之前已经遍历到的位置,// 这里需要根据pos重新计算开始的位置for_each_process(task) {if (index-- == 0) {return task;}}} else {return NULL + (*pos == 0);}return NULL;
}// 继续遍历, 直到my_seq_ops_next()放回NULL或者错误
static void *my_seq_ops_next(struct seq_file *m, void *v, loff_t *pos)
{struct task_struct *task = NULL;if(seqfile_debug_mode == 0) {task = next_task((struct task_struct *)v);// 这里加不加好像都没有作用++ *pos;// 返回NULL, 遍历结束if(task == &init_task) {return NULL;}} else {++ *pos;}return task;
}// 遍历完成/出错时seq_file会调用stop()函数
static void my_seq_ops_stop(struct seq_file *m, void *v)
{}static int lookup_pci_devices(struct device *dev, void *data)
{struct seq_file *m = (struct seq_file *)data;struct pci_dev *pdev = to_pci_dev(dev);seq_printf(m, "vendor id 0x%x, device id 0x%x, devname %s.\n", pdev->vendor, pdev->device, dev_name(&pdev->dev));return 0;
}static int lookup_pci_drivers(struct device_driver *drv, void *data)
{struct seq_file *m = (struct seq_file *)data;seq_printf(m, "driver name %s.\n", drv->name);return 0;
}static int list_device_belongs_todriver(struct device *dev, void *p)
{struct seq_file *m = (struct seq_file *)p;struct pci_dev *pdev = to_pci_dev(dev);seq_printf(m, "vendor id 0x%x, device id 0x%x, devname %s.\n", pdev->vendor, pdev->device, dev_name(&pdev->dev));return 0;
}// 此函数将数据写入`seq_file`内部的缓冲区
// `seq_file`会在合适的时候把缓冲区的数据拷贝到应用层
// 参数@V是start/next函数的返回值
static int my_seq_ops_show(struct seq_file *m, void *v)
{struct task_struct * task = NULL;struct task_struct * p = NULL;struct file *file = m->private;if(seqfile_debug_mode == 0) {seq_puts(m, " file=");seq_file_path(m, file, "\n");seq_putc(m, ' ');task = (struct task_struct *)v;struct pid *session = task_session(task);struct task_struct *tsk = pid_task(session, PIDTYPE_PID);if(task->flags & PF_KTHREAD) {seq_printf(m, "Kernel thread: PID=%u, task: %s, index=%lld, read_pos=%lld, %s.\n", task->tgid, task->comm, m->index, m->read_pos, tsk? "has session" : "no session");} else {seq_printf(m, "User thread: PID=%u, task: %s, index=%lld, read_pos=%lld %s.\n", task->tgid, task->comm, m->index, m->read_pos, tsk? "has session" : "no session");}} else if(seqfile_debug_mode == 1) {struct task_struct *g, *p;static int oldcount = 0;static int entercount = 0;char *str;printk("%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);seq_printf(m, "%s line %d here enter %d times.\n", __func__, __LINE__, ++ entercount);rcu_read_lock();for_each_process_thread(g, p) {struct task_struct *session = pid_task(task_session(g), PIDTYPE_PID);struct task_struct *thread = pid_task(task_session(p), PIDTYPE_PID);struct task_struct *ggroup = pid_task(task_pgrp(g), PIDTYPE_PID);struct task_struct *pgroup = pid_task(task_pgrp(p), PIDTYPE_PID);struct pid * pid = task_session(g);if(list_empty(&p->tasks)) {str = "empty";} else {str = "not empty";}seq_printf(m, "process %s(pid %d tgid %d,cpu%d) thread %s(pid %d tgid %d,cpu%d),threadnum %d, %d. tasks->prev = %p, tasks->next = %p, p->tasks=%p, %s, process parent %s(pid %d tgid %d), thread parent%s(pid %d, tgid %d, files %p\n)",g->comm, task_pid_nr(g), task_tgid_nr(g), task_cpu(g), \p->comm, task_pid_nr(p), task_tgid_nr(p), task_cpu(p), \get_nr_threads(g), get_nr_threads(p), p->tasks.prev, p->tasks.next, &p->tasks, str, g->real_parent->comm, \task_pid_nr(g->real_parent),task_tgid_nr(g->real_parent), p->real_parent->comm, task_pid_nr(p->real_parent), task_tgid_nr(p->real_parent), p->files);if(ggroup) {seq_printf(m, "ggroup(pid %d tgid %d).", task_pid_nr(ggroup),task_tgid_nr(ggroup));}if(pgroup) {seq_printf(m, "pgroup(pid %d tgid %d).", task_pid_nr(pgroup),task_tgid_nr(pgroup));}seq_printf(m, "current smp processor id %d.", smp_processor_id());if(thread) {seq_printf(m, "thread session %s(%d).", thread->comm, task_pid_nr(thread));}if(session) {seq_printf(m, "process session %s(%d).", session->comm, task_pid_nr(session));}if(oldcount == 0 || oldcount != m->size) {printk("%s line %d, m->count %ld, m->size %ld.", __func__, __LINE__, m->count, m->size);oldcount = m->size;}if(pid){seq_printf(m, "pid task %p,pgid task %p, psid_task %p", pid_task(pid, PIDTYPE_PID), pid_task(pid, PIDTYPE_PGID), pid_task(pid, PIDTYPE_SID));seq_printf(m, "pid task %s,pgid task %s, psid_task %s", pid_task(pid, PIDTYPE_PID)->comm, pid_task(pid, PIDTYPE_PGID)->comm, pid_task(pid, PIDTYPE_SID)->comm);}seq_printf(m, "\n");}rcu_read_unlock();} else if(seqfile_debug_mode == 2) {for_each_process(task) {struct pid *pgrp = task_pgrp(task);seq_printf(m, "Group Header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(pgrp, PIDTYPE_PGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d.\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p));} while_each_pid_task(pgrp, PIDTYPE_PGID, p);}} else if (seqfile_debug_mode == 3) {for_each_process(task) {struct pid *session = task_session(task);struct task_struct *tsk = pid_task(session, PIDTYPE_PID);if(tsk) {seq_printf(m, "session task %s(%d,cpu%d):", tsk->comm, task_pid_nr(tsk), task_cpu(tsk));} else {seq_printf(m, "process %s(%d,cpu%d) has no session task.", task->comm, task_pid_nr(task), task_cpu(task));}seq_printf(m, "session header %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));do_each_pid_task(session, PIDTYPE_SID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d, spidtask %s(%d,%d).\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), pid_task(session, PIDTYPE_SID)->comm, pid_task(session, PIDTYPE_SID)->tgid, pid_task(session, PIDTYPE_SID)->pid);if(pid_task(session, PIDTYPE_PID)) {seq_printf(m, "pidtask %s(%d,%d).\n", pid_task(session, PIDTYPE_PID)->comm, pid_task(session, PIDTYPE_PID)->tgid, pid_task(session, PIDTYPE_PID)->pid);}} while_each_pid_task(pgrp, PIDTYPE_SID, p);}} else if(seqfile_debug_mode == 4) {struct task_struct *thread, *child;for_each_process(task) {seq_printf(m, "process %s(%d,cpu%d):\n", task->comm, task_pid_nr(task), task_cpu(task));for_each_thread(task, thread) {list_for_each_entry(child, &thread->children, sibling) {seq_printf(m, "      thread %s(%d,cpu%d) child %s(%d,cpu%d),threadnum %d, %d.\n",thread->comm, task_pid_nr(thread), task_cpu(thread), \child->comm, task_pid_nr(child), task_cpu(child), \get_nr_threads(thread), get_nr_threads(child));}}}} else if(seqfile_debug_mode == 5) { struct task_struct *g, *t;do_each_thread (g, t) {seq_printf(m, "Process %s(%d cpu%d), thread %s(%d cpu%d), threadnum %d.\n", g->comm, task_pid_nr(g), task_cpu(g), t->comm, task_pid_nr(t), task_cpu(t), get_nr_threads(g));} while_each_thread (g, t);} else if(seqfile_debug_mode == 6) {for_each_process(task) {struct pid *pid = task_pid(task);seq_printf(m, "Process %s(%d,cpu%d) pid %d, tgid %d:\n", task->comm, task_pid_nr(task), task_cpu(task), task_pid_vnr(task), task_tgid_vnr(task));do_each_pid_task(pid, PIDTYPE_TGID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. pid %d, tgid %d\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), task_pid_vnr(p), task_tgid_vnr(p));} while_each_pid_task(pid, PIDTYPE_TGID, p);}} else if(seqfile_debug_mode == 7) {for_each_process(task) {struct pid *pid = task_pid(task);seq_printf(m, "Process %s(%d,cpu%d) pid %d, tgid %d:\n", task->comm, task_pid_nr(task), task_cpu(task), task_pid_vnr(task), task_tgid_vnr(task));do_each_pid_task(pid, PIDTYPE_PID, p) {seq_printf(m, "      process %s(%d,cpu%d) thread %s(%d,cpu%d),threadnum %d, %d. pid %d, tgid %d\n",task->comm, task_pid_nr(task), task_cpu(task), \p->comm, task_pid_nr(p), task_cpu(p), \get_nr_threads(task), get_nr_threads(p), task_pid_vnr(p), task_tgid_vnr(p));} while_each_pid_task(pid, PIDTYPE_PID, p);}} else if(seqfile_debug_mode == 8) {bus_for_each_dev(&pci_bus_type, NULL, (void*)m, lookup_pci_devices);bus_for_each_drv(&pci_bus_type, NULL, (void*)m, lookup_pci_drivers);} else if(seqfile_debug_mode == 9) {struct device_driver *drv; drv = driver_find("pcieport", &pci_bus_type);driver_for_each_device(drv, NULL, (void*)m, list_device_belongs_todriver);} else if(seqfile_debug_mode == 10) {for_each_process(task) {seq_printf(m, "Process %s(%d),state 0x%08lx, exit_state 0x%08x, refcount %d, usage %d rcucount %d.\n", \task->comm, task->tgid, task->state, task->exit_state, refcount_read(&task->stack_refcount), refcount_read(&task->usage), refcount_read(&task->rcu_users));}} else {printk("%s line %d,cant be here, seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);}return 0;
}static const struct seq_operations my_seq_ops = {.start  = my_seq_ops_start,.next   = my_seq_ops_next,.stop   = my_seq_ops_stop,.show   = my_seq_ops_show,
};static int proc_seq_open(struct inode *inode, struct file *file)
{int ret;struct seq_file *m;ret = seq_open(file, &my_seq_ops);if(!ret) {m = file->private_data; m->private = file;}return ret;
}static ssize_t proc_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos)
{char debug_string[16];int debug_no;memset(debug_string, 0x00, sizeof(debug_string));if (count >= sizeof(debug_string)) {printk("%s line %d, fata error, write count exceed max buffer size.\n", __func__, __LINE__);return -EINVAL;}if (copy_from_user(debug_string, buffer, count)) {printk("%s line %d, fata error, copy from user failure.\n", __func__, __LINE__);return -EFAULT;}if (sscanf(debug_string, "%d", &debug_no) <= 0) {printk("%s line %d, fata error, read debugno failure.\n", __func__, __LINE__);return -EFAULT;}seqfile_debug_mode = debug_no;//printk("%s line %d, debug_no %d.\n", __func__, __LINE__, debug_no);return count;
}static ssize_t proc_seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
{ssize_t ret;printk("%s line %d enter, fuck size %lld size %ld.\n", __func__, __LINE__, *ppos, size);ret = seq_read(file, buf, size, ppos);printk("%s line %d exit, fuck size %lld size %ld,ret = %ld.\n", __func__, __LINE__, *ppos, size, ret);return ret;
}static struct file_operations seq_proc_ops = {.owner      = THIS_MODULE,.open       = proc_seq_open,.release    = seq_release,.read       = proc_seq_read,.write      = proc_seq_write,.llseek     = seq_lseek,.unlocked_ioctl = NULL,
};static struct proc_dir_entry * entry;
static int proc_hook_init(void)
{printk("%s line %d, init. seqfile_debug_mode = %d.\n", __func__, __LINE__, seqfile_debug_mode);entry = proc_create("dumptask", 0644, NULL, &seq_proc_ops);//entry = proc_create_seq("dumptask", 0644, NULL, &my_seq_ops);return 0;
}static void proc_hook_exit(void)
{proc_remove(entry);printk("%s line %d, exit.\n", __func__, __LINE__);return;
}module_init(proc_hook_init);
module_exit(proc_hook_exit);

可以看到,父进程状态仍然是TASK_INTERRUPTIBLE,此时正在睡眠。子进程的state为TASK_DEAD,exit_state为EXIT_ZOMBIE,所以进程状态为僵尸进程。

如何杀死僵尸进程?

一般僵尸进程很难直接kill掉,cant be reschedule for it did not have its kernel stack and code context. 不过您可以kill僵尸爸爸。父进程死后,僵尸进程成为”孤儿进程”,过继给1号进程init,init始终会负责清理僵尸进程.它产生的所有僵尸进程也跟着消失。

子进程死后,会发送SIGCHLD信号给父进程,父进程收到此信号后,执行waitpid()函数为子进程收尸。就是基于这样的原理:就算父进程没有调用wait,内核也会向它发送SIGCHLD消息,而此时,尽管对它的默认处理是忽略,如果想响应这个消息,可以设置一个处理函数。

如何避免僵尸进程呢?

处理SIGCHLD信号并不是必须的,但对于某些进程,特别是服务器进程往往在请求到来时生成子进程处理请求。

如果父进程不等待子进程结束,子进程将成为僵尸进程(zombie)从而占用系统资源。如果父进程等待子进程结束,将增加父进程的负担,影响服务器进程的并发性能。

在Linux下 可以简单地将 SIGCHLD信号的操作设为SIG_IGN。as flollows:

signal(SIGCHLD, SIG_IGN);

#include<stdio.h>
#include<sys/types.h>
#include<unistd.h>
#include<sys/wait.h>int main(int argc, const char *argv[])
{signal(SIGCHLD, SIG_IGN);pid_t pid = fork();if(pid > 0) {while(1) {printf("this is parent ppid:%d child:%d\n",getpid(),pid);sleep(1);}} else if(pid == 0) {printf("this is child ppid:%d child:%d\n",getppid(),getpid());sleep(1);return 0;} else {perror("fork");}return 0;
}

进程变为僵尸进程的过程

进程在执行完exit后,将会主动调用__schedule进行一次主动调度,这次任务调度后,当前任务将永远失去被调度的机会。

在执行调度后,调用堆栈:

schedule->_schedule ->context_switch->put_task_stack(prev)->release_task_stack(tsk);->free_thread_stack(tsk);tsk->stack = NULL;

释放task_struct

schedule->_schedule ->context_switch->put_task_stack(prev)->put_task_struct_rcu_user(prev);->if (refcount_dec_and_test(&task->rcu_users)) call_rcu(&task->rcu, delayed_put_task_struct);

由于rcu_users初始为2,减1并不会导致其释放,所以task_struct仍然存在。

至此,进程内核堆栈得到了释放,但是task_struct结构体并没有释放。进程成为僵尸进程,这就是僵尸进程的产生过程。

callstack

[   52.992966] release_task_stack line 441, pid 2892.
[   52.992971] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.224+ #40
[   52.992973] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[   52.992975] Call Trace:
[   52.992985]  dump_stack+0x57/0x6d
[   52.992991]  put_task_stack+0x1ae/0x1e0
[   52.992995]  finish_task_switch+0x1a2/0x240
[   52.993001]  __schedule+0x27f/0x710
[   52.993007]  schedule_idle+0x22/0x40
[   52.993012]  do_idle+0x181/0x2a0
[   52.993017]  cpu_startup_entry+0x1d/0x20
[   52.993021]  rest_init+0xae/0xb0
[   52.993027]  arch_call_rest_init+0xe/0x1b
[   52.993031]  start_kernel+0x517/0x539
[   52.993037]  x86_64_start_reservations+0x24/0x26
[   52.993041]  x86_64_start_kernel+0x8e/0x91
[   52.993046]  secondary_startup_64+0xa4/0xb0
[   67.497746] release_task_stack line 441, pid 2891.
[   67.497751] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.224+ #40
[   67.497753] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[   67.497754] Call Trace:
[   67.497765]  dump_stack+0x57/0x6d
[   67.497770]  put_task_stack+0x1ae/0x1e0
[   67.497775]  finish_task_switch+0x1a2/0x240
[   67.497781]  __schedule+0x27f/0x710
[   67.497787]  schedule_idle+0x22/0x40
[   67.497792]  do_idle+0x181/0x2a0
[   67.497798]  cpu_startup_entry+0x1d/0x20
[   67.497801]  start_secondary+0x159/0x1b0
[   67.497806]  secondary_startup_64+0xa4/0xb0
[   67.497820] put_task_struct_rcu_user line 189, pid 2891.
[   67.497822] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.224+ #40
[   67.497824] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[   67.497825] Call Trace:
[   67.497828]  dump_stack+0x57/0x6d
[   67.497834]  put_task_struct_rcu_user+0x90/0xa0
[   67.497837]  finish_task_switch+0x1aa/0x240
[   67.497842]  __schedule+0x27f/0x710
[   67.497848]  schedule_idle+0x22/0x40
[   67.497852]  do_idle+0x181/0x2a0
[   67.497858]  cpu_startup_entry+0x1d/0x20
[   67.497861]  start_secondary+0x159/0x1b0
[   67.497864]  secondary_startup_64+0xa4/0xb0
[   67.497990] put_task_struct_rcu_user line 189, pid 2892.
[   67.497995] CPU: 3 PID: 2390 Comm: systemd Not tainted 5.4.224+ #40
[   67.497997] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[   67.497998] Call Trace:
[   67.498006]  dump_stack+0x57/0x6d
[   67.498012]  put_task_struct_rcu_user+0x90/0xa0
[   67.498017]  release_task+0x3c0/0x450
[   67.498023]  wait_consider_task+0x991/0xa70
[   67.498029]  do_wait+0x11e/0x230
[   67.498033]  kernel_waitid+0x13d/0x210
[   67.498039]  ? proc_cgroup_show+0x1fe/0x2a0
[   67.498044]  ? task_stopped_code+0x50/0x50
[   67.498048]  __do_sys_waitid+0x118/0x140
[   67.498053]  ? call_rcu+0x10/0x20
[   67.498058]  ? __fput+0x162/0x260
[   67.498064]  ? _cond_resched+0x19/0x40
[   67.498068]  ? task_work_run+0x46/0xc0
[   67.498072]  __x64_sys_waitid+0x24/0x30
[   67.498075]  ? __x64_sys_waitid+0x24/0x30
[   67.498079]  do_syscall_64+0x51/0x180
[   67.498083]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[   67.498087] RIP: 0033:0x7fba8550245a
[   67.498091] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 91 c5 30 00 41 89 ca 8b 00 85 c0 75 18 45 31 c0 b8 f7 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5e f3 c3 0f 1f 40 00 41 55 41 54 41 89 cd 55
[   67.498093] RSP: 002b:00007ffdc93c4e58 EFLAGS: 00000246 ORIG_RAX: 00000000000000f7
[   67.498096] RAX: ffffffffffffffda RBX: 00007ffdc93c4e70 RCX: 00007fba8550245a
[   67.498098] RDX: 00007ffdc93c4e70 RSI: 0000000000000b4c RDI: 0000000000000001
[   67.498100] RBP: 0000565556befee0 R08: 0000000000000000 R09: 0000000000000004
[   67.498101] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
[   67.498103] R13: 0000000000000000 R14: 0000565556ccbec0 R15: 00005655550571b8
[   67.517694] free_task_struct line 175, pid 2891.
[   67.517700] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.224+ #40
[   67.517702] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[   67.517703] Call Trace:
[   67.517706]  <IRQ>
[   67.517716]  dump_stack+0x57/0x6d
[   67.517721]  free_task+0x94/0xa0
[   67.517725]  __put_task_struct+0xb1/0x170
[   67.517730]  delayed_put_task_struct+0x8c/0xb0
[   67.517736]  rcu_core+0x196/0x480
[   67.517742]  rcu_core_si+0xe/0x10
[   67.517747]  __do_softirq+0xde/0x2ce
[   67.517752]  irq_exit+0xa8/0xb0
[   67.517757]  smp_apic_timer_interrupt+0x79/0x130
[   67.517762]  apic_timer_interrupt+0xf/0x20
[   67.517764]  </IRQ>
[   67.517768] RIP: 0010:cpuidle_enter_state+0xb4/0x430
[   67.517772] Code: 89 c7 0f 1f 44 00 00 31 ff e8 f8 4f 82 ff 80 7d d3 00 74 12 9c 58 f6 c4 02 0f 85 4e 03 00 00 31 ff e8 c0 dd 88 ff fb 45 85 ed <0f> 88 1a 03 00 00 4c 2b 7d c8 48 ba cf f7 53 e3 a5 9b c4 20 49 63
[   67.517774] RSP: 0018:ffffaf00c00c7e50 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[   67.517778] RAX: ffff8c9726aaec00 RBX: ffffffffb8961600 RCX: 000000000000001f
[   67.517780] RDX: 0000000fb85dd4cb RSI: 000000002aaaab99 RDI: 0000000000000000
[   67.517781] RBP: ffffaf00c00c7e90 R08: 0000000000000002 R09: 000000000002e480
[   67.517783] R10: ffffaf00c00c7e20 R11: 00000000000002bd R12: ffff8c9726ab9900
[   67.517785] R13: 0000000000000004 R14: ffffffffb8961798 R15: 0000000fb85dd4cb
[   67.517793]  cpuidle_enter+0x2e/0x40
[   67.517798]  do_idle+0x220/0x2a0
[   67.517804]  cpu_startup_entry+0x1d/0x20
[   67.517808]  start_secondary+0x159/0x1b0
[   67.517813]  secondary_startup_64+0xa4/0xb0
[   67.525696] free_task_struct line 175, pid 2892.
[   67.525703] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.4.224+ #40
[   67.525705] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[   67.525707] Call Trace:
[   67.525710]  <IRQ>
[   67.525721]  dump_stack+0x57/0x6d
[   67.525727]  free_task+0x94/0xa0
[   67.525731]  __put_task_struct+0xb1/0x170
[   67.525736]  delayed_put_task_struct+0x8c/0xb0
[   67.525742]  rcu_core+0x196/0x480
[   67.525748]  rcu_core_si+0xe/0x10
[   67.525754]  __do_softirq+0xde/0x2ce
[   67.525759]  irq_exit+0xa8/0xb0
[   67.525763]  smp_apic_timer_interrupt+0x79/0x130
[   67.525768]  apic_timer_interrupt+0xf/0x20
[   67.525770]  </IRQ>
[   67.525775] RIP: 0010:cpuidle_enter_state+0xb4/0x430
[   67.525779] Code: 89 c7 0f 1f 44 00 00 31 ff e8 f8 4f 82 ff 80 7d d3 00 74 12 9c 58 f6 c4 02 0f 85 4e 03 00 00 31 ff e8 c0 dd 88 ff fb 45 85 ed <0f> 88 1a 03 00 00 4c 2b 7d c8 48 ba cf f7 53 e3 a5 9b c4 20 49 63
[   67.525781] RSP: 0018:ffffaf00c00d7e50 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[   67.525785] RAX: ffff8c9726baec00 RBX: ffffffffb8961600 RCX: 000000000000001f
[   67.525786] RDX: 0000000fb8d7d5c0 RSI: 000000002aaaab99 RDI: 0000000000000000
[   67.525788] RBP: ffffaf00c00d7e90 R08: 0000000000000004 R09: 000000000002e480
[   67.525790] R10: ffffaf00c00d7e20 R11: 00000000000000b9 R12: ffff8c9726bb9900
[   67.525791] R13: 0000000000000004 R14: ffffffffb8961798 R15: 0000000fb8d7d5c0
[   67.525800]  cpuidle_enter+0x2e/0x40
[   67.525805]  do_idle+0x220/0x2a0
[   67.525811]  cpu_startup_entry+0x1d/0x20
[   67.525815]  start_secondary+0x159/0x1b0
[   67.525821]  secondary_startup_64+0xa4/0xb0
czl@czl-Vostro-3268:~/Workspace/zombie$

based on above log, we can conclude that:

1. child process(2892) stack freed after schdule to the idle task,in finsh_task_switch context.

2.father process(2891) stack freed also in the same context like child process.

3.child task_struct will be release after father exit , for the exit of father will reaper child to systemd(2390), then freed by systemd "wait" syscall.

4. the father process task and child process are all be freed in rcu context.

进程管理结构

Linux内核通过struct task_struct和struct thread_info, 在内核中,这两种结构存在两种分布形式,第一种 struct task_struct 和struct thread_info独立存在,struct thread_info位于内和栈底部,彼此互相指向:

第二种, struct thread_info嵌在struct task_struct对象当中

内核有两处保存寄存器的值,从用户模式切到内核模式时,把用户模式的各种寄存器保存在内核栈底部的结构体struct pt_regs.进程调度器调度进程时,切换出去的进程把寄存器保存在进程描述符task_struct.thread中,不同的架构寄存器不同,所以各种处理器架构需要自定义结构体struct pt_regs和struct thread_struct.实现copy_thread_tls.

对于内核线程,由于不会用到struct pt_regs.所以被设置为0.

内核提供一个配置选项CONFIG_THREAD_INFO_IN_TASK,如果选中,则表示采用第二种布局,默认X86架构使用第二种布局。

x86中current的实现

内核首先定义了一个PERCPU的struct task_struct的变量,用来记录每个CPU上当前正在执行的任务。

arch/x86/include/asm/current.h:

每次进行调度时,会通过如下调用栈对当前CPU的current_task进行设置:

schedule->_schedule ->context_switch->switch_to->__switch_to->this_cpu_write(current_task,next_p);

得到thread_info结构体的方式很简单,由于雌雄同体,直接对task_struct进行强制类型转换即可。

fork子进程的相关

1.子进程和父进程共享相同的fd, struct file对象。

2.子进程创建时没有CLONE_FILES标志,线程创建才有,CLONE_FILES更进一步,直接共享了task_struct->files struct files_struct对象,所以文件fd, struct file,struct fdtable是同一个。

验证:

根据copy_files实现可以看出,创建子进程虽然不需要设置CLONE_FILES.但是仍然会dup_fd.

fork 子进程时,dup_fd会对每个struct file对象的引用计数加1,表示多了子进程的引用。

如何查案进程子进程:

比如我们想知道内核进程kthreadd都创建了那些子内核子进程,可以执行命令

$ pstree -n -p 2
zlcao@zlcao-RedmiBook-14:~/workspace/dumpstack$ pstree -n -p 2
kthreadd(2)─┬─rcu_gp(3)├─rcu_par_gp(4)├─slub_flushwq(5)├─netns(6)├─kworker/0:0H-events_highpri(8)├─mm_percpu_wq(10)├─rcu_tasks_rude_(11)├─rcu_tasks_trace(12)├─ksoftirqd/0(13)├─rcu_sched(14)├─migration/0(15)├─idle_inject/0(16)├─cpuhp/0(18)├─cpuhp/1(19)├─idle_inject/1(20)├─migration/1(21)├─ksoftirqd/1(22)├─kworker/1:0H-events_highpri(24)├─cpuhp/2(25)├─idle_inject/2(26)├─migration/2(27)├─ksoftirqd/2(28)├─kworker/2:0H-events_highpri(30)├─cpuhp/3(31)├─idle_inject/3(32)├─migration/3(33)├─ksoftirqd/3(34)├─kworker/3:0H-kblockd(36)├─cpuhp/4(37)├─idle_inject/4(38)├─migration/4(39)├─ksoftirqd/4(40)├─kworker/4:0H-events_highpri(42)├─cpuhp/5(43)├─idle_inject/5(44)├─migration/5(45)├─ksoftirqd/5(46)├─kworker/5:0H-kblockd(48)├─cpuhp/6(49)├─idle_inject/6(50)├─migration/6(51)├─ksoftirqd/6(52)├─kworker/6:0H(54)├─cpuhp/7(55)├─idle_inject/7(56)├─migration/7(57)├─ksoftirqd/7(58)├─kworker/7:0H-events_highpri(60)├─kdevtmpfs(61)├─inet_frag_wq(62)├─kauditd(63)├─khungtaskd(64)├─oom_reaper(65)├─writeback(66)├─kcompactd0(67)├─ksmd(68)├─khugepaged(69)├─kintegrityd(116)├─kblockd(117)├─blkcg_punt_bio(118)├─tpm_dev_wq(119)├─ata_sff(120)├─md(121)├─edac-poller(122)├─devfreq_wq(123)├─watchdogd(124)├─kworker/0:1H-events_highpri(126)├─kswapd0(128)├─ecryptfs-kthrea(129)├─kthrotld(132)├─irq/123-aerdrv(133)├─irq/123-pcie-dp(134)├─acpi_thermal_pm(140)├─vfio-irqfd-clea(142)├─mld(143)├─ipv6_addrconf(144)├─kstrp(155)├─zswap-shrink(158)├─charger_manager(166)├─kworker/1:1H-kblockd(211)├─scsi_eh_0(224)├─scsi_tmf_0(225)├─scsi_eh_1(226)├─scsi_tmf_1(227)├─scsi_eh_2(228)├─scsi_tmf_2(229)├─kworker/7:1H-events_highpri(232)├─irq/126-ELAN230(235)├─kworker/5:1H-events_highpri(238)├─kworker/6:1H-events_highpri(241)├─kworker/3:1H-events_highpri(242)├─kworker/2:1H-events_highpri(243)├─kworker/4:1H-kblockd(244)├─jbd2/sda8-8(272)├─ext4-rsv-conver(273)├─cfg80211(426)├─irq/128-iwlwifi(439)├─irq/129-iwlwifi(440)├─irq/130-iwlwifi(441)├─irq/131-iwlwifi(442)├─irq/132-iwlwifi(443)├─irq/133-iwlwifi(444)├─cryptd(495)├─card0-crtc0(499)├─card0-crtc1(500)├─card0-crtc2(501)├─nv_queue(576)├─nv_queue(577)├─nvidia-modeset/(668)├─nvidia-modeset/(669)├─irq/135-nvidia(678)├─nv_queue(680)├─irq/136-AudioDS(690)├─UVM global queu(721)├─UVM deferred re(722)├─UVM Tools Event(723)├─krfcommd(1712)├─irq/127-mei_me(34170)├─nvidia(34202)├─kworker/0:0-events(37274)├─kworker/2:3-events(37740)├─kworker/1:2-events(37938)├─kworker/u17:0-rb_allocator(38202)├─kworker/5:1-events(38228)├─kworker/0:2-events(38232)├─kworker/7:2-events(38234)├─kworker/u17:2-i915_flip(38244)├─kworker/4:1-events(38245)├─kworker/6:2-events(38344)├─kworker/1:0-events(38394)├─kworker/5:2-events(38411)├─kworker/3:2-events(38433)├─kworker/7:1-events(38529)├─kworker/6:1-events(38557)├─kworker/u16:3-events_unbound(38578)├─kworker/u16:0-events_unbound(38688)├─kworker/u16:1-i915(38713)├─kworker/4:2-mm_percpu_wq(38834)├─kworker/3:1-events(38835)├─kworker/2:0-events(38900)├─kworker/7:0-mm_percpu_wq(38967)├─kworker/5:0-events(38972)├─kworker/u17:1(38981)├─kworker/6:0-events(39035)├─kworker/0:1(39036)└─kworker/3:0-events(39054)

其它技术点

1.如何判断一个进程是否是单线程进程?内核中的函数current_is_single_thread.

查看其它任务的调用栈

sched_show_task是一个EXPORT_SYMBOL_GPL的符号,可以用在模块或者内核中,查看任意一个任务的调用堆栈情况。

总结:

结合最开始的图和代码,我们可以得到如下结论:

1.kthead衍生的内核线程没有session pid,说人话就是所有的内核线程没有都没有操作控制台,不能通过控制台去操作控制。

参考

处理器从单核到多核的演化过程_papaofdoudou的博客-CSDN博客


End!

Linux内核进程,线程,进程组,会话组织模型以及进程管理相关推荐

  1. 进程组 会话 作业

    一.进程组.作业.会话 1.进程组(process group) 每个进程除了有一个进程ID(保存在PCB当中),还属于一个进程组.进程组由一个或多个进程组成,通常和一个作业相关联,可以接收来自同一终 ...

  2. Python之路 34:并发与并行、锁(GIL、同步锁、死锁与递归锁)、信号量、线程队列、生消模型、进程(基础使用、进程通信、进程池、回调函数)、协程

    内容: 同步锁 死锁.递归锁 信号量和同步对象(暂时了解即可) 队列------生产者和消费者模型 进程(基础使用.进程通信.进程池.回调函数) 协程 一.并发并行与同步异步的概念 1.1.并发和并行 ...

  3. linux中线程的挂起与恢复(进程暂停)

    今 天 在网上查了一下 linux中对进程的挂起与恢复的实现,相关资料少的可怜,大部分都是粘贴复制.也没有完整详细的代码.故自己整理了一下 程序流程为:主线程创建子线程(当前子线程状态为stop停止状 ...

  4. linux孤儿进程组深入理解,unix基础---有关于孤儿进程组和终端会话的理解-------值得一看!!!...

    转载自http://xingyunbaijunwei.blog.163.com/blog/static/765380672011112633634628/ 孤儿进程: 即一个其父进程已经终止的进程.  ...

  5. linux 进程组id 错乱,【Linux】终端,进程组,作业,会话及作业控制

    终端 概念 在UNIX系统中,用用户通过终端登录系统后得到一一个Shell进程,这个终端成为Shell进程的控制终端 (Controlling Terminal),控制终端是保存在PCB中的信息,而我 ...

  6. linux操作系统之进程组及会话

    (1)进程组(作业):代表一个或多个进程的集合. 1)父进程创建子进程时,默认子进程与父进程属于同一进程组,进程组id==第一个进程id(组长id,父进程id). 2)使用kill -SIGKILL ...

  7. linux 会话 进程组 守护进程

    Linux 下每个进程都会有一个非负整数表示的唯一进程 ID ,简称 pid . Linux 提供了 getpid 函数来获取 进程的 pid ,同时还提供了 getppid 函数来获取父进程的 pi ...

  8. 一文读懂Linux进程、进程组、会话、僵尸

    作者简介 herongwei,北交硕士毕业,现就职于搜狗公司,后端开发工程师.从事 C++,Golang ,Linux 后端开发. 追求技术,热爱编程与分享,希望能和大家多多交流学习~ 座右铭:    ...

  9. linux进程--进程组、会话、守护进程(八)

    进程组 一个或多个进程的集合 进程组ID: 正整数 两个函数 getpgid(0)=getpgrp() 例子:显示子进程与父进程的进程组id #include <stdio.h> #inc ...

最新文章

  1. eosjs-ecc中文文档
  2. POJ1679判断最小生成树的唯一性
  3. CLR Via C# 3rd 阅读摘要 -- Chapter 24 – Runtime Serialization
  4. 查看pod网络范围_可用网络的ip地址范围
  5. FTP 500 OOPS
  6. android 状态机的作用,Android 状态机、状态模式 基础框架实现
  7. html相对路径载入页面,html页面的绝对路径和相对路径
  8. ASP.NET MVC @helper使用说明
  9. web mis系统构建
  10. px,em,rem单位转换工具
  11. SEGGER J-Flash烧写SN号(serial number)的两种方式
  12. zk服务启动报错:Unexpected exception, exiting abnormally.java.io.IOException:
  13. 2021年起重机械指挥考试技巧及起重机械指挥考试试题
  14. 硕士毕业论文写多少字
  15. 经常逛Github的人才知道的开源框架
  16. fastboot烧录镜像--VTSGSI镜像替换
  17. 我叫张强,somnus
  18. Commvault恢复时间超过2小时,网络断开问题
  19. NO.25-SAP S4 HANA Cloud EX版本介绍
  20. 子网掩码是什么,IP段的24是什么写法

热门文章

  1. 【Android-Service】基于MVP的音乐播放器demo实现思路(附源码)
  2. mac电脑如何打包dmg安装包文件
  3. 小尺寸屏幕能否解决Windows平板电脑的真正问题
  4. 蓝桥杯参赛总结(Java B组)
  5. LaTeX技巧心得28:如何在文中实现带圈的数字和圈中加号
  6. java+sql+物流快递管理系统
  7. Desire(G7) 联通3g上网与彩信的设置方法
  8. shell脚本进入某个目录返回当前目录写法
  9. Libsvm和Liblinear的使用经验谈
  10. AE2019安装Optical Flare插件没有显示的问题