关于pthread和kthread的说明

pthread时os对外提供的POSIX接口，并不是linux的概念。用户态程序可以调用pthread的接口创建、运行和销毁用户态线程。在下面的内容中，用pthread表示用户态线程，用kthread表示内核线程。在linux中，用户态线程和内核线程都是用struct task_struct表示的。

和kthread/pthread区别对待相关的数据结构

linux中的TCB是task_struct，定义在文件./include/linux/sched.h中，和调度时和kthread和pthread区别对待相关的域如下：

struct task_struct {/* ... */struct thread_info  thread_info;/* ... *//* Per task flags (PF_*), defined further below: */unsigned int   flags;/* ... */struct mm_struct  *mm;struct mm_struct  *active_mm;/* ... */
}/** Per process flags*/
#define PF_IDLE   0x00000002 /* I am an IDLE thread */
#define PF_EXITING  0x00000004 /* Getting shut down */
#define PF_VCPU   0x00000010 /* I'm a virtual CPU */
#define PF_WQ_WORKER  0x00000020 /* I'm a workqueue worker */
#define PF_FORKNOEXEC  0x00000040 /* Forked but didn't exec */
#define PF_MCE_PROCESS  0x00000080      /* Process policy on mce errors */
#define PF_SUPERPRIV  0x00000100 /* Used super-user privileges */
#define PF_DUMPCORE  0x00000200 /* Dumped core */
#define PF_SIGNALED  0x00000400 /* Killed by a signal */
#define PF_MEMALLOC  0x00000800 /* Allocating memory */
#define PF_NPROC_EXCEEDED 0x00001000 /* set_user() noticed that RLIMIT_NPROC was exceeded */
#define PF_USED_MATH  0x00002000 /* If unset the fpu must be initialized before use */
#define PF_USED_ASYNC  0x00004000 /* Used async_schedule*(), used by module init */
#define PF_NOFREEZE  0x00008000 /* This thread should not be frozen */
#define PF_FROZEN  0x00010000 /* Frozen for system suspend */
#define PF_KSWAPD  0x00020000 /* I am kswapd */
#define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */
#define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */
#define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
#define PF_KTHREAD  0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE  0x00400000 /* Randomize virtual address space */
#define PF_SWAPWRITE  0x00800000 /* Allowed to write to swap */
#define PF_MEMSTALL  0x01000000 /* Stalled due to lack of memory */
#define PF_UMH   0x02000000 /* I'm an Usermodehelper process */
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */
#define PF_MCE_EARLY  0x08000000      /* Early kill for mce process policy */
#define PF_MEMALLOC_NOCMA 0x10000000 /* All allocation request will have _GFP_MOVABLE cleared */
#define PF_IO_WORKER  0x20000000 /* Task is an IO worker */
#define PF_FREEZER_SKIP  0x40000000 /* Freezer should not count it as freezable */
#define PF_SUSPEND_TASK  0x80000000      /* This thread called freeze_processes() and should not be frozen */

flag字段是task的标志位，标志位所有的信息如上所示。其中PF_KTHREAD标志位置1表示task是一个kthread。

struct mm_struct是记录用户虚拟地址空间的数据结构。对于内核线程，mm为NULL。由于内核线程之前可能是任何用户层进程在执行，故用户空间部分的内容本质上是随机的，内核线程决不能修改其内容，故将mm设置为NULL，同时如果切换出去的是用户进程，内核将原来进程的mm存放在新内核线程的active_mm中，因为某些时候内核必须知道用户空间当前包含了什么。在linux的代码实现中，这个字段和fpu的load/store没有关系，只是和kthread和pthread的区别对待有关系。

thread_info 定义在./arch/x86/include/asm/thread_info.h的结构如下。它的flag字段设有32个标志位，如下所示。其中TIF_NEED_FPU_LOAD标志位置1，当task被调度到，在返回用户态之前，或者kernel需要用到fpu时，fpu会被加载到cpu上。

struct thread_info {unsigned long  flags;  /* low level flags */u32   status;  /* thread synchronous flags */
};/** thread information flags* - these are process state flags that various assembly files*   may need to access*/
#define TIF_SYSCALL_TRACE 0 /* syscall trace active */
#define TIF_NOTIFY_RESUME 1 /* callback before returning to user */
#define TIF_SIGPENDING  2 /* signal pending */
#define TIF_NEED_RESCHED 3 /* rescheduling necessary */
#define TIF_SINGLESTEP  4 /* reenable singlestep on user return*/
#define TIF_SSBD  5 /* Speculative store bypass disable */
#define TIF_SYSCALL_EMU  6 /* syscall emulation active */
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
#define TIF_SECCOMP  8 /* secure computing */
#define TIF_SPEC_IB  9 /* Indirect branch speculation mitigation */
#define TIF_SPEC_FORCE_UPDATE 10 /* Force speculation MSR update in context switch */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_UPROBE  12 /* breakpointed or singlestepping */
#define TIF_PATCH_PENDING 13 /* pending live patching update */
#define TIF_NEED_FPU_LOAD 14 /* load FPU on return to userspace */
#define TIF_NOCPUID  15 /* CPUID is not accessible in userland */
#define TIF_NOTSC  16 /* TSC is not accessible in userland */
#define TIF_IA32  17 /* IA32 compatibility process */
#define TIF_NOHZ  19 /* in adaptive nohz mode */
#define TIF_MEMDIE  20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
#define TIF_IO_BITMAP  22 /* uses I/O bitmap */
#define TIF_FORCED_TF  24 /* true if TF in eflags artificially */
#define TIF_BLOCKSTEP  25 /* set when we want DEBUGCTLMSR_BTF */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define TIF_ADDR32  29 /* 32-bit address space on 64 bits */
#define TIF_X32   30 /* 32-bit native x86-64 binary */
#define TIF_FSCHECK  31 /* Check FS is USER_DS on return */

调用图

具体的linux代码实现在后文中介绍。这里是调度和返回用户态的过程中和fpu load/store相关的过程。图中，箭头线表示执行流(在前一个空心箭头的上级函数中），而空心箭头表示函数调用。

fpu load/store 在调度中的处理

fpu在调度中的处理流程如下：

shedule in /kernel/sched/core.c是调度主要的函数。

asmlinkage __visible void __sched schedule(void){/* ... */do {preempt_disable();__schedule(false);sched_preempt_enable_no_resched();} while (need_resched());/* ... */
}

__schedule in ./kernel/sched/core.c。更新调度队列的当前task(rq->cur)(但是此时还换栈，下文中的current指的都是栈上的current task，也就是old task)，并调用context_switch进行上下文切换。

static void __sched notrace __schedule(bool preempt){struct task_struct *prev, *next;unsigned long *switch_count;struct rq_flags rf;struct rq *rq;int cpu;cpu = smp_processor_id();rq = cpu_rq(cpu);prev = rq->curr;/* ... */next = pick_next_task(rq, prev, &rf);/* ... */if (likely(prev != next)) {/* ... */RCU_INIT_POINTER(rq->curr, next);/* ... */rq = context_switch(rq, prev, next, &rf);} else {/* ... */}/* ... */
}

context_switch in ./kernel/sched/core.c。

/** context_switch - switch to the new MM and the new thread's register state.*/
static __always_inline struct rq *
context_switch(struct rq *rq, struct task_struct *prev,struct task_struct *next, struct rq_flags *rf){/* ... *//* Here we just switch the register state and the stack. */switch_to(prev, next, prev);/* ... */}

switch_to in ./arch/x86/include/asm/switch_to.h。

#define switch_to(prev, next, last)     \
do {         \prepare_switch_to(next);     \\((last) = __switch_to_asm((prev), (next)));   \
} while (0)

__switch_to_asm in ./arch/x86/kernel/process_64.c。对fpu的store/load分两步进行：首先，它调用test_thread_flag(TIF_NEEED_FPU_LOAD) 判断是否需要store fpu(old thread)，然后调用switch_fpu_prepare store fpu；其次，调用switch_fpu_finish把TIF_NEED_FPU_LOAD位置1，这样当新的thread返回用户态，或者需要用到fpu时，就会load fpu。

__visible __notrace_funcgraph struct task_struct *
__switch_to(struct task_struct *prev_p, struct task_struct *next_p){struct thread_struct *prev = &prev_p->thread;struct thread_struct *next = &next_p->thread;struct fpu *prev_fpu = &prev->fpu;struct fpu *next_fpu = &next->fpu;int cpu = smp_processor_id();/* ... */if (!test_thread_flag(TIF_NEED_FPU_LOAD))switch_fpu_prepare(prev_fpu, cpu);/* ... */switch_fpu_finish(next_fpu);/* ... */
}

test_thread_flag in ./include/linux/thread_info.h，检查对应的位是否置1，这个调用链中是检查TIF_NEED_FPU_LOAD。

#define test_thread_flag(flag) \test_ti_thread_flag(current_thread_info(), flag)static inline int test_ti_thread_flag(struct thread_info *ti, int flag){return test_bit(flag, (unsigned long *)&ti->flags);
}#define current_thread_info() ((struct thread_info *)current)/* in ./arch/x86/include/asm/current.h */
DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void){return this_cpu_read_stable(current_task);
}
#define current get_current()/* in ./include/asm-generic/bitops/non-atomic.h */
/*** test_bit - Determine whether a bit is set* @nr: bit number to test* @addr: Address to start counting from*/
static inline int test_bit(int nr, const volatile unsigned long *addr)
{return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}

switch_fpu_prepare in ./arch/x86/include/asm/fpu/internal.h。调用copy_fpregs_to_fpstate进行fpu store。

/** FPU state switching for scheduling.** This is a two-stage process:**  - switch_fpu_prepare() saves the old state.*    This is done within the context of the old process.**  - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state*    will get loaded on return to userspace, or when the kernel needs it.** If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers* are saved in the current thread's FPU register state.** If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not* hold current()'s FPU registers. It is required to load the* registers before returning to userland or using the content* otherwise.** The FPU context is only stored/restored for a user task and* PF_KTHREAD is used to distinguish between kernel and user threads.*/
static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu){if (static_cpu_has(X86_FEATURE_FPU) && !(current->flags & PF_KTHREAD)) {if (!copy_fpregs_to_fpstate(old_fpu))old_fpu->last_cpu = -1;elseold_fpu->last_cpu = cpu;/* But leave fpu_fpregs_owner_ctx! */trace_x86_fpu_regs_deactivated(old_fpu);}
}// in ./arch/x86/include/asm/cpufeatures.h
/* Intel-defined CPU features, CPUID level 0x00000001 (EDX), word 0 */
#define X86_FEATURE_FPU   ( 0*32+ 0) /* Onboard FPU */

switch_fpu_finish in ./arch/x86/include/asm/fpu/internal.h。设置新 task的TIF_NEED_FPU_LOAD标志位(在返回用户态之前，会检查并根据结果load fpu)。

/** Load PKRU from the FPU context if available. Delay loading of the* complete FPU state until the return to userland.*/
static inline void switch_fpu_finish(struct fpu *new_fpu){/* ... */if (!static_cpu_has(X86_FEATURE_FPU))return;set_thread_flag(TIF_NEED_FPU_LOAD);/* ... */
}

prepare_exit_to_usermode in ./arch/x86/entry/common.c。当调度结束返回用户态之前，会调用prepare_exit_to_usermode，它会调用switch_fpu_return 完成fpu load的操作。

/* Called with IRQs disabled. */
__visible inline void prepare_exit_to_usermode(struct pt_regs *regs){struct thread_info *ti = current_thread_info();u32 cached_flags;/* ... */cached_flags = READ_ONCE(ti->flags);/* ... */if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD))switch_fpu_return();/* ... */
}

switch_fpu_return in ./arch/x86/kernel/fpu/core.c。它调用__fregs_load_activate in ./arch/x86/include/asm/fpu/internal.h进行fpu的加载。首先，如果当前线程是kthread，就不会load fpu；然后，load fpu；最后，清除TIF_NEED_FPU_LOAD标志位。清除操作是因为调度时，会根据这个标志位进行store fpu的操作。如果是kthread，这个标志位就不会被清除，所以kthread被调度出去的时候不会保存fpu的状态；kthread被调度进来的时候也不会恢复fpu的状态。如果是pthread，这个标志位会被清除，所以pthread被调度出去的时候会保存fpu的状态；pthread被调度进来的时候会在返回用户态之前恢复fpu的状态。

/** Load FPU context before returning to userspace.*/
void switch_fpu_return(void){if (!static_cpu_has(X86_FEATURE_FPU))return;__fpregs_load_activate();
}
EXPORT_SYMBOL_GPL(switch_fpu_return);/** Internal helper, do not use directly. Use switch_fpu_return() instead.*/
static inline void __fpregs_load_activate(void){struct fpu *fpu = &current->thread.fpu;int cpu = smp_processor_id();if (WARN_ON_ONCE(current->flags & PF_KTHREAD))return;if (!fpregs_state_valid(fpu, cpu)) {copy_kernel_to_fpregs(&fpu->state);fpregs_activate(fpu);fpu->last_cpu = cpu;}clear_thread_flag(TIF_NEED_FPU_LOAD);
}

fpu load/store 在KVM中的处理

当vm-exit发生时，在内核中会执行当前线程(也就是vcpu对应的线程，是一个普通的线程，可以是kthread，也可以是pthread。但在QEMU/KVM中，这个线程是一个pthread)。然后，vcpu线程和别的线程一样参与调度，而在调度的过程中对fpu load/store的处理和前文描述的一样。不同的是，kvm是要vm-entry到客户机执行，而非返回到用户态。所以需要和返回用户态一样，在返回客户机执行前执行load fpu的操作。

vcpu_enter_guest in ./arch/x86/kvm/x86.c

/** Returns 1 to let vcpu_run() continue the guest execution loop without* exiting to the userspace.  Otherwise, the value will be returned to the* userspace.*/
static int vcpu_enter_guest(struct kvm_vcpu *vcpu){/* .... */if (test_thread_flag(TIF_NEED_FPU_LOAD))switch_fpu_return();/* ... */kvm_x86_ops->run(vcpu);/* ... */
}

根据以上的描述。如果vcpu对应的thread是kthread，那么到那个kthread被调度出去时，os调度器不会执行store fpu的操作，这就需要在vm-exit手动的执行fpu store，以防被调度出去时，客户机需要的fpu state被修改。如果vcpu对应的thread选择为pthread，那么当pthread被调度出去时，os调度器会执行store fpu的操作；当要返回客户机之前，需要kvm执行load fpu的操作。

参考

参考内容中的表述并不严谨
Linux内核线程kernel thread详解–Linux进程的管理与调度

linux 上下文切换时对用户task和内核task区别对待——针对fpu相关推荐

Linux Malloc分析-从用户空间到内核空间
Linux Malloc分析-从用户空间到内核空间本文介绍malloc的实现及其malloc在进行堆扩展操作,并分析了虚拟地址到物理地址是如何实现映射关系. ordeder原创,原文链接: http ...
linux 各用户内存_Linux用户空间与内核空间（理解高端内存）
Linux 操作系统和驱动程序运行在内核空间,应用程序运行在用户空间,两者不能简单地使用指针传递数据,因为Linux使用的虚拟内存机制,用户空间的数据可能被换出,当内核空间使用用户空间指针时,对应的数 ...
Linux源码研究-用户管理员手册-内核命令行参数
下面的列表是__setup(), core_param()和module_param()宏实现的内核参数,内核从命令-开始解析参数,如果参数不被识别,也不包含".",参数会被用来启 ...
linux信号机制－用户堆栈和内核堆栈的变化【转】
转自:http://itindex.net/detail/16418-linux-%E4%BF%A1%E5%8F%B7-%E5%A0%86%E6%A0%88 此文只简单分析发送信号给用户程序后,用户堆 ...
Linux Malloc分析-从用户空间到内核空间【转】
转自:http://blog.csdn.net/ordeder/article/details/41654509 版权声明:本文为博主(http://blog.csdn.net/ordeder)原创文 ...
linux下根目录(/)和用户目录(主目录；~)的区别
根目录:/是树状形式目录的根,只有一个. 用户目录:主目录是用户的HOME目录,是在添加用户的时候指定的,对于不同用户,主目录不同. 一般情况下表示当前目录,即当开终端的时候,都是在用户名目录下面的, ...
CPU的用户态和内核态
内核从本质上看是一种软件--控制计算机的硬件资源用户态和内核态也可以说是对应系统执行权限的一个分级个人理解慎浅,多是参考别人的文章,日后工作之于在研究文章目录指令集一.用户态二.内核态三 ...
ioctrl原形 linux_Linux常见的几种用户态与内核态交互方式优缺点
背景由于Linux 系统分为了用户态和内核态,用户态在设计初衷就是运行与硬件无关的应用程序,与硬件相关的操作大部分都集中在内核态处理.所以用户态如果需要获取硬件信息,或者操作硬件,必须要通过某种方式 ...
「操作系统」什么是用户态和内核态？为什么要区分
「操作系统」什么是用户态和内核态?为什么要区分参考&鸣谢从根上理解用户态与内核态程序员阿星并发编程(二十六)内核态和用户态 Lovely小猫操作系统之内核态与用户态 fimm 文章目 ...

linux 上下文切换时对用户task和内核task区别对待——针对fpu

目录

关于pthread和kthread的说明

和kthread/pthread区别对待相关的数据结构

调用图

fpu load/store 在调度中的处理

fpu load/store 在KVM中的处理

参考

linux 上下文切换时对用户task和内核task区别对待——针对fpu相关推荐

最新文章

热门文章