目录

Priority inheritance in the kernel

译文


Priority inheritance in the kernel

https://lwn.net/Articles/178253/

[Posted April 3, 2006 by corbet]


Imagine a system with two processes running, one at high priority and the other at a much lower priority. These processes share resources which are protected by locks. At some point, the low-priority process manages to run and obtains a lock for one of those resources. If the high-priority process then attempts to obtain the same lock, it will have to wait. Essentially, the low-priority process has trumped the high-priority process, at least for as long as it holds the contended lock.

Now imagine a third process, one which uses a lot of processor time, and which has a priority between the other two. If that process starts to crank, it will push the low-priority process out of the CPU indefinitely. As a result, the third process can keep the highest-priority process out of the CPU indefinitely.

This situation, called "priority inversion," tends to be followed by system failure, upset users, and unemployed engineers. There are a number of approaches to avoiding priority inversion, including lockless designs, carefully thought-out locking scenarios, and a technique known as priority inheritance. The priority inheritance method is simple in concept: when a process holds a lock, it should run at (at least) the priority of the highest-priority process waiting for the lock. When a lock is taken by a low-priority process, the priority of that process might need to be boosted until the lock is released.

There are a number of approaches to priority inheritance. In effect, the kernel performs a very simple form of it by not allowing kernel code to be preempted while holding a spinlock. In some systems, each lock has a priority associated with it; whenever a process takes a lock, its priority is raised to the lock's priority. In others, a high-priority process will have its priority "inherited" by another process which holds a needed lock. Most priority inheritance schemes have shown a tendency to complicate and slow down the locking code, and they can be used to paper over poor application designs. So they are unpopular in many circles. Linus was reasonably clear about how he felt on the subject last December:

"Friends don't let friends use priority inheritance".

Just don't do it. If you really need it, your system is broken anyway.

Faced with this sort of opposition, many developers would quietly shelve their priority inheritance designs and go back to working on accounting code. The kernel development community, however, happens to have a member who has a track record of getting code merged in spite of this sort of objection: Ingo Molnar. History may well repeat itself, as Ingo (working with Thomas Gleixner) has posted a priority-inheriting futex implementation with a request that it be merged into the mainline. This approach, says Ingo, provides a useful functionality to user space (it is not meant to provide priority-inheriting kernel mutual exclusion primitives) while avoiding the pitfalls which have hit other implementations.

The PI-futex patch adds a couple of new operations to the futex() system call: FUTEX_LOCK_PI and FUTEX_UNLOCK_PI. In the uncontended case, a PI-futex can be taken without involving the kernel at all, just like an ordinary futex. When there is contention, instead, the FUTEX_LOCK_PI operation is requested from the kernel. The requesting process is put into a special queue, and, if necessary, that process lends its priority to the process actually holding the contended futex. The priority inheritance is chained, so that, if the holding process is blocked on a second futex, the boosted priority will propagate to the holder of that second futex. As soon as a futex is released, any associated priority boost is removed.

As with regular futexes, the kernel only needs to know about a PI-futex while it is being contended. So the number of futexes in the system can become quite large without serious overhead on the kernel side.

Within the kernel, the PI-futex type is implemented by way of a new primitive called an rt_mutex. The rt_mutex is superficially similar to regular mutexes, with the addition of the priority inheritance capability. They are, however, an entirely different type, with no code shared with the mutex implementation. The API will be familiar to mutex users, however; in brief, it is:

    #include <linux/rtmutex.h>void rt_mutex_init(struct rt_mutex *lock);void rt_mutex_destroy(struct rt_mutex *lock);void rt_mutex_lock(struct rt_mutex *lock);int rt_mutex_lock_interruptible(struct rt_mutex *lock, int detect_deadlock);int rt_mutex_timed_lock(struct rt_mutex *lock,struct hrtimer_sleeper *timeout,int detect_deadlock);int rt_mutex_trylock(struct rt_mutex *lock);void rt_mutex_unlock(struct rt_mutex *lock);int rt_mutex_is_locked(struct rt_mutex *lock);

The alert reader may have noticed that this looks much like the realtime mutex type found in the realtime preemption patch. Ingo once said that the realtime patches would slowly trickle into the mainline, and that is what appears to be happening here. With this patch set, the PI-futex code is the only user of the new rt_mutex type, but that could certainly change over time.

The PI-futex patch also includes a new, priority-sorted list type which could find users elsewhere in the kernel.

There has been relatively little discussion of this patch so far; it has been included in recent -mm trees. It is too late for 2.6.17, but, if no real opposition develops, the PI-futex code might just find its way into a subsequent kernel.

译文


想象一个系统有两个正在运行的进程,一个处于高优先级,另一个处于低得多的优先级。这些进程共享受锁保护的资源。在某个时候,低优先级进程设法运行并获得这些资源之一的锁。如果高优先级进程随后尝试获取相同的锁,则必须等待。从本质上讲,低优先级进程胜过高优先级进程,至少在保持竞争锁的情况下如此。

现在想象第三个进程,它使用大量的处理器时间,并且在其他两个进程之间具有优先级。如果该进程开始启动,它将无限期将低优先级进程推出CPU。结果,第三个进程可以无限期地将优先级最高的进程排除在CPU外。

这种情况称为“优先级倒置”,通常会导致系统故障,用户不满和工程师失业。有许多避免优先级倒置的方法,包括无锁设计,经过深思熟虑的锁定方案以及称为优先级继承的技术。优先级继承方法在概念上很简单:当进程持有锁时,它应(至少)以等待锁的最高优先级进程的优先级运行。当低优先级进程获取锁时,可能需要提高该进程的优先级,直到释放锁为止。

有许多优先级继承的方法。实际上,内核通过不允许在持有自旋锁的同时抢占内核代码来执行一种非常简单的形式。在某些系统中,每个锁都有与之关联的优先级。每当进程获取锁时,其优先级都会提高到锁的优先级。在其他情况下,高优先级的进程将由拥有所需锁的另一个进程“继承”其优先级。大多数优先级继承方案已显示出使锁定代码复杂化和减慢其趋势的趋势,并且它们可用于覆盖较差的应用程序设计。因此,它们在许多圈子中都不受欢迎。莱纳斯在去年12月对这个问题的看法相当清楚:

“朋友不让朋友使用优先级继承”。

只是不要这样做。如果您确实需要它,那么您的系统还是会坏掉的。

面对这种反对,许多开发人员会悄悄搁置其优先级继承设计,然后重新开始使用会计代码。然而,尽管有这样的反对,内核开发社区碰巧拥有一个拥有合并代码的记录的成员:Ingo Molnar。历史可能会重演,因为Ingo(与Thomas Gleixner合作)发布了优先级继承的futex实现,并要求将其合并到主线中。Ingo说,这种方法为用户空间提供了有用的功能(这并不意味着提供具有优先级的内核互斥基元),同时避免了打击其他实现的陷阱。

PI-futex补丁向futex() 系统调用添加了两个新操作:FUTEX_LOCK_PI和FUTEX_UNLOCK_PI。在无竞争的情况下,就像普通的futex一样,可以完全不涉及内核地获取PI-futex。发生争用时,而是 从内核请求FUTEX_LOCK_PI操作。请求进程被放入一个特殊的队列中,并且,如果有必要,该进程将其优先级赋予实际持有竞争的futex的进程。优先级继承是链接在一起的,因此,如果在第二个futex上阻止了保持过程,则提升后的优先级将传播到该第二个futex的持有者。释放后缀后,所有关联的优先级提升都将被删除。

与常规futex一样,内核只需要在竞争PI futex时就知道它。因此,系统中的futex数量可以变得非常大,而不会在内核方面造成严重的开销。

在内核中,PI-futex类型是通过称为rt_mutex的新原语实现的。该rt_mutex是表面上类似于常规的互斥体,增加了优先级继承能力。但是,它们是完全不同的类型,没有与互斥体实现共享的代码。但是,互斥用户会熟悉该API。简而言之,它是:

    #include <linux/rtmutex.h>void rt_mutex_init(struct rt_mutex *lock);void rt_mutex_destroy(struct rt_mutex *lock);void rt_mutex_lock(struct rt_mutex *lock);int rt_mutex_lock_interruptible(struct rt_mutex *lock, int detect_deadlock);int rt_mutex_timed_lock(struct rt_mutex *lock,struct hrtimer_sleeper *timeout,int detect_deadlock);int rt_mutex_trylock(struct rt_mutex *lock);void rt_mutex_unlock(struct rt_mutex *lock);int rt_mutex_is_locked(struct rt_mutex *lock);

警报阅读器可能已经注意到,这看起来很像实时抢先补丁中的实时互斥锁类型。Ingo曾经说过,实时补丁会慢慢滴入主线,这似乎正在发生。使用此补丁集,PI-futex代码是新的rt_mutex类型的唯一用户,但是随着时间的推移,它肯定会发生变化。

PI-futex补丁程序还包括一个新的,按优先级排序的列表类型,可以在内核中的其他位置找到用户。

到目前为止,对该补丁的讨论相对较少。它已包含在最近的-mm树中。对于2.6.17来说为时已晚,但是,如果没有真正的对立,则PI-futex代码可能会直接进入后续内核。

Index entries for this article
Kernel Futex
Kernel Locking mechanisms/Mutexes
Kernel Priority inheritance
Kernel Realtime

Linux进程管理:内核中的优先级继承互斥(rtmutex.h):防止优先级反转相关推荐

  1. linux 进程管理 task_struct 中 parent/children/sibling 成员的关系

    前言 最近在看<Linux内核设计与实现(原书第3版)>中第进程管理,有点疑问,上网顺便补习了linux内核链表第相关知识,在此记录下来. 疑问 书中写到: 3.2.6 进程家族树 ... ...

  2. 内核该怎么学?Linux进程管理工作原理(代码演示)

    前言:Linux内核里大部分都是C语言.建议先看<Linux内核设计与实现(Linux Kernel Development)>,Robert Love,也就是LKD. Linux是一种动 ...

  3. Linux进程管理 (7)实时调度

    关键词:RT.preempt_count.RT patch. 除了CFS调度器之外,还包括重要的实时调度器,有两种RR和FIFO调度策略.本章只是一个简单的介绍. 更详细的介绍参考<Linux进 ...

  4. Linux—进程管理

    1. 进程的概念 Linux是一个多用户多任务的操作系统.多用户是指多个用户可以在同一时间使用同一个linux系统:多任务是指在Linux下可以同时执行多个任务,更详细的说,linux采用了分时管理的 ...

  5. Linux进程管理:进程和线程基础知识

    <Linux进程管理:进程和线程基础知识> <Linux-进程管理> <C语言进程的内存地址空间分配> <进程和线程模型> <(1)Linux进程 ...

  6. Linux 进程管理与调度

    引言 本文整理了 Linux 内核中进程管理与调度的相关知识.更多相关文章和其他文章均收录于贝贝猫的文章目录. 进程管理与调度 现代操作系统都能同时运行多个进程,至少从用户的角度来看是这个样子的.每一 ...

  7. Linux系列教程——1 Linux磁盘管理、2 Linux进程管理、3 Linux系统服务、 4 Linux计划任务

    文章目录 1 Linux磁盘管理 1.磁盘的基本概念 1.什么是磁盘 2.磁盘的基本结构 3.磁盘的预备知识 1.磁盘的接口类型 2.磁盘的基本术语 3.磁盘在系统上的命名方式 4.磁盘基本分区Fdi ...

  8. linux 的 swapper 进程不能sleep,Linux进程管理与调度.ppt

    Linux进程管理与调度 关于进程与线程Linux进程实现Linux进程调度策略Linux进程调度实现 1Linux进程与线程 Linux进程Linux线程 进程作为资源分配的基本单位而存在 线程作为 ...

  9. Linux 进程管理工具

    Linux进程管理命令:     pstree.ps.top.pidof.htop.glances.pmap.vmstat.dstat.kill.pkill.job.bg.fg.nohup.pgrep ...

  10. ​Linux进程管理工具

    Linux进程管理工具 一.pstree:查看进程树结构 二.ps命令使用 1.命令说明 ps:显示进程状态的命令,快照式.一次性 2.常用组合参数 aux: -ef: 例如: 3.各选项代表意义: ...

最新文章

  1. 解决报错:Can't read private key和./build-aux/cksum-schema-check: Permission denied
  2. 通知 | “大数据能力提升项目”证书办理及领取(2021年秋季学期)
  3. Oracle PL/SQL入门之慨述
  4. SanFengClound
  5. 笔记-项目干系人管理
  6. QOMO Linux 4.0 正式版发布
  7. ubuntu 安装 swoole 和mac 安装swoole 扩展
  8. mysql 排名_MySQL和Hive中的排名问题
  9. 【Flink】No key set. This method should not be called outside of a keyed context.
  10. Android4.4 添加系统属性
  11. [收藏]深入浅出的《网络socket编程指南》4
  12. linux 系统下 tar 的压缩与解压缩命令
  13. oracle中jason串,在oracle中使用json
  14. OpenCV探索之路(十六):图像矫正技术深入探讨
  15. SQL Server数据库的管理
  16. JDK8新特性02 Lambda表达式02_Lambda语法规则
  17. ubuntu16.04耳机没有声音解决办法
  18. Windows中 配置DHCP服务器
  19. 51单片机:共阴数码管动态显示(定时器+中断)
  20. ImageIO 本地读取,网络下载图片

热门文章

  1. 为什么代码规范要求SQL语句不要过多的join?
  2. c语言中栈的作用,栈(Stack)的概念和应用及C语言实现
  3. Python模块_re正则表达式模块-2
  4. controller,sevices层,java初步了解
  5. OUTLOOK2019 解决 无法验证您连接到的服务器使用的安全证书
  6. java a%2==0_Java 初始化a=2 打印a+++a++为5
  7. cpu使用率_线程CPU使用率到底该如何计算?
  8. sata接口 图解 定义_硬盘有几个接口 硬盘不同接口介绍【详解】
  9. rx2700_第二代锐龙 7 2700X 台式处理器 | AMD
  10. java 实体字段变更记录_java – Hibernate:检查哪个实体的字段被修改