转载 http://blog.chinaunix.net/uid-24774106-id-3379478.html 
linux进程调度之 FIFO 和 RR 调度策略 2012-10-19 18:16:43





    在用户空间,或者应用编程领域 ,Linux提供了一些API或者系统调用来影响Linux的内核调度器,或者是获取内核调度器的信息。比如可以获取或者设置进程的调度策略、优先级,获取CPU时间片大小的信息。
    1 在一定程度上,实时进程优先级高,实时进程存在,就没有普通进程占用CPU的机会,(但是前一篇博文也讲过了,实时组调度出现在内核以后,允许普通进程占用少量的CPU时间,取决于配置)。
    2 对于实时进程而言,高优先级的进程存在,低优先级的进程是轮不上的,没机会跑在CPU上,所谓实时进程的调度策略,指的是相同优先级之间的调度策略。如果是FIFO实时进程在占用CPU,除非出现以下事情,否则FIFO一条道跑到黑。
     a)FIFO进程良心发现,调用了系统调用sched_yield 自愿让出CPU
     b) 更高优先级的进程横空出世,抢占FIFO进程的CPU。有些人觉得很奇怪,怎么FIFO占着CPU,为啥还能有更高优先级的进程出现呢。别忘记,我们是多核多CPU ,如果其他CPU上出现了一个比FIFO优先级高的进程,可能会push到FIFO进程所在的CPU上。
  1. #include<stdio.h>
  2. #include<stdlib.h>
  3. #include<unistd.h>
  4. #include<sys/time.h>
  5. #include<sys/types.h>
  6. #include<sys/sysinfo.h>
  7. #include<time.h>
  8. #define __USE_GNU
  9. #include<sched.h>
  10. #include<ctype.h>
  11. #include<string.h>
  12. #define COUNT 300000
  13. #define MILLION 1000000L
  14. #define NANOSECOND 1000
  15. void test_func()
  16. {
  17. int i = 0;
  18. unsigned long long result = 0;;
  19. for(i = 0; i<8000 ;i++)
  20. {
  21. result += 2;
  22. }
  23. }
  24. int main(int argc,char* argv[])
  25. {
  26. int i;
  27. struct timespec sleeptm;
  28. long interval;
  29. struct timeval tend,tstart;
  30. struct tm lcltime = {0};
  31. struct sched_param param;
  32. int ret = 0;
  33. if(argc != 3)
  34. {
  35. fprintf(stderr,"usage:./test sched_method sched_priority\n");
  36. return -1;
  37. }
  38. cpu_set_t mask ;
  39. CPU_ZERO(&mask);
  40. CPU_SET(1,&mask);
  41. if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
  42. {
  43. printf("warning: could not set CPU affinity, continuing...\n");
  44. }
  45. int sched_method = atoi(argv[1]);
  46. int sched_priority = atoi(argv[2]);
  47. /*    if(sched_method > 2 || sched_method < 0)
  48. {
  49. fprintf(stderr,"sched_method scope [0,2]\n");
  50. return -2;
  51. }
  52. if(sched_priority > 99 || sched_priority < 1)
  53. {
  54. fprintf(stderr,"sched_priority scope [1,99]\n");
  55. return -3;
  56. }
  57. if(sched_method == 1 || sched_method == 2)*/
  58. {
  59. param.sched_priority = sched_priority;
  60. ret = sched_setscheduler(getpid(),sched_method,&param);
  61. if(ret)
  62. {
  63. fprintf(stderr,"set scheduler to %d %d failed %m\n");
  64. return -4;
  65. }
  66. }
  67. int scheduler = sched_getscheduler(getpid());
  68. fprintf(stderr,"the scheduler of PID(%ld) is %d, priority (%d),BEGIN time is :%ld\n",
  69. getpid(),scheduler,sched_priority,time(NULL));
  70. sleep(2);
  71. sleeptm.tv_sec = 0;
  72. sleeptm.tv_nsec = NANOSECOND;
  73. for(i = 0;i<COUNT;i++)
  74. {
  75. test_func();
  76. }
  77. interval = MILLION*(tend.tv_sec - tstart.tv_sec)
  78. +(tend.tv_usec-tstart.tv_usec);
  79. fprintf(stderr," PID = %d\t priority: %d\tEND TIME is %ld\n",getpid(),sched_priority,time(NULL));
  80. return 0;
  81. }


    1 为了降低复杂度,绑定到了同一个核上,我做实验的机器是四核(通过cat /proc/cpuinfo可以看到)
    2  sleep(2),是给其他进程得到调度的机会,否则无法模拟出多个不同优先级的实时进程并行的场景。sleep过后,就没有阻塞性的系统调用了,高优先级的就会占据CPU(FIFO),同等优先级的进程轮转(RR) 
  1. struct sched_param {
  2. /* ... */
  3. int sched_priority;
  4. /* ... */
  5. };
int sched_setscheduler (pid_t pid,
int policy,
const struct sched_param *sp);

sched_setscheduler函数的第二个参数调度方法 :
  1. #define SCHED_OTHER 0
  2. #define SCHED_FIFO 1
  3. #define SCHED_RR 2
  4. #ifdef __USE_GNU
  5. # define SCHED_BATCH 3
  6. #endif


   SCHED_FIFO 和SCHED_RR表示实时进程的调度策略,第三个参数的取值范围为[1,99]。
   如果sched_setscheduler 优先级设置的值和调度策略不符合的话,会返回失败的。
  1. /*
  2. * Valid priorities for SCHED_FIFO and SCHED_RR are
  3. * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL,
  4. * SCHED_BATCH and SCHED_IDLE is 0.
  5. */


  1. #include <sched.h>
  2. int sched_get_priority_min (int policy);
  3. int sched_get_priority_max (int policy);

  1. int sched_setparam (pid_t pid, const struct sched_param *sp);


    下面是ps -C test -o pid,pri,cmd,time,psr 的输出:

  1. PID   PRI  CMD             TIME     PSR
  2. 6303 139 ./test 1 99  00:00:04       1

虽说本文主要讲的是实时进程,但是需要插句话。对于普通进程,是通过nice系统调用来调整优先级的。从内核角度讲[100,139]是普通进程的优先级的范围,100最高,139最低,默认是120。普通进程的优先级的作用和实时进程不同,普通进程优先级表示的是占的CPU时间。深入linux内核架构中提到,普通优先级越高(100最高,139最低),享受的CPU time越多,相邻的两个优先级,高一级的进程比低一级的进程多占用10%的CPU,比如内核优先级数值为120的进程要比数值是121的进程多占用10%的CPU。

  1. static const int prio_to_weight[40] = {
  2. /* -20 */ 88761, 71755, 56483, 46273, 36291,
  3. /* -15 */ 29154, 23254, 18705, 14949, 11916,
  4. /* -10 */ 9548, 7620, 6100, 4904, 3906,
  5. /* -5 */ 3121, 2501, 1991, 1586, 1277,
  6. /* 0 */ 1024, 820, 655, 526, 423,
  7. /* 5 */ 335, 272, 215, 172, 137,
  8. /* 10 */ 110, 87, 70, 56, 45,
  9. /* 15 */ 36, 29, 23, 18, 15,
  10. };
1 约定好时间片,每人玩1小时,玩完后记账,张XX 1小时,谁玩的时间短,谁去玩
    2 引入优先级的概念,李四有紧急情况,需要提高他玩电脑的时间,怎么办,玩1个小时,记账半小时,那么同等情况下,李四会比其他人被选中玩电脑的频率要高,就体现了这个优先级的概念。
    3  王五也有紧急情况,但是以考察,不如李四的紧急,好吧,玩1个小时,记账45分钟。
    4  情况有变化,听说这里有电脑,突然又来了10个人,如果按照每人玩1小时的时间片,排在最后的那哥们早就开始骂人了,怎么办?时间片动态变化,根据人数来确定时间片。人越多,每个人玩的时间越少,防止哥们老捞不着玩,耐心耗尽,开始骂人。
    这个记账就是我们prio_to_weight的作用。我就不多说了,prio_to_weight[20]就是基准,玩一小时,记账一小时,数组20以前的值是特权一级,玩1小时记账20分钟之类的享有特权的,数组20之后是倒霉蛋,玩1小时,记账1.5小时之类的倒霉蛋。 CFS这种调度好在大家都能捞着玩。
  1. [root@localhost sched]# cat comp.sh
  2. #/bin/sh
  3. ./test $1 99 &
  4. usleep 1000;
  5. ./test $1 70 &
  6. usleep 1000;
  7. ./test $1 70 &
  8. usleep 1000;
  9. ./test $1 70 &
  10. usleep 1000;
  11. ./test $1 50 &
  12. usleep 1000;
  13. ./test $1 30 &
  14. usleep 1000;
  15. ./test $1 10 &

因为test进程有sleep 2秒,所以可以给comp.sh启动其他test的机会。可以看到有 99级(最高优先级)的实时进程,3个70级的实时进程,50级,30级,10级的各一个。

  1. #define DEF_TIMESLICE        (100 * HZ / 1000)
    下面我们验证: 我写了两个观察脚本,来观察实时进程的调度情况:
    第一个脚本比较简单,观察进程的CPU 占用的time,用ps工具就可以了:
  1. [root@localhost sched]# cat getpsinfo.sh
  2. #!/bin/sh
  3. for((i = 0; i < 40; i++))
  4. do
  5. ps -C test -o pid,pri,cmd,time,psr >>psinfo.log 2>&1
  6. sleep 2;
  7. done


  1. [root@localhost sched]# cat cswmon_spec.stp
  2. global time_offset
  3. probe begin { time_offset = gettimeofday_us() }
  4. probe scheduler.ctxswitch {
  5. if(next_task_name == "test" ||prev_task_name == "test")
  6. {
  7. t = gettimeofday_us()
  8. printf(" time_off (%8d )%20s(%6d)(pri=%4d)(state=%d)->%20s(%6d)(pri=%4d)(state=%d)\n",
  9. t-time_offset,
  10. prev_task_name,
  11. prev_pid,
  12. prev_priority,
  13. (prevtsk_state),
  14. next_task_name,
  15. next_pid,
  16. next_priority,
  17. (nexttsk_state))
  18. }
  19. }
  20. probe scheduler.process_exit
  21. {
  22. if(execname() == "test")
  23. printf("task :%s PID(%d) PRI(%d) EXIT\n",execname(),pid,priority);
  24. }
  25. probe timer.s($1) {
  26. printf("--------------------------------------------------------------\n")
  27. exit();
  28. }
A)    FIFO调度策略的输出:

  1. 终端1 :
  2. stap ./cswmon_spec.stp 70
  3. 终端2 :
  4. ./getpsinfo.sh
  5. 终端3
  6. ./comp.sh 1


  1. time_off ( 689546 ) test( 6305)(pri= 120)(state=0)-> migration/2( 11)(pri= 0)(state=0)
  2. time_off ( 689977 ) stap( 5895)(pri= 120)(state=0)-> test( 6305)(pri= 120)(state=0)
  3. time_off ( 690067 ) test( 6305)(pri= 29)(state=1)-> stap( 5895)(pri= 120)(state=0)
  4. time_off ( 697899 ) test( 6303)(pri= 120)(state=0)-> migration/2( 11)(pri= 0)(state=0)
  5. time_off ( 698042 ) test( 6307)(pri= 120)(state=0)-> migration/0( 3)(pri= 0)(state=0)
  6. time_off ( 699114 ) stap( 5895)(pri= 120)(state=0)-> test( 6303)(pri= 120)(state=0)
  7. time_off ( 699307 ) test( 6303)(pri= 0)(state=1)-> test( 6307)(pri= 120)(state=0)
  8. time_off ( 699371 ) test( 6307)(pri= 29)(state=1)-> stap( 5895)(pri= 120)(state=0)
  9. time_off ( 699392 ) test( 6309)(pri= 120)(state=0)-> migration/3( 15)(pri= 0)(state=0)
  10. time_off ( 699966 ) events/1( 20)(pri= 120)(state=1)-> test( 6309)(pri= 120)(state=0)
  11. time_off ( 700034 ) test( 6309)(pri= 29)(state=1)-> stap( 5895)(pri= 120)(state=0)
  12. time_off ( 707379 ) test( 6311)(pri= 120)(state=0)-> migration/3( 15)(pri= 0)(state=0)
  13. time_off ( 707587 ) test( 6313)(pri= 120)(state=0)-> migration/0( 3)(pri= 0)(state=0)
  14. time_off ( 712021 ) stap( 5895)(pri= 120)(state=0)-> test( 6311)(pri= 120)(state=0)
  15. time_off ( 712145 ) test( 6311)(pri= 49)(state=1)-> test( 6313)(pri= 120)(state=0)
  16. time_off ( 712252 ) test( 6313)(pri= 69)(state=1)-> stap( 5895)(pri= 120)(state=0)
  17. time_off ( 727057 ) test( 6315)(pri= 120)(state=0)-> migration/0( 3)(pri= 0)(state=0)
  18. time_off ( 727952 ) stap( 5895)(pri= 120)(state=0)-> test( 6315)(pri= 120)(state=0)
  19. time_off ( 728047 ) test( 6315)(pri= 89)(state=1)-> stap( 5895)(pri= 120)(state=0)
  20. time_off ( 2690181 ) stap( 5895)(pri= 120)(state=0)-> test( 6305)(pri= 29)(state=0)
  21. time_off ( 2699316 ) test( 6305)(pri= 29)(state=0)-> test( 6303)(pri= 0)(state=0)
  22. task :test PID(6303) PRI(0) EXIT
  23. time_off (13057854 ) test( 6303)(pri= 0)(state=64)-> watchdog/1( 10)(pri= 0)(state=0)
  24. time_off (13057864 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
  25. time_off (15333340 ) test( 6305)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  26. time_off (15333354 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
  27. time_off (18743409 ) test( 6305)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  28. time_off (18743422 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
  29. time_off (22154757 ) test( 6305)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  30. time_off (22154771 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
  31. task :test PID(6305) PRI(29) EXIT
  32. time_off (22466855 ) test( 6305)(pri= 29)(state=64)-> test( 6307)(pri= 29)(state=0)
  33. time_off (25563548 ) test( 6307)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  34. time_off (25563566 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6307)(pri= 29)(state=0)
  35. time_off (28973602 ) test( 6307)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  36. time_off (28973616 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6307)(pri= 29)(state=0)
  37. task :test PID(6307) PRI(29) EXIT
  38. time_off (31846121 ) test( 6307)(pri= 29)(state=64)-> test( 6309)(pri= 29)(state=0)
  39. time_off (32383671 ) test( 6309)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  40. time_off (32383683 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6309)(pri= 29)(state=0)
  41. time_off (35793735 ) test( 6309)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  42. time_off (35793747 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6309)(pri= 29)(state=0)
  43. time_off (39203797 ) test( 6309)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  44. time_off (39203809 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6309)(pri= 29)(state=0)
  45. task :test PID(6309) PRI(29) EXIT
  46. time_off (41200440 ) test( 6309)(pri= 29)(state=64)-> test( 6311)(pri= 49)(state=0)
  47. time_off (42613866 ) test( 6311)(pri= 49)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  48. time_off (42613898 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6311)(pri= 49)(state=0)
  49. time_off (46024070 ) test( 6311)(pri= 49)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  50. time_off (46024082 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6311)(pri= 49)(state=0)
  51. time_off (49434004 ) test( 6311)(pri= 49)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  52. time_off (49434017 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6311)(pri= 49)(state=0)
  53. task :test PID(6311) PRI(49) EXIT


B) RR的情况
  1. 终端1 :
  2. stap ./cswmon_spec.stp 70
  3. 终端2 :
  4. ./getpsinfo.sh
  5. 终端3
  6. ./comp.sh 1

  1. time_off ( 4188015 ) test( 6428)(pri= 0)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  2. time_off ( 4188025 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6428)(pri= 0)(state=0)
  3. time_off ( 7612014 ) test( 6428)(pri= 0)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  4. time_off ( 7612024 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6428)(pri= 0)(state=0)
  5. task :test PID(6428) PRI(0) EXIT
  6. time_off (10679062 ) test( 6428)(pri= 0)(state=64)-> test( 6430)(pri= 29)(state=0)
  7. time_off (10964413 ) test( 6430)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  8. time_off (10964422 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6430)(pri= 29)(state=0)
  9. time_off (11709024 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  10. time_off (12736030 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
  11. time_off (13779022 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
  12. time_off (13879021 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  13. time_off (13984075 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
  14. time_off (14084020 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
  15. time_off (14184023 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  16. time_off (14284024 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
  17. time_off (14374486 ) test( 6434)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
  18. time_off (14374502 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6434)(pri= 29)(state=0)
  19. time_off (14384097 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
  20. time_off (14484066 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  21. time_off (14584023 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
  22. time_off (14684020 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
  23. time_off (14786032 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  24. time_off (14886020 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
  25. time_off (14986026 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
  26. time_off (15089023 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  27. time_off (15192030 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
  28. time_off (15292026 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
  29. time_off (15396085 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  30. time_off (15496022 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
  31. time_off (15596027 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
  32. time_off (15696153 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
  33. time_off (15796022 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)


1 深入linux 内核架构
2 linux system program
3 systemtap example


