Linux内核深入理解定时器和时间管理 相关的系统调用

rtoax 2021年3月

在原文基础上,增加5.10.13内核源码相关内容。

 结构体
-------------------------------------------------------
struct clocksource; struct clock_event_device; `clockevents_register_device()`  `clockevents_config_and_register()``tick_check_new_device()`->`tick_install_broadcast_device()`
struct tick_device; 每个CPU的本地定时器
struct timer_list; struct timer_base;全局部变量
-------------------------------------------------------
> jiffies = jiffies_64;
> u64 jiffies_64;
> struct bus_type clocksource_subsys ;
> struct device device_clocksource ;
> struct tick_device tick_broadcast_device;
> struct clock_event_device lapic_clockevent;时钟源
-------------------------------------------------------
> struct clocksource clocksource_jiffies = {.name       = "jiffies", ...};
> struct clocksource refined_jiffies;    => `i8253/i8254`
> struct clocksource clocksource_tsc = {.name           = "tsc", ...};
> struct clocksource clocksource_hpet;
> struct clocksource clocksource_acpi_pm;
> struct clocksource clocksource_tsc_early;
> struct clocksource clocksource_tsc;
> struct clocksource kvm_clock;函数调用关系
1. 初始化
-------------------------------------------------------
x86_64_start_kernel() x86_64_start_reservations() ... x86_intel_mid_early_setup() ...x86_init.timers.wallclock_init = intel_mid_rtc_init;... start_kernel() ...setup_arch() ...x86_init.timers.wallclock_init() = intel_mid_rtc_init() ...register_refined_jiffies(CLOCK_TICK_RATE) __clocksource_register()clocksource_register_hz()clocksource_register_khz() tick_init() tick_broadcast_init()tick_nohz_init() init_timers()...time_init() late_time_init = x86_late_time_init; ...if (late_time_init) late_time_init(); -> x86_late_time_init() x86_init.irqs.intr_mode_select() -> apic_intr_mode_select()x86_init.timers.timer_init() -> hpet_time_init() x86_init.irqs.intr_mode_init() -> apic_intr_mode_init()tsc_init()
2. 使用
-------------------------------------------------------
get_jiffies_64() `human` time units. 1. To get one second jiffies / HZ2.  /* one minute from now */ unsigned long later = jiffies + 60*HZ;/* five minutes from now */unsigned long later = jiffies + 5*60*HZ;/* Thirty seconds from now */jiffies + 30*HZ/* Two minutes from now */jiffies + 120*HZ/* One millisecond from now */jiffies + HZ / 1000
3. 定时器
-------------------------------------------------------
__init_timer() / __TIMER_INITIALIZER => `struct timer_list`
add_timer() / del_timer()定时器频率
-------------------------------------------------------
[ACPI PM] Frequency of the [ACPI] power management timer is `3.579545 MHz`.
[hpet] Frequency of the [High Precision Event Timer]  is at least `10 MHz`.
[tsc] Frequency of the [Time Stamp Counter] depends on processor.
hpet 使用 `read_hpet()` 获取 `counter` 值

1. Time related system calls in the Linux kernel

This is the seventh and last part chapter, which describes timers and time management related stuff in the Linux kernel. In the previous part, we discussed timers in the context of x86_64: High Precision Event Timer and Time Stamp Counter. Internal time management is an interesting part of the Linux kernel, but of course not only the kernel needs the time concept. Our programs also need to know time. In this part, we will consider implementation of some time management related system calls. These system calls are:

  • clock_gettime;
  • gettimeofday;
  • nanosleep.

We will start from a simple userspace C program and see all way from the call of the standard library function to the implementation of certain system calls. As each architecture provides its own implementation of certain system calls, we will consider only x86_64 specific implementations of system calls, as this book is related to this architecture.

Additionally, we will not consider the concept of system calls in this part, but only implementations of these three system calls in the Linux kernel. If you are interested in what is a system call, there is a special chapter about this.

So, let’s start from the gettimeofday system call.

2. Implementation of the gettimeofday system call

As we can understand from the name gettimeofday, this function returns the current time. First of all, let’s look at the following simple example:

#include <time.h>
#include <sys/time.h>
#include <stdio.h>int main(int argc, char **argv)
{char buffer[40];struct timeval time;gettimeofday(&time, NULL);strftime(buffer, 40, "Current date/time: %m-%d-%Y/%T", localtime(&time.tv_sec));printf("%s\n",buffer);return 0;
}

As you can see, here we call the gettimeofday function, which takes two parameters. The first parameter is a pointer to the timeval structure, which represents an elapsed time:

struct timeval {time_t      tv_sec;     /* seconds */suseconds_t tv_usec;    /* microseconds */
};

The second parameter of the gettimeofday function is a pointer to the timezone structure which represents a timezone. In our example, we pass address of the timeval time to the gettimeofday function, the Linux kernel fills the given timeval structure and returns it back to us. Additionally, we format the time with the strftime function to get something more human readable than elapsed microseconds. Let’s see the result:

~$ gcc date.c -o date
~$ ./date
Current date/time: 03-26-2016/16:42:02

As you may already know, a userspace application does not call a system call directly from the kernel space. Before the actual system call entry will be called, we call a function from the standard library. In my case it is glibc, so I will consider this case. The implementation of the gettimeofday function is located in the sysdeps/unix/sysv/linux/x86/gettimeofday.c source code file. As you already may know, the gettimeofday is not a usual system call. It is located in the special area which is called vDSO (you can read more about it in the part, which describes this concept).

The glibc implementation of gettimeofday tries to resolve the given symbol; in our case this symbol is __vdso_gettimeofday by the call of the _dl_vdso_vsym internal function. If the symbol cannot be resolved, it returns NULL and we fallback to the call of the usual system call:

return (_dl_vdso_vsym ("__vdso_gettimeofday", &linux26)?: (void*) (&__gettimeofday_syscall));

The gettimeofday entry is located in the arch/x86/entry/vdso/vclock_gettime.c source code file. As we can see the gettimeofday is a weak alias of the __vdso_gettimeofday:

int gettimeofday(struct timeval *, struct timezone *)__attribute__((weak, alias("__vdso_gettimeofday")));

The __vdso_gettimeofday is defined in the same source code file and calls the do_realtime function if the given timeval is not null:

notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
{if (likely(tv != NULL)) {if (unlikely(do_realtime((struct timespec *)tv) == VCLOCK_NONE))return vdso_fallback_gtod(tv, tz);tv->tv_usec /= 1000;}if (unlikely(tz != NULL)) {tz->tz_minuteswest = gtod->tz_minuteswest;tz->tz_dsttime = gtod->tz_dsttime;}return 0;
}

If the do_realtime will fail, we fallback to the real system call via call the syscall instruction and passing the __NR_gettimeofday system call number and the given timeval and timezone:

notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
{long ret;asm("syscall" : "=a" (ret) :"0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory");return ret;
}

The do_realtime function gets the time data from the vsyscall_gtod_data structure which is defined in the arch/x86/include/asm/vgtod.h header file and contains mapping of the timespec structure and a couple of fields which are related to the current clock source in the system. This function fills the given timeval structure with values from the vsyscall_gtod_data which contains a time related data which is updated via timer interrupt.

First of all we try to access the gtod or global time of day the vsyscall_gtod_data structure via the call of the gtod_read_begin and will continue to do it until it will be successful:

do {seq = gtod_read_begin(gtod);mode = gtod->vclock_mode;ts->tv_sec = gtod->wall_time_sec;ns = gtod->wall_time_snsec;ns += vgetsns(&mode);ns >>= gtod->shift;
} while (unlikely(gtod_read_retry(gtod, seq)));ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
ts->tv_nsec = ns;

As we got access to the gtod, we fill the ts->tv_sec with the gtod->wall_time_sec which stores current time in seconds gotten from the real time clock during initialization of the timekeeping subsystem in the Linux kernel and the same value but in nanoseconds. In the end of this code we just fill the given timespec structure with the resulted values.

That’s all about the gettimeofday system call. The next system call in our list is the clock_gettime.

3. Implementation of the clock_gettime system call

The clock_gettime function gets the time which is specified by the second parameter. Generally the clock_gettime function takes two parameters:

  • clk_id - clock identifier;
  • timespec - address of the timespec structure which represent elapsed time.

Let’s look on the following simple example:

#include <time.h>
#include <sys/time.h>
#include <stdio.h>int main(int argc, char **argv)
{struct timespec elapsed_from_boot;clock_gettime(CLOCK_BOOTTIME, &elapsed_from_boot);printf("%d - seconds elapsed from boot\n", elapsed_from_boot.tv_sec);return 0;
}

which prints uptime information:

~$ gcc uptime.c -o uptime
~$ ./uptime
14180 - seconds elapsed from boot

We can easily check the result with the help of the uptime util:

~$ uptime
up  3:56

The elapsed_from_boot.tv_sec represents elapsed time in seconds, so:

>>> 14180 / 60
236
>>> 14180 / 60 / 60
3
>>> 14180 / 60 % 60
56

The clock_id maybe one of the following:

  • CLOCK_REALTIME - system wide clock which measures real or wall-clock time;
  • CLOCK_REALTIME_COARSE - faster version of the CLOCK_REALTIME;
  • CLOCK_MONOTONIC - represents monotonic time since some unspecified starting point;
  • CLOCK_MONOTONIC_COARSE - faster version of the CLOCK_MONOTONIC;
  • CLOCK_MONOTONIC_RAW - the same as the CLOCK_MONOTONIC but provides non NTP adjusted time.
  • CLOCK_BOOTTIME - the same as the CLOCK_MONOTONIC but plus time that the system was suspended;
  • CLOCK_PROCESS_CPUTIME_ID - per-process time consumed by all threads in the process;
  • CLOCK_THREAD_CPUTIME_ID - thread-specific clock.

The clock_gettime is not usual syscall too, but as the gettimeofday, this system call is placed in the vDSO area. Entry of this system call is located in the same source code file - arch/x86/entry/vdso/vclock_gettime.c) as for gettimeofday.

The Implementation of the clock_gettime depends on the clock id. If we have passed the CLOCK_REALTIME clock id, the do_realtime function will be called:

notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
{switch (clock) {case CLOCK_REALTIME:if (do_realtime(ts) == VCLOCK_NONE)goto fallback;break;.........
fallback:return vdso_fallback_gettime(clock, ts);
}

In other cases, the do_{name_of_clock_id} function is called. Implementations of some of them is similar. For example if we will pass the CLOCK_MONOTONIC clock id:

...
...
...
case CLOCK_MONOTONIC:if (do_monotonic(ts) == VCLOCK_NONE)goto fallback;break;
...
...
...

the do_monotonic function will be called which is very similar on the implementation of the do_realtime:

notrace static int __always_inline do_monotonic(struct timespec *ts)
{do {seq = gtod_read_begin(gtod);mode = gtod->vclock_mode;ts->tv_sec = gtod->monotonic_time_sec;ns = gtod->monotonic_time_snsec;ns += vgetsns(&mode);ns >>= gtod->shift;} while (unlikely(gtod_read_retry(gtod, seq)));ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);ts->tv_nsec = ns;return mode;
}

We already saw a little about the implementation of this function in the previous paragraph about the gettimeofday. There is only one difference here, that the sec and nsec of our timespec value will be based on the gtod->monotonic_time_sec instead of gtod->wall_time_sec which maps the value of the tk->tkr_mono.xtime_nsec or number of nanoseconds elapsed.

That’s all.

4. Implementation of the nanosleep system call

The last system call in our list is the nanosleep. As you can understand from its name, this function provides sleeping ability. Let’s look on the following simple example:

#include <time.h>
#include <stdlib.h>
#include <stdio.h>int main (void)
{    struct timespec ts = {5,0};printf("sleep five seconds\n");nanosleep(&ts, NULL);printf("end of sleep\n");return 0;
}

If we will compile and run it, we will see the first line

~$ gcc sleep_test.c -o sleep
~$ ./sleep
sleep five seconds
end of sleep

and the second line after five seconds.

The nanosleep is not located in the vDSO area like the gettimeofday and the clock_gettime functions. So, let’s look how the real system call which is located in the kernel space will be called by the standard library. The implementation of the nanosleep system call will be called with the help of the syscall instruction. Before the execution of the syscall instruction, parameters of the system call must be put in processor registers according to order which is described in the System V Application Binary Interface or in other words:

  • rdi - first parameter;
  • rsi - second parameter;
  • rdx - third parameter;
  • r10 - fourth parameter;
  • r8 - fifth parameter;
  • r9 - sixth parameter.

The nanosleep system call has two parameters - two pointers to the timespec structures. The system call suspends the calling thread until the given timeout has elapsed. Additionally it will finish if a signal interrupts its execution. It takes two parameters, the first is timespec which represents timeout for the sleep. The second parameter is the pointer to the timespec structure too and it contains remainder of time if the call of the nanosleep was interrupted.

As nanosleep has two parameters:

int nanosleep(const struct timespec *req, struct timespec *rem);

To call system call, we need put the req to the rdi register, and the rem parameter to the rsi register. The glibc does these job in the INTERNAL_SYSCALL macro which is located in the sysdeps/unix/sysv/linux/x86_64/sysdep.h header file.

# define INTERNAL_SYSCALL(name, err, nr, args...) \INTERNAL_SYSCALL_NCS (__NR_##name, err, nr, ##args)

which takes the name of the system call, storage for possible error during execution of system call, number of the system call (all x86_64 system calls you can find in the system calls table) and arguments of certain system call. The INTERNAL_SYSCALL macro just expands to the call of the INTERNAL_SYSCALL_NCS macro, which prepares arguments of system call (puts them into the processor registers in correct order), executes syscall instruction and returns the result:

# define INTERNAL_SYSCALL_NCS(name, err, nr, args...)      \({                                                                         \unsigned long int resultvar;                                           \LOAD_ARGS_##nr (args)                                                      \LOAD_REGS_##nr                                                             \asm volatile (                                                             \"syscall\n\t"                                                            \: "=a" (resultvar)                                                          \: "0" (name) ASM_ARGS_##nr : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);   \(long int) resultvar; })

The LOAD_ARGS_##nr macro calls the LOAD_ARGS_N macro where the N is number of arguments of the system call. In our case, it will be the LOAD_ARGS_2 macro. Ultimately all of these macros will be expanded to the following:

# define LOAD_REGS_TYPES_1(t1, a1)                      \register t1 _a1 asm ("rdi") = __arg1;                       \LOAD_REGS_0# define LOAD_REGS_TYPES_2(t1, a1, t2, a2)                  \register t2 _a2 asm ("rsi") = __arg2;                       \LOAD_REGS_TYPES_1(t1, a1)
...
...
...

After the syscall instruction will be executed, the context switch will occur and the kernel will transfer execution to the system call handler. The system call handler for the nanosleep system call is located in the kernel/time/hrtimer.c source code file and defined with the SYSCALL_DEFINE2 macro helper:

SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp,struct timespec __user *, rmtp)
{struct timespec tu;if (copy_from_user(&tu, rqtp, sizeof(tu)))return -EFAULT;if (!timespec_valid(&tu))return -EINVAL;return hrtimer_nanosleep(&tu, rmtp, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
}

More about the SYSCALL_DEFINE2 macro you may read in the chapter about system calls. If we look at the implementation of the nanosleep system call, first of all we will see that it starts from the call of the copy_from_user function. This function copies the given data from the userspace to kernelspace. In our case we copy timeout value to sleep to the kernelspace timespec structure and check that the given timespec is valid by the call of the timesc_valid function:

static inline bool timespec_valid(const struct timespec *ts)
{if (ts->tv_sec < 0)return false;if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)return false;return true;
}

which just checks that the given timespec does not represent date before 1970 and nanoseconds does not overflow 1 second. The nanosleep function ends with the call of the hrtimer_nanosleep function from the same source code file. The hrtimer_nanosleep function creates a timer and calls the do_nanosleep function. The do_nanosleep does main job for us. This function provides loop:

do {set_current_state(TASK_INTERRUPTIBLE);hrtimer_start_expires(&t->timer, mode);if (likely(t->task))freezable_schedule();} while (t->task && !signal_pending(current));__set_current_state(TASK_RUNNING);
return t->task == NULL;
static inline void freezable_schedule(void)
{freezer_do_not_count();schedule();freezer_count();
}

Which freezes current task during sleep. After we set TASK_INTERRUPTIBLE flag for the current task, the hrtimer_start_expires function starts the give high-resolution timer on the current processor. As the given high resolution timer will expire, the task will be again running.

That’s all.

5. Conclusion

This is the end of the seventh part of the chapter that describes timers and timer management related stuff in the Linux kernel. In the previous part we saw x86_64 specific clock sources. As I wrote in the beginning, this part is the last part of this chapter. We saw important time management related concepts like clocksource and clockevents frameworks, jiffies counter and etc., in this chpater. Of course this does not cover all of the time management in the Linux kernel. Many parts of this mostly related to the scheduling which we will see in other chapter.

If you have questions or suggestions, feel free to ping me in twitter 0xAX, drop me email or just create issue.

Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to linux-insides.

6. Links

  • system call
  • C programming language
  • standard library
  • glibc
  • real time clock
  • NTP
  • nanoseconds
  • register
  • System V Application Binary Interface
  • context switch
  • Introduction to timers in the Linux kernel
  • uptime
  • system calls table for x86_64
  • High Precision Event Timer
  • Time Stamp Counter
  • x86_64
  • previous part

Linux内核深入理解定时器和时间管理(7):相关的系统调用相关推荐

  1. Linux内核深入理解定时器和时间管理(6):x86_64 相关的时钟源(kvm-clock,tsc,acpi_pm,hpet)

    Linux内核深入理解定时器和时间管理 x86_64 相关的时钟源(kvm-clock,tsc,acpi_pm,hpet) rtoax 2021年3月 在原文基础上,增加5.10.13内核源码相关内容 ...

  2. Linux内核深入理解定时器和时间管理(5):clockevents 框架

    Linux内核深入理解定时器和时间管理 clockevents 框架 rtoax 2021年3月 在原文基础上,增加5.10.13内核源码相关内容. 1. Introduction to the cl ...

  3. Linux内核深入理解定时器和时间管理(4):定时器 timer

    Linux内核深入理解定时器和时间管理 定时器 timer rtoax 2021年3月 在原文基础上,增加5.10.13内核源码相关内容. 1. Timers This is fourth part ...

  4. Linux内核深入理解定时器和时间管理(3):tick 广播 框架 和 dyntick

    Linux内核深入理解定时器和时间管理 tick 广播 框架 和 dyntick rtoax 2021年3月 在原文基础上,增加5.10.13内核源码相关内容. 结构体 --------------- ...

  5. Linux内核深入理解定时器和时间管理(2):clocksource 框架

    Linux内核深入理解定时器和时间管理 clocksource 框架 rtoax 2021年3月 在原文基础上,增加5.10.13内核源码相关内容. 全局部变量 ------------------- ...

  6. Linux内核深入理解定时器和时间管理(1):硬件时钟和jiffies

    Linux内核深入理解定时器和时间管理 硬件时钟和jiffies rtoax 2021年3月 在原文基础上,增加5.10.13内核源码相关内容. 全局部变量 --------------------- ...

  7. 《Linux内核设计与实现》读书笔记(十一)- 定时器和时间管理

    系统中有很多与时间相关的程序(比如定期执行的任务,某一时间执行的任务,推迟一段时间执行的任务),因此,时间的管理对于linux来说非常重要. 主要内容: 系统时间 定时器 定时器相关概念 定时器执行流 ...

  8. Linux内核——定时器和时间管理

    定时器和时间管理 系统定时器是一种可编程硬件芯片.它能以固定频率产生中断.该中断就是所谓的定时器中断.它所相应的中断处理程序负责更新系统时间,还负责执行须要周期性执行的任务. 系统定时器和时钟中断处理 ...

  9. linux内核定时器死机,浅析linux内核中timer定时器的生成和sofirq软中断调用流程

    浅析linux内核中timer定时器的生成和sofirq软中断调用流程 mod_timer添加的定时器timer在内核的软中断中发生调用,__run_timers会spin_lock_irq(& ...

最新文章

  1. 电脑安装python为什么显示的是程序丢失-python报错:无法启动此程序,因为计算机中丢失...
  2. web项目没有run on server时..
  3. SSH 中从后台传值到前台JSP,传不过去问题
  4. java小编程--在一个A字符串中找到与B字符串一样的,返回B字符串出现的第一个位置
  5. python json key_最全总结 | 聊聊 Python 数据处理全家桶(配置篇)
  6. iOS开发:几种静态扫描工具的使用与对比
  7. (三) 弦截法(试位法)求根
  8. android 程序错乱,android – 安装时应用程序崩溃,错误sqlite3_...
  9. vue全家桶+element-UI
  10. 快速提取年龄,身份证提取年龄的公式
  11. ADB识别失败,驱动显示感叹号解决方案——记录一次驱动重装导致的不识别手机问题
  12. 图学习——04.HAN(异构图注意力网络)
  13. 《SpringBoot框架开发技术整合》笔记(二)
  14. 【基础篇】Linux的目录结构
  15. swift 获取导航栏底部线
  16. 【离散数学】阿贝尔群和循环群编程题
  17. linux桌面系统 9,从Linux桌面进入Win10桌面仅需九分五十秒,重装系统So Easy!
  18. SpringBoot整合Activiti7
  19. 分类信息网络赚钱网络收益团队,安全可靠
  20. 黎明热血永恒服务器维护,黎明热血永恒暗黑魔幻奇迹觉醒

热门文章

  1. 原生js实现文字无缝向上滚动效果
  2. linux 定时任务 (php)
  3. VC++中CEdit控件实现回车换行
  4. http报文和协议首部
  5. laravel报错:MassAssignmentException
  6. Activity生命周期方法的调用顺序project与測试日志
  7. oracle 自定义函数 返回一个表类型
  8. 数字化转型方法论_50+企业数字化转型、管理的方法论,这本书到底有什么干货?...
  9. 求一列数据中的波峰_数据分析实践入门(四):数据运算
  10. 拦截器和过滤器区别_新手能看懂的(Interceptor)和(Filter)区别与使用!