Linux Load Averages: Solving the Mystery

遇到一篇很好的blog，忍不住复制一下免得啥时候墙没了。格式后面再整理吧
来源: https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html

Load averages are an industry-critical metric – my company spends millions auto-scaling cloud instances based on them and other metrics – but on Linux there’s some mystery around them. Linux load averages track not just runnable tasks, but also tasks in the uninterruptible sleep state. Why? I’ve never seen an explanation. In this post I’ll solve this mystery, and summarize load averages as a reference for everyone trying to interpret them.

Linux load averages are “system load averages” that show the running thread (task) demand on the system as an average number of running plus waiting threads. This measures demand, which can be greater than what the system is currently processing. Most tools show three averages, for 1, 5, and 15 minutes:

$ uptime
16:48:24 up 4:11, 1 user, load average: 25.25, 23.40, 23.46

top - 16:48:42 up 4:12, 1 user, load average: 25.25, 23.14, 23.37

$ cat /proc/loadavg
25.72 23.19 23.35 42/3411 43603
Some interpretations:

If the averages are 0.0, then your system is idle.
If the 1 minute average is higher than the 5 or 15 minute averages, then load is increasing.
If the 1 minute average is lower than the 5 or 15 minute averages, then load is decreasing.
If they are higher than your CPU count, then you might have a performance problem (it depends).
As a set of three, you can tell if load is increasing or decreasing, which is useful. They can be also useful when a single value of demand is desired, such as for a cloud auto scaling rule. But to understand them in more detail is difficult without the aid of other metrics. A single value of 23 - 25, by itself, doesn’t mean anything, but might mean something if the CPU count is known, and if it’s known to be a CPU-bound workload.

Instead of trying to debug load averages, I usually switch to other metrics. I’ll discuss these in the “Better Metrics” section near the end.

History
The original load averages show only CPU demand: the number of processes running plus those waiting to run. There’s a nice description of this in RFC 546 titled “TENEX Load Averages”, August 1973:

[1] The TENEX load average is a measure of CPU demand. The load average is an average of the number of runnable processes over a given time period. For example, an hourly load average of 10 would mean that (for a single CPU system) at any time during that hour one could expect to see 1 process running and 9 others ready to run (i.e., not blocked for I/O) waiting for the CPU.
The version of this on ietf.org links to a PDF scan of a hand drawn load average graph from July 1973, showing that this has been monitored for decades:

source: https://tools.ietf.org/html/rfc546
Nowadays, the source code to old operating systems can also be found online. Here’s an except of DEC macro assembly from TENEX (early 1970’s) SCHED.MAC:

NRJAVS==3 ;NUMBER OF LOAD AVERAGES WE MAINTAIN
GS RJAV,NRJAVS ;EXPONENTIAL AVERAGES OF NUMBER OF ACTIVE PROCESSES
[…]
;UPDATE RUNNABLE JOB AVERAGES

DORJAV: MOVEI 2,^D5000
MOVEM 2,RJATIM ;SET TIME OF NEXT UPDATE
MOVE 4,RJTSUM ;CURRENT INTEGRAL OF NBPROC+NGPROC
SUBM 4,RJAVS1 ;DIFFERENCE FROM LAST UPDATE
EXCH 4,RJAVS1
FSC 4,233 ;FLOAT IT
FDVR 4,[5000.0] ;AVERAGE OVER LAST 5000 MS
[…]
;TABLE OF EXP(-T/C) FOR T = 5 SEC.

EXPFF: EXP 0.920043902 ;C = 1 MIN
EXP 0.983471344 ;C = 5 MIN
EXP 0.994459811 ;C = 15 MIN
And here’s an excerpt from Linux today (include/linux/sched/loadavg.h):

#define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-point /
#define EXP_5 2014 / 1/exp(5sec/5min) /
#define EXP_15 2037 / 1/exp(5sec/15min) */
Linux is also hard coding the 1, 5, and 15 minute constants.

There have been similar load average metrics in older systems, including Multics, which had an exponential scheduling queue average.

The Three Numbers
These three numbers are the 1, 5, and 15 minute load averages. Except they aren’t really averages, and they aren’t 1, 5, and 15 minutes. As can be seen in the source above, 1, 5, and 15 minutes are constants used in an equation, which calculate exponentially-damped moving sums of a five second average. The resulting 1, 5, and 15 minute load averages reflect load well beyond 1, 5, and 15 minutes.

If you take an idle system, then begin a single-threaded CPU-bound workload (one thread in a loop), what would the one minute load average be after 60 seconds? If it was a plain average, it would be 1.0. Here is that experiment, graphed:

Load average experiment to visualize exponential damping
The so-called “one minute average” only reaches about 0.62 by the one minute mark. For more on the equation and similar experiments, Dr. Neil Gunther has written an article on load averages: How It Works, plus there are many Linux source block comments in loadavg.c.

Linux Uninterruptible Tasks
When load averages first appeared in Linux, they reflected CPU demand, as with other operating systems. But later on Linux changed them to include not only runnable tasks, but also tasks in the uninterruptible state (TASK_UNINTERRUPTIBLE or nr_uninterruptible). This state is used by code paths that want to avoid interruptions by signals, which includes tasks blocked on disk I/O and some locks. You may have seen this state before: it shows up as the “D” state in the output ps and top. The ps(1) man page calls it “uninterruptible sleep (usually IO)”.

Adding the uninterruptible state means that Linux load averages can increase due to a disk (or NFS) I/O workload, not just CPU demand. For everyone familiar with other operating systems and their CPU load averages, including this state is at first deeply confusing.

Why? Why, exactly, did Linux do this?

There are countless articles on load averages, many of which point out the Linux nr_uninterruptible gotcha. But I’ve seen none that explain or even hazard a guess as to why it’s included. My own guess would have been that it’s meant to reflect demand in a more general sense, rather than just CPU demand.

Searching for an ancient Linux patch
Understanding why something changed in Linux is easy: you read the git commit history on the file in question and read the change description. I checked the history on loadavg.c, but the change that added the uninterruptible state predates that file, which was created with code from an earlier file. I checked the other file, but that trail ran cold as well: the code itself has hopped around different files. Hoping to take a shortcut, I dumped “git log -p” for the entire Linux github repository, which was 4 Gbytes of text, and began reading it backwards to see when the code first appeared. This, too, was a dead end. The oldest change in the entire Linux repo dates back to 2005, when Linus imported Linux 2.6.12-rc2, and this change predates that.

There are historical Linux repos (here and here), but this change description is missing from those as well. Trying to discover, at least, when this change occurred, I searched tarballs on kernel.org and found that it had changed by 0.99.15, and not by 0.99.13 – however, the tarball for 0.99.14 was missing. I found it elsewhere, and confirmed that the change was in Linux 0.99 patchlevel 14, Nov 1993. I was hoping that the release description for 0.99.14 by Linus would explain the change, but that too, was a dead end:

“Changes to the last official release (p13) are too numerous to mention (or even to remember)…” – Linus
He mentions major changes, but not the load average change.

Based on the date, I looked up the kernel mailing list archives to find the actual patch, but the oldest email available is from June 1995, when the sysadmin writes:

“While working on a system to make these mailing archives scale more effecitvely I accidently destroyed the current set of archives (ah whoops).”
My search was starting to feel cursed. Thankfully, I found some older linux-devel mailing list archives, rescued from server backups, often stored as tarballs of digests. I searched over 6,000 digests containing over 98,000 emails, 30,000 of which were from 1993. But it was somehow missing from all of them. It really looked as if the original patch description might be lost forever, and the “why” would remain a mystery.

The origin of uninterruptible
Fortunately, I did finally find the change, in a compressed mailbox file from 1993 on oldlinux.org. Here it is:

From: Matthias Urlichs urlichs@smurf.sub.org
Subject: Load average broken ?
Date: Fri, 29 Oct 1993 11:37:23 +0200

The kernel only counts “runnable” processes when computing the load average.
I don’t like that; the problem is that processes which are swapping or
waiting on “fast”, i.e. noninterruptible, I/O, also consume resources.

It seems somewhat nonintuitive that the load average goes down when you
replace your fast swap disk with a slow swap disk…

Anyway, the following patch seems to make the load average much more
consistent WRT the subjective speed of the system. And, most important, the
load is still zero when nobody is doing anything.

Linux Load Averages: Solving the Mystery相关推荐

linux load averages 和 cpu 使用率
Load Averages 这里的 Load Averages 指的是系统平均负载(System Load Averages),包括正在运行的进程和正在等待的进程对于系统的压力,不仅包括正在运行的进程 ...
如何理解 Linux 中的 load averages
原文:https://mp.weixin.qq.com/s?src=11&timestamp=1533697106&ver=1047&signature=poqrJFfcNAB ...
Linux load average负载量分析与解决思路
一.load average top命令中load average显示的是最近1分钟.5分钟和15分钟的系统平均负载.系统平均负载表示系统平均负载被定义为在特定时间间隔内运行队列中(在CPU上运行或 ...
Linux load average详解
转载至linux load average负载详细解释一.load average top命令中load average显示的是最近1分钟.5分钟和15分钟的系统平均负载.系统平均负载表示系统平均 ...
判断Linux load的值是否过高
接触过和使用过unix或linux的朋友,都知道如何查看Unix/Linux load的值,这边我也重复一下查看load的方法: [root@www.linuxidc.com ~]# uptime13 ...
linux load average,理解Linux中的Load Average
在Linux系统中,使用下面的命令: top w uptime (以上三个命令各有区别,top是以固定间隔显示进程的资源占用排名,w显示who and what they are doing,upti ...
linux load average,Linux 平均负载 Load Average 详解
一.什么是Load Average? 系统负载(System Load)是系统CPU繁忙程度的度量,即有多少进程在等待被CPU调度(进程等待队列的长度). 平均负载(Load Average)是一段时 ...
深入理解 Linux Load Average
一直不解,为什么io占用较高时,系统负载也会变高,偶遇此文,终解吾惑. #1 load average介绍 ##1.1 load average 指标介绍 uptime和top等命令都可以看到load ...
Linux Load Average高但磁盘IO和CPU占用率不高的可能原因
vmstat 1:查看block in(bi),block out(bo),interrupt(in),context switch(cs) pidstat -w 1:查看每个进程的context s ...

Linux Load Averages: Solving the Mystery

Linux Load Averages: Solving the Mystery相关推荐

最新文章

热门文章