标签

PostgreSQL , auto_explain , pg_test_timing , 时钟 , tsc , hpet , acpi , acpi_pm , Linux


背景

我们在诊断SQL的执行计划时,通常会用explain analyze,analyze有几个开关,其中一个是timing,它会帮你记录下SQL每个NODE的执行时间。

但是这部分是有一定的性能开销的,而且这个开销与操作系统的时钟获取接口有关。

有时,你会发现explain analyze的执行时间远大于真实的执行时间,这是为什么呢?

时钟硬件与时间精确度

常见时钟方法精度 tsc > hpet ( 100 纳秒(十亿分之一秒) ) > acpi_pm ( 300 纳秒(十亿分之一秒) )

Clock hardware and timing accuracy

Collecting accurate timing information is normally done on computers using hardware clocks with various levels of accuracy.
With some hardware the operating systems can pass the system clock time almost directly to programs.
A system clock can also be derived from a chip that simply provides timing interrupts, periodic ticks at some known time interval.
In either case, operating system kernels provide a clock source that hides these details.
But the accuracy of that clock source and how quickly it can return results varies based on the underlying hardware.  Inaccurate time keeping can result in system instability. Test any change to the clock source very carefully.
Operating system defaults are sometimes made to favor reliability over best accuracy. And if you are using a virtual machine, look into the recommended time sources compatible with it.
Virtual hardware faces additional difficulties when emulating timers, and there are often per operating system settings suggested by vendors.  The Time Stamp Counter (TSC) clock source is the most accurate one available on current generation CPUs.
It's the preferred way to track the system time when it's supported by the operating system and the TSC clock is reliable.
There are several ways that TSC can fail to provide an accurate timing source, making it unreliable.
Older systems can have a TSC clock that varies based on the CPU temperature, making it unusable for timing.   以前就有遇到过机器时钟跳来跳去的问题,我记得是IBMX3950的堆叠服务器
Trying to use TSC on some older multicore CPUs can give a reported time that's inconsistent among multiple cores.
This can result in the time going backwards, a problem this program checks for. And even the newest systems can fail to provide accurate TSC timing with very aggressive power saving configurations.  Newer operating systems may check for the known TSC problems and switch to a slower, more stable clock source when they are seen.
If your system supports TSC time but doesn't default to that, it may be disabled for a good reason.
And some operating systems may not detect all the possible problems correctly, or will allow using TSC even in situations where it's known to be inaccurate.  The High Precision Event Timer (HPET) is the preferred timer on systems where it's available and TSC is not accurate.
The timer chip itself is programmable to allow up to 100 nanosecond resolution, but you may not see that much accuracy in your system clock.  Advanced Configuration and Power Interface (ACPI) provides a Power Management (PM) Timer, which Linux refers to as the acpi_pm.
The clock derived from acpi_pm will at best provide 300 nanosecond resolution.  Timers used on older PC hardware include the 8254 Programmable Interval Timer (PIT),
the real-time clock (RTC), the Advanced Programmable Interrupt Controller (APIC) timer,
and the Cyclone timer.
These timers aim for millisecond resolution.

explain analyze代码

当开启了explain analyze timing开关时,会设置instrument_option |= INSTRUMENT_TIMER;

src/backend/commands/explain.c

ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,  const char *queryString, ParamListInfo params,  const instr_time *planduration)
{  DestReceiver *dest;  QueryDesc  *queryDesc;  instr_time      starttime;  double          totaltime = 0;  int                     eflags;  int                     instrument_option = 0;  if (es->analyze && es->timing)  instrument_option |= INSTRUMENT_TIMER;  else if (es->analyze)  instrument_option |= INSTRUMENT_ROWS;  if (es->buffers)  instrument_option |= INSTRUMENT_BUFFERS;

这个设置会直接影响EXECUTE时的时钟统计,这部分还不是很友好,其实没有必要每条TUPLE都统计这部分时间,只要统计进出NODE的时候的时间即可

src/backend/executor/instrument.c

/* Entry to a plan node */
void
InstrStartNode(Instrumentation *instr)
{  if (instr->need_timer)  {  if (INSTR_TIME_IS_ZERO(instr->starttime))  INSTR_TIME_SET_CURRENT(instr->starttime);  else  elog(ERROR, "InstrStartNode called twice in a row");  }  /* save buffer usage totals at node entry, if needed */  if (instr->need_bufusage)  instr->bufusage_start = pgBufferUsage;
}  /* Exit from a plan node */
void
InstrStopNode(Instrumentation *instr, double nTuples)
{  instr_time      endtime;  /* count the returned tuples */  instr->tuplecount += nTuples;  /* let's update the time only if the timer was requested */  if (instr->need_timer)  {  if (INSTR_TIME_IS_ZERO(instr->starttime))  elog(ERROR, "InstrStopNode called without start");  INSTR_TIME_SET_CURRENT(endtime);  INSTR_TIME_ACCUM_DIFF(instr->counter, endtime, instr->starttime);  INSTR_TIME_SET_ZERO(instr->starttime);  }  /* Add delta of buffer usage since entry to node's totals */  if (instr->need_bufusage)  BufferUsageAccumDiff(&instr->bufusage,  &pgBufferUsage, &instr->bufusage_start);  /* Is this the first tuple of this cycle? */  if (!instr->running)  {  instr->running = true;  instr->firsttuple = INSTR_TIME_GET_DOUBLE(instr->counter);  }
}

src/include/portability/instr_time.h

#define INSTR_TIME_SET_CURRENT(t)       gettimeofday(&(t), NULL)

单条QUERY , EXECUTE可能被多次调用,所以如果要统计时间。

src/backend/executor/execMain.c

void
ExecutorRun(QueryDesc *queryDesc,  ScanDirection direction, uint64 count)
{  if (ExecutorRun_hook)  (*ExecutorRun_hook) (queryDesc, direction, count);  else  standard_ExecutorRun(queryDesc, direction, count);
}  void
standard_ExecutorRun(QueryDesc *queryDesc,  ScanDirection direction, uint64 count)
{
...  /* Allow instrumentation of Executor overall runtime */  if (queryDesc->totaltime)  InstrStartNode(queryDesc->totaltime);  // 时钟
...  if (queryDesc->totaltime)  InstrStopNode(queryDesc->totaltime, estate->es_processed);

如果需要处理的记录数非常多,由于频繁调用gettimeofday,就导致了explain analyze timing时间拉长的问题。

例子

1000万纪录的count(*)操作,我们看看实际的执行时间,以及开启analyze timing后,不同时钟硬件的性能影响

create table tbl_time(id int);
insert into tbl_time select generate_series(1,10000000);  \timing
postgres=# SELECT COUNT(*) FROM tbl_time;  count
----------  10000000
(1 row)
Time: 1171.956 ms

使用tsc hpet acpi_pm三种时钟硬件测试性能影响

#cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm

1. tsc

#echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource   postgres=# explain analyze SELECT COUNT(*) FROM tbl_time;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------  Aggregate  (cost=169247.71..169247.72 rows=1 width=8) (actual time=2113.432..2113.432 rows=1 loops=1)  ->  Seq Scan on tbl_time  (cost=0.00..144247.77 rows=9999977 width=0) (actual time=0.013..1128.860 rows=10000000 loops=1)  Planning time: 0.062 ms  Execution time: 2113.514 ms
(4 rows)

2. hpet

#echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource   postgres=# explain analyze SELECT COUNT(*) FROM tbl_time;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------  Aggregate  (cost=169247.71..169247.72 rows=1 width=8) (actual time=13968.218..13968.218 rows=1 loops=1)  ->  Seq Scan on tbl_time  (cost=0.00..144247.77 rows=9999977 width=0) (actual time=0.018..7067.711 rows=10000000 loops=1)  Planning time: 0.059 ms  Execution time: 13968.271 ms
(4 rows)

3. acpi_pm

#echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource   postgres=# explain analyze SELECT COUNT(*) FROM tbl_time;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------  Aggregate  (cost=169247.71..169247.72 rows=1 width=8) (actual time=19641.242..19641.243 rows=1 loops=1)  ->  Seq Scan on tbl_time  (cost=0.00..144247.77 rows=9999977 width=0) (actual time=0.018..9896.285 rows=10000000 loops=1)  Planning time: 0.060 ms  Execution time: 19641.296 ms
(4 rows)

使用pg_test_timing测试不同的时钟硬件在使用gettimeofday时带来的性能影响

pg_test_timing是用来测试不同硬件时钟的损耗的,使用gettimeofday进行测试

下面是测试

1. tsc

pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 47.70 nsec
Histogram of timing durations:
< usec   % of total      count  1     95.23986   59893249  2      4.75540    2990515  4      0.00384       2414  8      0.00077        485  16      0.00013         79  32      0.00000          3  64      0.00000          1

2. hpet

pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 696.44 nsec
Histogram of timing durations:
< usec   % of total      count  1     31.81944    1370669  2     67.06767    2889038  4      1.03890      44752  8      0.05959       2567  16      0.01418        611  32      0.00016          7  64      0.00005          2

3. acpi_pm

pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 919.07 nsec
Histogram of timing durations:
< usec   % of total      count  1     12.25423     399999  2     84.17305    2747553  4      3.45019     112620  8      0.08648       2823  16      0.03468       1132  32      0.00132         43  64      0.00003          1  128      0.00003          1

根据以上测试,可以预估前面EXPLAIN ANALYZE带来的问题

acpi_pm
<1到底是多少不知道了,<1越多,误差越大
10000000*(1*0.1225+2*0.8417)/1000000    hpet
<1到底是多少不知道了,<1越多,误差越大
10000000*(1*0.3182+2*0.6706)/1000000    tsc
不太好估算,因为<1到底是多少不知道了
10000000*(1*0.9524+2*0.0475)/1000000

auto_explain log timing也有类似问题

当开启auto_explain的timing计数后,由于时钟开销的问题,可能严重的影响性能,如本文所示,即使使用TSC时钟,对于扫描记录数很多的时候,性能下降了一倍。

虽然auto_explain有一个超时阈值,但是当你开启了timing的记录后,就会导致它把所有的NODE执行时间都记录下来,因为执行结束前,并不知道总时间会不会超,所以每条QUERY的ANALYZE TIMING都会被开启。

建议如果不是特殊需求,不要开启auto_explain timing选项。

参考

https://www.ibm.com/developerworks/cn/linux/1308_liuming_linuxtime4/

https://www.postgresql.org/docs/9.6/static/pgtesttiming.html

https://www.postgresql.org/docs/9.6/static/auto-explain.html

Linux 时钟精度 与 PostgreSQL auto_explain (explain timing 时钟开销估算)相关推荐

  1. 如何提高linux的时钟精度,Linux时钟精度提高有什么办法?

    2 动态高精度时钟设计和实现 动态高精度时钟设计方案借鉴了KURT-Linux思想,但与其不同的是提供一个与标准Linux核心时钟并行的具有精密刻度的实时时钟,并与原核心时钟区别开.采用X86体系CP ...

  2. linux内核纳秒精度时间,Linux时钟精度:毫秒?微妙?纳秒?

    最近被内核时钟精度弄的很是郁闷.具体情况如下: 扫盲:1秒=1000毫秒=1000000微妙=1000000000纳秒 首先:linux有一个很重要的概念--节拍,它的单位是(次/秒).2.6内核这个 ...

  3. linux设置rx8010时间,【经验】实时时钟模块RX8010SJ精度的软件校准方法

    RX8010SJ,由于使用简单.标准SOP8封装和高性价比等特点,已经被广泛的应用于各种电子产品的设计中.虽然RX8010SJ内置出厂时经过校准的32.768KHz的晶体单元,能够很大程度上保证了其精 ...

  4. linux下编写时钟代码,Linux时间子系统之一:clock source(时钟源)【转】(示例代码)...

    clock source用于为linux内核提供一个时间基线,如果你用linux的date命令获取当前时间,内核会读取当前的clock source,转换并返回合适的时间单位给用户空间.在硬件层,它通 ...

  5. Linux时间子系统之一:clock source(时钟源)【转】

    本文转载自:http://blog.csdn.net/droidphone/article/details/7975694 版权声明:本文为博主原创文章,未经博主允许不得转载. 目录(?)[+] cl ...

  6. arm linux 时钟源 信息,Linux时间子系统之一:clock source(时钟源)

    clock source用于为linux内核提供一个时间基线,如果你用linux的date命令获取当前时间,内核会读取当前的clock source,转换并返回合适的时间单位给用户空间.在硬件层,它通 ...

  7. 我使用过的Linux命令之hwclock - 查询和设置硬件时钟

    我使用过的Linux命令之hwclock - 查询和设置硬件时钟 本文链接:http://codingstandards.iteye.com/blog/804830   (转载请注明出处) 用途说明 ...

  8. linux下hwclock不能同步时间到硬件时钟

    问题:        linux(redhat6.5)下hwclock不能同步时间到硬件时钟 修改原因:        硬件时钟是存储在CMOS里的时钟,关机后该时钟依然运行,主板的电池为它供电.那个 ...

  9. 嵌入式Linux驱动笔记(十四)------详解clock时钟(CCF)框架及clk_get函数

    你好!这里是风筝的博客, 欢迎和我一起交流. 我在找资料的时候,发现网上大部分文章都是说: 在s3c244x_init_clocks函数里: void __init s3c244x_init_cloc ...

最新文章

  1. zookeeper java.env_zookeeper在生产环境中的配置(zookeeper3.6)
  2. 【哲学】不可知论是什么?agnosticism
  3. mysql+sqlplus命令找不到_bash: sqlplus: command not found 解决方法
  4. global与nonlocal关键字
  5. JDK 14的新特性:更加好用的NullPointerExceptions
  6. 深入理解右值引用,move语义和完美转发
  7. Tomcat的Session管理(三)
  8. android ndk串口触屏,Aandroid NDK开发之串口控制
  9. javascript : spket 视频教程
  10. C语言常用开发环境IDE
  11. 全国青少年编程等级考试scratch一级真题2021年12月(含题库答题软件账号)
  12. 笔记本电脑桌面的计算机图标不见了,win7系统笔记本电脑桌面计算机图标不见了的解决方法...
  13. 百度地图动态添加marker的图片显示问题
  14. 可中心可边缘,云计算“罗马大路”需要什么样的超融合新基建?
  15. OpenXML指定位置插入图片
  16. oracle教程课件,Oracle入门教程(PPT课件)
  17. OCR--苹果ios安卓android身份证拍照扫描识别sdk
  18. 使用ie浏览器打开chrome(谷歌)。
  19. VR系列--资料汇总
  20. g54y6huj6yh

热门文章

  1. 成为一流CSS设计师的8大技巧
  2. 如何实现快速的diff工具(windiff, winmerge)?
  3. XamarinSQLite教程Xamarin.Android项目添加引用
  4. 图片预览组件PhotoView
  5. 提取IPv6地址的编码信息
  6. ​网页图表Highcharts实践教程标之添加题副标题版权信息
  7. win7不显示移动硬盘_win7系统插上移动硬盘后灯一直闪但是不识别如何解决
  8. 家校通Android源码,基于Android的家校通系统设计与实现
  9. 牛津大学的研究人员首次在人体植入“闭环”生物电子研究系统
  10. 早上醒来收获一个Surprise,成为CSDN博客专家了