注:本专栏所有分析以函数为主线,必要数据结构会带入讲解;数据库版本为Postgresql10.16。

注:如有讨论的需要请email to jackgo73@outlook.com

一、问题

Postgresql中常见的的sharebuffer配置为内存的25%,而mysql的bp常见配置为内存的75%,原因和刷盘方式不同有关。

常见模式(和配置有关)

  • pgsql:数据依赖OS CACHE,日志依赖OS CACHE
  • mysql:数据不依赖OS CACHE使用O_DIRECT直接刷盘,日志依赖OS CACHE

这里着重分析PG的几种sync参数的不同

二、参数

1 fsync

决定是否同步刷xlog,默认打开。

#fsync = on             # flush data to disk for crash safety# (turning this off can cause# unrecoverable data corruption)

fsync (boolean)

If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing fsync() system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash.

默认打开。

使用wal_sync_method配置的刷盘函数保证数据落盘。

While turning off fsync is often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash. Thus it is only advisable to turn off fsync if you can easily recreate your entire database from external data.

Examples of safe circumstances for turning off fsync include the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which gets recreated frequently and is not used for failover. High quality hardware alone is not a sufficient justification for turning off fsync.

For reliable recovery when changing fsync off to on, it is necessary to force all modified buffers in the kernel to durable storage. This can be done while the cluster is shutdown or while fsync is on by running initdb --sync-only, running sync, unmounting the file system, or rebooting the server.

In many situations, turning off synchronous_commit for noncritical transactions can provide much of the potential performance benefit of turning off fsync, without the attendant risks of data corruption.

fsync can only be set in the postgresql.conf file or on the server command line. If you turn this parameter off, also consider turning off full_page_writes.

2 wal_sync_method

上述参数打开后,使用什么方式同步刷xlog。默认fdatasync。

#wal_sync_method = fsync        # the default is the first option# supported by the operating system:#   open_datasync#   fdatasync (default on Linux)#   fsync#   fsync_writethrough#   open_sync

wal_sync_method (enum)

Method used for forcing WAL updates out to disk. If fsync is off then this setting is irrelevant, since WAL file updates will not be forced out at all. Possible values are:

  • open_datasync (write WAL files with open() option O_DSYNC)

  • fdatasync (call fdatasync() at each commit)

  • fsync (call fsync() at each commit)

  • fsync_writethrough (call fsync() at each commit, forcing write-through of any disk write cache)

  • open_sync (write WAL files with open() option O_SYNC)

The open_* options also use O_DIRECT if available. Not all of these choices are available on all platforms. The default is the first method in the above list that is supported by the platform, except that fdatasync is the default on Linux and FreeBSD. The default is not necessarily ideal; it might be necessary to change this setting or other aspects of your system configuration in order to create a crash-safe configuration or achieve optimal performance. These aspects are discussed in Section 29.1. This parameter can only be set in the postgresql.conf file or on the server command line.

如果系统支持,open_*会使用O_DIRECT;Linux会默认使用fdatasync;

open_datasync:open() O_DSYNC

fdatasync:fdatasync()

fsync:fsync()

fsync_writethrough:fsync()

open_sync:open() O_SYNC

源码位置:src/include/access/xlog.h

/* Sync methods */
#define SYNC_METHOD_FSYNC       0
#define SYNC_METHOD_FDATASYNC   1
#define SYNC_METHOD_OPEN        2   /* for O_SYNC */
#define SYNC_METHOD_FSYNC_WRITETHROUGH  3
#define SYNC_METHOD_OPEN_DSYNC  4   /* for O_DSYNC */
extern int  sync_method;

默认SYNC_METHOD_FDATASYNC

3 synchronous_commit

同步提交约束配置,默认on。

synchronous_commit (enum)

Specifies how much WAL processing must complete before the database server returns a “success” indication to the client. Valid values are remote_applyon (the default), remote_writelocal, and off.

If synchronous_standby_names is empty, the only meaningful settings are on and offremote_applyremote_write and local all provide the same local synchronization level as on. The local behavior of all non-off modes is to wait for local flush of WAL to disk.

所有非off的配置,都需要等fsync成功,事务才能返回。

In off mode, there is no waiting, so there can be a delay between when success is reported to the client and when the transaction is later guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter to off does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction. For more discussion see Section 29.3.

事务真正提交 和 事务成功返回客户端 不是一致的! 中间可能最多差三倍的wal_writer_delay。

一般把这个参数关了可以提升性能,为什么不关fsync呢?

因为这个参数关了之后,系统crash后最近的几条成功提交的事务会直接丢失,不会造成数据不一致。

而fsync关了之后,日志落盘完全没有保障了,提交了的事物可能一部分刷盘,一部分没有刷盘造成数据不一致。

If synchronous_standby_names is non-empty, synchronous_commit also controls whether transaction commits will wait for their WAL records to be processed on the standby server(s).

When set to remote_apply, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and applied it, so that it has become visible to queries on the standby(s), and also written to durable storage on the standbys. This will cause much larger commit delays than previous settings since it waits for WAL replay. When set to on, commits wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and flushed it to durable storage. This ensures the transaction will not be lost unless both the primary and all synchronous standbys suffer corruption of their database storage. When set to remote_write, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and written it to their file systems. This setting ensures data preservation if a standby instance of PostgreSQL crashes, but not if the standby suffers an operating-system-level crash because the data has not necessarily reached durable storage on the standby. The setting local causes commits to wait for local flush to disk, but not for replication. This is usually not desirable when synchronous replication is in use, but is provided for completeness.

This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. For example, to make a single multistatement transaction commit asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF within the transaction.

Table 19.1 summarizes the capabilities of the synchronous_commit settings.

Table 19.1. synchronous_commit Modes

synchronous_commit setting local durable commit standby durable commit after PG crash standby durable commit after OS crash standby query consistency
remote_apply
on  
remote_write    
local      
off        

wal_writer_delay/wal_writer_flush_after

常用配置

wal_writer_delay = 10ms

wal_writer_flush_after = 0  # IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB

wal_writer_delay (integer)

Specifies how often the WAL writer flushes WAL, in time terms. After flushing WAL the writer sleeps for the length of time given by wal_writer_delay, unless woken up sooner by an asynchronously committing transaction. If the last flush happened less than wal_writer_delay ago and less than wal_writer_flush_after worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. If this value is specified without units, it is taken as milliseconds. The default value is 200 milliseconds (200ms). Note that on many systems, the effective resolution of sleep delays is 10 milliseconds; setting wal_writer_delay to a value that is not a multiple of 10 might have the same results as setting it to the next higher multiple of 10. This parameter can only be set in the postgresql.conf file or on the server command line.

不满足10ms的事务只写磁盘,不做flush。

wal_writer_flush_after (integer)

Specifies how often the WAL writer flushes WAL, in volume terms. If the last flush happened less than wal_writer_delay ago and less than wal_writer_flush_after worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. If wal_writer_flush_after is set to 0 then WAL data is always flushed immediately. If this value is specified without units, it is taken as WAL blocks, that is XLOG_BLCKSZ bytes, typically 8kB. The default is 1MB. This parameter can only be set in the postgresql.conf file or on the server command line.

同上,从空间上触发刷盘。一般盘的io写日志无瓶颈的话,不需要使用这个参数。

除非发现刷xlog周期性的打满IO,配这个参数有奇效。

三、fsync相关源码

xlog文件创建。

XLogFileInit

int
XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
{...fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),S_IRUSR | S_IWUSR);...
}

get_sync_bit(sync_method)      sync_method = 1

.../* Sync methods */
#define SYNC_METHOD_FSYNC       0
#define SYNC_METHOD_FDATASYNC   1
#define SYNC_METHOD_OPEN        2   /* for O_SYNC */
#define SYNC_METHOD_FSYNC_WRITETHROUGH  3
#define SYNC_METHOD_OPEN_DSYNC  4   /* for O_DSYNC */
extern int  sync_method;...static int
get_sync_bit(int method)
{....if (!XLogIsNeeded() && !AmWalReceiverProcess())o_direct_flag = PG_O_DIRECT;....switch (method){/** enum values for all sync options are defined even if they are* not supported on the current platform.  But if not, they are* not included in the enum option array, and therefore will never* be seen here.*/case SYNC_METHOD_FSYNC:case SYNC_METHOD_FSYNC_WRITETHROUGH:case SYNC_METHOD_FDATASYNC:return 0;
#ifdef OPEN_SYNC_FLAGcase SYNC_METHOD_OPEN:return OPEN_SYNC_FLAG | o_direct_flag;
#endif
#ifdef OPEN_DATASYNC_FLAGcase SYNC_METHOD_OPEN_DSYNC:return OPEN_DATASYNC_FLAG | o_direct_flag;
#endifdefault:/* can't happen (unless we are out of sync with option array) */elog(ERROR, "unrecognized wal_sync_method: %d", method);return 0;           /* silence warning */}
}

默认函数返回0。

XLogFileOpen

xlog文件打开

    fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),S_IRUSR | S_IWUSR);

issue_xlog_fsync

触发一次同步刷盘

/** Issue appropriate kind of fsync (if any) for an XLOG output file.** 'fd' is a file descriptor for the XLOG file to be fsync'd.* 'log' and 'seg' are for error reporting purposes.*/
void
issue_xlog_fsync(int fd, XLogSegNo segno)
{switch (sync_method){case SYNC_METHOD_FSYNC:if (pg_fsync_no_writethrough(fd) != 0)ereport(PANIC,(errcode_for_file_access(),errmsg("could not fsync log file %s: %m",XLogFileNameP(ThisTimeLineID, segno))));break;
#ifdef HAVE_FSYNC_WRITETHROUGHcase SYNC_METHOD_FSYNC_WRITETHROUGH:if (pg_fsync_writethrough(fd) != 0)ereport(PANIC,(errcode_for_file_access(),errmsg("could not fsync write-through log file %s: %m",XLogFileNameP(ThisTimeLineID, segno))));break;
#endif
#ifdef HAVE_FDATASYNCcase SYNC_METHOD_FDATASYNC:if (pg_fdatasync(fd) != 0)ereport(PANIC,(errcode_for_file_access(),errmsg("could not fdatasync log file %s: %m",XLogFileNameP(ThisTimeLineID, segno))));break;
#endifcase SYNC_METHOD_OPEN:case SYNC_METHOD_OPEN_DSYNC:/* write synced it already */break;default:elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);break;}
}

默认走fdatasync

/** pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off** Not all platforms have fdatasync; treat as fsync if not available.*/
int
pg_fdatasync(int fd)
{if (enableFsync){
#ifdef HAVE_FDATASYNCreturn fdatasync(fd);
#elsereturn fsync(fd);
#endif}elsereturn 0;
}

HAVE_FDATASYNC宏由configure时配置。

OPEN参数含义

上述过程可以看到,默认情况open参数如下:

O_RDWR | PG_BINARY | 0

  • O_RDWR:打开方式 read/write
  • PG_BINARY:适配windows,默认为0。

S_IRUSR | S_IWUSR

  • S_IRUSR:读权限
  • S_IWUSR:写权限

Postgresql中的sync相关参数源码分析相关推荐

  1. netty中的future和promise源码分析(二)

    前面一篇netty中的future和promise源码分析(一)中对future进行了重点分析,接下来讲一讲promise. promise是可写的future,从future的分析中可以发现在其中没 ...

  2. WebRTC[1]-WebRTC中h264解码过程的源码分析

    目录 前言 正文 <WebRTC工作原理精讲>系列-总览_liuzhen007的专栏-CSDN博客_webrtc 原理前言欢迎大家订阅Data-Mining 的<WebRTC工作原理 ...

  3. 【Linux 内核】进程管理 ( 进程相关系统调用源码分析 | fork() 源码 | vfork() 源码 | clone() 源码 | _do_fork() 源码 | do_fork() 源码 )

    文章目录 一.fork 系统调用源码 二.vfork 系统调用源码 三.clone 系统调用源码 四._do_fork 函数源码 五.do_fork 函数源码 Linux 进程相关 " 系统 ...

  4. 【Java】NIO中Selector的select方法源码分析

    该篇博客的有些内容和在之前介绍过了,在这里再次涉及到的就不详细说了,如果有不理解请看[Java]NIO中Channel的注册源码分析, [Java]NIO中Selector的创建源码分析 Select ...

  5. 【java】java中的线程池 ThreadPoolExecutor源码分析

    文章目录 1.概述 4.源码 4.1 关键属性 4.2 构造函数 4.4 状态控制 4.5 ThreadLocalMap 4.6 execute方法源码分析 4.7 addWorker方法源码分析 4 ...

  6. Java中的锁大全(底层源码分析)

    引用:https://tech.meituan.com/2018/11/15/java-lock.html 加锁过程:https://www.cnblogs.com/hkdpp/p/11917383. ...

  7. Apache Mahout中推荐算法Slope one源码分析

    2019独角兽企业重金招聘Python工程师标准>>> 关于推荐引擎 如今的互联网中,无论是电子商务还是社交网络,对数据挖掘的需求都越来越大了,而推荐引擎正是数据挖掘完美体现:通过分 ...

  8. MapReduce中map并行度优化及源码分析

    mapTask并行度的决定机制 一个job的map阶段并行度由客户端在提交job时决定,而客户端对map阶段并行度的规划的基本逻辑为:将待处理数据执行逻辑切片(即按照一个特定切片大小,将待处理数据划分 ...

  9. java中 resource_Java中如何获取resource的源码分析

    java 中获取 resource一般通过getResource(),不管你是通过class还是classloader来调用, 首先我们来讲讲getResource(string name)这个nam ...

最新文章

  1. python【Matlibplot绘图库】绘制用于学术论文投稿的黑白图片
  2. Context结构图
  3. Linux下执行.sh文件
  4. Visual Studio Code——待补充
  5. sqoop建表_Sqoop基础教程
  6. 内存性能测试---stream benchmark 的使用
  7. 适用于openvino 2020.2的yolov5的docker制作
  8. 前端----表格的具体使用(jquery)
  9. 转:VC++获取屏幕大小第一篇 像素大小GetSystemMetrics
  10. 通过SAXReader解析XML
  11. FMS直播和点播测试
  12. 淘宝粉丝能买吗?怎么加最快
  13. CH340G的调试过程
  14. 正确设置 pip,避免There was a problem confirming the ssl certificate问题
  15. Frenet坐标系与Cartesian坐标系互转(一):公式推导
  16. 股票交易sdk接口是什么意思?
  17. 哪些软件是用C++写的
  18. Java面向对象高级部分——通过Class类实例化对象(五十二)
  19. 用python做成绩分析做表格_excel成绩表-懂Excel就能轻松入门Python数据分析包pandas(三):制作成绩条...
  20. 清明节快到了,车辆尾号限行规则提前看!一文带你了解全国各地的尾号限行规则

热门文章

  1. php5那一年,ps cs5是哪一年的
  2. VC++编程实现网络嗅探器
  3. 前端使用bootstrap实现一个简单的小案例,重点是关注响应式布局
  4. vst和vst3插件_如何使用VST插件改善Twitch流音频
  5. STC单片机实现智能小车避障功能的C语言例程
  6. 不超过十个人的好电影
  7. sads以后明天撒谎打算
  8. 企业移动信息化应用开发模式选型指南
  9. MindManager2020官方中文许可秘钥版免费下载思维导图软件安装使用教程
  10. JS事件与操作元素--世纪佳缘(用户名、显示隐藏内容)--黑马程序员pink老师JS第P25-操作元素总结及作业1