Postgresql中的sync相关参数源码分析
注:本专栏所有分析以函数为主线,必要数据结构会带入讲解;数据库版本为Postgresql10.16。
注:如有讨论的需要请email to jackgo73@outlook.com
一、问题
Postgresql中常见的的sharebuffer配置为内存的25%,而mysql的bp常见配置为内存的75%,原因和刷盘方式不同有关。
常见模式(和配置有关)
- pgsql:数据依赖OS CACHE,日志依赖OS CACHE
- mysql:数据不依赖OS CACHE使用O_DIRECT直接刷盘,日志依赖OS CACHE
这里着重分析PG的几种sync参数的不同
二、参数
1 fsync
决定是否同步刷xlog,默认打开。
#fsync = on # flush data to disk for crash safety# (turning this off can cause# unrecoverable data corruption)
fsync
(boolean
)If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing
fsync()
system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash.
默认打开。
使用wal_sync_method配置的刷盘函数保证数据落盘。
While turning off
fsync
is often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash. Thus it is only advisable to turn offfsync
if you can easily recreate your entire database from external data.Examples of safe circumstances for turning off
fsync
include the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which gets recreated frequently and is not used for failover. High quality hardware alone is not a sufficient justification for turning offfsync
.For reliable recovery when changing
fsync
off to on, it is necessary to force all modified buffers in the kernel to durable storage. This can be done while the cluster is shutdown or whilefsync
is on by runninginitdb --sync-only
, runningsync
, unmounting the file system, or rebooting the server.In many situations, turning off synchronous_commit for noncritical transactions can provide much of the potential performance benefit of turning off
fsync
, without the attendant risks of data corruption.
fsync
can only be set in thepostgresql.conf
file or on the server command line. If you turn this parameter off, also consider turning off full_page_writes.
2 wal_sync_method
上述参数打开后,使用什么方式同步刷xlog。默认fdatasync。
#wal_sync_method = fsync # the default is the first option# supported by the operating system:# open_datasync# fdatasync (default on Linux)# fsync# fsync_writethrough# open_sync
wal_sync_method
(enum
)Method used for forcing WAL updates out to disk. If
fsync
is off then this setting is irrelevant, since WAL file updates will not be forced out at all. Possible values are:
open_datasync
(write WAL files withopen()
optionO_DSYNC
)
fdatasync
(callfdatasync()
at each commit)
fsync
(callfsync()
at each commit)
fsync_writethrough
(callfsync()
at each commit, forcing write-through of any disk write cache)
open_sync
(write WAL files withopen()
optionO_SYNC
)The
open_
* options also useO_DIRECT
if available. Not all of these choices are available on all platforms. The default is the first method in the above list that is supported by the platform, except thatfdatasync
is the default on Linux and FreeBSD. The default is not necessarily ideal; it might be necessary to change this setting or other aspects of your system configuration in order to create a crash-safe configuration or achieve optimal performance. These aspects are discussed in Section 29.1. This parameter can only be set in thepostgresql.conf
file or on the server command line.
如果系统支持,open_*会使用O_DIRECT;Linux会默认使用fdatasync;
open_datasync:open() O_DSYNC
fdatasync:fdatasync()
fsync:fsync()
fsync_writethrough:fsync()
open_sync:open() O_SYNC
源码位置:src/include/access/xlog.h
/* Sync methods */
#define SYNC_METHOD_FSYNC 0
#define SYNC_METHOD_FDATASYNC 1
#define SYNC_METHOD_OPEN 2 /* for O_SYNC */
#define SYNC_METHOD_FSYNC_WRITETHROUGH 3
#define SYNC_METHOD_OPEN_DSYNC 4 /* for O_DSYNC */
extern int sync_method;
默认SYNC_METHOD_FDATASYNC
3 synchronous_commit
同步提交约束配置,默认on。
synchronous_commit
(enum
)Specifies how much WAL processing must complete before the database server returns a “success” indication to the client. Valid values are
remote_apply
,on
(the default),remote_write
,local
, andoff
.If
synchronous_standby_names
is empty, the only meaningful settings areon
andoff
;remote_apply
,remote_write
andlocal
all provide the same local synchronization level ason
. The local behavior of all non-off
modes is to wait for local flush of WAL to disk.
所有非off的配置,都需要等fsync成功,事务才能返回。
In
off
mode, there is no waiting, so there can be a delay between when success is reported to the client and when the transaction is later guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter tooff
does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turningsynchronous_commit
off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction. For more discussion see Section 29.3.
事务真正提交 和 事务成功返回客户端 不是一致的! 中间可能最多差三倍的wal_writer_delay。
一般把这个参数关了可以提升性能,为什么不关fsync呢?
因为这个参数关了之后,系统crash后最近的几条成功提交的事务会直接丢失,不会造成数据不一致。
而fsync关了之后,日志落盘完全没有保障了,提交了的事物可能一部分刷盘,一部分没有刷盘造成数据不一致。
If synchronous_standby_names is non-empty,
synchronous_commit
also controls whether transaction commits will wait for their WAL records to be processed on the standby server(s).When set to
remote_apply
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and applied it, so that it has become visible to queries on the standby(s), and also written to durable storage on the standbys. This will cause much larger commit delays than previous settings since it waits for WAL replay. When set toon
, commits wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and flushed it to durable storage. This ensures the transaction will not be lost unless both the primary and all synchronous standbys suffer corruption of their database storage. When set toremote_write
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and written it to their file systems. This setting ensures data preservation if a standby instance of PostgreSQL crashes, but not if the standby suffers an operating-system-level crash because the data has not necessarily reached durable storage on the standby. The settinglocal
causes commits to wait for local flush to disk, but not for replication. This is usually not desirable when synchronous replication is in use, but is provided for completeness.This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. For example, to make a single multistatement transaction commit asynchronously when the default is the opposite, issue
SET LOCAL synchronous_commit TO OFF
within the transaction.Table 19.1 summarizes the capabilities of the
synchronous_commit
settings.Table 19.1. synchronous_commit Modes
synchronous_commit setting local durable commit standby durable commit after PG crash standby durable commit after OS crash standby query consistency remote_apply • • • • on • • • remote_write • • local • off
wal_writer_delay/wal_writer_flush_after
常用配置
wal_writer_delay = 10ms
wal_writer_flush_after = 0 # IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB
wal_writer_delay
(integer
)Specifies how often the WAL writer flushes WAL, in time terms. After flushing WAL the writer sleeps for the length of time given by
wal_writer_delay
, unless woken up sooner by an asynchronously committing transaction. If the last flush happened less thanwal_writer_delay
ago and less thanwal_writer_flush_after
worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. If this value is specified without units, it is taken as milliseconds. The default value is 200 milliseconds (200ms
). Note that on many systems, the effective resolution of sleep delays is 10 milliseconds; settingwal_writer_delay
to a value that is not a multiple of 10 might have the same results as setting it to the next higher multiple of 10. This parameter can only be set in thepostgresql.conf
file or on the server command line.
不满足10ms的事务只写磁盘,不做flush。
wal_writer_flush_after
(integer
)Specifies how often the WAL writer flushes WAL, in volume terms. If the last flush happened less than
wal_writer_delay
ago and less thanwal_writer_flush_after
worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. Ifwal_writer_flush_after
is set to0
then WAL data is always flushed immediately. If this value is specified without units, it is taken as WAL blocks, that isXLOG_BLCKSZ
bytes, typically 8kB. The default is1MB
. This parameter can only be set in thepostgresql.conf
file or on the server command line.
同上,从空间上触发刷盘。一般盘的io写日志无瓶颈的话,不需要使用这个参数。
除非发现刷xlog周期性的打满IO,配这个参数有奇效。
三、fsync相关源码
xlog文件创建。
XLogFileInit
int
XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
{...fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),S_IRUSR | S_IWUSR);...
}
get_sync_bit(sync_method) sync_method = 1
.../* Sync methods */
#define SYNC_METHOD_FSYNC 0
#define SYNC_METHOD_FDATASYNC 1
#define SYNC_METHOD_OPEN 2 /* for O_SYNC */
#define SYNC_METHOD_FSYNC_WRITETHROUGH 3
#define SYNC_METHOD_OPEN_DSYNC 4 /* for O_DSYNC */
extern int sync_method;...static int
get_sync_bit(int method)
{....if (!XLogIsNeeded() && !AmWalReceiverProcess())o_direct_flag = PG_O_DIRECT;....switch (method){/** enum values for all sync options are defined even if they are* not supported on the current platform. But if not, they are* not included in the enum option array, and therefore will never* be seen here.*/case SYNC_METHOD_FSYNC:case SYNC_METHOD_FSYNC_WRITETHROUGH:case SYNC_METHOD_FDATASYNC:return 0;
#ifdef OPEN_SYNC_FLAGcase SYNC_METHOD_OPEN:return OPEN_SYNC_FLAG | o_direct_flag;
#endif
#ifdef OPEN_DATASYNC_FLAGcase SYNC_METHOD_OPEN_DSYNC:return OPEN_DATASYNC_FLAG | o_direct_flag;
#endifdefault:/* can't happen (unless we are out of sync with option array) */elog(ERROR, "unrecognized wal_sync_method: %d", method);return 0; /* silence warning */}
}
默认函数返回0。
XLogFileOpen
xlog文件打开
fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),S_IRUSR | S_IWUSR);
issue_xlog_fsync
触发一次同步刷盘
/** Issue appropriate kind of fsync (if any) for an XLOG output file.** 'fd' is a file descriptor for the XLOG file to be fsync'd.* 'log' and 'seg' are for error reporting purposes.*/
void
issue_xlog_fsync(int fd, XLogSegNo segno)
{switch (sync_method){case SYNC_METHOD_FSYNC:if (pg_fsync_no_writethrough(fd) != 0)ereport(PANIC,(errcode_for_file_access(),errmsg("could not fsync log file %s: %m",XLogFileNameP(ThisTimeLineID, segno))));break;
#ifdef HAVE_FSYNC_WRITETHROUGHcase SYNC_METHOD_FSYNC_WRITETHROUGH:if (pg_fsync_writethrough(fd) != 0)ereport(PANIC,(errcode_for_file_access(),errmsg("could not fsync write-through log file %s: %m",XLogFileNameP(ThisTimeLineID, segno))));break;
#endif
#ifdef HAVE_FDATASYNCcase SYNC_METHOD_FDATASYNC:if (pg_fdatasync(fd) != 0)ereport(PANIC,(errcode_for_file_access(),errmsg("could not fdatasync log file %s: %m",XLogFileNameP(ThisTimeLineID, segno))));break;
#endifcase SYNC_METHOD_OPEN:case SYNC_METHOD_OPEN_DSYNC:/* write synced it already */break;default:elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);break;}
}
默认走fdatasync
/** pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off** Not all platforms have fdatasync; treat as fsync if not available.*/
int
pg_fdatasync(int fd)
{if (enableFsync){
#ifdef HAVE_FDATASYNCreturn fdatasync(fd);
#elsereturn fsync(fd);
#endif}elsereturn 0;
}
HAVE_FDATASYNC宏由configure时配置。
OPEN参数含义
上述过程可以看到,默认情况open参数如下:
O_RDWR | PG_BINARY | 0
- O_RDWR:打开方式 read/write
- PG_BINARY:适配windows,默认为0。
S_IRUSR | S_IWUSR
- S_IRUSR:读权限
- S_IWUSR:写权限
Postgresql中的sync相关参数源码分析相关推荐
- netty中的future和promise源码分析(二)
前面一篇netty中的future和promise源码分析(一)中对future进行了重点分析,接下来讲一讲promise. promise是可写的future,从future的分析中可以发现在其中没 ...
- WebRTC[1]-WebRTC中h264解码过程的源码分析
目录 前言 正文 <WebRTC工作原理精讲>系列-总览_liuzhen007的专栏-CSDN博客_webrtc 原理前言欢迎大家订阅Data-Mining 的<WebRTC工作原理 ...
- 【Linux 内核】进程管理 ( 进程相关系统调用源码分析 | fork() 源码 | vfork() 源码 | clone() 源码 | _do_fork() 源码 | do_fork() 源码 )
文章目录 一.fork 系统调用源码 二.vfork 系统调用源码 三.clone 系统调用源码 四._do_fork 函数源码 五.do_fork 函数源码 Linux 进程相关 " 系统 ...
- 【Java】NIO中Selector的select方法源码分析
该篇博客的有些内容和在之前介绍过了,在这里再次涉及到的就不详细说了,如果有不理解请看[Java]NIO中Channel的注册源码分析, [Java]NIO中Selector的创建源码分析 Select ...
- 【java】java中的线程池 ThreadPoolExecutor源码分析
文章目录 1.概述 4.源码 4.1 关键属性 4.2 构造函数 4.4 状态控制 4.5 ThreadLocalMap 4.6 execute方法源码分析 4.7 addWorker方法源码分析 4 ...
- Java中的锁大全(底层源码分析)
引用:https://tech.meituan.com/2018/11/15/java-lock.html 加锁过程:https://www.cnblogs.com/hkdpp/p/11917383. ...
- Apache Mahout中推荐算法Slope one源码分析
2019独角兽企业重金招聘Python工程师标准>>> 关于推荐引擎 如今的互联网中,无论是电子商务还是社交网络,对数据挖掘的需求都越来越大了,而推荐引擎正是数据挖掘完美体现:通过分 ...
- MapReduce中map并行度优化及源码分析
mapTask并行度的决定机制 一个job的map阶段并行度由客户端在提交job时决定,而客户端对map阶段并行度的规划的基本逻辑为:将待处理数据执行逻辑切片(即按照一个特定切片大小,将待处理数据划分 ...
- java中 resource_Java中如何获取resource的源码分析
java 中获取 resource一般通过getResource(),不管你是通过class还是classloader来调用, 首先我们来讲讲getResource(string name)这个nam ...
最新文章
- python【Matlibplot绘图库】绘制用于学术论文投稿的黑白图片
- Context结构图
- Linux下执行.sh文件
- Visual Studio Code——待补充
- sqoop建表_Sqoop基础教程
- 内存性能测试---stream benchmark 的使用
- 适用于openvino 2020.2的yolov5的docker制作
- 前端----表格的具体使用(jquery)
- 转:VC++获取屏幕大小第一篇 像素大小GetSystemMetrics
- 通过SAXReader解析XML
- FMS直播和点播测试
- 淘宝粉丝能买吗?怎么加最快
- CH340G的调试过程
- 正确设置 pip,避免There was a problem confirming the ssl certificate问题
- Frenet坐标系与Cartesian坐标系互转(一):公式推导
- 股票交易sdk接口是什么意思?
- 哪些软件是用C++写的
- Java面向对象高级部分——通过Class类实例化对象(五十二)
- 用python做成绩分析做表格_excel成绩表-懂Excel就能轻松入门Python数据分析包pandas(三):制作成绩条...
- 清明节快到了,车辆尾号限行规则提前看!一文带你了解全国各地的尾号限行规则