PostgreSQL数据库WAL——备机回放checkpoint WAL

从PostgreSQL数据库WAL——资源管理器RMGR文章中，我们知道XLog日志被划分为多个类型的资源管理器，每个资源管理器只需要负责与自己相关的日志处理（抽象出操作函数，不同的日志实现不同的操作函数）。checkpoint WAL是包含在RM_XLOG_ID类型（和XLog相关的事务日志，包括Checkpoint、日志切换等）的资源管理器中。RM_XLOG_ID类型的资源管理器定义在src/include/access/rmgrlist.h中，它们的作用和操作函数如下所示：PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, NULL)。rm_startup、rm_redo、rm_mask和rm_cleanup用于startup进程执行StartupXLOG函数进行XLOG日志回放中的初始化资源管理器RMGR、回放xlog、对页面做一致性检查和清理工作，其执行流程如下所示：

void StartupXLOG(void){.../* Initialize resource managers */ // 初始化资源管理器RMGRfor (rmid = 0; rmid <= RM_MAX_ID; rmid++) {if (RmgrTable[rmid].rm_startup != NULL)RmgrTable[rmid].rm_startup();}.../* main redo apply loop */ // 主回放逻辑do {   .../* Now apply the WAL record itself */  // 调用rm_redo回放xlogRmgrTable[record->xl_rmid].rm_redo(xlogreader);            /* After redo, check whether the backup pages associated with the WAL record are consistent with the existing pages. This check is done only if consistency check is enabled for this record. */ // redo 后，检查与 WAL 记录关联的备份页是否与现有页一致。 仅当为此记录启用一致性检查时才进行此检查。if ((record->xl_info & XLR_CHECK_CONSISTENCY) != 0)checkXLogConsistency(xlogreader); ...                 record = ReadRecord(xlogreader, LOG, false); /* Else, try to fetch the next WAL record */} while (record != NULL);/* Allow resource managers to do any required cleanup. */ // 进行清理工作for (rmid = 0; rmid <= RM_MAX_ID; rmid++) {if (RmgrTable[rmid].rm_cleanup != NULL)RmgrTable[rmid].rm_cleanup();}
}

PostgreSQL备机在回放主机的WAL日志过程中，由于回放较慢会导致pg_control文件中checkpoint timestamp远远小于备机自身时间（和checkpoint中的时间戳一致），PostgreSQL备机pg_control文件的checkpoint记录的位点是从主机传过来WAL里面的checkpoint记录位置，由此可见备机pg_control文件中checkpoint相关字段是来自于PostgreSQL主机的checkpoint wal。

/* XLOG resource manager's routines* Definitions of info values are in include/catalog/pg_control.h, though not all record types are related to control file updates. */
void xlog_redo(XLogReaderState *record) {uint8      info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;XLogRecPtr  lsn = record->EndRecPtr;...else if (info == XLOG_CHECKPOINT_ONLINE) {CheckPoint   checkPoint; // 定义在src/include/catalog/pg_control.h中memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint)); // 获取checkpoint wal中CheckPoint记录LWLockAcquire(XidGenLock, LW_EXCLUSIVE); /* In an ONLINE checkpoint, treat the XID counter as a minimum */if (FullTransactionIdPrecedes(ShmemVariableCache->nextXid, checkPoint.nextXid))ShmemVariableCache->nextXid = checkPoint.nextXid; // 如果checkpoint中的nextXid大于备机ShmemVariableCache缓存的nextXid，则更新LWLockRelease(XidGenLock);/* We ignore the nextOid counter in an ONLINE checkpoint, preferring to track OID assignment through XLOG_NEXTOID records.  The nextOid counter is from the start of the checkpoint and might well be stale compared to later XLOG_NEXTOID records.  We could try to take the maximum of the nextOid counter and our latest value, but since there's no particular guarantee about the speed with which the OID counter wraps around, that's a risky thing to do.  In any case, users of the nextOid counter are required to avoid assignment of duplicates, so that a somewhat out-of-date value should be safe. 我们忽略在线检查点中的nextOid计数器，更喜欢通过XLOG_Nextoid记录跟踪OID分配。nextOid计数器是从检查点开始的，与后来的XLOG_Nextoidd记录相比可能很旧。我们可以尝试取nextOid计数器的最大值和我们的最新值，但由于OID计数器的运行速度没有特别的保证，所以这是一件冒险的事情。在任何情况下，nextOid计数器的用户都需要避免分配重复项，因此有些过时的值应该是安全的 */MultiXactAdvanceNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset); /* Handle multixact */// 如果checkPoint.nextMulti大于MultiXactState->nextMXact，则更新MultiXactState->nextMXact为checkPoint.nextMulti// 如果checkPoint.nextMultiOffset大于MultiXactState->nextOffset，则更新MultiXactState->nextOffset为checkPoint.nextMultiOffset/* NB: This may perform multixact truncation when replaying WAL generated by an older primary. */MultiXactAdvanceOldest(checkPoint.oldestMulti, checkPoint.oldestMultiDB);// 如果checkPoint.oldestMulti大于MultiXactState->oldestMultiXactId，则更新MultiXactState->oldestMultiXactId为checkPoint.oldestMulti和MultiXactState->oldestMultiXactDB为checkPoint.oldestMultiDBif (TransactionIdPrecedes(ShmemVariableCache->oldestXid, checkPoint.oldestXid))SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);// 如果checkPoint.oldestXid大于ShmemVariableCache->oldestXid，则更新ShmemVariableCache->oldestXid为 checkPoint.oldestXid和ShmemVariableCache->oldestXidDB为checkPoint.oldestXidDB/* ControlFile->checkPointCopy always tracks the latest ckpt XID */LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); // 更新ControlFile->checkPointCopy.nextXidControlFile->checkPointCopy.nextXid = checkPoint.nextXid;LWLockRelease(ControlFileLock);/* Update shared-memory copy of checkpoint XID/epoch */SpinLockAcquire(&XLogCtl->info_lck);XLogCtl->ckptFullXid = checkPoint.nextXid;SpinLockRelease(&XLogCtl->info_lck);if (checkPoint.ThisTimeLineID != ThisTimeLineID) /* TLI should not change in an on-line checkpoint */ereport(PANIC, (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record", checkPoint.ThisTimeLineID, ThisTimeLineID)));RecoveryRestartPoint(&checkPoint);// 保存检查点以进行恢复。如果合适，请重新启动。每次从XLOG读取检查点记录时，都会调用此函数。它必须确定检查点是否表示安全重启点。如果是这样，检查点记录将隐藏在共享内存中，以便CreateRestartPoint可以查询它。（请注意，后一个函数由检查点执行，而此函数将由启动进程执行） 主要是设置XLogCtl->lastCheckPointRecPtr为ReadRecPtr;XLogCtl->lastCheckPointEndPtr为EndRecPtr;XLogCtl->lastCheckPoint为*checkPoint;}

ReadRecPtr赋值流程在ReadRecord函数中

 for (;;){char      *errormsg;record = XLogReadRecord(xlogreader, RecPtr, &errormsg);ReadRecPtr = xlogreader->ReadRecPtr;EndRecPtr = xlogreader->EndRecPtr;...}

PostgreSQL为了缩短恢复时间，备机上也支持checkpoint，即CreateRestartPoint。PostgreSQL备机checkpoint是不能产生checkpoint WAL的，因为如果写这样类型的checkpoint的话，就会将接收的WAL打乱，那么日志将混乱，回放会出问题。在src/backend/postmaster/checkpointer.c中的CheckpointerMain函数，我们关注do_restartpoint布尔值变量，如果RecoveryInProgress函数返回true，CheckpointerMain函数就执行CreateRestartPoint函数，CreateRestartPoint函数用于建立重新启动点，类似于CreateCheckPoint，但在WAL恢复期间用于建立一个点，从该点开始恢复可以向前滚动，而无需重放整个恢复日志。如果建立了新的重新启动点，则返回true。只有在自上次重新启动点以来重放了安全检查点记录时，才能建立重新启动点。

bool CreateRestartPoint(int flags) {XLogRecPtr   lastCheckPointRecPtr,lastCheckPointEndPtr;CheckPoint    lastCheckPoint;XLogRecPtr   PriorRedoPtr,receivePtr,replayPtr,endptr;TimeLineID replayTLI;  XLogSegNo   _logSegNo;TimestampTz xtime;SpinLockAcquire(&XLogCtl->info_lck); /* Get a local copy of the last safe checkpoint record. */lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr; // checkpoint的位置来自XLogCtl->lastCheckPointRecPtrlastCheckPointEndPtr = XLogCtl->lastCheckPointEndPtr;lastCheckPoint = XLogCtl->lastCheckPoint;SpinLockRelease(&XLogCtl->info_lck);if (!RecoveryInProgress()){ /* Check that we're still in recovery mode. It's ok if we exit recovery mode after this check, the restart point is valid anyway. */ereport(DEBUG2,(errmsg_internal("skipping restartpoint, recovery has already ended")));return false;}/* If the last checkpoint record we've replayed is already our last restartpoint, we can't perform a new restart point. We still update minRecoveryPoint in that case, so that if this is a shutdown restart point, we won't start up earlier than before. That's not strictly necessary, but when hot standby is enabled, it would be rather weird if the database opened up for read-only connections at a point-in-time before the last shutdown. Such time travel is still possible in case of immediate shutdown, though. We don't explicitly advance minRecoveryPoint when we do create a restartpoint. It's assumed that flushing the buffers will do that as a side-effect. */ // 如果我们回放的最后一条检查点记录已经是我们的最后一个重新启动点，我们不能执行新的重新启动点。在这种情况下，我们仍然会更新minRecoveryPoint，这样，如果这是一个关机重启点，我们就不会比以前更早启动。这并不是绝对必要的，但当启用热备用时，如果数据库在上次关闭之前的某个时间点打开以进行只读连接，则会非常奇怪。然而，在立即关闭的情况下，这种时间旅行仍然是可能的。在创建restartpoint时，我们不会显式提升minRecoveryPoint。假设刷新缓冲区会产生副作用。if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) || lastCheckPoint.redo <= ControlFile->checkPointCopy.redo) {ereport(DEBUG2,(errmsg_internal("skipping restartpoint, already performed at %X/%X", LSN_FORMAT_ARGS(lastCheckPoint.redo))));UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);if (flags & CHECKPOINT_IS_SHUTDOWN) {LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;ControlFile->time = (pg_time_t) time(NULL);UpdateControlFile();LWLockRelease(ControlFileLock);}return false;}/* Update the shared RedoRecPtr so that the startup process can calculate the number of segments replayed since last restartpoint, and request a restartpoint if it exceeds CheckPointSegments. Like in CreateCheckPoint(), hold off insertions to update it, although during recovery this is just pro forma, because no WAL insertions are happening. */ // 更新共享RedRecptr，以便启动进程可以计算自上次重新启动点以来重放的段数，并在超过检查点段时请求重新启动点。与CreateCheckPoint（）一样，推迟插入以更新它，尽管在恢复过程中这只是形式上的，因为没有WAL插入发生。WALInsertLockAcquireExclusive();RedoRecPtr = XLogCtl->Insert.RedoRecPtr = lastCheckPoint.redo;WALInsertLockRelease();  SpinLockAcquire(&XLogCtl->info_lck);XLogCtl->RedoRecPtr = lastCheckPoint.redo; /* Also update the info_lck-protected copy */SpinLockRelease(&XLogCtl->info_lck);/* Prepare to accumulate statistics. Note: because it is possible for log_checkpoints to change while a checkpoint proceeds, we always accumulate stats, even if log_checkpoints is currently off. */MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));CheckpointStats.ckpt_start_t = GetCurrentTimestamp();if (log_checkpoints) LogCheckpointStart(flags, true); update_checkpoint_display(flags, true, false); /* Update the process title */CheckPointGuts(lastCheckPoint.redo, flags);/* Remember the prior checkpoint's redo ptr for UpdateCheckPointDistanceEstimate() */PriorRedoPtr = ControlFile->checkPointCopy.redo;/** Update pg_control, using current time.  Check that it still shows* DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;* this is a quick hack to make sure nothing really bad happens if somehow* we get here after the end-of-recovery checkpoint.*/LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && ControlFile->checkPointCopy.redo < lastCheckPoint.redo) {ControlFile->checkPoint = lastCheckPointRecPtr;ControlFile->checkPointCopy = lastCheckPoint;ControlFile->time = (pg_time_t) time(NULL);/** Ensure minRecoveryPoint is past the checkpoint record.  Normally,* this will have happened already while writing out dirty buffers,* but not necessarily - e.g. because no buffers were dirtied.  We do* this because a non-exclusive base backup uses minRecoveryPoint to* determine which WAL files must be included in the backup, and the* file (or files) containing the checkpoint record must be included,* at a minimum. Note that for an ordinary restart of recovery there's* no value in having the minimum recovery point any earlier than this* anyway, because redo will begin just after the checkpoint record.*/if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr){ControlFile->minRecoveryPoint = lastCheckPointEndPtr;ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;/* update local copy */minRecoveryPoint = ControlFile->minRecoveryPoint;minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;}if (flags & CHECKPOINT_IS_SHUTDOWN) ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;UpdateControlFile();}LWLockRelease(ControlFileLock);/** Update the average distance between checkpoints/restartpoints if the* prior checkpoint exists.*/if (PriorRedoPtr != InvalidXLogRecPtr)UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);/** Delete old log files, those no longer needed for last restartpoint to* prevent the disk holding the xlog from growing full.*/XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);/** Retreat _logSegNo using the current end of xlog replayed or received,* whichever is later.*/receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);replayPtr = GetXLogReplayRecPtr(&replayTLI);endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;KeepLogSeg(endptr, &_logSegNo);if (InvalidateObsoleteReplicationSlots(_logSegNo)){/** Some slots have been invalidated; recalculate the old-segment* horizon, starting again from RedoRecPtr.*/XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);KeepLogSeg(endptr, &_logSegNo);}_logSegNo--;/** Try to recycle segments on a useful timeline. If we've been promoted* since the beginning of this restartpoint, use the new timeline chosen* at end of recovery (RecoveryInProgress() sets ThisTimeLineID in that* case). If we're still in recovery, use the timeline we're currently* replaying.** There is no guarantee that the WAL segments will be useful on the* current timeline; if recovery proceeds to a new timeline right after* this, the pre-allocated WAL segments on this timeline will not be used,* and will go wasted until recycled on the next restartpoint. We'll live* with that.*/if (RecoveryInProgress()) ThisTimeLineID = replayTLI;RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);  PreallocXlogFiles(endptr); /* Make more log segments if needed.  (Do this after recycling old log segments, since that may supply some of the needed files.) *//* ThisTimeLineID is normally not set when we're still in recovery. However, recycling/preallocating segments above needed ThisTimeLineID to determine which timeline to install the segments on. Reset it now, to restore the normal state of affairs for debugging purposes. */if (RecoveryInProgress()) ThisTimeLineID = 0;/* Truncate pg_subtrans if possible.  We can throw away all data before the oldest XMIN of any running transaction.  No future transaction will attempt to reference any pg_subtrans entry older than that (see Asserts in subtrans.c).  When hot standby is disabled, though, we mustn't do this because StartupSUBTRANS hasn't been called yet. */if (EnableHotStandby) TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());LogCheckpointEnd(true); /* Real work is done; log and update stats. */   update_checkpoint_display(flags, true, true); /* Reset the process title */xtime = GetLatestXTime();ereport((log_checkpoints ? LOG : DEBUG2),(errmsg("recovery restart point at %X/%X",LSN_FORMAT_ARGS(lastCheckPoint.redo)),xtime ? errdetail("Last completed transaction was at log time %s.",timestamptz_to_str(xtime)) : 0));  if (archiveCleanupCommand && strcmp(archiveCleanupCommand, "") != 0) /* Finally, execute archive_cleanup_command, if any. */ ExecuteRecoveryCommand(archiveCleanupCommand,"archive_cleanup_command",false);return true;
}

PostgreSQL数据库WAL——备机回放checkpoint WAL相关推荐

PostgreSQL备机checkpoint
数据库异常关闭时,数据库关闭时来不及或者没机会做checkpoint,则需要从上一个一致性检查的开始恢复. PostgreSQL备机checkpoint是不能产生checkpoint WAL的,因为如 ...
PostgreSQL数据库系列之五：预写式日志WAL
[WAL介绍] Write-Ahead Logging是一种保证数据完整性的标准方法.简单地说,WAL的概念就是对数据文件的改变(包括表和索引)必须先写入日志,即日志记录刷新到永久储存之后,才能被写. ...
PostgreSQL的学习心得和知识总结（一百零八）|语法级自上而下完美实现PostgreSQL数据库的常规表Insert操作跳过WAL记录的实现方案
目录结构注:提前言明本文借鉴了以下博主.书籍或网站的内容,其列表如下: 1.参考书籍:<PostgreSQL数据库内核分析> 2.参考书籍:<数据库事务处理的艺术:事务管理与并发 ...
PPT和回放来了 | 中国PostgreSQL数据库生态大会圆满落幕
2月17-19日,中国PostgreSQL数据库生态大会在北京中科院软件所和CSDN平台以线下线上结合方式盛大召开!本届大会由中国开源软件推进联盟PostgreSQL分会主办.作为自2017年后我们举 ...
android wal模式,WCDB 的 WAL 模式和异步 Checkpoint
WAL 模式是 SQLite 3.7.0 版本推出的改进写性能和并发性的功能,至今已经7年多了,但由于WAL是默认关闭的,可能有相当多的应用并没有用上,仍然使用性能较差的传统模式. 微信 APP 开启 ...
介绍数据库中的wal技术_SQLite中的WAL机制详细介绍
一.什么是WAL? WAL的全称是Write Ahead Logging,它是很多数据库中用于实现原子事务的一种机制,SQLite在3.7.0版本引入了该特性. 二.WAL如何工作? 在引入WAL机制 ...
oracle控制文件全备失败,Oracle数据库案例整理-恢复数据库失败-主备机控制文件所在目录不同...
1.1 现象描述使用主机节点的控制文件在备机节点上进行恢复时失败. · 主节点控制文件目录为:"/opt/HUAWEI/cgp/workshop/omu/dat ...
PostgreSQL数据库头胎——后台一等公民进程StartupDataBase StartupXLOG函数进入Recovery模式
检查我们是否需要强制从 WAL 中恢复. 如果数据库似乎是完全关闭并且我们没有恢复信号文件,则假设不需要恢复(InRecovery = false). /* Check whether we need ...
记一次pg_rman备份postgresql数据库报段错误的处理过程
作者:瀚高PG实验室(Highgo PG Lab)- 徐云鹤数据库备机前期稳定执行备份操作,成功无报错.近期发现pg_rman无法完成备份,提示段错误并产生core文件. 通过执行pg_rman d ...

PostgreSQL数据库WAL——备机回放checkpoint WAL

PostgreSQL数据库WAL——备机回放checkpoint WAL相关推荐

最新文章

热门文章