TableAM Parallel table scan

TableAM与Parallel table scan相关的函数如下所示,真正的scan动作的支持还需要PostgreSQL数据库TableAM——HeapAM Scans中Scan相关函数的执行。这里列出的函数仅仅与并行化SCAN初始化工作相关。

 /* ------------------------------------------------------------------------* Parallel table scan related functions.* ------------------------------------------------------------------------*//* Estimate the size of shared memory needed for a parallel scan of this relation. The snapshot does not need to be accounted for. */Size        (*parallelscan_estimate) (Relation rel);/* Initialize ParallelTableScanDesc for a parallel scan of this relation. `pscan` will be sized according to parallelscan_estimate() for the same relation. */Size        (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan);/* Reinitialize `pscan` for a new scan. `rel` will be the same relation as when `pscan` was initialized by parallelscan_initialize. */void     (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan);

table_parallelscan_estimate

table_parallelscan_estimate函数定义在src/backend/access/table/tableam.c中,主要作用就是在使用MVCC快照情况下评估快照大小,并调用parallelscan_estimate函数评估tableAM为支持并行扫描需要的空间。

Size table_parallelscan_estimate(Relation rel, Snapshot snapshot){Size       sz = 0;if (IsMVCCSnapshot(snapshot)) sz = add_size(sz, EstimateSnapshotSpace(snapshot));else Assert(snapshot == SnapshotAny);sz = add_size(sz, rel->rd_tableam->parallelscan_estimate(rel));return sz;
}

在src/backend/executor/nodeSeqscan.c文件中的ExecSeqScanEstimate函数会调用table_parallelscan_estimate接口来计算SeqScanState所需要的空间。

table_parallelscan_initialize

table_parallelscan_initialize函数定义在src/backend/access/table/tableam.c中,首先调用TableAM自定义的parallelscan_initialize函数(初始化ParallelTableScanDesc的’子类’,并返回该’子类’结构体大小),如果是MVCC快照,将快照序列化到(char *) pscan + pscan->phs_snapshot_off起始的内存空间中。

void table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan, Snapshot snapshot){Size        snapshot_off = rel->rd_tableam->parallelscan_initialize(rel, pscan);pscan->phs_snapshot_off = snapshot_off;if (IsMVCCSnapshot(snapshot)){SerializeSnapshot(snapshot, (char *) pscan + pscan->phs_snapshot_off);pscan->phs_snapshot_any = false;}else{pscan->phs_snapshot_any = true;}
}

在src/backend/executor/nodeSeqscan.c文件中的ExecSeqScanInitializeDSM函数会调用table_parallelscan_initialize接口来初始化ParallelTableScanDesc或其’子类’。

table_parallelscan_reinitialize

table_parallelscan_reinitialize定义在src/include/access/tableam.h中,直接调用TableAM自定义的parallelscan_reinitialize函数。

/* Restart a parallel scan.  Call this in the leader process.  Caller is responsible for making sure that all workers have finished the scan beforehand. */
static inline void table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan){rel->rd_tableam->parallelscan_reinitialize(rel, pscan);
}

在src/backend/executor/nodeSeqscan.c文件中的ExecSeqScanReInitializeDSM函数会调用table_parallelscan_reinitialize接口。

Parallel table scan

table_beginscan_parallel函数由执行器调用,其流程是如果由快照被序列化到共享内存中,则将快照恢复并Register,增加SO_TEMP_SNAPSHOT(scan完成后endscan Unregister该快照);否则使用调用者传入的SNAPSHOT_ANY(Any tuple is visible)。最后调用TableAM自定义的scan_begin接口。

TableScanDesc table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan){Snapshot  snapshot;uint32     flags = SO_TYPE_SEQSCAN | SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;if (!parallel_scan->phs_snapshot_any){/* Snapshot was serialized -- restore it */snapshot = RestoreSnapshot((char *) parallel_scan + parallel_scan->phs_snapshot_off);RegisterSnapshot(snapshot);flags |= SO_TEMP_SNAPSHOT;}else{/* SnapshotAny passed by caller (not serialized) */snapshot = SnapshotAny;}return relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL,parallel_scan, flags);
}

在src/backend/executor/nodeSeqscan.c文件中的ExecSeqScanInitializeDSM和ExecSeqScanInitializeWorker函数都调用了TableAM提供的table_beginscan_parallel函数。

ParallelTableScanDesc

ParallelTableScanDesc是贯穿上述API的结构体,下面block oriented AMs parallel scans提供的ParallelBlockTableScanDesc可以看成是ParallelTableScanDesc的子类。每个参与并行扫描的后端进程都有私有的TableScanDesc,它们都指向ParallelTableScanDescData,所有后端进程共享其成员信息。src/backend/executor/nodeSeqscan.c文件中的函数大多依赖于TableAM的Parallel table scan和Plain table scan API。

/* Shared state for parallel table scan.* Each backend participating in a parallel table scan has its own* TableScanDesc in backend-private memory, and those objects all contain a* pointer to this structure.  The information here must be sufficient to* properly initialize each new TableScanDesc as workers join the scan, and it* must act as a information what to scan for those workers. */
typedef struct ParallelTableScanDescData{Oid            phs_relid;      /* OID of relation to scan */bool       phs_syncscan;   /* report location to syncscan logic? */bool        phs_snapshot_any;   /* SnapshotAny, not phs_snapshot_data? */Size       phs_snapshot_off;   /* data for snapshot */
} ParallelTableScanDescData;
typedef struct ParallelTableScanDescData *ParallelTableScanDesc;
/* Shared state for parallel table scans, for block oriented storage. */
typedef struct ParallelBlockTableScanDescData{ParallelTableScanDescData base;BlockNumber phs_nblocks;   /* # blocks in relation at start of scan */slock_t      phs_mutex;      /* mutual exclusion for setting startblock */BlockNumber phs_startblock; /* starting block number */pg_atomic_uint64 phs_nallocated;    /* number of blocks allocated to workers so far. */
}           ParallelBlockTableScanDescData;

block oriented AMs parallel scans Helper functions

HeapAM对parallelscan_estimate、parallelscan_initialize和parallelscan_reinitialize三个函数指针的实现就是直接使用table_block_parallelscan_estimate、table_block_parallelscan_initialize和table_block_parallelscan_reinitialize函数。

/* ----------------------------------------------------------------------------* Helper functions to implement parallel scans for block oriented AMs.* ----------------------------------------------------------------------------*/
Size table_block_parallelscan_estimate(Relation rel) {return sizeof(ParallelBlockTableScanDescData);
}
Size table_block_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan) {ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan;bpscan->base.phs_relid = RelationGetRelid(rel); // 初始化需要扫描的表的oidbpscan->phs_nblocks = RelationGetNumberOfBlocks(rel); // blocks in relation at start of scan/* compare phs_syncscan initialization to similar logic in initscan */bpscan->base.phs_syncscan = synchronize_seqscans && !RelationUsesLocalBuffers(rel) && bpscan->phs_nblocks > NBuffers / 4;SpinLockInit(&bpscan->phs_mutex);bpscan->phs_startblock = InvalidBlockNumber; // starting block numberpg_atomic_init_u64(&bpscan->phs_nallocated, 0); // number of blocks allocated to workers so farreturn sizeof(ParallelBlockTableScanDescData);
}
void table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan) {ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan;pg_atomic_write_u64(&bpscan->phs_nallocated, 0); // number of blocks allocated to workers so far
}

table_block_parallelscan_startblock_init函数由src/backend/access/heap/heapam.c文件中的heapgettup和heapgettup_pagemode函数调用,也就是说在ExecSeqScan流程中调用。该函数的作用就是确定phs_startblock的值,也就是并行SeqScan的起始块。如果没有使用synchronized scan machinery,则设置scan’s startblock为0;如果sync_startpage为InvalidBlockNumber,则调用ss_get_location利用synchronized scan machinery获取sync_startpage;如果sync_startpage不为InvalidBlockNumber,则设置scan’s startblock为sync_startpage。

/* find and set the scan's startblock: Determine where the parallel seq scan should start.  This function may be called many times, once by each parallel worker.  We must be careful only to set the startblock once. */
void table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan) {BlockNumber sync_startpage = InvalidBlockNumber;
retry:  SpinLockAcquire(&pbscan->phs_mutex); /* Grab the spinlock. *//* If the scan's startblock has not yet been initialized, we must do so now.  If this is not a synchronized scan, we just start at block 0, but if it is a synchronized scan, we must get the starting position from the synchronized scan machinery.  We can't hold the spinlock while doing that, though, so release the spinlock, get the information we need, and retry.  If nobody else has initialized the scan in the meantime, we'll fill in the value we fetched on the second time through. */if (pbscan->phs_startblock == InvalidBlockNumber){if (!pbscan->base.phs_syncscan)pbscan->phs_startblock = 0;else if (sync_startpage != InvalidBlockNumber)pbscan->phs_startblock = sync_startpage;else{SpinLockRelease(&pbscan->phs_mutex);sync_startpage = ss_get_location(rel, pbscan->phs_nblocks);goto retry;}}SpinLockRelease(&pbscan->phs_mutex);
}

table_block_parallelscan_nextpage函数由src/backend/access/heap/heapam.c文件中的heapgettup和heapgettup_pagemode函数调用,也就是说在ExecSeqScan流程中调用。

  1. phs_nallocated跟踪已分配给workers的页面数。当phs_nallocated>=rs_nblocks时,所有块都已分配。要返回的实际页面是通过将计数器添加到起始块号starting block number模上nblocks来计算的。
  2. 报告扫描位置。通常,我们报告当前页码current page number。但是,当我们到达扫描结束时,我们报告的是起始页,而不是结束页,这样以后扫描的起始位置就不会反转。但是,我们只在扫描结束时报告一次位置:后续调用方将不报告任何内容。
/* get the next page to scan* Get the next page to scan.  Even if there are no pages left to scan, another backend could have grabbed a page to scan and not yet finished looking at it, so it doesn't follow that the scan is done when the first backend gets an InvalidBlockNumber return. */
BlockNumber table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbscan) {BlockNumber page;uint64     nallocated;/* phs_nallocated tracks how many pages have been allocated to workers already.  When phs_nallocated >= rs_nblocks, all blocks have been allocated. Because we use an atomic fetch-and-add to fetch the current value, the phs_nallocated counter will exceed rs_nblocks, because workers will still increment the value, when they try to allocate the next block but all blocks have been allocated already. The counter must be 64 bits wide because of that, to avoid wrapping around when rs_nblocks is close to 2^32.* The actual page to return is calculated by adding the counter to the starting block number, modulo nblocks. */nallocated = pg_atomic_fetch_add_u64(&pbscan->phs_nallocated, 1);if (nallocated >= pbscan->phs_nblocks) page = InvalidBlockNumber;    /* all blocks have been allocated */else page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks;/* Report scan location.  Normally, we report the current page number. When we reach the end of the scan, though, we report the starting page, not the ending page, just so the starting positions for later scans doesn't slew backwards.  We only report the position at the end of the scan once, though: subsequent callers will report nothing. */if (pbscan->base.phs_syncscan){if (page != InvalidBlockNumber) ss_report_location(rel, page);else if (nallocated == pbscan->phs_nblocks) ss_report_location(rel, pbscan->phs_startblock);}return page;
}

synchronized scan machinery
ss_get_location
ss_report_location

PostgreSQL数据库TableAM——HeapAM Parallel table scan相关推荐

  1. RDS SQL Server - 专题分享 - 巧用执行计划缓存之Table Scan

    背景引入 执行计划中的Table Scan或者是Clustered Index Scan会导致非常低下的查询性能,尤其是对于大表或者超大表.执行计划缓存是SQL Server内存管理中非常重要的特性, ...

  2. PostgreSQL数据库扩展包——原理CreateExtension扩展控制文件解析

    createExtension函数 首先看createExtension函数,该函数首先调用check_valid_extension_name函数在任何访问文件系统之前检测extension的名字的 ...

  3. PostgreSQL 数据库备份

    PostgreSQL 数据库备份 pg_dump 一.备份还原 注意:命令在pg_dump目录下进行 1.备份test数据库 pg_dump -h 127.0.0.1 -p 5432 -U usern ...

  4. psql+加载mysql数据库_Go实战--go语言操作PostgreSQL数据库(github.com/lib/pq)

    生命不止,继续 Go go go !!! 之前关于golang操作数据库的博客: 今天跟大家分享golang中使用PostgreSQL数据库. 何为PostgreSQL PostgreSQL is a ...

  5. Postgresql数据库介绍10——使用

    索引(Indexes) Indexes are a common way to enhance database performance. An index allows the database s ...

  6. pq和mysql_Go实战--go语言操作PostgreSQL数据库(github.com/lib/pq)

    生命不止,继续 Go go go !!! 之前关于golang操作数据库的博客: 今天跟大家分享golang中使用PostgreSQL数据库. 何为PostgreSQL PostgreSQL is a ...

  7. 数据库服务器 之 PostgreSQL数据库的日常维护工作

    来自:LinuxSir.Org 摘要:为了保持所安装的 PostgreSQL 服务器平稳运行, 我们必须做一些日常性的维护工作.我们在这里讨论的这些工作都是经常重复的事情, 可以很容易地使用标准的 U ...

  8. [原创]Silverlight与PostgreSQL数据库的互操作(CURD完全解析)

    今天将为大家介绍如何让Silverlight使用PostgreSQL作为后台数据库以及CURD操作. 准备工作 1)建立起测试项目 细节详情请见强大的DataGrid组件[2]_数据交互之ADO.NE ...

  9. postgresql数据库的数据导出

    一.pg_dump的用法: 数据库的导入导出是最常用的功能之一,每种数据库都提供有这方面的工具,例如Oracle的exp/imp,Informix的dbexp/dbimp,MySQL的mysqldum ...

最新文章

  1. 常用jar包之commons-beanutils使用
  2. webpack打包转换es6_webpack(二)解析es6并打包
  3. spring之java配置(springboot推荐的配置方式)
  4. mysql 数据库编译安装_mysql 数据库 编译安装(千峰)
  5. 文案写作软件_11种可改善网站用户体验的文案写作技术
  6. POJ 2976 Dropping Tests
  7. 二十六、PHP框架Laravel学习笔记——模型的一对多关联
  8. mysql 乘法_测试面试题集Python花式打印九九乘法口诀表
  9. 现代控制理论输出y_现代控制理论线性系统入门(三)输入输出变量的稳定性
  10. Struts2 工作流程
  11. Kubernetes - - k8s - v1.12.3 一键部署高可用 Prometheus 并实现邮件告警
  12. ...为他们的产品痴迷,不是有兴趣,不是了解,而是痴迷
  13. Spring源码分析-从@ComponentScan注解配置包扫描路径到IoC容器中的BeanDefinition,经历了什么(一)?
  14. win7c语言命令行编译,易语言命令行编译工具
  15. excel生成二维码
  16. win7计算机管理快捷键,win7系统中的常用快捷键
  17. 阿里巴巴大数据实践:大数据建设方法论OneData
  18. 计算机一级打字要注意什么,用键盘打字时要注意什么 怎样用键盘来练习打字...
  19. linux主流ftp server,Linux下常用的Ftp Server
  20. java queue toarray_Java PriorityBlockingQueue toArray()用法及代码示例

热门文章

  1. 预训练模型 Fine-tuning
  2. 关于我自己 a propos de moi
  3. 电脑计算机怎么没有桌面显示器,电脑显示器没有全屏显示怎么解决
  4. 自动与Internet时间服务器同步
  5. 单极性归零NRZ码、双极性非归零NRZ码、2ASK、2FSK、2PSK、2DPSK及MATLAB仿真
  6. git push提交成功后如何撤销回退
  7. python ndarray 与 base64 互转 接口传递ndarry
  8. Linux服务器操作系统快速删除大量/大文件
  9. 基于android的资源文件管理器
  10. HTTP学习笔记(适合初学)2