postgresql 11 的并行(parallel)简介

os:centos 7.4
db:postgresql 11.1

postgresql 11 对parallel又有了进一步的加强。

并行创建btree索引
使用共享hash table时可以并行执行hash join
单个选择如果不能并行化，则允许UNION并行运行每个SELECT
并行扫描分区表
允许 limit 传递给并行进程
允许并行进程使用索引扫描式减少返回结果
允许并行化单个计算查询、where子句聚合查询和目标列表中的函数
新加参数 parallel_leader_participation 控制执行计划中的领导者，默认启用。
并行执行CREATE TABLE … AS, CREATE MATERIALIZED VIEW, certain queries using UNION
并行hash join、并行顺序扫描在多并行进程下得到加强
在EXPLAIN中添加并行进程排序活动的报告

# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core) # su - postgres
Last login: Tue Nov  6 16:06:23 CST 2018 on pts/2$ psql -c "select version();"version
---------------------------------------------------------------------------------------------------------PostgreSQL 11.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

实例级参数

有几种设置会导致查询规划器在任何情况下都不生成并行查询计划。为了让并行查询计划能够被生成，必须配置好下列设置。

max_parallel_workers_per_gather 必须被设置为大于零的值。这是一种特殊情况，更加普遍的原则是所用的工作者数量不能超过max_parallel_workers_per_gather所配置的数量。
dynamic_shared_memory_type 必须被设置为除none之外的值。并行查询要求动态共享内存以便在合作的进程之间传递数据。

select *from pg_settings pswhere 1=1and ps.name in ('force_parallel_mode','max_worker_processes','max_parallel_workers','max_parallel_maintenance_workers','max_parallel_workers_per_gather',--'min_parallel_relation_size',-- add 9.6,remove from 10'min_parallel_index_scan_size','min_parallel_table_scan_size','parallel_tuple_cost','parallel_setup_cost','parallel_leader_participation'
)
;name               | setting | unit |                category                |                                             short_desc                                             |                                                                extra_desc                                                                 |  context   | vartype | source  | min_val |   max_val    |     enumvals     | boot_val | reset_val | sourcefile | sourceline | pending_restart
----------------------------------+---------+------+----------------------------------------+----------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+---------+---------+--------------+------------------+----------+-----------+------------+------------+-----------------force_parallel_mode              | off     |      | Query Tuning / Other Planner Options   | Forces use of parallel query facilities.                                                           | If possible, run query using a parallel worker and with parallel restrictions.                                                            | user       | enum    | default |         |              | {off,on,regress} | off      | off       |            |            | fmax_parallel_maintenance_workers | 4       |      | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel processes per maintenance operation.                           |                                                                                                                                           | user       | integer | session | 0       | 1024         |                  | 2        | 2         |            |            | fmax_parallel_workers             | 8       |      | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel workers that can be active at one time.                        |                                                                                                                                           | user       | integer | default | 0       | 1024         |                  | 8        | 8         |            |            | fmax_parallel_workers_per_gather  | 2       |      | Resource Usage / Asynchronous Behavior | Sets the maximum number of parallel processes per executor node.                                   |                                                                                                                                           | user       | integer | default | 0       | 1024         |                  | 2        | 2         |            |            | fmax_worker_processes             | 8       |      | Resource Usage / Asynchronous Behavior | Maximum number of concurrent worker processes.                                                     |                                                                                                                                           | postmaster | integer | default | 0       | 262143       |                  | 8        | 8         |            |            | fmin_parallel_index_scan_size     | 64      | 8kB  | Query Tuning / Planner Cost Constants  | Sets the minimum amount of index data for a parallel scan.                                         | If the planner estimates that it will read a number of index pages too small to reach this limit, a parallel scan will not be considered. | user       | integer | default | 0       | 715827882    |                  | 64       | 64        |            |            | fmin_parallel_table_scan_size     | 1024    | 8kB  | Query Tuning / Planner Cost Constants  | Sets the minimum amount of table data for a parallel scan.                                         | If the planner estimates that it will read a number of table pages too small to reach this limit, a parallel scan will not be considered. | user       | integer | default | 0       | 715827882    |                  | 1024     | 1024      |            |            | fparallel_leader_participation    | on      |      | Resource Usage / Asynchronous Behavior | Controls whether Gather and Gather Merge also run subplans.                                        | Should gather nodes also run subplans, or just gather tuples?                                                                             | user       | bool    | default |         |              |                  | on       | on        |            |            | fparallel_setup_cost              | 1000    |      | Query Tuning / Planner Cost Constants  | Sets the planner's estimate of the cost of starting up worker processes for parallel query.        |                                                                                                                                           | user       | real    | default | 0       | 1.79769e+308 |                  | 1000     | 1000      |            |            | fparallel_tuple_cost              | 0.1     |      | Query Tuning / Planner Cost Constants  | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend. |                                                                                                                                           | user       | real    | default | 0       | 1.79769e+308 |                  | 0.1      | 0.1       |            |            | f
(10 rows)

新加了几个参数
max_parallel_maintenance_workers
设置维护命令(例如 CREATE INDEX) 允许的最大并行进程数，默认值为2。

parallel_leader_participation
这个参数没太理解，保持默认设置吧。看英文文档大概理解为控制并行执行的效率。

Allows the leader process to execute the query plan under Gather and Gather Merge nodes instead of waiting for worker processes.
The default is on.
Setting this value to off reduces the likelihood that workers will become blocked because the leader is not reading tuples fast enough,
but requires the leader process to wait for worker processes to start up before the first tuples can be produced.
The degree to which the leader can help or hinder performance depends on the plan type, number of workers and query duration.

之前对max_worker_processes这个参数理解不深刻，又翻了一遍文档，再次理解了下。

max_worker_processes
数据库允许的最大注册后台进程数，并行进程属于后台进程的一种。
这里描述下postgresql的后台进程又分为两种：
第一种是只能在postmaster内调用RegisterBackgroundWorker(BackgroundWorker *worker)来注册；
第二种是在系统启动后通过调用函数RegisterDynamicBackgroundWorker(BackgroundWorker *worker, BackgroundWorkerHandle **handle)来启动后台工作者，该注册的后台工作者最大数量由max_worker_processes限制。
max_parallel_workers
参数设置数据库允许的最大并行进程数。
postgresql 11 并行进程调整为两类:
第一类是并行查询，其并行度由 max_parallel_workers_per_gather 控制
第二类是维护命令(例如 CREATE INDEX)，其并行度由 max_parallel_maintenance_workers 控制。

max_parallel_workers 值应小于或等于max_worker_processes。
max_parallel_workers_per_gather+max_parallel_maintenance_workers 值应小于或等于 max_parallel_workers。

参数控制体现了层级的思维

验证

postgres=# create table tmp_t0(c0 varchar(100),c1 varchar(100),c2 varchar(100),c3 varchar(100));
postgres=# insert into tmp_t0(c0,c1,c2,c3)
select id::varchar,(id*2)::varchar,md5((id)::varchar),md5(md5((id)::varchar)) from generate_series(1,5000000) as id;
INSERT 0 5000000
postgres=# \d+ tmp_t0Table "public.tmp_t0"Column |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description
--------+------------------------+-----------+----------+---------+----------+--------------+-------------c0     | character varying(100) |           |          |         | extended |              | c1     | character varying(100) |           |          |         | extended |              | c2     | character varying(100) |           |          |         | extended |              | c3     | character varying(100) |           |          |         | extended |              |

并行全表扫描

postgres=# explain select count(1) from tmp_t0;QUERY PLAN
------------------------------------------------------------------------------------------Finalize Aggregate  (cost=75279.14..75279.15 rows=1 width=8)->  Gather  (cost=75278.92..75279.13 rows=2 width=8)Workers Planned: 2->  Partial Aggregate  (cost=74278.92..74278.93 rows=1 width=8)->  Parallel Seq Scan on tmp_t0  (cost=0.00..73613.74 rows=266074 width=0)
(5 rows)

并行创建btree索引

postgres=# create index idx_tmp_t0_x1 on public.tmp_t0 using btree (c0);
CREATE INDEX

通过top命令可以查看到并行进程
postgres: parallel worker for PID 2196
postgres: postgres postgres [local] CREATE INDEX

并行create table as

postgres=# show max_parallel_maintenance_workers ;max_parallel_maintenance_workers
----------------------------------2
(1 row)postgres=# create table tmp_t1 as select * from public.tmp_t0;
SELECT 5000000

并行join

postgres=# explain select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;QUERY PLAN
----------------------------------------------------------------------------------------------------------Finalize Aggregate  (cost=267463.69..267463.70 rows=1 width=8)->  Gather  (cost=267463.47..267463.68 rows=2 width=8)Workers Planned: 2->  Partial Aggregate  (cost=266463.47..266463.48 rows=1 width=8)->  Parallel Hash Join  (cost=125958.51..261255.86 rows=2083045 width=0)Hash Cond: ((t0.c0)::text = (t1.c0)::text)->  Parallel Seq Scan on tmp_t0 t0  (cost=0.00..91786.33 rows=2083333 width=7)->  Parallel Hash  (cost=91783.45..91783.45 rows=2083045 width=7)->  Parallel Seq Scan on tmp_t1 t1  (cost=0.00..91783.45 rows=2083045 width=7)
(9 rows)

参考：
https://www.postgresql.org/docs/11/release-11-1.html

postgresql 11 的并行(parallel)简介相关推荐

PostgreSQL 11 100亿 tpcb 性能测试 on ECS
标签 PostgreSQL , tpcb , pgbench , 100亿背景 PostgreSQL 11 发布在即,以下是tpcc与tpch测试的结果: <PostgreSQL 11 tpc ...
PostgreSQL 11 1000亿 tpcb、1000W tpcc 性能测试 - on 阿里云ECS + ESSD (含quorum based 0丢失多副本配置与性能测试)...
标签 PostgreSQL , pgbench , tpcb , tpcc , tpch 背景 https://help.aliyun.com/knowledge_detail/64950.html ...
POSTGRESQL 13 可以并行VACUUM INDEX 你知道对吧
POSTGRESQL 我们在大量的使用,但实话实话知识的更新永远是滞后的,VACUUM 是可以并行进行INDEX 的操作,这个事情是在 POSTGRESQL 13 的这个版本上被实现的. 实际上POS ...
CentOS 7 源码编译安装 PostgreSQL 11.2
环境系统版本 Centos7.6 工具:xshell6 PostgreSql: postgresql-11.2.tar.gz 安装部署安装前准备官网下载PostgreSQL 11.2源码地址:h ...
PostgreSQL · 实现分析 · PostgreSQL 10.0 并行查询和外部表的结合
前言大家都知道,PostgreSQL 近几大版本中加入了很多 OLAP 相关特性.9.6 的并行扫描应该算最大的相关特性.在今年发布的 10.0 中,并行扫描也在不断加强,新增了并行的索引扫描. 我 ...
UA MATH636 信息论7 并行高斯信道简介
UA MATH636 信息论7 并行高斯信道简介考虑并行的高斯信道:将一个长信号分为kkk段,走一个并行的高斯信道,被接受后再把信号拼起来.每一个高斯信道的输入为Xi,i=1,⋯,kX_i,i=1, ...
PostgreSQL 11 1Kw TPCC , 1亿 TPCB 7*24 强压耐久测试
标签 PostgreSQL , tpcc , tpcb 背景 TPCC, TPCB是工业标准的OLTP类型业务的数据库测试,包含大量的读.写.更新.删除操作. 7*24小时强压耐久测试,主要看数据库在 ...
在Ubuntu 18.04上安装PostgreSQL 11和PgAdmin4
In this guide, you will learn how to install PostgreSQL 11 and PgAdmin4 on Ubuntu 18.04 LTS. 在本指南中,您 ...
PostgreSQL 11 与 pgAdmin4 在 Windows 平台上的安装和使用
文章目录 1.下载 2.解压后,添加环境配置 3.创建 Database Cluster 4.启动 vs 停止,注册 Windows 服务 5.角色 6.pgAdmin4 7.psql 8.配置远程访 ...
PostgreSQL 11 preview - Surjective indexes - 索引HOT增强（表达式）update评估
标签 PostgreSQL , 表达式索引 , 表达式结果变化评估 , projection function , 多值索引 , GIN , 多值元素变化背景 PostgreSQL 11马上要提交的 ...