PostgreSQL 数据库多列复合索引的字段顺序选择原理
标签
PostgreSQL , 多列索引 , 复合索引 , 驱动列 , 顺序 , 等值查询 , 范围扫描 , 离散值 , 连续值 , 单列索引 , bitmap index scan
背景
当需要创建多列复合索引时,应该使用什么样的顺序呢?
多列复合索引的组织结构与单列字段索引结构类似,只是需要按索引内表达式指定的顺序编排。
《深入浅出PostgreSQL B-Tree索引结构》
例如
create index idx on tbl using btree (udf(c1) desc, c2 , c3 desc nulls last);
那么会按定义的顺序编排。
举个例子
postgres=# create unlogged table tab1 (id int, c1 int, c2 int);
CREATE TABLE
postgres=# insert into tab1 select id, random()*9, 1 from generate_series(1,1000000) t(id);
INSERT 0 1000000
postgres=# insert into tab1 select id, random()*9, 3 from generate_series(1,1000000) t(id);
INSERT 0 1000000
postgres=# insert into tab1 values (1,1,2);
INSERT 0 1
postgres=# insert into tab1 select id, 1, 3 from generate_series(1,1000000) t(id);
INSERT 0 1000000
postgres=# insert into tab1 select id, 1, 1 from generate_series(1,1000000) t(id);
INSERT 0 1000000
c1=1, c2=2的记录只有一条
1、搜索c1=1, c2=2,只需要扫描4个BLOCK
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1=1 and c2=2; QUERY PLAN
----------------------------------------------------------------------------------------------------------------------- Index Scan using idx_tab1 on public.tab1 (cost=0.43..2.38 rows=1 width=12) (actual time=0.017..0.018 rows=1 loops=1) Output: id, c1, c2 Index Cond: ((tab1.c1 = 1) AND (tab1.c2 = 2)) Buffers: shared hit=4 (4个BLOCK,包括 root page, branch page, leaf page, HEAP PAGE) Planning time: 0.214 ms Execution time: 0.042 ms
(6 rows)
2、搜索其他的,需要扫描很多BLOCK。
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1=1 and c2=3; QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_tab1 on public.tab1 (cost=0.43..46108.77 rows=1109400 width=12) (actual time=0.026..237.712 rows=1111519 loops=1) Output: id, c1, c2 Index Cond: ((tab1.c1 = 1) AND (tab1.c2 = 3)) Buffers: shared hit=22593 read=303 (包括heap page) Planning time: 0.089 ms Execution time: 328.249 ms
(6 rows) postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1=1 and c2=1; QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_tab1 on public.tab1 (cost=0.43..46108.77 rows=1109400 width=12) (actual time=0.022..238.399 rows=1110527 loops=1) Output: id, c1, c2 Index Cond: ((tab1.c1 = 1) AND (tab1.c2 = 1)) Buffers: shared hit=22582 read=299 (包括heap page) Planning time: 0.094 ms Execution time: 329.331 ms
(6 rows)
那么如何知道数据库是快速定位到c1=1, c2=2的记录的呢?
可以使用pageinspect来看一看索引内部的结构
postgres=# create extension pageinspect ;
CREATE EXTENSION
查看索引内部结构,看看如何通过复合索引快速定位一条记录
首先要查看索引的第一个PAGE,即metapage,它会告诉你这个索引有几层,ROOT PAGE在哪里
postgres=# SELECT * FROM bt_metap('idx_tab1'); magic | version | root | level | fastroot | fastlevel
--------+---------+------+-------+----------+----------- 340322 | 2 | 290 | 2 | 290 | 2
(1 row)
表示这个索引除去ROOT节点有2层,ROOT节点是290号数据块。
查看根页
postgres=# SELECT * FROM bt_page_items('idx_tab1', 290); itemoffset | ctid | itemlen | nulls | vars | data
------------+-----------+---------+-------+------+------------------------- 1 | (3,1) | 8 | f | f | 2 | (289,1) | 16 | f | f | 00 00 00 00 03 00 00 00 3 | (12341,1) | 16 | f | f | 01 00 00 00 01 00 00 00 4 | (12124,1) | 16 | f | f | 01 00 00 00 01 00 00 00 5 | (11907,1) | 16 | f | f | 01 00 00 00 01 00 00 00 6 | (11690,1) | 16 | f | f | 01 00 00 00 01 00 00 00 7 | (11473,1) | 16 | f | f | 01 00 00 00 01 00 00 00 8 | (11256,1) | 16 | f | f | 01 00 00 00 01 00 00 00 9 | (11039,1) | 16 | f | f | 01 00 00 00 01 00 00 00 10 | (10822,1) | 16 | f | f | 01 00 00 00 01 00 00 00 11 | (10605,1) | 16 | f | f | 01 00 00 00 01 00 00 00 12 | (10388,1) | 16 | f | f | 01 00 00 00 01 00 00 00 13 | (10171,1) | 16 | f | f | 01 00 00 00 01 00 00 00 14 | (9954,1) | 16 | f | f | 01 00 00 00 01 00 00 00 15 | (9737,1) | 16 | f | f | 01 00 00 00 01 00 00 00 16 | (9520,1) | 16 | f | f | 01 00 00 00 01 00 00 00 17 | (9303,1) | 16 | f | f | 01 00 00 00 01 00 00 00 18 | (9086,1) | 16 | f | f | 01 00 00 00 01 00 00 00 19 | (575,1) | 16 | f | f | 01 00 00 00 01 00 00 00 20 | (8866,1) | 16 | f | f | 01 00 00 00 03 00 00 00 21 | (8649,1) | 16 | f | f | 01 00 00 00 03 00 00 00 22 | (8432,1) | 16 | f | f | 01 00 00 00 03 00 00 00 23 | (8215,1) | 16 | f | f | 01 00 00 00 03 00 00 00 24 | (7998,1) | 16 | f | f | 01 00 00 00 03 00 00 00 25 | (7781,1) | 16 | f | f | 01 00 00 00 03 00 00 00 26 | (7564,1) | 16 | f | f | 01 00 00 00 03 00 00 00 27 | (7347,1) | 16 | f | f | 01 00 00 00 03 00 00 00 28 | (7130,1) | 16 | f | f | 01 00 00 00 03 00 00 00 29 | (6913,1) | 16 | f | f | 01 00 00 00 03 00 00 00 30 | (6696,1) | 16 | f | f | 01 00 00 00 03 00 00 00 31 | (6479,1) | 16 | f | f | 01 00 00 00 03 00 00 00 32 | (6262,1) | 16 | f | f | 01 00 00 00 03 00 00 00 33 | (6045,1) | 16 | f | f | 01 00 00 00 03 00 00 00 34 | (5828,1) | 16 | f | f | 01 00 00 00 03 00 00 00 35 | (5611,1) | 16 | f | f | 01 00 00 00 03 00 00 00 36 | (860,1) | 16 | f | f | 01 00 00 00 03 00 00 00 37 | (1145,1) | 16 | f | f | 02 00 00 00 01 00 00 00 38 | (1430,1) | 16 | f | f | 02 00 00 00 03 00 00 00 39 | (1715,1) | 16 | f | f | 03 00 00 00 01 00 00 00 40 | (2000,1) | 16 | f | f | 03 00 00 00 03 00 00 00 41 | (2285,1) | 16 | f | f | 04 00 00 00 01 00 00 00 42 | (2570,1) | 16 | f | f | 04 00 00 00 03 00 00 00 43 | (2855,1) | 16 | f | f | 05 00 00 00 01 00 00 00 44 | (3140,1) | 16 | f | f | 05 00 00 00 03 00 00 00 45 | (3425,1) | 16 | f | f | 06 00 00 00 01 00 00 00 46 | (3710,1) | 16 | f | f | 06 00 00 00 03 00 00 00 47 | (3995,1) | 16 | f | f | 07 00 00 00 01 00 00 00 48 | (4280,1) | 16 | f | f | 07 00 00 00 03 00 00 00 49 | (4565,1) | 16 | f | f | 07 00 00 00 03 00 00 00 50 | (4850,1) | 16 | f | f | 08 00 00 00 01 00 00 00 51 | (5135,1) | 16 | f | f | 08 00 00 00 03 00 00 00 52 | (5420,1) | 16 | f | f | 09 00 00 00 03 00 00 00
(52 rows)
索引的非leaf节点,data表示这个PAGE的最小边界值,最左边的页没有最小值
如何快速找到c1=1 and c2=2,通过以上信息,可以知道1,2在575号数据块中。
19 | (575,1) | 16 | f | f | 01 00 00 00 01 00 00 00 20 | (8866,1) | 16 | f | f | 01 00 00 00 03 00 00 00
继续查看575号索引页的内容。这个页是第一层(不是最后一层),分支节点
第一条表示与当前页右边的相邻页,data是它的最小值。第二条表示当前页左边的相邻页,data为空。
postgres=# SELECT * FROM bt_page_items('idx_tab1', 575); itemoffset | ctid | itemlen | nulls | vars | data
------------+----------+---------+-------+------+------------------------- 1 | (8712,1) | 16 | f | f | 01 00 00 00 03 00 00 00 2 | (572,1) | 8 | f | f | 3 | (573,1) | 16 | f | f | 01 00 00 00 01 00 00 00 4 | (574,1) | 16 | f | f | 01 00 00 00 01 00 00 00 5 | (576,1) | 16 | f | f | 01 00 00 00 01 00 00 00 6 | (577,1) | 16 | f | f | 01 00 00 00 01 00 00 00 7 | (578,1) | 16 | f | f | 01 00 00 00 01 00 00 00 8 | (579,1) | 16 | f | f | 01 00 00 00 01 00 00 00 9 | (580,1) | 16 | f | f | 01 00 00 00 01 00 00 00 10 | (581,1) | 16 | f | f | 01 00 00 00 01 00 00 00 11 | (582,1) | 16 | f | f | 01 00 00 00 01 00 00 00 12 | (583,1) | 16 | f | f | 01 00 00 00 01 00 00 00 13 | (584,1) | 16 | f | f | 01 00 00 00 01 00 00 00 14 | (585,1) | 16 | f | f | 01 00 00 00 01 00 00 00 15 | (586,1) | 16 | f | f | 01 00 00 00 01 00 00 00 16 | (587,1) | 16 | f | f | 01 00 00 00 01 00 00 00 17 | (588,1) | 16 | f | f | 01 00 00 00 01 00 00 00 18 | (589,1) | 16 | f | f | 01 00 00 00 01 00 00 00 19 | (590,1) | 16 | f | f | 01 00 00 00 01 00 00 00 20 | (591,1) | 16 | f | f | 01 00 00 00 01 00 00 00 21 | (592,1) | 16 | f | f | 01 00 00 00 01 00 00 00 22 | (593,1) | 16 | f | f | 01 00 00 00 01 00 00 00 23 | (594,1) | 16 | f | f | 01 00 00 00 01 00 00 00 24 | (595,1) | 16 | f | f | 01 00 00 00 01 00 00 00 25 | (596,1) | 16 | f | f | 01 00 00 00 01 00 00 00 26 | (597,1) | 16 | f | f | 01 00 00 00 01 00 00 00 27 | (598,1) | 16 | f | f | 01 00 00 00 01 00 00 00 28 | (599,1) | 16 | f | f | 01 00 00 00 01 00 00 00 29 | (600,1) | 16 | f | f | 01 00 00 00 01 00 00 00 30 | (601,1) | 16 | f | f | 01 00 00 00 01 00 00 00 31 | (602,1) | 16 | f | f | 01 00 00 00 01 00 00 00 32 | (603,1) | 16 | f | f | 01 00 00 00 01 00 00 00 33 | (604,1) | 16 | f | f | 01 00 00 00 01 00 00 00 34 | (605,1) | 16 | f | f | 01 00 00 00 01 00 00 00 35 | (606,1) | 16 | f | f | 01 00 00 00 01 00 00 00 36 | (607,1) | 16 | f | f | 01 00 00 00 01 00 00 00 37 | (608,1) | 16 | f | f | 01 00 00 00 01 00 00 00 38 | (609,1) | 16 | f | f | 01 00 00 00 01 00 00 00 39 | (610,1) | 16 | f | f | 01 00 00 00 01 00 00 00 40 | (5488,1) | 16 | f | f | 01 00 00 00 01 00 00 00 41 | (8961,1) | 16 | f | f | 01 00 00 00 03 00 00 00 42 | (8960,1) | 16 | f | f | 01 00 00 00 03 00 00 00
。。。。。。。。。。。。。。
通过这两行,找到了c1=1.c2=2应该在5488号索引页中。
40 | (5488,1) | 16 | f | f | 01 00 00 00 01 00 00 00 41 | (8961,1) | 16 | f | f | 01 00 00 00 03 00 00 00
继续搜索索引也,第二层(最后一层),叶子节点
postgres=# SELECT * FROM bt_page_items('idx_tab1', 5488); itemoffset | ctid | itemlen | nulls | vars | data
------------+-------------+---------+-------+------+------------------------- 1 | (16215,25) | 16 | f | f | 01 00 00 00 03 00 00 00 2 | (5398,127) | 16 | f | f | 01 00 00 00 01 00 00 00 3 | (5398,137) | 16 | f | f | 01 00 00 00 01 00 00 00 4 | (5398,156) | 16 | f | f | 01 00 00 00 01 00 00 00 5 | (5398,172) | 16 | f | f | 01 00 00 00 01 00 00 00
..... 130 | (5405,10) | 16 | f | f | 01 00 00 00 01 00 00 00 131 | (5405,15) | 16 | f | f | 01 00 00 00 01 00 00 00 132 | (5405,17) | 16 | f | f | 01 00 00 00 01 00 00 00 133 | (5405,35) | 16 | f | f | 01 00 00 00 01 00 00 00 134 | (5405,59) | 16 | f | f | 01 00 00 00 01 00 00 00 135 | (10810,151) | 16 | f | f | 01 00 00 00 02 00 00 00 136 | (16216,41) | 16 | f | f | 01 00 00 00 03 00 00 00 137 | (16216,40) | 16 | f | f | 01 00 00 00 03 00 00 00 138 | (16216,39) | 16 | f | f | 01 00 00 00 03 00 00 00
...
找到记录
HEAP PAGE
135 | (10810,151) | 16 | f | f | 01 00 00 00 02 00 00 00
因为是叶子节点,所以ctid表示的是HEAP的偏移值,直接在HEAP PAGE中查看
postgres=# select * from tab1 where ctid='(10810,151)'; id | c1 | c2
----+----+---- 1 | 1 | 2
(1 row)
在了解了多列索引的内部结构后,可以来看一下几种查询场景的优化
例子 - 范围+等值查询
驱动列使用范围条件,第二列使用等值条件
虽然走了索引,但是扫描了第一列的所有索引页。
性能不佳
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 between 1 and 9 and c2=2; QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_tab1 on public.tab1 (cost=0.43..60757.38 rows=1 width=12) (actual time=0.027..106.362 rows=1 loops=1) Output: id, c1, c2 Index Cond: ((tab1.c1 >= 1) AND (tab1.c1 <= 9) AND (tab1.c2 = 2)) Buffers: shared hit=8321 Planning time: 0.099 ms Execution time: 106.422 ms
(6 rows)
优化
新建复合索引,将等值列放在前面
postgres=# create index idx_tab1_2 on tab1 using btree (c2,c1);
CREATE INDEX
等值条件直接被过滤,只需要扫描一条索引ITEM
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 between 1 and 9 and c2=2; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_tab1_2 on public.tab1 (cost=0.43..2.35 rows=1 width=12) (actual time=0.017..0.018 rows=1 loops=1) Output: id, c1, c2 Index Cond: ((tab1.c2 = 2) AND (tab1.c1 >= 1) AND (tab1.c1 <= 9)) Buffers: shared hit=4 Planning time: 0.095 ms Execution time: 0.040 ms
(6 rows)
例子 - 多值+等值查询
PostgreSQL针对离散多值查询,有一定的优化,仅仅扫描了多个离散值的索引ITEM
drop index idx_tab1_2; postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 in (1,2,3,4,5,6,7,8,9) and c2=2; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------ Index Scan using idx_tab1 on public.tab1 (cost=0.43..13.90 rows=1 width=12) (actual time=0.024..0.186 rows=1 loops=1) Output: id, c1, c2 Index Cond: ((tab1.c1 = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[])) AND (tab1.c2 = 2)) Buffers: shared hit=21 read=7 Planning time: 0.114 ms Execution time: 0.208 ms
(6 rows)
而如果将单值列放在前面,多值列放在后面,扫描的BLOCK会更少,但是会将离散过滤条件作为FILTER条件。
postgres=# create index idx_tab1_2 on tab1 using btree (c2,c1);
CREATE INDEX postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 in (1,2,3,4,5,6,7,8,9) and c2=2; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_tab1_2 on public.tab1 (cost=0.43..2.35 rows=1 width=12) (actual time=0.027..0.027 rows=1 loops=1) Output: id, c1, c2 Index Cond: (tab1.c2 = 2) Filter: (tab1.c1 = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[])) Buffers: shared hit=4 Planning time: 0.107 ms Execution time: 0.047 ms
(7 rows)
因为c2=2是驱动列,使用第二个索引,可以直接命中到1条item,其他的不需要扫到,所以快了很多。
假设有两个索引存在,对于数据存在倾斜的情况,数据库会根据过滤性自动选择合适的索引。
小结
PostgreSQL目前还不支持非连续性的索引扫描,所以当驱动列(第一列)使用了范围扫描后,即使复合索引有第二列,并且第二列是个等值查询,那么也要扫描第一列范围覆盖的所有索引。
这样就出现了索引页扫描的IO放大(因为可能扫了一些实际条件不符的INDEX PAGE)。
多列复合索引的创建建议:
1、离散查询条件(例如 等值)的列放在最前面,如果一个复合查询中有多个等值查询的列,尽量将选择性好(count(distinct) 值多的)的放在前面。
2、离散查询条件(例如 多值)的列放在后面,如果一个复合查询中有多个多值查询的列,尽量将选择性好(count(distinct) 值多的)的放在前面。
3、连续查询条件(例如 范围查询)的列放在最后面,如果一个复合查询中有多个多值查询的列,尽量将输入范围条件返回结果集少的列放前面,提高筛选效率(同时也减少索引扫描的范围)。
4、如果返回的结果集非常大(或者说条件命中率很高),并且属于流式返回(或需要高效率优先返回前面几条记录),同时有排序输出的需求。建议按排序键建立索引。
参考
《PostgreSQL bitmapAnd, bitmapOr, bitmap index scan, bitmap heap scan》
《深入浅出PostgreSQL B-Tree索引结构》
https://www.postgresql.org/docs/devel/static/pageinspect.html
PostgreSQL 数据库多列复合索引的字段顺序选择原理相关推荐
- php复合索引,多列复合索引的使用绕过微软sqlserver的一个缺陷
多列复合索引的使用绕过微软sqlserver的一个缺陷 作者:小涵 | 来源:互联网 | 2018-07-15 16:35 阅读: 4292 多列复合索引是指由多个字段组成的索引.这种情况相当常用的, ...
- Pandas中xs()函数索引复合索引数据的不同切面数据(索引复合索引中需要的数据):索引列复合索引中的一个切面、索引行复合索引中的一个切面
Pandas中xs()函数索引复合索引数据的不同切面数据(索引复合索引中需要的数据):索引列复合索引中的一个切面.索引行复合索引中的一个切面 目录
- 建立组合索引的字段顺序优化
建立组合索引的字段顺序优化 简介 组合索引我们经常用到,建立组合索引大家也都会,但是如何考虑建立组合索引的顺序是一个值得推敲的事情. 正文 1. 尽量把最常用的字段放在最前面 对于我们需要创建的组合索 ...
- mysql联合索引顺序调整_MySQL 关于联合索引的字段顺序规则讨论
联合索引的顺序,难道不是哪个查询条件最多用就放在前面的吗? 比如商品有三个分类A,B,C,类似「界门纲目科属种」那样,越左类别越大.还有一个是商品来源D,不一定会用于查询条件中. 举个例子,A代表衣服 ...
- mysql组合索引与字段顺序
转自:http://www.cnblogs.com/sunss/archive/2010/09/14/1826112.html 很多时候,我们在mysql中创建了索引,但是某些查询还是很慢,根本就没有 ...
- mysql 组合索引 in_mysql组合索引与字段顺序
转自:http://www.cnblogs.com/sunss/archive/2010/09/14/1826112.html 很多时候,我们在mysql中创建了索引,但是某些查询还是很慢,根本就没有 ...
- mysql navicat 组合索引_mysql组合索引与字段顺序
很多时候,我们在mysql中创建了索引,但是某些查询还是很慢,根本就没有使用到索引! 一般来说,可能是某些字段没有创建索引,或者是组合索引中字段的顺序与查询语句中字段的顺序不符. 看下面的例子: 假设 ...
- MySql复合索引失效的几种情况
建表 插入数据 建立复合索引 或者 查询插入的索引 执行测试Sql 测试数据1 这里常规查询,分别查询了当使用复合索引中的部分字段当查询条件时的索引使用情况,通过explain执行计划结果显示,以上四 ...
- oracle复合索引介绍(多字段索引)
首先,在大多数情况下,复合索引比单字段索引好.以税务系统的SB_ZSXX(申报类_征收信息表)为例,该表为税务系统最大的交易表.如果分别按纳税人识别号,税务机关代码,月份3个字段查询,每个字段在该表中 ...
最新文章
- MySQL跑在CentOS 6 和 7上的性能比较
- windows server 2003上安装mysql的问题
- python 实现文本自动翻译功能
- OpenCV开放神经网络交换ONNX混合的实例(附完整代码)
- 手机控制linux工具下载,Scrcpy-Scrcpy(手机控制软件)下载 v1.16官方版--pc6下载站
- listview android:cacheColorHint,android:listSelector属性作用
- [转载] python numpy 笔记(一)
- AngularJs学习的前景及优势
- Win10 1709系统更新后卡顿:Killer 无线网卡1525驱动不兼容导致
- 番茄时间管理法——学会专注
- linux引导文件制作U盘,Linux下制作U盘系统启动盘的方法
- Git Branching
- java将图片转化为base64码
- 金融学系列之 Inflation Money Remit
- 详述SpringMVC中数据类型转换器
- 语音识别之HTK入门(二)
- HBUILDERX怎么导入html文件,HBuilderX使用教程 HBuilderX如何导入图片
- 2021年施工员-土建方向-岗位技能(施工员)免费试题及施工员-土建方向-岗位技能(施工员)考试总结
- windows系统 nginx域名基本配置
- 5G NR Rel16 两步接入/2-step RACH
热门文章
- 服务器上tomcat修改内存,修改Tomcat运行内存
- 图像拼接--Fish-Eye Lenses-Based Camera Calibration and Panoramic Image Stitching
- Java 使用线程池ThreadPool:newFixedThreadPool
- LeetCode 206 Reverse Linked List--反转链表--迭代与递归解法--递归使用一个临时变量,迭代使用3个
- matlab中添加0向量,如何把在matlab中把a=0:0.1:pi变成列向量啊?
- docker加速器,设置cdn
- 295. 数据流的中位数
- LeetCode-笔记-112、路径总和
- Dockerfile 里指定执行命令用RUN 、ENTRYPOING和用CMD有何不同?
- hamap java_一篇文章,让你真正了解Java