PostgreSQL 数据库多列复合索引的字段顺序选择原理

背景

当需要创建多列复合索引时，应该使用什么样的顺序呢？

多列复合索引的组织结构与单列字段索引结构类似，只是需要按索引内表达式指定的顺序编排。

《深入浅出PostgreSQL B-Tree索引结构》

例如

create index idx on tbl using btree (udf(c1) desc, c2 , c3 desc nulls last);

那么会按定义的顺序编排。

举个例子

postgres=# create unlogged table tab1 (id int, c1 int, c2 int);
CREATE TABLE
postgres=# insert into tab1 select id, random()*9, 1 from generate_series(1,1000000) t(id);
INSERT 0 1000000
postgres=# insert into tab1 select id, random()*9, 3 from generate_series(1,1000000) t(id);
INSERT 0 1000000
postgres=# insert into tab1 values (1,1,2);
INSERT 0 1
postgres=# insert into tab1 select id, 1, 3 from generate_series(1,1000000) t(id);
INSERT 0 1000000
postgres=# insert into tab1 select id, 1, 1 from generate_series(1,1000000) t(id);
INSERT 0 1000000

c1=1, c2=2的记录只有一条

1、搜索c1=1, c2=2，只需要扫描4个BLOCK

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1=1 and c2=2;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------  Index Scan using idx_tab1 on public.tab1  (cost=0.43..2.38 rows=1 width=12) (actual time=0.017..0.018 rows=1 loops=1)  Output: id, c1, c2  Index Cond: ((tab1.c1 = 1) AND (tab1.c2 = 2))  Buffers: shared hit=4  （4个BLOCK，包括 root page, branch page, leaf page, HEAP PAGE）  Planning time: 0.214 ms  Execution time: 0.042 ms
(6 rows)

2、搜索其他的，需要扫描很多BLOCK。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1=1 and c2=3;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------  Index Scan using idx_tab1 on public.tab1  (cost=0.43..46108.77 rows=1109400 width=12) (actual time=0.026..237.712 rows=1111519 loops=1)  Output: id, c1, c2  Index Cond: ((tab1.c1 = 1) AND (tab1.c2 = 3))  Buffers: shared hit=22593 read=303   (包括heap page)  Planning time: 0.089 ms  Execution time: 328.249 ms
(6 rows)  postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1=1 and c2=1;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------  Index Scan using idx_tab1 on public.tab1  (cost=0.43..46108.77 rows=1109400 width=12) (actual time=0.022..238.399 rows=1110527 loops=1)  Output: id, c1, c2  Index Cond: ((tab1.c1 = 1) AND (tab1.c2 = 1))   Buffers: shared hit=22582 read=299   (包括heap page)  Planning time: 0.094 ms  Execution time: 329.331 ms
(6 rows)

那么如何知道数据库是快速定位到c1=1, c2=2的记录的呢？

可以使用pageinspect来看一看索引内部的结构

postgres=# create extension pageinspect ;
CREATE EXTENSION

查看索引内部结构，看看如何通过复合索引快速定位一条记录

首先要查看索引的第一个PAGE，即metapage，它会告诉你这个索引有几层，ROOT PAGE在哪里

postgres=# SELECT * FROM bt_metap('idx_tab1');   magic  | version | root | level | fastroot | fastlevel
--------+---------+------+-------+----------+-----------  340322 |       2 |  290 |     2 |      290 |         2
(1 row)

表示这个索引除去ROOT节点有2层，ROOT节点是290号数据块。

查看根页

postgres=# SELECT * FROM bt_page_items('idx_tab1', 290);   itemoffset |   ctid    | itemlen | nulls | vars |          data
------------+-----------+---------+-------+------+-------------------------  1 | (3,1)     |       8 | f     | f    |   2 | (289,1)   |      16 | f     | f    | 00 00 00 00 03 00 00 00  3 | (12341,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  4 | (12124,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  5 | (11907,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  6 | (11690,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  7 | (11473,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  8 | (11256,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  9 | (11039,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  10 | (10822,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  11 | (10605,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  12 | (10388,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  13 | (10171,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  14 | (9954,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  15 | (9737,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  16 | (9520,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  17 | (9303,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  18 | (9086,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  19 | (575,1)   |      16 | f     | f    | 01 00 00 00 01 00 00 00  20 | (8866,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  21 | (8649,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  22 | (8432,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  23 | (8215,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  24 | (7998,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  25 | (7781,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  26 | (7564,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  27 | (7347,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  28 | (7130,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  29 | (6913,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  30 | (6696,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  31 | (6479,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  32 | (6262,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  33 | (6045,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  34 | (5828,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  35 | (5611,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  36 | (860,1)   |      16 | f     | f    | 01 00 00 00 03 00 00 00  37 | (1145,1)  |      16 | f     | f    | 02 00 00 00 01 00 00 00  38 | (1430,1)  |      16 | f     | f    | 02 00 00 00 03 00 00 00  39 | (1715,1)  |      16 | f     | f    | 03 00 00 00 01 00 00 00  40 | (2000,1)  |      16 | f     | f    | 03 00 00 00 03 00 00 00  41 | (2285,1)  |      16 | f     | f    | 04 00 00 00 01 00 00 00  42 | (2570,1)  |      16 | f     | f    | 04 00 00 00 03 00 00 00  43 | (2855,1)  |      16 | f     | f    | 05 00 00 00 01 00 00 00  44 | (3140,1)  |      16 | f     | f    | 05 00 00 00 03 00 00 00  45 | (3425,1)  |      16 | f     | f    | 06 00 00 00 01 00 00 00  46 | (3710,1)  |      16 | f     | f    | 06 00 00 00 03 00 00 00  47 | (3995,1)  |      16 | f     | f    | 07 00 00 00 01 00 00 00  48 | (4280,1)  |      16 | f     | f    | 07 00 00 00 03 00 00 00  49 | (4565,1)  |      16 | f     | f    | 07 00 00 00 03 00 00 00  50 | (4850,1)  |      16 | f     | f    | 08 00 00 00 01 00 00 00  51 | (5135,1)  |      16 | f     | f    | 08 00 00 00 03 00 00 00  52 | (5420,1)  |      16 | f     | f    | 09 00 00 00 03 00 00 00
(52 rows)

索引的非leaf节点，data表示这个PAGE的最小边界值，最左边的页没有最小值

如何快速找到c1=1 and c2=2，通过以上信息，可以知道1,2在575号数据块中。

         19 | (575,1)   |      16 | f     | f    | 01 00 00 00 01 00 00 00  20 | (8866,1)  |      16 | f     | f    | 01 00 00 00 03 00 00 00

继续查看575号索引页的内容。这个页是第一层（不是最后一层），分支节点

第一条表示与当前页右边的相邻页，data是它的最小值。第二条表示当前页左边的相邻页，data为空。

postgres=# SELECT * FROM bt_page_items('idx_tab1', 575);   itemoffset |   ctid   | itemlen | nulls | vars |          data
------------+----------+---------+-------+------+-------------------------  1 | (8712,1) |      16 | f     | f    | 01 00 00 00 03 00 00 00  2 | (572,1)  |       8 | f     | f    |   3 | (573,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  4 | (574,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  5 | (576,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  6 | (577,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  7 | (578,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  8 | (579,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  9 | (580,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  10 | (581,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  11 | (582,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  12 | (583,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  13 | (584,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  14 | (585,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  15 | (586,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  16 | (587,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  17 | (588,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  18 | (589,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  19 | (590,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  20 | (591,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  21 | (592,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  22 | (593,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  23 | (594,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  24 | (595,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  25 | (596,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  26 | (597,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  27 | (598,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  28 | (599,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  29 | (600,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  30 | (601,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  31 | (602,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  32 | (603,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  33 | (604,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  34 | (605,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  35 | (606,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  36 | (607,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  37 | (608,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  38 | (609,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  39 | (610,1)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  40 | (5488,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  41 | (8961,1) |      16 | f     | f    | 01 00 00 00 03 00 00 00  42 | (8960,1) |      16 | f     | f    | 01 00 00 00 03 00 00 00
。。。。。。。。。。。。。。

通过这两行，找到了c1=1.c2=2应该在5488号索引页中。

         40 | (5488,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00  41 | (8961,1) |      16 | f     | f    | 01 00 00 00 03 00 00 00

继续搜索索引也，第二层（最后一层），叶子节点

postgres=# SELECT * FROM bt_page_items('idx_tab1', 5488);   itemoffset |    ctid     | itemlen | nulls | vars |          data
------------+-------------+---------+-------+------+-------------------------  1 | (16215,25)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  2 | (5398,127)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  3 | (5398,137)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  4 | (5398,156)  |      16 | f     | f    | 01 00 00 00 01 00 00 00  5 | (5398,172)  |      16 | f     | f    | 01 00 00 00 01 00 00 00
.....  130 | (5405,10)   |      16 | f     | f    | 01 00 00 00 01 00 00 00  131 | (5405,15)   |      16 | f     | f    | 01 00 00 00 01 00 00 00  132 | (5405,17)   |      16 | f     | f    | 01 00 00 00 01 00 00 00  133 | (5405,35)   |      16 | f     | f    | 01 00 00 00 01 00 00 00  134 | (5405,59)   |      16 | f     | f    | 01 00 00 00 01 00 00 00  135 | (10810,151) |      16 | f     | f    | 01 00 00 00 02 00 00 00  136 | (16216,41)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  137 | (16216,40)  |      16 | f     | f    | 01 00 00 00 03 00 00 00  138 | (16216,39)  |      16 | f     | f    | 01 00 00 00 03 00 00 00
...

找到记录

HEAP PAGE

        135 | (10810,151) |      16 | f     | f    | 01 00 00 00 02 00 00 00

因为是叶子节点，所以ctid表示的是HEAP的偏移值，直接在HEAP PAGE中查看

postgres=# select * from tab1 where ctid='(10810,151)';  id | c1 | c2
----+----+----  1 |  1 |  2
(1 row)

在了解了多列索引的内部结构后，可以来看一下几种查询场景的优化

例子 - 范围+等值查询

驱动列使用范围条件，第二列使用等值条件

虽然走了索引，但是扫描了第一列的所有索引页。

性能不佳

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 between 1 and 9 and c2=2;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------  Index Scan using idx_tab1 on public.tab1  (cost=0.43..60757.38 rows=1 width=12) (actual time=0.027..106.362 rows=1 loops=1)  Output: id, c1, c2  Index Cond: ((tab1.c1 >= 1) AND (tab1.c1 <= 9) AND (tab1.c2 = 2))  Buffers: shared hit=8321  Planning time: 0.099 ms  Execution time: 106.422 ms
(6 rows)

优化

新建复合索引，将等值列放在前面

postgres=# create index idx_tab1_2 on tab1 using btree (c2,c1);
CREATE INDEX

等值条件直接被过滤，只需要扫描一条索引ITEM

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 between 1 and 9 and c2=2;  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------  Index Scan using idx_tab1_2 on public.tab1  (cost=0.43..2.35 rows=1 width=12) (actual time=0.017..0.018 rows=1 loops=1)  Output: id, c1, c2  Index Cond: ((tab1.c2 = 2) AND (tab1.c1 >= 1) AND (tab1.c1 <= 9))  Buffers: shared hit=4  Planning time: 0.095 ms  Execution time: 0.040 ms
(6 rows)

例子 - 多值+等值查询

PostgreSQL针对离散多值查询，有一定的优化，仅仅扫描了多个离散值的索引ITEM

drop index idx_tab1_2;  postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 in (1,2,3,4,5,6,7,8,9) and c2=2;  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------  Index Scan using idx_tab1 on public.tab1  (cost=0.43..13.90 rows=1 width=12) (actual time=0.024..0.186 rows=1 loops=1)  Output: id, c1, c2  Index Cond: ((tab1.c1 = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[])) AND (tab1.c2 = 2))  Buffers: shared hit=21 read=7  Planning time: 0.114 ms  Execution time: 0.208 ms
(6 rows)

而如果将单值列放在前面，多值列放在后面，扫描的BLOCK会更少，但是会将离散过滤条件作为FILTER条件。

postgres=# create index idx_tab1_2 on tab1 using btree (c2,c1);
CREATE INDEX  postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tab1 where c1 in (1,2,3,4,5,6,7,8,9) and c2=2;  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------  Index Scan using idx_tab1_2 on public.tab1  (cost=0.43..2.35 rows=1 width=12) (actual time=0.027..0.027 rows=1 loops=1)  Output: id, c1, c2  Index Cond: (tab1.c2 = 2)  Filter: (tab1.c1 = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))  Buffers: shared hit=4  Planning time: 0.107 ms  Execution time: 0.047 ms
(7 rows)

因为c2=2是驱动列，使用第二个索引，可以直接命中到1条item，其他的不需要扫到，所以快了很多。

假设有两个索引存在，对于数据存在倾斜的情况，数据库会根据过滤性自动选择合适的索引。

小结

PostgreSQL目前还不支持非连续性的索引扫描，所以当驱动列（第一列）使用了范围扫描后，即使复合索引有第二列，并且第二列是个等值查询，那么也要扫描第一列范围覆盖的所有索引。

这样就出现了索引页扫描的IO放大（因为可能扫了一些实际条件不符的INDEX PAGE）。

多列复合索引的创建建议：

1、离散查询条件（例如等值）的列放在最前面，如果一个复合查询中有多个等值查询的列，尽量将选择性好（count(distinct) 值多的）的放在前面。

2、离散查询条件（例如多值）的列放在后面，如果一个复合查询中有多个多值查询的列，尽量将选择性好（count(distinct) 值多的）的放在前面。

3、连续查询条件（例如范围查询）的列放在最后面，如果一个复合查询中有多个多值查询的列，尽量将输入范围条件返回结果集少的列放前面，提高筛选效率（同时也减少索引扫描的范围）。

4、如果返回的结果集非常大（或者说条件命中率很高），并且属于流式返回（或需要高效率优先返回前面几条记录），同时有排序输出的需求。建议按排序键建立索引。

参考

《PostgreSQL bitmapAnd, bitmapOr, bitmap index scan, bitmap heap scan》

《深入浅出PostgreSQL B-Tree索引结构》

https://www.postgresql.org/docs/devel/static/pageinspect.html