本文，我们进一步学习下Gist索引。Gist是Generalized Search Tree的意思，意思是通用搜索树，底层结构也是一种平衡树，它是一套索引模板，可以支持用户实现自定义的索引。相比于BTree索引，BTree索引可以建立在任意类型之上，但是BTree只支持<、=、>操作符，而Gist索引可以支持@>、&&等复杂运算的操作符。

一、Gist索引的存储结构

在《Postgresql杂谈 04—Postgresql中的四种常规索引》一文中，笔者曾经简单介绍过Gist索引的使用，并创建过一个包含两个列的索引：

stock_analysis_data=# create index mygistinx on test using gist(fund_code,record_time);
CREATE INDEX

test表的结构如下：

stock_analysis_data=# \d+ testTable "public.test"Column    |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description
-------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------fund_code   | character varying(256)      |           |          |         | extended |              | fund_name   | character varying(256)      |           |          |         | extended |              | record_time | timestamp without time zone |           |          |         | plain    |              |
Indexes:"myspgistinx" spgist (fund_code)

实际上，在我们创建好gist索引之后，整个索引的结构如下：

可以看到：

（1）只有在叶子节点中保存着数据的otid，上图右边()里面表示数据的otid，逗号左边表示磁盘的页号，右边表示该页中的序号。

（2）树的上层节点具有指向叶子节点的指针，并定义了叶子节点一个Page上数据的范围（对应红色字体部分），但是不含有指向数据的指针。

（3）基于上述的原因，相比于BTree索引，Gist索引所占的空间更大（BTree索引上非叶子节点也包括指向数据的指针）

二、Postgresql中支持Gist索引的操作类

Postgresql中对一些内置类型已经实现了Gist索引的操作类，我们可以直接使用在这些类型之上使用Gist索引。支持Gist索引的的操作类：

数据类型	索引操作符
box	&& &> &< &<\| >> << <<\| <@ @> @ \|&> !>> ~ ~=
circle	&& &> &< &<\| >> << <<\| <@ @> @ \|&> !>> ~ ~=
inet,cidr	&& >> >>= > >= <> << <<= < <= =
point	>> >^ << <@ <^ ~=
polygon	&& &> &< &<\| >> << <<\| <@ @> @ \|&》 \|》》 ~ ~=
range	&& &> &< >> << <@ -\|- = <@ @>
tsquery	<@ @>
tsvector	@@

下面，笔者通过实例来说明下Gist索引的用法：

Point类型上创建Gist索引：

首先，创建一个测试表：

stock_analysis_data=# create table pts(id int ,p point);
CREATE TABLE
stock_analysis_data=# \d+ pts;Table "public.pts"Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------id     | integer |           |          |         | plain   |              | p      | point   |           |          |         | plain   |              |

在表里面插入待测试的数据：

stock_analysis_data=# insert into pts select t.d,point(ceil(random()*1000),ceil(random()*1000)) from generate_series(1,1000000) as t(d);
INSERT 0 1000000

在没有索引的条件下进行查询，查询语句的意思是查找所有在圆形((100,100) 100)范围内的点：

stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing)  select * from pts where circle '((100,100) 100)'  @> p;QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------Gather  (cost=1000.00..12678.33 rows=1000 width=20) (actual time=0.323..175.264 rows=31426 loops=1)Output: id, pWorkers Planned: 2Workers Launched: 2Buffers: shared hit=6370->  Parallel Seq Scan on public.pts  (cost=0.00..11578.33 rows=417 width=20) (actual time=0.030..63.779 rows=10475 loops=3)Output: id, pFilter: ('<(100,100),100>'::circle @> pts.p)Rows Removed by Filter: 322858Buffers: shared hit=6370Worker 0: actual time=0.041..33.690 rows=8188 loops=1Buffers: shared hit=1675Worker 1: actual time=0.032..52.241 rows=8901 loops=1Buffers: shared hit=1791Planning Time: 0.060 msExecution Time: 190.921 ms
(16 rows)

在没有索引的条件下，查询总共耗时190ms。接下来，创建Gist索引：

stock_analysis_data=# create index on pts using gist(p);
CREATE INDEX

通过Gist索引进行查询：

stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing)  select * from pts where circle '((100,100) 100)'  @> p;QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on public.pts  (cost=44.03..2705.93 rows=1000 width=20) (actual time=9.080..36.787 rows=31426 loops=1)Output: id, pRecheck Cond: ('<(100,100),100>'::circle @> pts.p)Heap Blocks: exact=6331Buffers: shared hit=6689->  Bitmap Index Scan on pts_p_idx  (cost=0.00..43.78 rows=1000 width=0) (actual time=8.222..8.223 rows=31426 loops=1)Index Cond: ('<(100,100),100>'::circle @> pts.p)Buffers: shared hit=358Planning Time: 0.620 msExecution Time: 52.507 ms
(10 rows)

通过Explain中可以看到，加了索引之后，整个查询的效率提高了4倍。

inet类型的Gist索引

笔者再来列举一个使用inet类型Gist索引进行查询的列子，首先创建一个vector的表，表中包含inet类型的字段：

stock_analysis_data=# create table vector(id int,ip inet);
CREATE TABLE
stock_analysis_data=# \d+ vector;Table "public.vector"Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------id     | integer |           |          |         | plain   |              | ip     | inet    |           |          |         | main    |              |

插入测试数据：

stock_analysis_data=# insert into vector select t.d,inet(ceil(random()*255)||'.'||ceil(random()*255)||'.'||ceil(random()*255)||'.'||ceil(random()*255)) from generate_series(1,1000000) as t(d);
INSERT 0 1000000

在插入索引之前，我们先查询IP地址等于77.80.250.123的所有IP地址：

stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing) select * from vector where ip = '77.80.250.123'::inet;QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------Gather  (cost=1000.00..11614.43 rows=1 width=11) (actual time=0.222..108.731 rows=1 loops=1)Output: id, ipWorkers Planned: 2Workers Launched: 2Buffers: shared hit=5406->  Parallel Seq Scan on public.vector  (cost=0.00..10614.33 rows=1 width=11) (actual time=50.047..85.650 rows=0 loops=3)Output: id, ipFilter: (vector.ip = '77.80.250.123'::inet)Rows Removed by Filter: 333333Buffers: shared hit=5406Worker 0: actual time=81.686..81.686 rows=0 loops=1Buffers: shared hit=1633Worker 1: actual time=68.437..68.438 rows=0 loops=1Buffers: shared hit=1312Planning Time: 0.057 msExecution Time: 108.759 ms
(16 rows)

在插入Gist索引之前，我们先插入BTree索引，来查看查询效率如何：

stock_analysis_data=# create index vector_btree_inx on vector using btree(ip);CREATE INDEX

进行查询，发现虽然走了BTree索引，而且查询效率提升了不少。

stock_analysis_data=# create index vector_btree_inx on vector using btree(ip);
CREATE INDEX
stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing) select * from vector where ip = '77.80.250.123'::inet;QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------Index Scan using vector_btree_inx on public.vector  (cost=0.42..8.44 rows=1 width=11) (actual time=0.028..0.029 rows=1 loops=1)Output: id, ipIndex Cond: (vector.ip = '77.80.250.123'::inet)Buffers: shared hit=1 read=3Planning Time: 0.180 msExecution Time: 0.054 ms
(6 rows)

接下来，创建Gist索引并进行查询：

stock_analysis_data=# create index vector_btree_inx on vector using gist(ip inet_ops);
CREATE INDEX
stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing) select * from vector where ip = '77.80.250.123'::inet;QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------Index Scan using vector_gist_inx on public.vector  (cost=0.29..8.30 rows=1 width=11) (actual time=0.033..0.034 rows=1 loops=1)Output: id, ipIndex Cond: (vector.ip = '77.80.250.123'::inet)Buffers: shared hit=4Planning Time: 0.133 msExecution Time: 0.056 msstock_analysis_data=#

可以看到，使用Gist索引的查询效率和使用Btree的查询效率差不多，但是需要BTree索引是不支持<<、@>这些操作符的。

三、总结

根据上面的内容，我们可以得到如下结论：

（1）使用Gist索引和BTree索引相比，前者创建索引时耗时更多，占用的空间更大。

（2）对于int、String等基本的数据类型，可以使用Gist索引和BTree索引，但是使用Gist索引的性价比较低，不建议在这些常见类型中使用。

（3）对于inet等网络类型，建议使用Gist索引，因为Gist索引查询支持@>等特殊的操作符

（4）对于point等空间类型，建立索引时应该使用Gist索引。

Postgresql杂谈 09—Postgresql中的Gist索引的深入学习相关推荐

在mysql中unique唯一索引的作用_MySQL_MySQL中的唯一索引的简单学习教程，mysql 唯一索引UNIQUE一般用于不 - phpStudy...
MySQL中的唯一索引的简单学习教程 mysql 唯一索引UNIQUE一般用于不重复数据字段了我们经常会在数据表中的id设置为唯一索引UNIQUE,下面我来介绍如何在mysql中使用唯一索引UNIQU ...
Postgresql杂谈 04—Postgresql中的五种常规索引
一.索引的分类 Postgresql中索引一共分为5种,每一种都有它合适的应用场景,我们在使用时要根据不同业务的特点,选择合适的索引,这样才能加快sql语句的查询效率.下面,我们将就每种不同的索引,介 ...
Postgresql杂谈 06—Postgresql中的范围和数组类型
本文主要介绍下Postgresql的另外两种特殊的类型Range类型(范围类型)和数组类型.两种类型,适用于不同的场景,但是最终的目的相同,就是使用传统的数据类型,建立常规的索引无法满足查询的性能要求 ...
Postgresql杂谈 23——Postgresql中的全文检索
今天我们来聊一下全文检索,想必做搜索相关业务朋友对这个概念不会陌生,尤其是做搜索引擎,或者类似CSDN.知乎类的社区网站,全文检索是逃不开的业务.文,即文章.文档.全文搜索就是给定关键词,在所有的文档 ...
Postgresql杂谈 16—Postgresql中的锁机制
今天,我们学习下Postgresql中的锁机制.锁是数据库事务的基础,通过锁才能保证数据库在并发时能够保证数据的安全和一致,才能够达到事务的一致性和隔离性.但是任何事物都有它的两面性,引入锁同样会增加 ...
Postgresql杂谈 18—Postgresql中的备份和恢复（二）
上一篇文章中,我们主要学习了Postgresql的逻辑备份和恢复,接着上一篇的内容,今天我们介绍下Postgresql的物理备份.所谓物理备份,就是针对数据库的数据文件或者目录进行备份,物理备份的好处 ...
Postgresql杂谈 10—Postgresql中的分区表
一.关于分区表表分区是在大数据优化中的一种常见的分表方案,通过将大数据按照一定的规则(最常见的是按照时间)进行分表处理,将逻辑上的一个大表分割成物理上的几块表,插入数据时,数据会自动插入到不同的分区 ...
Postgresql杂谈 22——Postgresql中的模糊匹配
Postgresql对模糊查询的支持,主要有三种方法:传统的like操作符.SQL99新增的SIMILAR TO操作符以及POSIX正则表达式.除了前面两种SQL标准的模糊查询手段,Postgresq ...
PostgreSQL 10.1 手册_部分 II. SQL 语言_第 12 章全文搜索_12.9. GIN 和 GiST 索引类型
12.9. GIN 和 GiST 索引类型有两种索引可以被用来加速全文搜索.注意全文搜索并非一定需要索引,但是在一个定期会被搜索的列上,通常需要有一个索引. CREATE INDEX name ON ...

Postgresql杂谈 09—Postgresql中的Gist索引的深入学习

一、Gist索引的存储结构

二、Postgresql中支持Gist索引的操作类

Point类型上创建Gist索引：

inet类型的Gist索引

三、总结

Postgresql杂谈 09—Postgresql中的Gist索引的深入学习相关推荐

最新文章

热门文章