PostgreSQL模糊查询

说明:
数据库中常用到模糊查询基本分为三类:后模糊(abc%)、前模糊(%abc)、前后模糊(%abc%),一般数据库中都能够支持后模糊查询,但是对于前模糊和前后模糊的支持并不好,但是PostgreSQL对于模糊查询的支持可以说是相当强大,针对不同场景有不同的优化手段,在PostgreSQL中对于这三种模糊查询采用的方法大致如下:
后模糊(abc%):可以使用btree来优化
前模糊(%abc):可以利用reverse函数建立函数索引来优化
前后模糊(%abc%):常利用pg_trgm插件和gin索引来优化

前模糊和后模糊:
后模糊(abc%):
1、利用btree索引来进行优化即可,但是需要注意使用类型默认的index ops class时，仅适合于collate="C"的查询,例如:

bill=# create table t1(id int,info text);
CREATE TABLE
bill=# insert into t1 select generate_series(1,100000),hashtext(random()::text);
INSERT 0 100000bill=# create index idx_t1 on t1 using btree(info collate "C");
CREATE INDEX
bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info like 'abc%' collate "C";      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.005..0.005 rows=0 loops=1)Output: id, infoIndex Cond: ((t1.info >= 'abc'::text) AND (t1.info < 'abd'::text))Filter: (t1.info ~~ 'abc%'::text COLLATE "C")Buffers: shared hit=3Planning Time: 0.414 msExecution Time: 0.024 ms
(7 rows)

2、当collate<>"C"时,使用对应类型的pattern ops来让btree索引支持模糊查询

bill=# create index idx_t1 on t1 using btree(info text_pattern_ops);
CREATE INDEX
bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info like 'abc%';QUERY PLAN
--------------------------------------------------------------------------------------------------------------------Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.027..0.027 rows=0 loops=1)Output: id, infoIndex Cond: ((t1.info ~>=~ 'abc'::text) AND (t1.info ~<~ 'abd'::text))Filter: (t1.info ~~ 'abc%'::text)Buffers: shared read=3Planning Time: 0.335 msExecution Time: 0.054 ms
(7 rows)bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info like 'abc%' collate "C";      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.027..0.027 rows=0 loops=1)Output: id, infoIndex Cond: ((t1.info ~>=~ 'abc'::text) AND (t1.info ~<~ 'abd'::text))Filter: (t1.info ~~ 'abc%'::text COLLATE "C")Buffers: shared hit=3Planning Time: 0.133 msExecution Time: 0.056 ms
(7 rows)

同时还能支持规则表达式的写法，如下：

bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info ~ '^abc';  QUERY PLAN
--------------------------------------------------------------------------------------------------------------------Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.009..0.009 rows=0 loops=1)Output: id, infoIndex Cond: ((t1.info ~>=~ 'abc'::text) AND (t1.info ~<~ 'abd'::text))Filter: (t1.info ~ '^abc'::text)Buffers: shared hit=3Planning Time: 0.210 msExecution Time: 0.026 ms
(7 rows)

前模糊(%abc)
使用反转函数(reverse)索引，可以支持前模糊的查询。
1、同样,使用类型默认的index ops class时，仅适合于collate="C"的查询,例如:

bill=# create index idx_t1 on t1 using btree(reverse(info) collate "C");
CREATE INDEX
bill=# select * from t1 limit 1;id |    info
----+-------------1 | -2100249117
(1 row)bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where reverse(info) like '117%' collate "C";QUERY PLAN
-------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on public.t1  (cost=8.14..378.67 rows=500 width=14) (actual time=0.142..0.287 rows=101 loops=1)Output: id, infoFilter: (reverse(t1.info) ~~ '117%'::text COLLATE "C")Heap Blocks: exact=94Buffers: shared hit=94 read=3->  Bitmap Index Scan on idx_t1  (cost=0.00..8.02 rows=500 width=0) (actual time=0.118..0.118 rows=101 loops=1)Index Cond: ((reverse(t1.info) >= '117'::text) AND (reverse(t1.info) < '118'::text))Buffers: shared read=3Planning Time: 0.117 msExecution Time: 0.324 ms
(10 rows)

2、当collate<>"C"时,使用对应类型的pattern ops来让btree索引支持模糊查询,并且也能支持规则表达式的写法

bill=# create index idx_t1 on t1 using btree(reverse(info) text_pattern_ops);
CREATE INDEX
bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where reverse(info) like '117%'; QUERY PLAN
-------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on public.t1  (cost=8.14..378.67 rows=500 width=14) (actual time=0.084..0.212 rows=101 loops=1)Output: id, infoFilter: (reverse(t1.info) ~~ '117%'::text)Heap Blocks: exact=94Buffers: shared hit=94 read=3->  Bitmap Index Scan on idx_t1  (cost=0.00..8.02 rows=500 width=0) (actual time=0.062..0.063 rows=101 loops=1)Index Cond: ((reverse(t1.info) ~>=~ '117'::text) AND (reverse(t1.info) ~<~ '118'::text))Buffers: shared read=3Planning Time: 0.096 msExecution Time: 0.242 ms
(10 rows)bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where reverse(info) ~ '^117';   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on public.t1  (cost=8.14..378.67 rows=500 width=14) (actual time=0.060..0.283 rows=101 loops=1)Output: id, infoFilter: (reverse(t1.info) ~ '^117'::text)Heap Blocks: exact=94Buffers: shared hit=97->  Bitmap Index Scan on idx_t1  (cost=0.00..8.02 rows=500 width=0) (actual time=0.037..0.037 rows=101 loops=1)Index Cond: ((reverse(t1.info) ~>=~ '117'::text) AND (reverse(t1.info) ~<~ '118'::text))Buffers: shared hit=3Planning Time: 0.129 msExecution Time: 0.320 ms
(10 rows)

前、后模糊的合体:
使用pg_trgm索引，可以支持前、后模糊的查询。
如果想要支持中文模糊查询,需要注意数据库的lc_ctype不能为"C",并且索引、查询条件的collate必须一致才能使用索引,例如:

bill=# \l+ billList of databasesName |  Owner   | Encoding | Collate |    Ctype    | Access privileges | Size  | Tablespace | Description
------+----------+----------+---------+-------------+-------------------+-------+------------+-------------bill | postgres | UTF8     | C       | zh_CN.UTF-8 |                   | 95 MB | pg_default |
(1 row)

构造环境:

bill=# create table test001(c1 text);
CREATE TABLEbill=# create or replace function gen_hanzi(int) returns text as $$
bill$# declare
bill$#   res text;
bill$# begin
bill$#   if $1 >=1 then
bill$#     select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1);
bill$#     return res;
bill$#   end if;
bill$#   return null;
bill$# end;
bill$# $$ language plpgsql strict;
CREATE FUNCTION
bill=# insert into test001 select gen_hanzi(20) from generate_series(1,100000);
INSERT 0 100000bill=# create index idx_test001_1 on test001 using gin (c1 gin_trgm_ops);
CREATE INDEX
bill=# select * from test001 limit 5;   c1
------------------------------------------頺縈鈍鮆嘻頔傉澩裂驁沧蜏鉩靣鞪僗怅鏒翌豇傸鵾鰔骝髑熯偣孊弰瞅油禩轷贶墯佁貫阖翜蹽戦綷筿降峝蹮險田螻攠當剞嘌蛓獿鮯諨嬰虡纙錮虒弧蜖比蒞窐砫宥軡泜箏矘犍秥冴僕礫四逇跐藉圹縲稡霃黇兯蒟穭曣轋虫徼嗣嶹矧鳨冥肺
(5 rows)

模糊查询:

bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '你%';  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on public.test001  (cost=10.38..475.76 rows=500 width=32) (actual time=0.041..0.052 rows=5 loops=1)Output: c1Recheck Cond: (test001.c1 ~~ '你%'::text)Heap Blocks: exact=5Buffers: shared hit=9->  Bitmap Index Scan on idx_test001_1  (cost=0.00..10.25 rows=500 width=0) (actual time=0.031..0.032 rows=5 loops=1)Index Cond: (test001.c1 ~~ '你%'::text)Buffers: shared hit=4Planning Time: 0.098 msExecution Time: 0.076 ms
(10 rows)bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '%中国';       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on public.test001  (cost=6.58..19.42 rows=10 width=61) (actual time=0.023..0.023 rows=0 loops=1)Output: c1Recheck Cond: (test001.c1 ~~ '%中国'::text)Rows Removed by Index Recheck: 1Heap Blocks: exact=1Buffers: shared hit=5->  Bitmap Index Scan on idx_test001_1  (cost=0.00..6.58 rows=10 width=0) (actual time=0.015..0.015 rows=1 loops=1)Index Cond: (test001.c1 ~~ '%中国'::text)Buffers: shared hit=4Planning Time: 0.248 msExecution Time: 0.045 ms
(11 rows)

前后模糊(%abc%):
使用pg_trgm插件，支持前后模糊的查询。
同样,需要注意数据库的lc_ctype不能为"C",并且索引、查询条件的collate必须一致才能使用索引
建议输入3个或3个以上字符，否则效果不佳

bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '%中国%';                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------Seq Scan on public.test001  (cost=0.00..2387.00 rows=10 width=61) (actual time=21.432..21.432 rows=0 loops=1)Output: c1Filter: (test001.c1 ~~ '%中国%'::text)Rows Removed by Filter: 100000Buffers: shared hit=1137Planning Time: 0.097 msExecution Time: 21.451 ms
(7 rows)bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '%我是程序员%';           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on public.test001  (cost=16.98..29.82 rows=10 width=61) (actual time=0.041..0.042 rows=0 loops=1)Output: c1Recheck Cond: (test001.c1 ~~ '%我是程序员%'::text)Buffers: shared hit=10->  Bitmap Index Scan on idx_test001_1  (cost=0.00..16.98 rows=10 width=0) (actual time=0.039..0.039 rows=0 loops=1)Index Cond: (test001.c1 ~~ '%我是程序员%'::text)Buffers: shared hit=10Planning Time: 0.112 msExecution Time: 0.066 ms
(9 rows)

可以看到只有2个字符性能有明显下降

总结:

对于只有后模糊(abc%)查询需求，使用collate "C"的b-tree索引；当collate不为"C"时，可以使用类型对应的pattern ops(例如text_pattern_ops)建立b-tree索引。
对于只有前模糊(%abc)的查询需求，使用collate "C"的reverse()表达式的b-tree索引；当collate不为"C"时，可以使用类型对应的pattern ops(例如text_pattern_ops)建立b-tree索引。
对于前后均模糊(%abc%)查询需求，并且包含中文，请使用lc_ctype <> "C"的数据库，同时使用pg_trgm插件的gin索引。

PostgreSQL模糊查询相关推荐

postgresql模糊查询不区分大小写
pg默认的模糊查询是区分大小写的,如果你想忽略大小写的话,酱紫做修改sql 很简单,直接把like换成ilike select * from table_name where name ilike ...
Greenplum 模糊查询实践
标签 PostgreSQL , Greenplum , orafunc , 阿里云HybridDB for PostgreSQL , reverse , like , 模糊查询背景文本搜索的需求分 ...
中文模糊查询性能优化 by PostgreSQL trgm
前模糊,后模糊,前后模糊,正则匹配都属于文本搜索领域常见的需求. PostgreSQL在文本搜索领域除了全文检索,还有trgm是一般数据库没有的,甚至可能很多人没有听说过. 对于前模糊和后模糊,PG则 ...
用PostgreSQL 做实时高效搜索引擎 - 全文检索、模糊查询、正则查询、相似查询、ADHOC查询...
用PostgreSQL 做实时高效搜索引擎 - 全文检索.模糊查询.正则查询.相似查询.ADHOC查询作者 digoal 日期 2017-12-05 标签 PostgreSQL , 搜索引擎 , ...
PostgreSQL 实时高效搜索 - 全文检索、模糊查询、正则查询、相似查询、ADHOC查询...
标签 PostgreSQL , 搜索引擎 , GIN , ranking , high light , 全文检索 , 模糊查询 , 正则查询 , 相似查询 , ADHOC查询背景字符串搜索是非常常 ...
PostgreSQL 百亿数据秒级响应正则及模糊查询
原文: https://yq.aliyun.com/articles/7444?spm=5176.blog7549.yqblogcon1.6.2wcXO2 摘要: 正则匹配和模糊匹配通常是搜索引擎的特 ...
【postgreSQL】时间类型模糊查询
关键代码全部模糊查询,%直接用是在文本类型数据上,时间类型不行.转换一下进行查询 WHERE (TO_CHAR("c_statistic_time", 'hh24:mi:ss') ...
mysql41 sphinx_抛弃mysql模糊查询，使用sphinx做专业索引
Sphinx是一个基于SQL的全文检索引擎,可以结合MySQL,PostgreSQL做全文搜索,提供比数据库本身更专业的搜索功能特别为MySQL也设计了一个存储引擎插件,从此抛弃模糊查询吧. Sphi ...
PostgreSQL 各种查询
PostgreSQL的各种查询 (···*···)这个查询比较厉害在表中,可能会包含重复值.这并不成问题,不过,有时您也许希望仅仅列出不同(distinct)的值.关键词 distinct用于返回唯 ...
PGSQL 模糊查询不区分大小写
PostgreSQL 和 MySql在字母的模糊查询上是不一样的,mysql的like是不区分大小写的,但是PostgreSQL 是区分的,要想做到不区分大小写需要使用ILIKE进行查询.

PostgreSQL模糊查询

PostgreSQL模糊查询相关推荐

最新文章

热门文章