Greenplum 类型一致性使用规范 - 索引条件、JOIN的类型一致性限制

背景

在查询时，有很多用户会犯浑，发现建立了索引，但是查询偏偏不走索引。

怎么不走索引啊？

这里做容易混淆的是类型的差异，例如字段类型为字符串，但是输入的是INT类型，这样就可能不走索引。（除非创建了自动的CAST，自动进行类型转换）

查询的输入类型与索引的类型一致是非常有必要的。

例子

1、建表（使用变长、定长字符串类型），写入测试数据

create table tbl3(id int, c1 text, c2 varchar(64), c3 char(64));  postgres=# insert into tbl3 select id, md5, md5, md5 from (select id, md5(random()::text) as md5 from generate_series(1,1000000) t(id)) t;
INSERT 0 1000000

2、创建索引

postgres=# create index idx_tbl3_1 on tbl3 (c1);
CREATE INDEX
postgres=# create index idx_tbl3_2 on tbl3 (c2);
CREATE INDEX
postgres=# create index idx_tbl3_3 on tbl3 (c3);
CREATE INDEX

3、结构如下

postgres=# \d+ tbl3  Table "public.tbl3"  Column |         Type          | Modifiers | Storage  | Description
--------+-----------------------+-----------+----------+-------------  id     | integer               |           | plain    |   c1     | text                  |           | extended |   c2     | character varying(64) |           | extended |   c3     | character(64)         |           | extended |
Indexes:  "idx_tbl3_1" btree (c1)  "idx_tbl3_2" btree (c2)  "idx_tbl3_3" btree (c3)
Has OIDs: no
Distributed by: (id)

4、走索引，强制转换为bpchar。

postgres=# explain analyze select * from tbl3 where c3='abc';  QUERY PLAN
------------------------------------------------------------------------------------  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..201.23 rows=1 width=135)  Rows out:  0 rows at destination with 3.081 ms to end, start offset by 0.294 ms.  ->  Index Scan using idx_tbl3_3 on tbl3  (cost=0.00..201.23 rows=1 width=135)  Index Cond: c3 = 'abc'::bpchar  Rows out:  0 rows (seg0) with 0.138 ms to end, start offset by 3.211 ms.  Slice statistics:  (slice0)    Executor memory: 147K bytes.  (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).  Statement statistics:  Memory used: 128000K bytes  Total runtime: 3.484 ms
(11 rows)

5、不走索引，因为输入的为text变长，与char定长不一致。

postgres=# explain analyze select * from tbl3 where c3='abc'::text;  QUERY PLAN
-------------------------------------------------------------------------------------  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..20032.60 rows=1001 width=135)  Rows out:  0 rows at destination with 305 ms to end, start offset by 0.335 ms.  ->  Seq Scan on tbl3  (cost=0.00..20032.60 rows=334 width=135)  Filter: c3::text = 'abc'::text  Rows out:  0 rows (seg0) with 302 ms to end, start offset by 4.023 ms.  Slice statistics:  (slice0)    Executor memory: 147K bytes.  (slice1)    Executor memory: 195K bytes avg x 3 workers, 195K bytes max (seg0).  Statement statistics:  Memory used: 128000K bytes  Total runtime: 305.985 ms
(11 rows)

6、走索引，输入的类型为变长类型，字段也是变长类型，与TEXT兼容。

postgres=# explain analyze select * from tbl3 where c2='abc'::text;  QUERY PLAN
------------------------------------------------------------------------------------  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..200.86 rows=1 width=135)  Rows out:  0 rows at destination with 2.744 ms to end, start offset by 0.275 ms.  ->  Index Scan using idx_tbl3_2 on tbl3  (cost=0.00..200.86 rows=1 width=135)  Index Cond: c2::text = 'abc'::text  Rows out:  0 rows (seg0) with 0.031 ms to end, start offset by 2.868 ms.  Slice statistics:  (slice0)    Executor memory: 147K bytes.  (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).  Statement statistics:  Memory used: 128000K bytes  Total runtime: 3.120 ms
(11 rows)

7、不走索引，输入的类型为定长类型，但是字段为变长类型。

postgres=# explain analyze select * from tbl3 where c2='abc'::bpchar;  QUERY PLAN
-----------------------------------------------------------------------------------  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..17531.66 rows=1 width=135)  Rows out:  0 rows at destination with 60 ms to end, start offset by 0.276 ms.  ->  Seq Scan on tbl3  (cost=0.00..17531.66 rows=1 width=135)  Filter: c2::bpchar = 'abc'::bpchar  Rows out:  0 rows (seg0) with 57 ms to end, start offset by 2.864 ms.  Slice statistics:  (slice0)    Executor memory: 147K bytes.  (slice1)    Executor memory: 131K bytes avg x 3 workers, 131K bytes max (seg0).  Statement statistics:  Memory used: 128000K bytes  Total runtime: 60.320 ms
(11 rows)

8、内表不走索引，因为JOIN字段类型不匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c3) and t1.c1='abc';  QUERY PLAN
-----------------------------------------------------------------------------------------------------------  Aggregate  (cost=25235.81..25235.82 rows=1 width=8)  ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=25235.74..25235.79 rows=1 width=8)  ->  Aggregate  (cost=25235.74..25235.75 rows=1 width=8)  ->  Nested Loop  (cost=0.00..25235.66 rows=11 width=0)  ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)  ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)  Index Cond: c1 = 'abc'::text AND 'abc'::text = c1  ->  Seq Scan on tbl3 t2  (cost=0.00..25034.46 rows=11 width=65)  Filter: 'abc'::text = c3::text AND c3::text = 'abc'::text
(9 rows)

9、内表走索引，因为JOIN字段类型都是变长，匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c2) and t1.c1='abc';  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------  Aggregate  (cost=401.86..401.87 rows=1 width=8)  ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=401.80..401.85 rows=1 width=8)  ->  Aggregate  (cost=401.80..401.81 rows=1 width=8)  ->  Nested Loop  (cost=200.88..401.79 rows=2 width=0)  ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)  Index Cond: c1 = 'abc'::text AND 'abc'::text = c1  ->  Materialize  (cost=200.88..200.89 rows=1 width=33)  ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)  ->  Index Scan using idx_tbl3_2 on tbl3 t2  (cost=0.00..200.86 rows=1 width=33)  Index Cond: 'abc'::text = c2::text AND c2::text = 'abc'::text
(10 rows)

分析

实际上就是要求索引的表达式与条件的表达式一致。

例子，使用表达式，强制转换为变长类型。

postgres=# create index idx_tbl3_4 on tbl3 ((c3::text));
CREATE INDEX

查询时，只要表达式的类型与条件类型匹配即可，走索引，同时会使用Implicit转换。

postgres=# explain analyze select * from tbl3 where (c3::text)='abc';  QUERY PLAN
----------------------------------------------------------------------------------------------------------------  Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)  Rows out:  0 rows at destination with 3.796 ms to end, start offset by 0.309 ms.  ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)  Recheck Cond: c3::text = 'abc'::text  Rows out:  0 rows (seg0) with 0.163 ms to end, start offset by 3.095 ms.  ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)  Index Cond: c3::text = 'abc'::text  Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.135 ms to end, start offset by 3.119 ms.  Slice statistics:  (slice0)    Executor memory: 155K bytes.  (slice1)    Executor memory: 321K bytes avg x 3 workers, 321K bytes max (seg0).  Statement statistics:  Memory used: 128000K bytes  Settings:  enable_seqscan=off  Total runtime: 4.219 ms
(15 rows)

postgres=# explain analyze select * from tbl3 where (c3::text) = '123'::int;  QUERY PLAN
----------------------------------------------------------------------------------------------------------------  Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)  Rows out:  0 rows at destination with 2.926 ms to end, start offset by 0.298 ms.  ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)  Recheck Cond: c3::text = '123'::text  Rows out:  0 rows (seg0) with 0.110 ms to end, start offset by 3.057 ms.  ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)  Index Cond: c3::text = '123'::text  Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.064 ms to end, start offset by 3.095 ms.  Slice statistics:  (slice0)    Executor memory: 155K bytes.  (slice1)    Executor memory: 289K bytes avg x 3 workers, 289K bytes max (seg0).  Statement statistics:  Memory used: 128000K bytes  Settings:  enable_seqscan=off  Total runtime: 3.327 ms
(15 rows)

postgres=# \dC  List of casts  Source type         |         Target type         |      Function      |   Implicit?
-----------------------------+-----------------------------+--------------------+---------------  abstime                     | date                        | date               | in assignment  abstime                     | integer                     | (binary coercible) | no  abstime                     | timestamp without time zone | timestamp          | yes  abstime                     | timestamp with time zone    | timestamptz        | yes  abstime                     | time without time zone      | time               | in assignment  bigint                      | bit                         | bit                | no  bigint                      | character                   | text               | in assignment  bigint                      | character varying           | text               | in assignment  bigint                      | double precision            | float8             | yes  bigint                      | integer                     | int4               | in assignment  bigint                      | numeric                     | numeric            | yes  bigint                      | oid                         | oid                | yes  bigint                      | real                        | float4             | yes  ...............

小结

要让索引起作用，最起码应该关注一下几点。

1、注意变长字符串和定长字符串，属于两种数据类型，不要混淆。

2、JOIN时操作符两边的字段类型一致。

3、查询时，WHERE条件与索引表达式一致。

4、查询时，WHERE条件与索引字段的类型一致。

5、查询时，WHERE条件的collate与索引的collate一致。(本地化相关)

6、操作符本身支持对应的索引访问方法。例如大多数的=操作符，支持b-tree的索引访问方法。

7、PG是CBO优化，因此索引访问的成本更低时，才会选择索引扫描。（设置了特定的优化器开关，或者使用了HINT除外）。

参考

《PostgreSQL 自定义自动类型转换(CAST)》

Greenplum 类型一致性使用规范 - 索引条件、JOIN的类型一致性限制相关推荐

Greenplum 优化CASE - 对齐JOIN字段类型，使用数组代替字符串，降低字符串处理开销，列存降低扫描开销...
标签 PostgreSQL , 数组 , 字符串 , 字符串处理 , JOIN , where , 类型一致性背景 Greenplum通常被用作OLAP,在一些用户使用过程中,可能因为数据结构设计, ...
oracle窗帘位图索引,Greenplum数据库设计开发规范参考.docx
Greenplum数据库设计开发规范参考 Greenplum数据库设计开发规范参考文档2016年7月目录Greenplum数据库设计开发规范1V1.511 前言41.1 文档目的41.2 文档范围4 ...
mysql join 索引无效_ORACLE MYSQL中join 字段类型不同索引失效的情况-阿里云开发者社区...
ORACLE MYSQL中join 字段类型不同索引失效的情况重庆八怪 2016-12-29 780浏览量简介: 关于JOIN使用不同类型的字段类型,数据库可能进行隐士转换,MYSQL ORACL ...
MySQL5.6之Index Condition Pushdown(ICP,索引条件下推)
ICP(index condition pushdown)是mysql利用索引(二级索引)元组和筛字段在索引中的where条件从表中提取数据记录的一种优化操作.ICP的思想是:存储引擎在访问索引的时候 ...
mysql索引条件下推_MySQL 索引条件下推优化
一什么是"索引条件下推" "索引条件下推",称为Index Condition Pushdown (ICP),这是MySQL提供的用某一个索引对一个特定的表从 ...
mysql索引4种结构_mysql索引：四种类型，两种方法
1. 普通索引:最基本的索引,它没有任何限制,用于加速查询. 2. 唯一索引unique:索引列的值必须唯一,但允许有空值.如果是组合索引,则列值的组合必须唯一. 3. 主键索引: 是一种特殊的唯一索 ...
mysql默认join是什么类型_MySQL：join语句类型
MySQL:join语句类型 join从句的类型有以下几种: 内链接(inner) 右外连接(right outer) 左外连接(left outer) 全外连接(full outer) 交叉链接(c ...
Elasticsearch学习笔记2：ES核心概念 -- 索引、倒排索引、类型、文档
一.ES和关系型数据库的对比 Elasticsearch Relational DB 索引(index) 数据库(database) 类型(types) 表(tables) 文档(documents) ...
go 类型 value 不支持索引_10分钟掌握PostgreSQL 5种索引的应用场景
索引是增强数据库性能的常用方法.索引使得数据库在查找和检索数据库的特定行的时候比没有索引快的多.但索引也增加了整个数据库系统的开销,所以应该合理使用. PostgreSQL提供了几种索引类型:B-tr ...

Greenplum 类型一致性使用规范 - 索引条件、JOIN的类型一致性限制

标签

背景

例子

分析

小结

参考

Greenplum 类型一致性使用规范 - 索引条件、JOIN的类型一致性限制相关推荐

最新文章

热门文章