标签

PostgreSQL , Oracle , index skip scan , 非驱动列条件 , 递归查询 , 子树


背景

对于输入条件在复合索引中为非驱动列的,如何高效的利用索引扫描?

在Oracle中可以使用index skip scan来实现这类CASE的高效扫描:

INDEX跳跃扫描一般用在WHERE条件里面没有使用到引导列,但是用到了引导列以外的其他列,并且引导列的DISTINCT值较少的情况。

在这种情况下,数据库把这个复合索引逻辑上拆散为多个子索引,依次搜索子索引中非引导列的WHERE条件里面的值。

使用方法如下:

/*+ INDEX_SS ( [ @ qb_name ] tablespec [ indexspec [ indexspec ]... ] ) */

The INDEX_SS hint instructs the optimizer to perform an index skip scan for the specified table. If the statement uses an index range scan, then Oracle scans the index entries in ascending order of their indexed values. In a partitioned index, the results are in ascending order within each partition.Each parameter serves the same purpose as in "INDEX Hint". For example:

SELECT /*+ INDEX_SS(e emp_name_ix) */ last_name FROM employees e WHERE first_name = 'Steven';

下面是来自ORACLE PERFORMANCE TUNING里的原文:

Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks.

Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped.

The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.

Example 13-5 Index Skip Scan

Consider, for example, a table

employees(
sex,
employee_id,
address
)

with a composite index on

(sex, employee_id).

Splitting this composite index would result in two logical subindexes, one for M and one for F.

For this example, suppose you have the following index data:

('F',98)('F',100)('F',102)('F',104)('M',101)('M',103)('M',105)

The index is split logically into the following two subindexes:

The first subindex has the keys with the value F.

The second subindex has the keys with the value M

The column sex is skipped in the following query:

SELECT * FROM employeesWHERE employee_id = 101;

A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M.

PostgreSQL 非skip scan

PostgreSQL支持非驱动列的索引扫描,但是需要扫描整个索引。

例子

1、创建测试表

postgres=# create table t(id int, c1 int);
CREATE TABLE

2、写入1000万测试数据

postgres=# insert into t select random()*1 , id from generate_series(1,10000000) id;
INSERT 0 10000000

3、创建多列索引

postgres=# create index idx_t on t(id,c1);
CREATE INDEX

4、非驱动列查询测试如下

index only scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------  Index Only Scan using idx_t on public.t  (cost=10000000000.43..10000105164.89 rows=1 width=8) (actual time=0.043..152.288 rows=1 loops=1)  Output: id, c1  Index Cond: (t.c1 = 1)  Heap Fetches: 0  Buffers: shared hit=27326  Execution time: 152.328 ms
(6 rows)

index scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------  Index Scan using idx_t on public.t  (cost=0.43..105165.99 rows=1 width=8) (actual time=0.022..151.845 rows=1 loops=1)  Output: id, c1  Index Cond: (t.c1 = 1)  Buffers: shared hit=27326  Execution time: 151.881 ms
(5 rows)

bitmap scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------  Bitmap Heap Scan on public.t  (cost=105164.88..105166.00 rows=1 width=8) (actual time=151.731..151.732 rows=1 loops=1)  Output: id, c1  Recheck Cond: (t.c1 = 1)  Heap Blocks: exact=1  Buffers: shared hit=27326  ->  Bitmap Index Scan on idx_t  (cost=0.00..105164.88 rows=1 width=0) (actual time=151.721..151.721 rows=1 loops=1)  Index Cond: (t.c1 = 1)  Buffers: shared hit=27325  Execution time: 151.777 ms
(9 rows)

seq scan(全表扫描)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN
---------------------------------------------------------------------------------------------------------  Seq Scan on public.t  (cost=0.00..169248.41 rows=1 width=8) (actual time=0.014..594.535 rows=1 loops=1)  Output: id, c1  Filter: (t.c1 = 1)  Rows Removed by Filter: 9999999  Buffers: shared hit=44248  Execution time: 594.568 ms
(6 rows)

使用索引扫,因为不需要FILTER,同时扫描的BLOCK更少,所以性能比全表扫略好。但是还是扫了整个索引的PAGE,所以并不能算skip scan。

那么如何让PostgreSQL支持index skip scan呢?

PostgreSQL skip scan

实际上原理和Oracle类似,可以输入驱动列条件,然后按多个条件扫描,这样就能达到SKIP SCAN的效果。(即多颗子树扫描)。

同样也更加适合于驱动列DISTINCT值较少的情况。

用PostgreSQL的递归查询语法可以实现这样的加速效果。这种方法也被用于获取count(distinct), distinct值等。

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

例如,我们通过这个方法,可以快速的得到驱动列的唯一值

with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 这里的where s.id is not null 一定要加,否则就死循环了.
)
select id from skip ;

然后封装到如下SQL,实现skip scan的效果

explain (analyze,verbose,timing,costs,buffers) select * from t where id in
(
with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 这里的where s.id is not null 一定要加,否则就死循环了.
)
select id from skip
) and c1=1
union all
select * from t where id is null and c1=1;

或者

explain (analyze,verbose,timing,costs,buffers) select * from t where id = any(array
(
with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 这里的where s.id is not null 一定要加,否则就死循环了.
)
select id from skip
)) and c1=1
union all
select * from t where id is null and c1=1;

看执行计划:

效果好多了

  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  Append  (cost=55.00..215.22 rows=2 width=8) (actual time=0.127..0.138 rows=1 loops=1)  Buffers: shared hit=21  ->  Nested Loop  (cost=55.00..213.64 rows=1 width=8) (actual time=0.126..0.127 rows=1 loops=1)  Output: t.id, t.c1  Buffers: shared hit=18  ->  HashAggregate  (cost=54.57..55.58 rows=101 width=4) (actual time=0.108..0.109 rows=3 loops=1)  Output: skip.id  Group Key: skip.id  Buffers: shared hit=11  ->  CTE Scan on skip  (cost=51.29..53.31 rows=101 width=4) (actual time=0.052..0.102 rows=3 loops=1)  Output: skip.id  Buffers: shared hit=11  CTE skip  ->  Recursive Union  (cost=0.46..51.29 rows=101 width=4) (actual time=0.050..0.099 rows=3 loops=1)  Buffers: shared hit=11  ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.049..0.049 rows=1 loops=1)  Output: $1  Buffers: shared hit=4  InitPlan 3 (returns $1)  ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.045..0.046 rows=1 loops=1)  Output: t_3.id  Buffers: shared hit=4  ->  Index Only Scan using idx_t on public.t t_3  (cost=0.43..205165.21 rows=10000033 width=4) (actual time=0.045..0.045 rows=1 loops=1)  Output: t_3.id  Index Cond: (t_3.id IS NOT NULL)  Heap Fetches: 0  Buffers: shared hit=4  ->  WorkTable Scan on skip s  (cost=0.00..4.88 rows=10 width=4) (actual time=0.015..0.015 rows=1 loops=3)  Output: (SubPlan 2)  Filter: (s.id IS NOT NULL)  Rows Removed by Filter: 0  Buffers: shared hit=7  SubPlan 2  ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=2)  Output: $3  Buffers: shared hit=7  InitPlan 1 (returns $3)  ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=2)  Output: t_2.id  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t t_2  (cost=0.43..76722.42 rows=3333344 width=4) (actual time=0.017..0.017 rows=0 loops=2)  Output: t_2.id  Index Cond: ((t_2.id > s.id) AND (t_2.id IS NOT NULL))  Heap Fetches: 0  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t  (cost=0.43..1.56 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3)  Output: t.id, t.c1  Index Cond: ((t.id = skip.id) AND (t.c1 = 1))  Heap Fetches: 0  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t t_1  (cost=0.43..1.56 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)  Output: t_1.id, t_1.c1  Index Cond: ((t_1.id IS NULL) AND (t_1.c1 = 1))  Heap Fetches: 0  Buffers: shared hit=3  Execution time: 0.256 ms
(56 rows)

从150多毫秒,降低到了0.256毫秒

内核层面优化

与Oracle做法类似,或者说与递归的做法类似。

使用这种方法来改进优化器,可以达到index skip scan的效果,而且不用改写SQL。

参考

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

PostgreSQL Oracle 兼容性之 - INDEX SKIP SCAN (递归查询变态优化) 非驱动列索引扫描优化...相关推荐

  1. oracle index skip scan,索引跳跃式扫描(INDEX SKIP SCAN)

    索引跳跃式扫描(INDEX SKIP SCAN) 索引跳跃式扫描(INDEX SKIP SCAN)适用于所有类型的复合B树索引(包括唯一性索引和非唯一性索引),它使那些在where条件中没有对目标索引 ...

  2. PostgreSQL Oracle兼容性之 - plpgsql 自治事务(autonomous_transaction)补丁

    PostgreSQL Oracle兼容性之 - plpgsql 自治事务(autonomous_transaction)补丁 作者 digoal 日期 2016-11-04 标签 PostgreSQL ...

  3. PostgreSQL Oracle 兼容性 之 - PL/SQL record, table类型定义

    背景 Oracle PL/SQL是非常强大的一门SQL编程语言,许多Oracle用户也使用它来处理一些要求延迟低且数据一致性或可靠性要求很高的业务逻辑. PostgreSQL也有一门非常高级的内置SQ ...

  4. PostgreSQL Oracle 兼容性之 - PL/SQL DETERMINISTIC 与PG函数稳定性(immutable, stable, volatile)...

    标签 PostgreSQL , Oracle , 函数稳定性 , stable , immutable , volatile , DETERMINISTIC 背景 Oracle创建pl/sql函数时, ...

  5. PostgreSQL Oracle 兼容性之 - rownum

    摘要: 标签 PostgreSQL , rownum , Oracle 兼容性 , row_number 窗口 , limit , PPAS , EDB 背景 Oracle ROWNUM是一个虚拟列, ...

  6. Oracle案例:index range scan真的不会多块读吗?

    团团圆圆吃汤圆,快快来三连 此次案例来自西安某客户的一次SQL优化,对于优化本身并不复杂,但是发现了一个比较有趣的问题,就是索引范围扫描以及回表都有使用多块读的方式.下面来看看具体案例. SQL文本: ...

  7. PostgreSQL Oracle 兼容性之 - PL/SQL FORALL, BULK COLLECT

    Oracle PL/SQL 开发的童鞋,一定对O家的bulk批量处理的性能很是赞赏吧. 但是PostgreSQL用户请不要垂涎,作为学院派和工业界的一颗璀璨明珠. 开源数据库PostgreSQL,也有 ...

  8. postgres oracle 兼容,PostgreSQL Oracle 兼容性之 - sys_guid() UUID

    背景 Oracle 使用sys_guid()用来产生UUID值. 在PostgreSQL中有类似的函数,需要安装uuid-ossp插件. 如果用户不想修改代码,还是需要使用sys_guid()函数的话 ...

  9. oracle full table scan,ORACLE优化之执行规划(1) - TABLE FULL SCAN/INDEX FULL SCAN

    ORACLE优化之执行规划(1) - TABLE FULL SCAN/INDEX FULL SCAN TABLE FULL SCAN 全表扫描,表示表中所有记录都被访问到.如果表很大, 该操作对查询性 ...

最新文章

  1. 分布式系统的事务处理(推荐)
  2. 360浏览器使用评价
  3. OpenCV图像处理使用笔记(二)——图像矩阵的掩膜操作
  4. 135. 分发糖果002(贪心算法+思路+详解)
  5. Django的model查询操作 与 查询性能优化
  6. web项目调整项目名称_如何有效调整软件项目范围
  7. ApacheCN 学习资源汇总 2019.3 1
  8. Java中对Array数组的api展示
  9. 电脑和树莓派之间文件传输
  10. 爬虫项目——BS4练手(1)
  11. STM32F429第四篇之跑马灯程序详解
  12. oracle 导入 imp-00008,imp导入文件时报大量的imp-0008错误
  13. 什么是表达能力?如何提高表达能力?
  14. python数字图像处理以及绘图
  15. BOM 和 DOM 的区别是什么?
  16. unity-2D游戏地面检测 三射线检测
  17. java 获取本周第一天
  18. ps – report process status
  19. 【BYM】Android 仿百度搜索列表滑动效果,又到一年金三银四
  20. 非期望产出的sbm模型_KANO模型:产品人必懂的需求分析法

热门文章

  1. Unity SRP自定义渲染管线 -- 2.Custom Shaders
  2. Ogre共享骨骼与两种骨骼驱动方法
  3. Photoshop 手动画金标准流程
  4. caffe 提取特征并可视化(已测试可执行)及在线可视化
  5. SQL Server安装文件挂起错误解决办法
  6. 【】MTCNN基于NCNN的测试过程
  7. 前后端交互json字符串
  8. vue中 mock使用教程
  9. linux下安装oracle sqlplus以及imp、exp工具
  10. Cache的一些总结