本节简单介绍了PG查询优化中对Having和Group By子句的简化处理。

一、基本概念

简化Having语句
把Having中的约束条件,如满足可以提升到Where条件中的,则移动到Where子句中,否则仍保留在Having语句中.这样做的目的是因为Having过滤在Group by之后执行,如能把Having中的过滤提升到Where中,则可以提前执行"选择"运算,减少Group by的开销.
以下语句,条件dwbh='1002'提升到Where中执行:

testdb=# explain verbose select a.dwbh,a.xb,count(*)
testdb-# from t_grxx a
testdb-# group by a.dwbh,a.xb
testdb-# having count(*) >= 1 and dwbh = '1002';QUERY PLAN
-----------------------------------------------------------------------------GroupAggregate  (cost=15.01..15.06 rows=1 width=84)Output: dwbh, xb, count(*)Group Key: a.dwbh, a.xbFilter: (count(*) >= 1) -- count(*) >= 1 仍保留在Having中->  Sort  (cost=15.01..15.02 rows=2 width=76)Output: dwbh, xbSort Key: a.xb->  Seq Scan on public.t_grxx a  (cost=0.00..15.00 rows=2 width=76)Output: dwbh, xbFilter: ((a.dwbh)::text = '1002'::text) -- 提升到Where中,扫描时过滤Tuple
(10 rows)

如存在Group by & Grouping sets则不作处理:

testdb=# explain verbose
testdb-# select a.dwbh,a.xb,count(*)
testdb-# from t_grxx a
testdb-# group by
testdb-# grouping sets ((a.dwbh),(a.xb),())
testdb-# having count(*) >= 1 and dwbh = '1002'
testdb-# order by a.dwbh,a.xb;QUERY PLAN
-------------------------------------------------------------------------------Sort  (cost=28.04..28.05 rows=3 width=84)Output: dwbh, xb, (count(*))Sort Key: a.dwbh, a.xb->  MixedAggregate  (cost=0.00..28.02 rows=3 width=84)Output: dwbh, xb, count(*)Hash Key: a.dwbhHash Key: a.xbGroup Key: ()Filter: ((count(*) >= 1) AND ((a.dwbh)::text = '1002'::text)) -- 扫描数据表后再过滤->  Seq Scan on public.t_grxx a  (cost=0.00..14.00 rows=400 width=76)Output: dwbh, grbh, xm, xb, nl
(11 rows)

简化Group by语句
如Group by中的字段列表已包含某个表主键的所有列,则该表在Group by语句中的其他列可以删除,这样的做法有利于提升在Group by过程中排序或Hash的性能,减少不必要的开销.

testdb=# explain verbose select a.dwbh,a.dwmc,count(*)
testdb-# from t_dwxx a
testdb-# group by a.dwbh,a.dwmc
testdb-# having count(*) >= 1;QUERY PLAN
--------------------------------------------------------------------------HashAggregate  (cost=13.20..15.20 rows=53 width=264)Output: dwbh, dwmc, count(*)Group Key: a.dwbh, a.dwmc -- 分组键为dwbh & dwmcFilter: (count(*) >= 1)->  Seq Scan on public.t_dwxx a  (cost=0.00..11.60 rows=160 width=256)Output: dwmc, dwbh, dwdz
(6 rows)testdb=# alter table t_dwxx add primary key(dwbh); -- 添加主键
ALTER TABLE
testdb=# explain verbose select a.dwbh,a.dwmc,count(*)
from t_dwxx a
group by a.dwbh,a.dwmc
having count(*) >= 1;QUERY PLAN
-----------------------------------------------------------------------HashAggregate  (cost=1.05..1.09 rows=1 width=264)Output: dwbh, dwmc, count(*)Group Key: a.dwbh -- 分组键只保留dwbhFilter: (count(*) >= 1)->  Seq Scan on public.t_dwxx a  (cost=0.00..1.03 rows=3 width=256)Output: dwmc, dwbh, dwdz
(6 rows)

二、源码解读

相关处理的源码位于文件subquery_planner.c中,主函数为subquery_planner,代码片段如下:

     /** In some cases we may want to transfer a HAVING clause into WHERE. We* cannot do so if the HAVING clause contains aggregates (obviously) or* volatile functions (since a HAVING clause is supposed to be executed* only once per group).  We also can't do this if there are any nonempty* grouping sets; moving such a clause into WHERE would potentially change* the results, if any referenced column isn't present in all the grouping* sets.  (If there are only empty grouping sets, then the HAVING clause* must be degenerate as discussed below.)** Also, it may be that the clause is so expensive to execute that we're* better off doing it only once per group, despite the loss of* selectivity.  This is hard to estimate short of doing the entire* planning process twice, so we use a heuristic: clauses containing* subplans are left in HAVING.  Otherwise, we move or copy the HAVING* clause into WHERE, in hopes of eliminating tuples before aggregation* instead of after.** If the query has explicit grouping then we can simply move such a* clause into WHERE; any group that fails the clause will not be in the* output because none of its tuples will reach the grouping or* aggregation stage.  Otherwise we must have a degenerate (variable-free)* HAVING clause, which we put in WHERE so that query_planner() can use it* in a gating Result node, but also keep in HAVING to ensure that we* don't emit a bogus aggregated row. (This could be done better, but it* seems not worth optimizing.)** Note that both havingQual and parse->jointree->quals are in* implicitly-ANDed-list form at this point, even though they are declared* as Node *.*/newHaving = NIL;foreach(l, (List *) parse->havingQual)//存在Having条件语句{Node       *havingclause = (Node *) lfirst(l);//获取谓词if ((parse->groupClause && parse->groupingSets) ||contain_agg_clause(havingclause) ||contain_volatile_functions(havingclause) ||contain_subplans(havingclause)){/* keep it in HAVING *///如果有Group&&Group Sets语句//保持不变newHaving = lappend(newHaving, havingclause);}else if (parse->groupClause && !parse->groupingSets){/* move it to WHERE *///只有group语句,可以加入到jointree的条件中parse->jointree->quals = (Node *)lappend((List *) parse->jointree->quals, havingclause);}else//既没有group也没有grouping set,拷贝一份到jointree的条件中{/* put a copy in WHERE, keep it in HAVING */parse->jointree->quals = (Node *)lappend((List *) parse->jointree->quals,copyObject(havingclause));newHaving = lappend(newHaving, havingclause);}}parse->havingQual = (Node *) newHaving;//调整having子句/* Remove any redundant GROUP BY columns */remove_useless_groupby_columns(root);//去掉group by中无用的数据列

remove_useless_groupby_columns

 /** remove_useless_groupby_columns*      Remove any columns in the GROUP BY clause that are redundant due to*      being functionally dependent on other GROUP BY columns.** Since some other DBMSes do not allow references to ungrouped columns, it's* not unusual to find all columns listed in GROUP BY even though listing the* primary-key columns would be sufficient.  Deleting such excess columns* avoids redundant sorting work, so it's worth doing.  When we do this, we* must mark the plan as dependent on the pkey constraint (compare the* parser's check_ungrouped_columns() and check_functional_grouping()).** In principle, we could treat any NOT-NULL columns appearing in a UNIQUE* index as the determining columns.  But as with check_functional_grouping(),* there's currently no way to represent dependency on a NOT NULL constraint,* so we consider only the pkey for now.*/static voidremove_useless_groupby_columns(PlannerInfo *root){Query      *parse = root->parse;//查询树Bitmapset **groupbyattnos;//位图集合Bitmapset **surplusvars;//位图集合ListCell   *lc;int         relid;/* No chance to do anything if there are less than two GROUP BY items */if (list_length(parse->groupClause) < 2)//如果只有1个ITEMS,无需处理return;/* Don't fiddle with the GROUP BY clause if the query has grouping sets */if (parse->groupingSets)//存在Grouping sets,不作处理return;/** Scan the GROUP BY clause to find GROUP BY items that are simple Vars.* Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k* that are GROUP BY items.*///用于分组的属性 groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) *(list_length(parse->rtable) + 1));foreach(lc, parse->groupClause){SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);Var        *var = (Var *) tle->expr;/** Ignore non-Vars and Vars from other query levels.** XXX in principle, stable expressions containing Vars could also be* removed, if all the Vars are functionally dependent on other GROUP* BY items.  But it's not clear that such cases occur often enough to* be worth troubling over.*/if (!IsA(var, Var) ||var->varlevelsup > 0)continue;/* OK, remember we have this Var */relid = var->varno;Assert(relid <= list_length(parse->rtable));groupbyattnos[relid] = bms_add_member(groupbyattnos[relid],var->varattno - FirstLowInvalidHeapAttributeNumber);}/** Consider each relation and see if it is possible to remove some of its* Vars from GROUP BY.  For simplicity and speed, we do the actual removal* in a separate pass.  Here, we just fill surplusvars[k] with a bitmapset* of the column attnos of RTE k that are removable GROUP BY items.*/surplusvars = NULL;         /* don't allocate array unless required */relid = 0;//如某个Relation的分组键中已含主键列,去掉其他列foreach(lc, parse->rtable){RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);Bitmapset  *relattnos;Bitmapset  *pkattnos;Oid         constraintOid;relid++;/* Only plain relations could have primary-key constraints */if (rte->rtekind != RTE_RELATION)continue;/* Nothing to do unless this rel has multiple Vars in GROUP BY */relattnos = groupbyattnos[relid];if (bms_membership(relattnos) != BMS_MULTIPLE)continue;/** Can't remove any columns for this rel if there is no suitable* (i.e., nondeferrable) primary key constraint.*/pkattnos = get_primary_key_attnos(rte->relid, false, &constraintOid);if (pkattnos == NULL)continue;/** If the primary key is a proper subset of relattnos then we have* some items in the GROUP BY that can be removed.*/if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1){/** To easily remember whether we've found anything to do, we don't* allocate the surplusvars[] array until we find something.*/if (surplusvars == NULL)surplusvars = (Bitmapset **) palloc0(sizeof(Bitmapset *) *(list_length(parse->rtable) + 1));/* Remember the attnos of the removable columns */surplusvars[relid] = bms_difference(relattnos, pkattnos);/* Also, mark the resulting plan as dependent on this constraint */parse->constraintDeps = lappend_oid(parse->constraintDeps,constraintOid);}}/** If we found any surplus Vars, build a new GROUP BY clause without them.* (Note: this may leave some TLEs with unreferenced ressortgroupref* markings, but that's harmless.)*/if (surplusvars != NULL){List       *new_groupby = NIL;foreach(lc, parse->groupClause){SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);Var        *var = (Var *) tle->expr;/** New list must include non-Vars, outer Vars, and anything not* marked as surplus.*/if (!IsA(var, Var) ||var->varlevelsup > 0 ||!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,surplusvars[var->varno]))new_groupby = lappend(new_groupby, sgc);}parse->groupClause = new_groupby;}}

三、参考资料

planner.c

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/6906/viewspace-2374877/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/6906/viewspace-2374877/

PostgreSQL 源码解读(35)- 查询语句#20(查询优化-简化Having和Grou...相关推荐

  1. PostgreSQL 源码解读(32)- 查询语句#17(查询优化-表达式预处理#2)

    本节简单介绍了PG查询优化表达式预处理中常量的简化过程.表达式预处理主要的函数主要有preprocess_expression和preprocess_qual_conditions(调用preproc ...

  2. PostgreSQL 源码解读(31)- 查询语句#16(查询优化-表达式预处理#1)

    本节简单介绍了PG查询优化对表达式预处理中连接Var(RTE中的Var,其中RTE_KIND=RTE_JOIN)溯源的过程.处理逻辑在主函数subquery_planner中通过调用flatten_j ...

  3. PostgreSQL 源码解读(160)- 查询#80(如何实现表达式解析)

    本节介绍了PostgreSQL如何解析查询语句中的表达式列并计算得出该列的值.表达式列是指除关系定义中的系统列/定义列之外的其他投影列.比如: testdb=# create table t_expr ...

  4. PostgreSQL 源码解读(203)- 查询#116(类型转换实现)

    本节简单介绍了PostgreSQL中的类型转换的具体实现. 解析表达式,涉及不同数据类型时: 1.如有相应类型的Operator定义(pg_operator),则尝试进行类型转换,否则报错; 2.如有 ...

  5. PostgreSQL 源码解读(153)- 后台进程#5(walsender#1)

    本节简单介绍了PostgreSQL的后台进程walsender,该进程实质上是streaming replication环境中master节点上普通的backend进程,在standby节点启动时,s ...

  6. PostgreSQL 源码解读(147)- Storage Manager#3(fsm_search函数)

    本节简单介绍了PostgreSQL在执行插入过程中与存储相关的函数RecordAndGetPageWithFreeSpace->fsm_search,该函数搜索FSM,找到有足够空闲空间(min ...

  7. PostgreSQL 源码解读(156)- 后台进程#8(walsender#4)

    上节介绍了PostgreSQL的后台进程walsender中的函数WalSndLoop->WaitLatchOrSocket->WaitEventSetWait->WaitEvent ...

  8. PostgreSQL 源码解读(154)- 后台进程#6(walsender#2)

    本节继续介绍PostgreSQL的后台进程walsender,重点介绍的是调用栈中的exec_replication_command和StartReplication函数. 调用栈如下: (gdb) ...

  9. PostgreSQL 源码解读(155)- 后台进程#7(walsender#3)

    本节继续介绍PostgreSQL的后台进程walsender,重点介绍的是调用栈中的函数WalSndLoop->WaitLatchOrSocket->WaitEventSetWait-&g ...

最新文章

  1. Datawhale专访 | 周涛:从窄门进最终走出宽路来
  2. Ruby on Rails路径穿越与任意文件读取漏洞分析(CVE-2019-5418)
  3. java 函数参数 返回值_java中如何用函数返回值作为post提交的参数?
  4. Everyday English
  5. C# Http方式下载文件到本地类改进版
  6. vue里面怎么删除部分页面_基于VUE选择上传图片并页面显示(图片可删除)
  7. 博弈论笔记:不完全信息与声誉
  8. 虹桥地铁站附近沿线的有房源出租的社区和村落
  9. 变局之际,聊聊物联网的过去、现在和未来
  10. 再见Spring Security!推荐一款功能强大的权限认证框架,用起来够优雅!
  11. discard python_Python学习第三天
  12. DNW启动异常的问题
  13. python 摄像头采集_Python+OpenCV采集本地摄像头的视频
  14. 如何防止CSRF攻击
  15. python爬取站酷海洛图片_站酷海洛图片爬取
  16. 第07节 C++类的组合
  17. C++ 二维vector使用
  18. 学习python多久?该如何学习python?
  19. Jquery 弹出对话框插件xcConfirm.js
  20. Apache Log4j Server 反序列化漏洞(CVE-2017-5645)

热门文章

  1. Open judge 1.8.3
  2. 如何设置excel中一部分表格显示但是不打印?
  3. 基于对话框的MFC程序加载位图为背景图案
  4. 控制器分析-绘制伯德图
  5. 路径中的“./“,“../“,“/“ 代表的含义
  6. 如何下载blob:https://www.bilibili.com/的视频
  7. 【工具使用】Word 排版
  8. 关于星巴克在故宫开店
  9. B站总结某up主面试题(持续等待更新......)
  10. Java精品项目源码第53期流浪动物管理系统