kettle性能及效率提升_开发人员掌握了这个技术,SQL效率会有几百倍的性能提升...
完成相同业务逻辑的SQL,写法不同,执行效率可能会有几百上千倍的差距,今天我们通过几个案例来说明一下:
case1 :
原sql代码如下(执行时间1.2分钟):
with holder_clear_temp as ( select distinct t.principal_holder_account from ch_member.holder_account s, ch_member.clear_agency_relation t where s.holder_account = t.principal_holder_account and s.holder_account_status = '1' and t.agency_status = '1' and t.agency_type in ('1','2') and t.agency_holder_account = :1 and t.principal_holder_account != :2 ) , holder_settle_temp as ( select t.principal_holder_account, t.product_category from ch_member.holder_account s, ch_member.settle_agency_rel t where s.holder_account = t.principal_holder_account and s.holder_account_status = '1' and t.agency_status = '1' and t.agency_type in ('1','2') and t.agency_holder_account = :3 and t.principal_holder_account != :4 and not exists ( select 1 from holder_clear_temp c where c.principal_holder_account=t.principal_holder_account ) ) , temp as ( select jour.BALANCE_CHG_SN from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour inner join ch_stock.product_info info on (info.product_code = jour.product_code or (info.pub_product_code = jour.product_code and info.has_distribution_flag='1')) where 1=1 and (exists ( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account ) or exists ( select 1 from holder_settle_temp s where jour.holder_account=s.principal_holder_account and info.product_Category =s.product_category ) ) and jour.init_date >= :5 and jour.init_date <= :6 union all select jour.BALANCE_CHG_SN from ch_stock.ACCOUNT_CHG_BALANCE_JOUR jour inner join ch_stock.product_info info on (info.product_code = jour.product_code or (info.pub_product_code = jour.product_code and info.has_distribution_flag='1')) where 1=1 and ( exists ( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account ) or exists ( select 1 from holder_settle_temp s where jour.holder_account=s.principal_holder_account and info.product_Category =s.product_category ) ) and jour.init_date >= :7 and jour.init_date <= :8 )select count(1) from temp;
这个sql相对复杂一点,我们通过sql monitor显示的执行计划可以明显的看出瓶颈所在: 因为谓词条件使用了or 连接两个exists子查询,所以只能使用filter操作,而主查询返回的记录数又比较多,就导致sql执行时间比较长. 根据sql写法和执行计划反馈的信息,我们就可以通过改写来优化这个SQL.
sql monitor显示(部分):
建议有兴趣的朋友可以下载sql monitor文件,调动一下自己的思维,先自己分析一下该如何优化这个SQL . 我之前分享到微信群和QQ群,有一个群友已经找到了优化的核心所在,尽管还有一些小瑕疵.
注:sql monitor文件可以在QQ群16778072 下载. 这个sql monitor的"Plan Statistics"页面显示的执行计划有点不太正确,但是不影响大局."Plan"页面显示的是正常的.
改写后的SQL:
with holder_clear_temp as ( select distinct t.principal_holder_account from ch_member.holder_account s, ch_member.clear_agency_relation t where s.holder_account = t.principal_holder_account and s.holder_account_status = '1' and t.agency_status = '1' and t.agency_type in ('1','2') and t.agency_holder_account = '2110348' and t.principal_holder_account != '2110348' ) , holder_settle_temp as ( select t.principal_holder_account, t.product_category from ch_member.holder_account s, ch_member.settle_agency_rel t where s.holder_account = t.principal_holder_account and s.holder_account_status = '1' and t.agency_status = '1' and t.agency_type in ('1','2') and t.agency_holder_account = '2110348' and t.principal_holder_account != '2110348' and not exists ( select 1 from holder_clear_temp c where c.principal_holder_account=t.principal_holder_account ) ) , exists_temp as (select principal_holder_account,'xx' as product_category from holder_clear_temp union select principal_holder_account,product_category from holder_settle_temp ), temp as ( select jour.BALANCE_CHG_SN from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour,ch_stock.product_info info, exists_temp uuu where info.product_code = jour.product_code and jour.holder_account=uuu.principal_holder_account and (uuu.product_category='xx' or info.product_Category =uuu.product_category) and jour.init_date >= 20190205 and jour.init_date <= 20190505 union all select jour.BALANCE_CHG_SN from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour,ch_stock.product_info info, exists_temp uuu where (info.pub_product_code = jour.product_code and info.has_distribution_flag='1') and jour.holder_account=uuu.principal_holder_account and (uuu.product_category='xx' or info.product_Category =uuu.product_category) and jour.init_date >= 20190205 and jour.init_date <= 20190505 and lnnvl(info.product_code = jour.product_code)------------------------------------------------------------------------------------- union all select jour.BALANCE_CHG_SN from ch_stock.ACCOUNT_CHG_BALANCE_JOUR jour inner join ch_stock.product_info info on (info.product_code = jour.product_code or (info.pub_product_code = jour.product_code and info.has_distribution_flag='1')) where 1=1 and ( exists ( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account ) or exists ( select 1 from holder_settle_temp s where jour.holder_account=s.principal_holder_account and info.product_Category =s.product_category ) ) and jour.init_date >= 20190205 and jour.init_date <= 20190505 )select count(1) from temp;
改写效果:
经过改写后,原来执行1.2分钟的SQL,现场测试只需要耗时0.6秒(这个测试只改了耗时较长union all的上半部分,如果下半部分也做相同改写,预计最终执行时间不到0.3秒,性能提升达200多倍).
改写说明:
原sql用or 连接的两个exists ,存在相同的关联条件,我们通过一个union(注意不是union all)把它合并在一起,通过CTE(with as)定义为exists_temp ,然后就可以与主查询的两个表做关联,而不是做filter. 因为主查询两个表的关联关系也存在一个or,优化器必然会使用concat,那样就会拆分成4段做union all. 我只希望主查询做concat,就人工做了concat,将主查询拆分成了union all.
注:网上很多sql优化专家在对or 改写的时候,基本上全部改成了union,这是不等价的改写方法,标准改写请参考一下本例的union all配合lnnvl的写法.
case2:
原SQL:
SELECT A.FLOW_INID, A.CURR_STEP, A.FLOW_NAME, A.FINS_NAME, TO_CHAR(A.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'manual_rel' RELA_TYPEFROM FLOW_INST AWHERE EXISTS (SELECT 1 FROM FLOW_RELATE_INFO B WHERE A.FLOW_INID = B.RELATE_FLOW_INID AND B.FLOW_INID = :1 ) OR EXISTS (SELECT 1 FROM FLOW_RELATE_INFO B WHERE A.FLOW_INID = B.FLOW_INID AND B.RELATE_FLOW_INID = :2 )UNION ALLSELECT S.FLOW_INID, S.CURR_STEP, S.FLOW_NAME, S.FINS_NAME, TO_CHAR(S.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'auto_rel' RELA_TYPEFROM FLOW_INST S, (SELECT FI.FLOW_INID, FI.PARA_INID FROM FLOW_INST FI WHERE FI.FLOW_INID = :3) FWHERE ((F.FLOW_INID = S.PARA_INID AND S.IF_SUB = 1) OR F.PARA_INID = S.FLOW_INID ) AND S.DEL_FLAG = 0;执行计划:----------------------------------------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |----------------------------------------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | | 194K(100)| | 0 |00:00:01.83 | 352K|| 1 | UNION-ALL | | 1 | | | | | 0 |00:00:01.83 | 352K||* 2 | FILTER | | 1 | | | | | 0 |00:00:01.83 | 352K|| 3 | TABLE ACCESS FULL | FLOW_INST | 1 | 57564 | 5115K| 1094 (1)| 00:00:14 | 58003 |00:00:00.05 | 4156 ||* 4 | TABLE ACCESS FULL | FLOW_RELATE_INFO | 58003 | 1 | 11 | 4 (0)| 00:00:01 | 0 |00:00:01.75 | 348K|| 5 | CONCATENATION | | 1 | | | | | 0 |00:00:00.01 | 8 || 6 | NESTED LOOPS | | 1 | 1 | 106 | 3 (0)| 00:00:01 | 0 |00:00:00.01 | 3 ||* 7 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 3 ||* 8 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 ||* 9 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 0 | 1 | 98 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 ||* 10 | INDEX UNIQUE SCAN | PK_FLOW_INST | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 || 11 | NESTED LOOPS | | 1 | 1 | 106 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 5 || 12 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 ||* 13 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 ||* 14 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 98 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 2 ||* 15 | INDEX RANGE SCAN | IDX_SUBFLOW_CHECK | 1 | 1 | | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 2 |----------------------------------------------------------------------------------------------------------------------------------------------
分析:
这是一个OA系统的业务SQL,执行时间接近2秒. FLOW_RELATE_INFO 表只有480条记录,8 blocks.
在不改写SQL的情况下,我们可以通过创建FLOW_RELATE_INFO表上 (FLOW_INID,RELATE_FLOW_INID)两字段联合索引,将sql执行效率提高到0.37秒(OA系统相对可以接受的一个响应时间):
这个创建小表索引提升效率的方法,也是对那些小表不需要创建索引说法的一个反证.
如果我们改写这个sql,可以不需要创建索引,就能得到一个更好的性能提升:不到0.01秒.
SQL改写结果如下:
SELECT A.FLOW_INID, A.CURR_STEP, A.FLOW_NAME, A.FINS_NAME, TO_CHAR(A.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'manual_rel' RELA_TYPEFROM FLOW_INST AWHERE FLOW_INID in (SELECT RELATE_FLOW_INID FROM FLOW_RELATE_INFO WHERE FLOW_INID = '77913' union SELECT FLOW_INID FROM FLOW_RELATE_INFO B WHERE RELATE_FLOW_INID= '77913' )UNION ALLSELECT S.FLOW_INID, S.CURR_STEP, S.FLOW_NAME, S.FINS_NAME, TO_CHAR(S.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'auto_rel' RELA_TYPEFROM FLOW_INST S, (SELECT FI.FLOW_INID, FI.PARA_INID FROM FLOW_INST FI WHERE FI.FLOW_INID = '77913') FWHERE ((F.FLOW_INID = S.PARA_INID AND S.IF_SUB = 1) OR F.PARA_INID = S.FLOW_INID ) AND S.DEL_FLAG = 0; 执行计划: ----------------------------------------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |----------------------------------------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | | 19 (100)| | 0 |00:00:00.01 | 20 || 1 | UNION-ALL | | 1 | | | | | 0 |00:00:00.01 | 20 || 2 | NESTED LOOPS | | 1 | 2 | 198 | 12 (17)| 00:00:01 | 0 |00:00:00.01 | 12 || 3 | NESTED LOOPS | | 1 | 2 | 198 | 12 (17)| 00:00:01 | 0 |00:00:00.01 | 12 || 4 | VIEW | VW_NSO_1 | 1 | 2 | 16 | 10 (20)| 00:00:01 | 0 |00:00:00.01 | 12 || 5 | SORT UNIQUE | | 1 | 2 | 22 | 10 (20)| 00:00:01 | 0 |00:00:00.01 | 12 || 6 | UNION-ALL | | 1 | | | | | 0 |00:00:00.01 | 12 ||* 7 | TABLE ACCESS FULL | FLOW_RELATE_INFO | 1 | 1 | 11 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 6 ||* 8 | TABLE ACCESS FULL | FLOW_RELATE_INFO | 1 | 1 | 11 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 6 ||* 9 | INDEX UNIQUE SCAN | PK_FLOW_INST | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 || 10 | TABLE ACCESS BY INDEX ROWID | FLOW_INST | 0 | 1 | 91 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 || 11 | CONCATENATION | | 1 | | | | | 0 |00:00:00.01 | 8 || 12 | NESTED LOOPS | | 1 | 1 | 106 | 3 (0)| 00:00:01 | 0 |00:00:00.01 | 3 ||* 13 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 3 ||* 14 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 ||* 15 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 0 | 1 | 98 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 ||* 16 | INDEX UNIQUE SCAN | PK_FLOW_INST | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 || 17 | NESTED LOOPS | | 1 | 1 | 106 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 5 || 18 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 ||* 19 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 ||* 20 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 98 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 2 ||* 21 | INDEX RANGE SCAN | IDX_SUBFLOW_CHECK | 1 | 1 | | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 2 |----------------------------------------------------------------------------------------------------------------------------------------------
case3:
原sql:
SELECT C.fd_txtname As "文件名称",a.fd_start_time AS "开始时间", a.fd_end_time AS "结束时间",c.N AS "数据量" FROM dapdw.tb_dapetl_log_proc ajoin dapdw.tb_dapetl_distribute_spool B on a.fd_proc_name=b.fd_idjoin (SELECT 'YCCTOEAL_INSTALMENT_DYYMMDD.DAT' as fd_txtname,COUNT(1) as N FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC01' UNION ALLSELECT 'YCCTOEAL_UNDRAWN_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC03' UNION ALLSELECT 'YCCTOEAL_FEE_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC05' UNION ALLSELECT 'NDSTOEAL_FXSPOT_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='NDS04' UNION ALLSELECT 'YI2TOEAL_LOAN_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YI201' UNION ALLSELECT 'YRLTOEAL_CCFD_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YRL01' ) CON C.fd_txtname=B.fd_txtname WHERE A.FD_DATE=20190217 ;
sql分析:
union all部分的C_EAL_LOANDEPO_HIS 表占用空间几十G,以data_Dt字段按天分区,有50个分区,data_Dt字段是varchar2类型.
存在两个问题:
1.data_Dt字段类型不匹配,发生了隐式类型转换,无法实现分区裁剪.类型匹配只需要访问一个分区,但是使用number类型变量要访问全部50个分区.
2.C_EAL_LOANDEPO_HIS表6次重复访问,可以使用case when的写法,只需要访问一次.
解决了上面两个问题后,预计改写后的SQL,执行执行效率会是原来的50*6=300倍. 只需要将data_Dt=20190217改成data_Dt='20190217',然后再配合case when, 不需要union all,只需要访问C_EAL_LOANDEPO_HIS表一次就能实现原SQL的业务逻辑,这个改写比较简单,这里就不多做说明.
总结:
今天通过3个case 来谈谈sql写法的重要性: 实现相同逻辑,写法不同,可能会有成百上千倍的性能差异.
只有熟练掌握分析执行计划的方法,再加上对各种SQL低效写法的了解,才能让SQL得以用最少的资源,最快的速度,完成业务需求.
感谢大家的阅读!
kettle性能及效率提升_开发人员掌握了这个技术,SQL效率会有几百倍的性能提升...相关推荐
- mysql中groupby会用到索引吗_开发人员不得不知的MySQL索引和查询优化
本文主要总结了工作中一些常用的操作及不合理的操作,在对慢查询进行优化时收集的一些有用的资料和信息,本文适合有 MySQL 基础的开发人员. 索引相关 索引基数 基数是数据列所包含的不同值的数量,例如, ...
- ios开发语言本地国际化_开发人员软件本地化最终语言指南
ios开发语言本地国际化 There are lots of great guides out there for how to prep your product for international ...
- ui设计师与开发人员的沟通_开发人员和设计师的27种免费资源
ui设计师与开发人员的沟通 Design is the face of your product, service or content, without good designs, even if ...
- 计算机软件开发如何提高效率,开发人员必知:提高工作效率的7个技巧
谁不希望有更多的时间来解决那些费神的复杂任务?利用一些节省时间的技巧来优化工作流程有助于在更短的时间内完成更多的工作.本文总结了帮助开发人员提供工作效率的7条技巧,供大家参考. 使用语音识别 如果你要 ...
- 时钟翻转事件_开发人员和时钟翻转
时钟翻转事件 Let's party like it's 1999! 让我们狂欢吧,就像1999年一样! You have probably written programs with a bug r ...
- slack 使用说明_开发人员应使用的7个Slack集成
slack 使用说明 如何使用集成和机器人自定义Slack来增强您的开发工作流程 毫无疑问,Slack正在逐渐成为现代办公通信的标准. 尽管您可能会说Slack从技术上讲与IRC没什么不同,但是精湛的 ...
- 前端开发时间格式的转换方法_开发人员投资时间而不浪费时间的10种方法
前端开发时间格式的转换方法 In today's, in the past and probably in the future world - the time is more valuable t ...
- 运动基元_开发人员的新分布式基元
运动基元 面向对象的基元(进程内基元) 作为Java开发人员,我非常熟悉面向对象的概念,例如类,对象,继承,封装,多态性等.除了面向对象的概念之外,我还非常熟悉Java运行时.它提供的功能,如何调整它 ...
- 汉堡菜单_开发人员在编写汉堡菜单时犯的错误
汉堡菜单 by Jared Tong 汤杰(Jared Tong) 开发人员在编写汉堡菜单时犯的错误 (The mistake developers make when coding a hambur ...
最新文章
- AutoX全无人驾驶出租车正式对公众开放试运营
- 特征工程(3):特征选择
- Remove Assignments to Parameters(移除对参数的赋值)
- ACC 时间范围处理
- access设置 dolby_win10系统设置和安装新款杜比音效的方法
- keil中下载程序的擦除功能
- 尚品宅配:最互联网的定制家居增长新势力,如何三招实现疫情期的逆势增长?
- 24节气—小雪海报、文案分享。雨凝成雪,万物冬藏。
- 2009年毕业设计题目:网上自助装机系统的设计与实现
- springboot集成solr实现全局搜索系列
- 性能测试入门(一):性能测试中的各项指标告诉我们什么
- 全国海选第四期:北京和海外赛区(视频)
- 3种修正异常数据的方法
- Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
- 小高不太行之前端--JSON
- IPV6到IPV4的转换
- IIB接收SAP请求配置
- 关于esp-idf编译时ccache错误导致在libsodium库报poly1305.c.obj类文件找不到的问题
- 字节码编程,Byte-buddy篇二《监控方法执行耗时动态获取出入参类型和值》
- slg游戏客户端框架简析
热门文章
- KD树是什么? 为什么要用KD树? KD树怎么用? KD树和KNN的关联是什么?
- R语言bioconductor包—maftools的使用
- oracle failovermode,[WK-T]ORACLE 10G 配置故障转移(Failover)
- 二三代基因组混合组装流程的搭建与序列拼接并行优化方法研究_武海波
- ReMILO:使用短读和长读的参考辅助错配检测算法
- NGS的测序仪和相关技术时间轴 NGS相关数据库和项目时间轴
- mysql 数据库的导入和导出
- 【ES6】数组的拓展
- 人群分割--Fully Convolutional Neural Networks for Crowd Segmentation
- 【openfst样例2】Downcasing Text