Oracle Dimension in DWH

重读 Tom 大师的 Expert One-on-One Oracle, 发现有对dimension的精彩讲解，并在强大的网络中找到了如下资料，最宝贵的是有sample：

在数据仓库环境中，我们通常利用物化视图强大的查询重写功能来提升统计查询的性能，但是物化视图的查询重写功能有时候无法智能地判断查询中一些相关联的条件，以至于影响性能。

比如我们有一张销售表sales，用于存储订单的详细信息，包含交易日期、顾客编号和销售量。我们创建一张物化视图，按月存储累计销量信息，假如这时候我们要查询按季度或者按年度统计销量信息，Oracle是否能够智能地转换查询重写呢？我们知道交易日期中的日期意味着月，月意味着所处的季度，季度意味着年度，但是Oracle却是无法智能地判断这其中的关系，因此无法利用物化视图查询重写来返回我们季度或年度的销量信息，而是直接查询基表，导致性能产生问题。

这时候Dimension就派上用场了。Dimension用于说明列之间的父子对应关系，以使优化器能够自动转换不同列的关系，利用物化视图的查询功能来提升查询统计性能。下面我们首先创建一张销售交易表sales，包含交易日期、顾客编号和销售量这几个列，用于保存销售订单信息，整个表有42万多条记录；创建另一张表time_hierarchy用于存储交易日期中时间的关系，包含交易日期及其对应的月、季度及年度等信息，然后我们将体验Dimension的强大功能。

Roby@XUE> create table sales

2 (trans_date date, cust_id int, sales_amount number );

Table created.

Roby@XUE> insert /*+ APPEND */ into sales

2 select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE,

3 mod(rownum,100) CUST_ID,

4 abs(dbms_random.random)/100 SALES_AMOUNT

5 from all_objects

6 /

5926 rows created.

Roby@XUE> commit;

Commit complete.

Roby@XUE> begin

2 for i in 1 .. 6

3 loop

4 insert /*+ APPEND */ into sales

5 select trans_date, cust_id, abs(dbms_random.random)/100 SALES_AMOUNT

6 from sales;

7 commit;

8 end loop;

9 end;

10 /

PL/SQL procedure successfully completed.

Roby@XUE> select count(*) from sales;

COUNT(*)

----------

426672

创建索引组织表time_hierarchy，里面生成了交易日期中日期DAY、月MMYYYY、季度QTY_YYYY、年度YYYY的关系。

Roby@XUE> create table time_hierarchy

2 (day primary key, mmyyyy, mon_yyyy, qtr_yyyy, yyyy)

3 organization index

4 as

5 select distinct

6 trans_date DAY,

7 cast (to_char(trans_date,'mmyyyy') as number) MMYYYY,

8 to_char(trans_date,'mon-yyyy') MON_YYYY,

9 'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY'

10 || to_char(trans_date,'yyyy') QTR_YYYY,

11 cast( to_char( trans_date, 'yyyy' ) as number ) YYYY

12 from sales

13 /

Table created.

接下我们创建一张物化视图mv_sales，用于存储每个客户对应每个月的销量统计信息。

Roby@XUE> create materialized view mv_sales

2 build immediate

3 refresh on demand

4 enable query rewrite

5 as

6 select sales.cust_id, sum(sales.sales_amount) sales_amount,

7 time_hierarchy.mmyyyy

8 from sales, time_hierarchy

9 where sales.trans_date = time_hierarchy.day

10 group by sales.cust_id, time_hierarchy.mmyyyy

11 /

Materialized view created.

我们对基表进行分析，以使优化器能够物化视图的查询重写功能：

Roby@XUE> analyze table sales compute statistics;

Table analyzed.

Roby@XUE> analyze table time_hierarchy compute statistics;

Table analyzed.

设置会话的查询重写功能：

Roby@XUE> alter session set query_rewrite_enabled=true;

Session altered.

Roby@XUE> alter session set query_rewrite_integrity=trusted;

Session altered.

接下来我们按月统计总的销量：

Roby@XUE> select time_hierarchy.mmyyyy, sum(sales_amount)

2 from sales, time_hierarchy

3 where sales.trans_date = time_hierarchy.day

4 group by time_hierarchy.mmyyyy

5 /

MMYYYY SUM(SALES_AMOUNT)

---------- -----------------

12006 4.0574E+11

12007 1.2297E+10

22006 3.6875E+11

32006 3.9507E+11

42006 3.7621E+11

52006 3.8549E+11

62006 3.6641E+11

72006 3.8110E+11

82006 3.8502E+11

92006 3.7278E+11

102006 3.7983E+11

112006 3.7210E+11

122006 3.8364E+11

13 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=4 Card=327 Bytes=8502)

1 0 SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502)

2 1 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327 Bytes=8502)

Statistics

----------------------------------------------------------

17 recursive calls

0 db block gets

25 consistent gets

4 physical reads

我们可以看到查询使用了查询重写的功能，直接查询物化视图中的查询方案，而不是查询其表，逻辑IO只有25个，性能相当良好。

假如这时候我们要按季度来查询统计销量信息，结果又会是怎样呢？

Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)

2 from sales, time_hierarchy

3 where sales.trans_date = time_hierarchy.day

4 group by time_hierarchy.qtr_yyyy

5 /

QTR_YYYY SUM(SALES_AMOUNT)

------------------------------------------------ -----------------

Q1 FY2006 1.1696E+12

Q1 FY2007 1.2297E+10

Q2 FY2006 1.1281E+12

Q3 FY2006 1.1389E+12

Q4 FY2006 1.1356E+12

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=1681 Card=5 Bytes=145)

1 0 SORT (GROUP BY) (Cost=1681 Card=5 Bytes=145)

2 1 NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)

3 2 TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672

4 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)

Statistics

----------------------------------------------------------

14 recursive calls

0 db block gets

428048 consistent gets

599 physical reads

可以看到查询将直接查询基表产生了将近428048个逻辑IO，性能无法满足需求。

接下我们创建一个Dimension表time_hierarchy_dim，用于提醒优化器time_hierarchy表中的DAY列暗示着MMYYYY，MMYYYY又意味着QTY_YYYY，QTY_YYYY又意味着YYYY。然后我们将重新运行上面那个查询，看执行计划发生了怎样的变更。

Roby@XUE> create dimension time_hierarchy_dim

2 level day is time_hierarchy.day

3 level mmyyyy is time_hierarchy.mmyyyy

4 level qtr_yyyy is time_hierarchy.qtr_yyyy

5 level yyyy is time_hierarchy.yyyy

6 hierarchy time_rollup

7 (

8 day child of

9 mmyyyy child of

10 qtr_yyyy child of

11 yyyy

12 )

13 attribute mmyyyy

14 determines mon_yyyy;

Dimension created.

Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)

2 from sales, time_hierarchy

3 where sales.trans_date = time_hierarchy.day

4 group by time_hierarchy.qtr_yyyy

5 /

QTR_YYYY SUM(SALES_AMOUNT)

------------------------------------------------ -----------------

Q1 FY2006 1.1696E+12

Q1 FY2007 1.2297E+10

Q2 FY2006 1.1281E+12

Q3 FY2006 1.1389E+12

Q4 FY2006 1.1356E+12

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=14 Card=5 Bytes=195)

1 0 SORT (GROUP BY) (Cost=14 Card=5 Bytes=195)

2 1 HASH JOIN (Cost=7 Card=1157 Bytes=45123)

3 2 VIEW (Cost=4 Card=46 Bytes=598)

4 3 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)

5 4 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)

6 2 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327

Statistics

----------------------------------------------------------

193 recursive calls

0 db block gets

49 consistent gets

2 physical reads

可以看到创建Dimension后，Oracle已经能够智能地理解交易日期中月度和季度的转换关系，查询使用到物化视图，逻辑IO由原来的428048个减少到49个，性能有了大幅的提升。

同样我们再来统计一下年度的销量信息：

Roby@XUE> select time_hierarchy.yyyy, sum(sales_amount)

2 from sales, time_hierarchy

3 where sales.trans_date = time_hierarchy.day

4 group by time_hierarchy.yyyy

5 /

YYYY SUM(SALES_AMOUNT)

---------- -----------------

2006 4.5721E+12

2007 1.2297E+10

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=10 Card=2 Bytes=66)

1 0 SORT (GROUP BY) (Cost=10 Card=2 Bytes=66)

2 1 HASH JOIN (Cost=7 Card=478 Bytes=15774)

我们再创建一张customer_hierarchy表，用于存储客户代码、邮政编码和地区的关系，然后我们将按不同邮编或地区来查询各自的月度、季度或者年度销量信息。

Roby@XUE> create table customer_hierarchy

2 ( cust_id primary key, zip_code, region )

3 organization index

4 as

5 select cust_id,

6 mod( rownum, 6 ) || to_char(mod( rownum, 1000 ), 'fm0000') zip_code,

7 mod( rownum, 6 ) region

8 from ( select distinct cust_id from sales)

9 /

Table created.

Roby@XUE> analyze table customer_hierarchy compute statistics;

Table analyzed.

改写物化视图，查询方案中添加按不同邮编的月度统计销量。

Roby@XUE> drop materialized view mv_sales;

Materialized view dropped.

Roby@XUE> create materialized view mv_sales

2 build immediate

3 refresh on demand

4 enable query rewrite

5 as

6 select customer_hierarchy.zip_code,

7 time_hierarchy.mmyyyy,

8 sum(sales.sales_amount) sales_amount

9 from sales, time_hierarchy, customer_hierarchy

10 where sales.trans_date = time_hierarchy.day

11 and sales.cust_id = customer_hierarchy.cust_id

12 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy

13 /

Materialized view created.

Roby@XUE> set autotrace traceonly

Roby@XUE> select customer_hierarchy.zip_code,

2 time_hierarchy.mmyyyy,

3 sum(sales.sales_amount) sales_amount

4 from sales, time_hierarchy, customer_hierarchy

5 where sales.trans_date = time_hierarchy.day

6 and sales.cust_id = customer_hierarchy.cust_id

7 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy

8 /

1216 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=409 Bytes=20450)

1 0 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 Bytes=20450)

Statistics

----------------------------------------------------------

28 recursive calls

0 db block gets

116 consistent gets

5 physical reads

可以看到如果按不同邮编、不同月度来统计查询的话，优化器将会查询物化视图中的查询方案，性能也是比较可观的。假如我们查不同地区年度的统计销量信息，结果又会是怎样？

Roby@XUE> select customer_hierarchy.region,

2 time_hierarchy.yyyy,

3 sum(sales.sales_amount) sales_amount

4 from sales, time_hierarchy, customer_hierarchy

5 where sales.trans_date = time_hierarchy.day

6 and sales.cust_id = customer_hierarchy.cust_id

7 group by customer_hierarchy.region, time_hierarchy.yyyy

8 /

9 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=1681 Card=9 Bytes=261)

1 0 SORT (GROUP BY) (Cost=1681 Card=9 Bytes=261)

2 1 NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)

3 2 NESTED LOOPS (Cost=35 Card=426672 Bytes=8106768)

4 3 TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672

5 3 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)

6 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

428047 consistent gets

745 physical reads

可以看到查询性能大有影响。接下我们同样创建dimension sales_dimension，用于说明客户代码和邮编、地区间的关系：

Roby@XUE> drop dimension time_hierarchy_dim

2 /

Dimension dropped.

Roby@XUE> create dimension sales_dimension

2 level cust_id is customer_hierarchy.cust_id

3 level zip_code is customer_hierarchy.zip_code

4 level region is customer_hierarchy.region

5 level day is time_hierarchy.day

6 level mmyyyy is time_hierarchy.mmyyyy

7 level qtr_yyyy is time_hierarchy.qtr_yyyy

8 level yyyy is time_hierarchy.yyyy

9 hierarchy cust_rollup

10 (

11 cust_id child of

12 zip_code child of

13 region

14 )

15 hierarchy time_rollup

16 (

17 day child of

18 mmyyyy child of

19 qtr_yyyy child of

20 yyyy

21 )

22 attribute mmyyyy

23 determines mon_yyyy;

Dimension created.

再回到原来的查询，我们可以看到查询性能有了大幅的提升：

Roby@XUE> set autotrace on

Roby@XUE> select customer_hierarchy.region,

2 time_hierarchy.yyyy,

3 sum(sales.sales_amount) sales_amount

4 from sales, time_hierarchy, customer_hierarchy

5 where sales.trans_date = time_hierarchy.day

6 and sales.cust_id = customer_hierarchy.cust_id

7 group by customer_hierarchy.region, time_hierarchy.yyyy

8 /

REGION YYYY SALES_AMOUNT

---------- ---------- ------------

0 2006 7.3144E+11

0 2007 4484956329

1 2006 7.8448E+11

2 2006 7.7257E+11

2 2007 4684418980

3 2006 7.7088E+11

4 2006 7.8004E+11

4 2007 3127953246

5 2006 7.3273E+11

9 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=15 Card=9 Bytes=576)

1 0 SORT (GROUP BY) (Cost=15 Card=9 Bytes=576)

2 1 HASH JOIN (Cost=10 Card=598 Bytes=38272)

3 2 VIEW (Cost=3 Card=100 Bytes=700)

4 3 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)

5 4 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)

6 2 HASH JOIN (Cost=7 Card=598 Bytes=34086)

7 6 VIEW (Cost=4 Card=19 Bytes=133)

8 7 SORT (UNIQUE) (Cost=4 Card=19 Bytes=133)

9 8 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828'

10 6 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409

Statistics

----------------------------------------------------------

364 recursive calls

0 db block gets

88 consistent gets

0 physical reads

Roby@XUE> set autot trace

Roby@XUE> select customer_hierarchy.region,

2 time_hierarchy.qtr_yyyy,

3 sum(sales.sales_amount) sales_amount

4 from sales, time_hierarchy, customer_hierarchy

5 where sales.trans_date = time_hierarchy.day

6 and sales.cust_id = customer_hierarchy.cust_id

7 group by customer_hierarchy.region, time_hierarchy.qtr_yyyy;

27 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=23 Card=22 Bytes=154

1 0 SORT (GROUP BY) (Cost=23 Card=22 Bytes=1540)

2 1 HASH JOIN (Cost=11 Card=1447 Bytes=101290)

3 2 VIEW (Cost=3 Card=100 Bytes=700)

4 3 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)

5 4 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE) (

6 2 HASH JOIN (Cost=7 Card=1447 Bytes=91161)

7 6 VIEW (Cost=4 Card=46 Bytes=598)

8 7 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)

9 8 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UN

10 6 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B

Statistics

----------------------------------------------------------

10 recursive calls

0 db block gets

19 consistent gets

0 physical reads

Roby@XUE> select customer_hierarchy.region,

2 time_hierarchy.mon_yyyy,

3 sum(sales.sales_amount) sales_amount

4 from sales, time_hierarchy, customer_hierarchy

5 where sales.trans_date = time_hierarchy.day

6 and sales.cust_id = customer_hierarchy.cust_id

7 group by customer_hierarchy.region, time_hierarchy.mon_yyyy;

75 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT ptimizer=CHOOSE (Cost=41 Card=56 Bytes=386

1 0 SORT (GROUP BY) (Cost=41 Card=56 Bytes=3864)

2 1 HASH JOIN (Cost=11 Card=3775 Bytes=260475)

3 2 VIEW (Cost=4 Card=120 Bytes=1440)

4 3 SORT (UNIQUE) (Cost=4 Card=120 Bytes=1440)

5 4 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQ

6 2 HASH JOIN (Cost=6 Card=409 Bytes=23313)

7 6 VIEW (Cost=3 Card=100 Bytes=700)

8 7 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)

9 8 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)

10 6 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

14 consistent gets

0 physical reads

源帖地址：
http://blog.csdn.net/wzy0623/archive/2009/11/13/4805860.aspx

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/781883/viewspace-664844/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/781883/viewspace-664844/

Oracle Dimension in DWH相关推荐

oracle dimension的探究（维度）
query rewrite和dimension配合物化视图的巨大作用,可以很大程度的提高查询的性能,在数据仓库中,应用很显著. 首先介绍一下dimension(维度),下面是concept中的概念: ...
Oracle优化器:星型转换
Oracle 8i中引入了星型转换(star transformation)的优化器新特性以便更有效地处理星型查询.星型查询语句多用于基于星型模型设计的数据仓库应用中.星型模型的称谓源于该种模型以图形 ...
Oracle Schema Objects（Schema Object Storage And Type）
One characteristic of an RDBMS is the independence of physical data storage from logical data struct ...
Oracle逻辑备份与恢复
1. 备份的类型按照备份方式的不同,可以把备份分为两类: 1.1 逻辑备份:指通过逻辑导出对数据进行备份.将数据库中的用户对象导出到一个二进制文件中,逻辑备份使用导入导出工具:EXPDP/IMPDP ...
关于Oracle数据库19c中的关键字和保留字的说明
关于Oracle数据库中的关键字和保留字的说明官方文档节选: You cannot use Oracle SQL reserved words as nonquoted identifiers. ...
oracle 24365,深入解析：DBA_OBJECTS中的OBJECT_ID与DATA_OBJECT_ID的区别
墨墨导读:在Oracle dba_objects视图中存在object_id和data_oubject两个列,已经存在object_id列了为什么还会有一个data_object_id列呢,这两个列有 ...
Oracle面试过程中常见的二十个问题
1.冷备份和热备份的不同点以及各自的优点解答:热备份针对归档模式的数据库,在数据库仍旧处于工作状态时进行备份.而冷备份指在数据库关闭后,进行备份,适用于所有模式的数据库.热备份的优点在于当备份时,数 ...
Oracle导出excel
oracle导出excel(非csv)的方法有两种,1.使用sqlplus spool,2.使用包体现将网上相关代码整理后贴出以备不时之需: 使用sqlplus: 使用sqlplus需要两个文件: ...
oracle sqlcode 多条,SQL查询以连接Oracle中多个行的列值
是否有可能构造SQL来连接列值多行? 以下是一个示例: 表A PID A B C 表B PID SEQ DESC A 1 Have A 2 a nice ...

Oracle Dimension in DWH

Oracle Dimension in DWH相关推荐

最新文章

热门文章