11.数据仓库搭建之DWS层搭建
数据仓库搭建之DWS层搭建
在搭建该层时,我们需要注意的是:
1)本层的设计主要参考指标体系
2)DWS层数据的数据存储格式为orc列式存储+snappy压缩。
3)DWS层表名的命名规范为:dws _ 数据域 _ 统计粒度 _ 业务过程 _ 统计周期(1d/nd/td)
其中1d表示最近一天的聚合统计,nd表示最近n天的聚合统计,而td表示历史至今的聚合统计。
对于DWS层而言,我们需要根据我们的需要来确定是否在DWS层构建聚合数据的表,如果某些数据指标被多个需求使用到,那么我们就需要提前进行汇聚计算。这样会大大提高我们的效率。
1.指标体系确定
我们在先前进行数据仓库设计的时候,根据现有的统计需求整理出了指标体系,后续又根据指标体系抽取出了所有的派生指标:
我们将相同业务过程,相同统计周期以及相同统计粒度的指标都放在一张DWS层表当中。
2.DWS层最近1日汇总表设计
2.1 交易域用户商品粒度订单(下单)最近1日汇总表
2.1.1建表语句
DROP TABLE IF EXISTS dws_trade_user_sku_order_1d;
CREATE EXTERNAL TABLE dws_trade_user_sku_order_1d
(`user_id` STRING COMMENT '用户id',`sku_id` STRING COMMENT 'sku_id',`sku_name` STRING COMMENT 'sku名称',`category1_id` STRING COMMENT '一级分类id',`category1_name` STRING COMMENT '一级分类名称',`category2_id` STRING COMMENT '一级分类id',`category2_name` STRING COMMENT '一级分类名称',`category3_id` STRING COMMENT '一级分类id',`category3_name` STRING COMMENT '一级分类名称',`tm_id` STRING COMMENT '品牌id',`tm_name` STRING COMMENT '品牌名称',`order_count_1d` BIGINT COMMENT '最近1日下单次数',`order_num_1d` BIGINT COMMENT '最近1日下单件数',`order_original_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单原始金额',`activity_reduce_amount_1d` DECIMAL(16, 2) COMMENT '最近1日活动优惠金额',`coupon_reduce_amount_1d` DECIMAL(16, 2) COMMENT '最近1日优惠券优惠金额',`order_total_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单最终金额'
) COMMENT '交易域用户商品粒度订单最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_sku_order_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.1.2数据装载
1)首日装载:
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trade_user_sku_order_1d partition(dt)
selectuser_id,id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_count_1d,order_num_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1d,dt
from
(selectdt,user_id,sku_id,count(*) order_count_1d,sum(sku_num) order_num_1d,sum(split_original_amount) order_original_amount_1d,sum(nvl(split_activity_amount,0.0)) activity_reduce_amount_1d,sum(nvl(split_coupon_amount,0.0)) coupon_reduce_amount_1d,sum(split_total_amount) order_total_amount_1dfrom dwd_trade_order_detail_incgroup by dt,user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom dim_sku_fullwhere dt='2022-05-01'
)sku
on od.sku_id=sku.id;
2)每日装载:
insert overwrite table dws_trade_user_sku_order_1d partition(dt='2022-05-02')
selectuser_id,id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_count,order_num,order_original_amount,activity_reduce_amount,coupon_reduce_amount,order_total_amount
from
(selectuser_id,sku_id,count(*) order_count,sum(sku_num) order_num,sum(split_original_amount) order_original_amount,sum(nvl(split_activity_amount,0)) activity_reduce_amount,sum(nvl(split_coupon_amount,0)) coupon_reduce_amount,sum(split_total_amount) order_total_amountfrom dwd_trade_order_detail_incwhere dt='2020-06-15'group by user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom dim_sku_fullwhere dt='2022-05-02'
)sku
on od.sku_id=sku.id;
2.2 交易域用户商品粒度退单最近1日汇总表
2.2.1建表语句
DROP TABLE IF EXISTS dws_trade_user_sku_order_refund_1d;
CREATE EXTERNAL TABLE dws_trade_user_sku_order_refund_1d
(`user_id` STRING COMMENT '用户id',`sku_id` STRING COMMENT 'sku_id',`sku_name` STRING COMMENT 'sku名称',`category1_id` STRING COMMENT '一级分类id',`category1_name` STRING COMMENT '一级分类名称',`category2_id` STRING COMMENT '一级分类id',`category2_name` STRING COMMENT '一级分类名称',`category3_id` STRING COMMENT '一级分类id',`category3_name` STRING COMMENT '一级分类名称',`tm_id` STRING COMMENT '品牌id',`tm_name` STRING COMMENT '品牌名称',`order_refund_count_1d` BIGINT COMMENT '最近1日退单次数',`order_refund_num_1d` BIGINT COMMENT '最近1日退单件数',`order_refund_amount_1d` DECIMAL(16, 2) COMMENT '最近1日退单金额'
) COMMENT '交易域用户商品粒度退单最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_sku_order_refund_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.2.2数据装载
1)首日装载
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trade_user_sku_order_refund_1d partition(dt)
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_refund_count,order_refund_num,order_refund_amount,dt
from
(selectdt,user_id,sku_id,count(*) order_refund_count,sum(refund_num) order_refund_num,sum(refund_amount) order_refund_amountfrom dwd_trade_order_refund_incgroup by dt,user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom dim_sku_fullwhere dt='2022-05-01'
)sku
on od.sku_id=sku.id;
2)每日装载
insert overwrite table dws_trade_user_sku_order_refund_1d partition(dt='2022-05-02')
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_refund_count,order_refund_num,order_refund_amount
from
(selectuser_id,sku_id,count(*) order_refund_count,sum(refund_num) order_refund_num,sum(refund_amount) order_refund_amountfrom dwd_trade_order_refund_incwhere dt='2022-05-02'group by user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom dim_sku_fullwhere dt='2022-05-02'
)sku
on od.sku_id=sku.id;
2.3 交易域用户粒度订单(下单)最近1日汇总表
2.3.1建表语句
DROP TABLE IF EXISTS dws_trade_user_order_1d;
CREATE EXTERNAL TABLE dws_trade_user_order_1d
(`user_id` STRING COMMENT '用户id',`order_count_1d` BIGINT COMMENT '最近1日下单次数',`order_num_1d` BIGINT COMMENT '最近1日下单商品件数',`order_original_amount_1d` DECIMAL(16, 2) COMMENT '最近1日最近1日下单原始金额',`activity_reduce_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单活动优惠金额',`coupon_reduce_amount_1d` DECIMAL(16, 2) COMMENT '下单优惠券优惠金额',`order_total_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单最终金额'
) COMMENT '交易域用户粒度订单最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_order_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.3.2数据装载
1)首日装载
insert overwrite table dws_trade_user_order_1d partition(dt)
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_original_amount),sum(nvl(split_activity_amount,0)),sum(nvl(split_coupon_amount,0)),sum(split_total_amount),dt
from dwd_trade_order_detail_inc
group by user_id,dt;
2)每日装载
insert overwrite table dws_trade_user_order_1d partition(dt='2022-05-02')
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_original_amount),sum(nvl(split_activity_amount,0)),sum(nvl(split_coupon_amount,0)),sum(split_total_amount)
from dwd_trade_order_detail_inc
where dt='2022-05-02'
group by user_id;
2.4 交易域用户粒度加购最近1日汇总表
2.4.1建表语句
DROP TABLE IF EXISTS dws_trade_user_cart_add_1d;
CREATE EXTERNAL TABLE dws_trade_user_cart_add_1d
(`user_id` STRING COMMENT '用户id',`cart_add_count_1d` BIGINT COMMENT '最近1日加购次数',`cart_add_num_1d` BIGINT COMMENT '最近1日加购商品件数'
) COMMENT '交易域用户粒度加购最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_cart_add_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.4.2数据装载
1)首日装载
insert overwrite table dws_trade_user_cart_add_1d partition(dt)
selectuser_id,count(*),sum(sku_num),dt
from dwd_trade_cart_add_inc
group by user_id,dt;
2)每日装载
insert overwrite table dws_trade_user_cart_add_1d partition(dt='2022-05-02')
selectuser_id,count(*),sum(sku_num)
from dwd_trade_cart_add_inc
where dt='2022-05-02'
group by user_id;
2.5 交易域用户粒度支付最近1日汇总表
2.5.1建表语句
DROP TABLE IF EXISTS dws_trade_user_payment_1d;
CREATE EXTERNAL TABLE dws_trade_user_payment_1d
(`user_id` STRING COMMENT '用户id',`payment_count_1d` BIGINT COMMENT '最近1日支付次数',`payment_num_1d` BIGINT COMMENT '最近1日支付商品件数',`payment_amount_1d` DECIMAL(16, 2) COMMENT '最近1日支付金额'
) COMMENT '交易域用户粒度支付最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_payment_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.5.2数据装载
1)首日装载
insert overwrite table dws_trade_user_payment_1d partition(dt)
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_payment_amount),dt
from dwd_trade_pay_detail_suc_inc
group by user_id,dt;
2)每日装载
insert overwrite table dws_trade_user_payment_1d partition(dt='2022-05-02')
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_payment_amount)
from dwd_trade_pay_detail_suc_inc
where dt='2022-05-02'
group by user_id;
2.6 交易域省份粒度订单最近1日汇总表
2.6.1建表语句
DROP TABLE IF EXISTS dws_trade_province_order_1d;
CREATE EXTERNAL TABLE dws_trade_province_order_1d
(`province_id` STRING COMMENT '省份id',`province_name` STRING COMMENT '省份名称',`area_code` STRING COMMENT '地区编码',`iso_code` STRING COMMENT '旧版ISO-3166-2编码',`iso_3166_2` STRING COMMENT '新版版ISO-3166-2编码',`order_count_1d` BIGINT COMMENT '最近1日下单次数',`order_original_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单原始金额',`activity_reduce_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单活动优惠金额',`coupon_reduce_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单优惠券优惠金额',`order_total_amount_1d` DECIMAL(16, 2) COMMENT '最近1日下单最终金额'
) COMMENT '交易域省份粒度订单最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_province_order_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.6.2数据装载
1)首日装载
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trade_province_order_1d partition(dt)
selectprovince_id,province_name,area_code,iso_code,iso_3166_2,order_count_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1d,dt
from
(selectprovince_id,count(distinct(order_id)) order_count_1d,sum(split_original_amount) order_original_amount_1d,sum(nvl(split_activity_amount,0)) activity_reduce_amount_1d,sum(nvl(split_coupon_amount,0)) coupon_reduce_amount_1d,sum(split_total_amount) order_total_amount_1d,dtfrom dwd_trade_order_detail_incgroup by province_id,dt
)o
left join
(selectid,province_name,area_code,iso_code,iso_3166_2from dim_province_fullwhere dt='2022-05-01'
)p
on o.province_id=p.id;
2)每日装载
insert overwrite table dws_trade_province_order_1d partition(dt='2022-05-02')
selectprovince_id,province_name,area_code,iso_code,iso_3166_2,order_count_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1d
from
(selectprovince_id,count(distinct(order_id)) order_count_1d,sum(split_original_amount) order_original_amount_1d,sum(nvl(split_activity_amount,0)) activity_reduce_amount_1d,sum(nvl(split_coupon_amount,0)) coupon_reduce_amount_1d,sum(split_total_amount) order_total_amount_1dfrom dwd_trade_order_detail_incwhere dt='2022-05-02'group by province_id
)o
left join
(selectid,province_name,area_code,iso_code,iso_3166_2from dim_province_fullwhere dt='2022-05-02'
)p
on o.province_id=p.id;
2.7 交易域用户粒度退单最近1日汇总表
2.7.1建表语句
DROP TABLE IF EXISTS dws_trade_user_order_refund_1d;
CREATE EXTERNAL TABLE dws_trade_user_order_refund_1d
(`user_id` STRING COMMENT '用户id',`order_refund_count_1d` BIGINT COMMENT '最近1日退单次数',`order_refund_num_1d` BIGINT COMMENT '最近1日退单商品件数',`order_refund_amount_1d` DECIMAL(16, 2) COMMENT '最近1日退单金额'
) COMMENT '交易域用户粒度退单最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_order_refund_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.7.2数据装载
1)首日装载
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trade_user_order_refund_1d partition(dt)
selectuser_id,count(*) order_refund_count,sum(refund_num) order_refund_num,sum(refund_amount) order_refund_amount,dt
from dwd_trade_order_refund_inc
group by user_id,dt;
2)每日装载
insert overwrite table dws_trade_user_order_refund_1d partition(dt='2022-05-02')
selectuser_id,count(*),sum(refund_num),sum(refund_amount)
from dwd_trade_order_refund_inc
where dt='2022-05-02'
group by user_id;
2.8 流量域会话粒度页面浏览最近1日汇总表
2.8.1建表语句
DROP TABLE IF EXISTS dws_traffic_session_page_view_1d;
CREATE EXTERNAL TABLE dws_traffic_session_page_view_1d
(`session_id` STRING COMMENT '会话id',`mid_id` string comment '设备id',`brand` string comment '手机品牌',`model` string comment '手机型号',`operate_system` string comment '操作系统',`version_code` string comment 'app版本号',`channel` string comment '渠道',`during_time_1d` BIGINT COMMENT '最近1日访问时长',`page_count_1d` BIGINT COMMENT '最近1日访问页面数'
) COMMENT '流量域会话粒度页面浏览最近1日汇总表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_traffic_session_page_view_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.8.2数据装载
insert overwrite table dws_traffic_session_page_view_1d partition(dt='2022-05-01')
selectsession_id,mid_id,brand,model,operate_system,version_code,channel,sum(during_time),count(*)
from dwd_traffic_page_view_inc
where dt='2022-05-01'
group by session_id,mid_id,brand,model,operate_system,version_code,channel;
2.9 流量域访客页面粒度页面浏览最近1日汇总表
2.9.1建表语句
DROP TABLE IF EXISTS dws_traffic_page_visitor_page_view_1d;
CREATE EXTERNAL TABLE dws_traffic_page_visitor_page_view_1d
(`mid_id` STRING COMMENT '访客id',`brand` string comment '手机品牌',`model` string comment '手机型号',`operate_system` string comment '操作系统',`page_id` STRING COMMENT '页面id',`during_time_1d` BIGINT COMMENT '最近1日浏览时长',`view_count_1d` BIGINT COMMENT '最近1日访问次数'
) COMMENT '流量域访客页面粒度页面浏览最近1日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_traffic_page_visitor_page_view_1d'TBLPROPERTIES ('orc.compress' = 'snappy');
2.9.2数据装载
insert overwrite table dws_traffic_page_visitor_page_view_1d partition(dt='2022-05-01')
selectmid_id,brand,model,operate_system,page_id,sum(during_time),count(*)
from dwd_traffic_page_view_inc
where dt='2022-05-01'
group by mid_id,brand,model,operate_system,page_id;
2.10 数据装载脚本编写
2.10.1首日数据装载脚本编写
(1)在hadoop102的**/home/hadoop/bin目录下创建dwd_to_dws_1d_init.sh**
脚本内容如下所示:
#!/bin/bash
APP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "请传入日期参数"exit
fidws_trade_province_order_1d="
insert overwrite table ${APP}.dws_trade_province_order_1d partition(dt)
selectprovince_id,province_name,area_code,iso_code,iso_3166_2,order_count_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1d,dt
from
(selectprovince_id,count(distinct(order_id)) order_count_1d,sum(split_original_amount) order_original_amount_1d,sum(nvl(split_activity_amount,0)) activity_reduce_amount_1d,sum(nvl(split_coupon_amount,0)) coupon_reduce_amount_1d,sum(split_total_amount) order_total_amount_1d,dtfrom ${APP}.dwd_trade_order_detail_incgroup by province_id,dt
)o
left join
(selectid,province_name,area_code,iso_code,iso_3166_2from ${APP}.dim_province_fullwhere dt='$do_date'
)p
on o.province_id=p.id;
"
dws_trade_user_cart_add_1d="
insert overwrite table ${APP}.dws_trade_user_cart_add_1d partition(dt)
selectuser_id,count(*),sum(sku_num),dt
from ${APP}.dwd_trade_cart_add_inc
group by user_id,dt;
"
dws_trade_user_order_1d="
insert overwrite table ${APP}.dws_trade_user_order_1d partition(dt)
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_original_amount),sum(nvl(split_activity_amount,0)),sum(nvl(split_coupon_amount,0)),sum(split_total_amount),dt
from ${APP}.dwd_trade_order_detail_inc
group by user_id,dt;
"
dws_trade_user_order_refund_1d="
insert overwrite table ${APP}.dws_trade_user_order_refund_1d partition(dt)
selectuser_id,count(*) order_refund_count,sum(refund_num) order_refund_num,sum(refund_amount) order_refund_amount,dt
from ${APP}.dwd_trade_order_refund_inc
group by user_id,dt;
"
dws_trade_user_payment_1d="
insert overwrite table ${APP}.dws_trade_user_payment_1d partition(dt)
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_payment_amount),dt
from ${APP}.dwd_trade_pay_detail_suc_inc
group by user_id,dt;
"
dws_trade_user_sku_order_1d="
insert overwrite table ${APP}.dws_trade_user_sku_order_1d partition(dt)
selectuser_id,id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_count_1d,order_num_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1d,dt
from
(selectdt,user_id,sku_id,count(*) order_count_1d,sum(sku_num) order_num_1d,sum(split_original_amount) order_original_amount_1d,sum(nvl(split_activity_amount,0.0)) activity_reduce_amount_1d,sum(nvl(split_coupon_amount,0.0)) coupon_reduce_amount_1d,sum(split_total_amount) order_total_amount_1dfrom ${APP}.dwd_trade_order_detail_incgroup by dt,user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom ${APP}.dim_sku_fullwhere dt='$do_date'
)sku
on od.sku_id=sku.id;
"
dws_trade_user_sku_order_refund_1d="
insert overwrite table ${APP}.dws_trade_user_sku_order_refund_1d partition(dt)
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_refund_count,order_refund_num,order_refund_amount,dt
from
(selectdt,user_id,sku_id,count(*) order_refund_count,sum(refund_num) order_refund_num,sum(refund_amount) order_refund_amountfrom ${APP}.dwd_trade_order_refund_incgroup by dt,user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom ${APP}.dim_sku_fullwhere dt='$do_date'
)sku
on od.sku_id=sku.id;
"
dws_traffic_page_visitor_page_view_1d="
insert overwrite table ${APP}.dws_traffic_page_visitor_page_view_1d partition(dt='$do_date')
selectmid_id,brand,model,operate_system,page_id,sum(during_time),count(*)
from ${APP}.dwd_traffic_page_view_inc
where dt='$do_date'
group by mid_id,brand,model,operate_system,page_id;
"
dws_traffic_session_page_view_1d="
insert overwrite table ${APP}.dws_traffic_session_page_view_1d partition(dt='$do_date')
selectsession_id,mid_id,brand,model,operate_system,version_code,channel,sum(during_time),count(*)
from ${APP}.dwd_traffic_page_view_inc
where dt='$do_date'
group by session_id,mid_id,brand,model,operate_system,version_code,channel;
"case $1 in"dws_trade_province_order_1d" )hive -e "$dws_trade_province_order_1d";;"dws_trade_user_cart_add_1d" )hive -e "$dws_trade_user_cart_add_1d";;"dws_trade_user_order_1d" )hive -e "$dws_trade_user_order_1d";;"dws_trade_user_order_refund_1d" )hive -e "$dws_trade_user_order_refund_1d";;"dws_trade_user_payment_1d" )hive -e "$dws_trade_user_payment_1d";;"dws_trade_user_sku_order_1d" )hive -e "$dws_trade_user_sku_order_1d";;"dws_trade_user_sku_order_refund_1d" )hive -e "$dws_trade_user_sku_order_refund_1d";;"dws_traffic_page_visitor_page_view_1d" )hive -e "$dws_traffic_page_visitor_page_view_1d";;"dws_traffic_session_page_view_1d" )hive -e "$dws_traffic_session_page_view_1d";;"all" )hive -e "$dws_trade_province_order_1d$dws_trade_user_cart_add_1d$dws_trade_user_order_1d$dws_trade_user_order_refund_1d$dws_trade_user_payment_1d$dws_trade_user_sku_order_1d$dws_trade_user_sku_order_refund_1d$dws_traffic_page_visitor_page_view_1d$dws_traffic_session_page_view_1d";;
esac
(2)增加脚本执行的权限
[root@hadoop102 bin]$ chmod +x dwd_to_dws_1d_init.sh
(3)使用脚本进行首日同步
[root@hadoop102 bin]$ dwd_to_dws_1d_init.sh all 2022-05-01
2.10.2每日数据装载脚本编写
(1)在hadoop102的**/home/hadoop/bin目录下创建dwd_to_dws_1d.sh**
脚本内容如下所示:
#!/bin/bash
APP=gmall# 如果输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidws_trade_province_order_1d="
insert overwrite table ${APP}.dws_trade_province_order_1d partition(dt='$do_date')
selectprovince_id,province_name,area_code,iso_code,iso_3166_2,order_count_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1d
from
(selectprovince_id,count(distinct(order_id)) order_count_1d,sum(split_original_amount) order_original_amount_1d,sum(nvl(split_activity_amount,0)) activity_reduce_amount_1d,sum(nvl(split_coupon_amount,0)) coupon_reduce_amount_1d,sum(split_total_amount) order_total_amount_1dfrom ${APP}.dwd_trade_order_detail_incwhere dt='$do_date'group by province_id
)o
left join
(selectid,province_name,area_code,iso_code,iso_3166_2from ${APP}.dim_province_fullwhere dt='$do_date'
)p
on o.province_id=p.id;
"
dws_trade_user_cart_add_1d="
insert overwrite table ${APP}.dws_trade_user_cart_add_1d partition(dt='$do_date')
selectuser_id,count(*),sum(sku_num)
from ${APP}.dwd_trade_cart_add_inc
where dt='$do_date'
group by user_id;
"
dws_trade_user_order_1d="
insert overwrite table ${APP}.dws_trade_user_order_1d partition(dt='$do_date')
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_original_amount),sum(nvl(split_activity_amount,0)),sum(nvl(split_coupon_amount,0)),sum(split_total_amount)
from ${APP}.dwd_trade_order_detail_inc
where dt='$do_date'
group by user_id;
"
dws_trade_user_order_refund_1d="
insert overwrite table ${APP}.dws_trade_user_order_refund_1d partition(dt='$do_date')
selectuser_id,count(*),sum(refund_num),sum(refund_amount)
from ${APP}.dwd_trade_order_refund_inc
where dt='$do_date'
group by user_id;
"
dws_trade_user_payment_1d="
insert overwrite table ${APP}.dws_trade_user_payment_1d partition(dt='$do_date')
selectuser_id,count(distinct(order_id)),sum(sku_num),sum(split_payment_amount)
from ${APP}.dwd_trade_pay_detail_suc_inc
where dt='$do_date'
group by user_id;
"
dws_trade_user_sku_order_1d="
insert overwrite table ${APP}.dws_trade_user_sku_order_1d partition(dt='$do_date')
selectuser_id,id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_count,order_num,order_original_amount,activity_reduce_amount,coupon_reduce_amount,order_total_amount
from
(selectuser_id,sku_id,count(*) order_count,sum(sku_num) order_num,sum(split_original_amount) order_original_amount,sum(nvl(split_activity_amount,0)) activity_reduce_amount,sum(nvl(split_coupon_amount,0)) coupon_reduce_amount,sum(split_total_amount) order_total_amountfrom ${APP}.dwd_trade_order_detail_incwhere dt='$do_date'group by user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom ${APP}.dim_sku_fullwhere dt='$do_date'
)sku
on od.sku_id=sku.id;
"
dws_trade_user_sku_order_refund_1d="
insert overwrite table ${APP}.dws_trade_user_sku_order_refund_1d partition(dt='$do_date')
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,order_refund_count,order_refund_num,order_refund_amount
from
(selectuser_id,sku_id,count(*) order_refund_count,sum(refund_num) order_refund_num,sum(refund_amount) order_refund_amountfrom ${APP}.dwd_trade_order_refund_incwhere dt='$do_date'group by user_id,sku_id
)od
left join
(selectid,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_namefrom ${APP}.dim_sku_fullwhere dt='$do_date'
)sku
on od.sku_id=sku.id;
"
dws_traffic_page_visitor_page_view_1d="
insert overwrite table ${APP}.dws_traffic_page_visitor_page_view_1d partition(dt='$do_date')
selectmid_id,brand,model,operate_system,page_id,sum(during_time),count(*)
from ${APP}.dwd_traffic_page_view_inc
where dt='$do_date'
group by mid_id,brand,model,operate_system,page_id;
"
dws_traffic_session_page_view_1d="
insert overwrite table ${APP}.dws_traffic_session_page_view_1d partition(dt='$do_date')
selectsession_id,mid_id,brand,model,operate_system,version_code,channel,sum(during_time),count(*)
from ${APP}.dwd_traffic_page_view_inc
where dt='$do_date'
group by session_id,mid_id,brand,model,operate_system,version_code,channel;
"case $1 in"dws_trade_province_order_1d" )hive -e "$dws_trade_province_order_1d";;"dws_trade_user_cart_add_1d" )hive -e "$dws_trade_user_cart_add_1d";;"dws_trade_user_order_1d" )hive -e "$dws_trade_user_order_1d";;"dws_trade_user_order_refund_1d" )hive -e "$dws_trade_user_order_refund_1d";;"dws_trade_user_payment_1d" )hive -e "$dws_trade_user_payment_1d";;"dws_trade_user_sku_order_1d" )hive -e "$dws_trade_user_sku_order_1d";;"dws_trade_user_sku_order_refund_1d" )hive -e "$dws_trade_user_sku_order_refund_1d";;"dws_traffic_page_visitor_page_view_1d" )hive -e "$dws_traffic_page_visitor_page_view_1d";;"dws_traffic_session_page_view_1d" )hive -e "$dws_traffic_session_page_view_1d";;"all" )hive -e "$dws_trade_province_order_1d$dws_trade_user_cart_add_1d$dws_trade_user_order_1d$dws_trade_user_order_refund_1d$dws_trade_user_payment_1d$dws_trade_user_sku_order_1d$dws_trade_user_sku_order_refund_1d$dws_traffic_page_visitor_page_view_1d$dws_traffic_session_page_view_1d";;
esac
(2)增加脚本执行的权限
[root@hadoop102 bin]$ chmod +x dwd_to_dws_1d.sh
(3)使用脚本进行首日同步
[root@hadoop102 bin]$ dwd_to_dws_1d.sh all 2022-05-02
3.DWS层最近n日汇总表设计
3.1 交易域用户商品粒度订单(下单)最近n日汇总表
3.1.1建表语句
DROP TABLE IF EXISTS dws_trade_user_sku_order_nd;
CREATE EXTERNAL TABLE dws_trade_user_sku_order_nd
(`user_id` STRING COMMENT '用户id',`sku_id` STRING COMMENT 'sku_id',`sku_name` STRING COMMENT 'sku名称',`category1_id` STRING COMMENT '一级分类id',`category1_name` STRING COMMENT '一级分类名称',`category2_id` STRING COMMENT '一级分类id',`category2_name` STRING COMMENT '一级分类名称',`category3_id` STRING COMMENT '一级分类id',`category3_name` STRING COMMENT '一级分类名称',`tm_id` STRING COMMENT '品牌id',`tm_name` STRING COMMENT '品牌名称',`order_count_7d` STRING COMMENT '最近7日下单次数',`order_num_7d` BIGINT COMMENT '最近7日下单件数',`order_original_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单原始金额',`activity_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日活动优惠金额',`coupon_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日优惠券优惠金额',`order_total_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单最终金额',`order_count_30d` BIGINT COMMENT '最近30日下单次数',`order_num_30d` BIGINT COMMENT '最近30日下单件数',`order_original_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单原始金额',`activity_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日活动优惠金额',`coupon_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日优惠券优惠金额',`order_total_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单最终金额'
) COMMENT '交易域用户商品粒度订单最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_sku_order_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.1.2数据装载
insert overwrite table dws_trade_user_sku_order_nd partition(dt='2020-05-01')
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,sum(if(dt>=date_add('2020-05-01',-6),order_count_1d,0)),sum(if(dt>=date_add('2020-05-01',-6),order_num_1d,0)),sum(if(dt>=date_add('2020-05-01',-6),order_original_amount_1d,0)),sum(if(dt>=date_add('2020-05-01',-6),activity_reduce_amount_1d,0)),sum(if(dt>=date_add('2020-05-01',-6),coupon_reduce_amount_1d,0)),sum(if(dt>=date_add('2020-05-01',-6),order_total_amount_1d,0)),sum(order_count_1d),sum(order_num_1d),sum(order_original_amount_1d),sum(activity_reduce_amount_1d),sum(coupon_reduce_amount_1d),sum(order_total_amount_1d)
from dws_trade_user_sku_order_1d
where dt>=date_add('2020-05-01',-29)
group by user_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name;
3.2 交易域用户商品粒度退单最近n日汇总表
3.2.1建表语句
DROP TABLE IF EXISTS dws_trade_user_sku_order_refund_nd;
CREATE EXTERNAL TABLE dws_trade_user_sku_order_refund_nd
(`user_id` STRING COMMENT '用户id',`sku_id` STRING COMMENT 'sku_id',`sku_name` STRING COMMENT 'sku名称',`category1_id` STRING COMMENT '一级分类id',`category1_name` STRING COMMENT '一级分类名称',`category2_id` STRING COMMENT '一级分类id',`category2_name` STRING COMMENT '一级分类名称',`category3_id` STRING COMMENT '一级分类id',`category3_name` STRING COMMENT '一级分类名称',`tm_id` STRING COMMENT '品牌id',`tm_name` STRING COMMENT '品牌名称',`order_refund_count_7d` BIGINT COMMENT '最近7日退单次数',`order_refund_num_7d` BIGINT COMMENT '最近7日退单件数',`order_refund_amount_7d` DECIMAL(16, 2) COMMENT '最近7日退单金额',`order_refund_count_30d` BIGINT COMMENT '最近30日退单次数',`order_refund_num_30d` BIGINT COMMENT '最近30日退单件数',`order_refund_amount_30d` DECIMAL(16, 2) COMMENT '最近30日退单金额'
) COMMENT '交易域用户商品粒度退单最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_sku_order_refund_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.2.2数据装载
insert overwrite table dws_trade_user_sku_order_refund_nd partition(dt='2022-05-01')
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,sum(if(dt>=date_add('2022-05-01',-6),order_refund_count_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_refund_num_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_refund_amount_1d,0)),sum(order_refund_count_1d),sum(order_refund_num_1d),sum(order_refund_amount_1d)
from dws_trade_user_sku_order_refund_1d
where dt>=date_add('2022-05-01',-29)
and dt<='2022-05-01'
group by user_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name;
3.3 交易域用户粒度订单(下单)最近n日汇总表
3.3.1建表语句
DROP TABLE IF EXISTS dws_trade_user_order_nd;
CREATE EXTERNAL TABLE dws_trade_user_order_nd
(`user_id` STRING COMMENT '用户id',`order_count_7d` BIGINT COMMENT '最近7日下单次数',`order_num_7d` BIGINT COMMENT '最近7日下单商品件数',`order_original_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单原始金额',`activity_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单活动优惠金额',`coupon_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单优惠券优惠金额',`order_total_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单最终金额',`order_count_30d` BIGINT COMMENT '最近30日下单次数',`order_num_30d` BIGINT COMMENT '最近30日下单商品件数',`order_original_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单原始金额',`activity_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单活动优惠金额',`coupon_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单优惠券优惠金额',`order_total_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单最终金额'
) COMMENT '交易域用户粒度订单最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_order_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.3.2数据装载
insert overwrite table dws_trade_user_order_nd partition(dt='2022-05-01')
selectuser_id,sum(if(dt>=date_add('2022-05-01',-6),order_count_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_num_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_original_amount_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),activity_reduce_amount_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),coupon_reduce_amount_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_total_amount_1d,0)),sum(order_count_1d),sum(order_num_1d),sum(order_original_amount_1d),sum(activity_reduce_amount_1d),sum(coupon_reduce_amount_1d),sum(order_total_amount_1d)
from dws_trade_user_order_1d
where dt>=date_add('2022-05-01',-29)
and dt<='2022-05-01'
group by user_id;
3.4 交易域用户粒度加购最近n日汇总表
3.4.1建表语句
DROP TABLE IF EXISTS dws_trade_user_cart_add_nd;
CREATE EXTERNAL TABLE dws_trade_user_cart_add_nd
(`user_id` STRING COMMENT '用户id',`cart_add_count_7d` BIGINT COMMENT '最近7日加购次数',`cart_add_num_7d` BIGINT COMMENT '最近7日加购商品件数',`cart_add_count_30d` BIGINT COMMENT '最近30日加购次数',`cart_add_num_30d` BIGINT COMMENT '最近30日加购商品件数'
) COMMENT '交易域用户粒度加购最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_cart_add_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.4.2数据装载
insert overwrite table dws_trade_user_cart_add_nd partition(dt='2022-05-01')
selectuser_id,sum(if(dt>=date_add('2022-05-01',-6),cart_add_count_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),cart_add_num_1d,0)),sum(cart_add_count_1d),sum(cart_add_num_1d)
from dws_trade_user_cart_add_1d
where dt>=date_add('2022-05-01',-29)
and dt<='2022-05-01'
group by user_id;
3.5 交易域用户粒度支付最近n日汇总表
3.5.1建表语句
DROP TABLE IF EXISTS dws_trade_user_payment_nd;
CREATE EXTERNAL TABLE dws_trade_user_payment_nd
(`user_id` STRING COMMENT '用户id',`payment_count_7d` BIGINT COMMENT '最近7日支付次数',`payment_num_7d` BIGINT COMMENT '最近7日支付商品件数',`payment_amount_7d` DECIMAL(16, 2) COMMENT '最近7日支付金额',`payment_count_30d` BIGINT COMMENT '最近30日支付次数',`payment_num_30d` BIGINT COMMENT '最近30日支付商品件数',`payment_amount_30d` DECIMAL(16, 2) COMMENT '最近30日支付金额'
) COMMENT '交易域用户粒度支付最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_payment_nd'
TBLPROPERTIES ('orc.compress' = 'snappy');
3.5.2数据装载
insert overwrite table dws_trade_user_payment_nd partition (dt = '2022-05-01')
select user_id,sum(if(dt >= date_add('2022-05-01', -6), payment_count_1d, 0)),sum(if(dt >= date_add('2022-05-01', -6), payment_num_1d, 0)),sum(if(dt >= date_add('2022-05-01', -6), payment_amount_1d, 0)),sum(payment_count_1d),sum(payment_num_1d),sum(payment_amount_1d)
from dws_trade_user_payment_1d
where dt >= date_add('2022-05-01', -29)and dt <= '2022-05-01'
group by user_id;
3.6 交易域省份粒度订单最近n日汇总表
3.6.1建表语句
建表语句
DROP TABLE IF EXISTS dws_trade_province_order_nd;
CREATE EXTERNAL TABLE dws_trade_province_order_nd
(`province_id` STRING COMMENT '用户id',`province_name` STRING COMMENT '省份名称',`area_code` STRING COMMENT '地区编码',`iso_code` STRING COMMENT '旧版ISO-3166-2编码',`iso_3166_2` STRING COMMENT '新版版ISO-3166-2编码',`order_count_7d` BIGINT COMMENT '最近7日下单次数',`order_original_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单原始金额',`activity_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单活动优惠金额',`coupon_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单优惠券优惠金额',`order_total_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单最终金额',`order_count_30d` BIGINT COMMENT '最近30日下单次数',`order_original_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单原始金额',`activity_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单活动优惠金额',`coupon_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单优惠券优惠金额',`order_total_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单最终金额'
) COMMENT '交易域省份粒度订单最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_province_order_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.6.2数据装载
insert overwrite table dws_trade_province_order_nd partition(dt='2022-05-01')
selectprovince_id,province_name,area_code,iso_code,iso_3166_2,sum(if(dt>=date_add('2022-05-01',-6),order_count_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_original_amount_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),activity_reduce_amount_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),coupon_reduce_amount_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_total_amount_1d,0)),sum(order_count_1d),sum(order_original_amount_1d),sum(activity_reduce_amount_1d),sum(coupon_reduce_amount_1d),sum(order_total_amount_1d)
from dws_trade_province_order_1d
where dt>=date_add('2022-05-01',-29)
and dt<='2022-05-01'
group by province_id,province_name,area_code,iso_code,iso_3166_2;
3.7 交易域优惠券粒度订单最近n日汇总表
3.7.1建表语句
DROP TABLE IF EXISTS dws_trade_coupon_order_nd;
CREATE EXTERNAL TABLE dws_trade_coupon_order_nd
(`coupon_id` STRING COMMENT '优惠券id',`coupon_name` STRING COMMENT '优惠券名称',`coupon_type_code` STRING COMMENT '优惠券类型id',`coupon_type_name` STRING COMMENT '优惠券类型名称',`coupon_rule` STRING COMMENT '优惠券规则',`start_date` STRING COMMENT '发布日期',`original_amount_30d` DECIMAL(16, 2) COMMENT '使用下单原始金额',`coupon_reduce_amount_30d` DECIMAL(16, 2) COMMENT '使用下单优惠金额'
) COMMENT '交易域优惠券粒度订单最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_coupon_order_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.7.2数据装载
insert overwrite table dws_trade_coupon_order_nd partition(dt='2022-05-01')
selectid,coupon_name,coupon_type_code,coupon_type_name,benefit_rule,start_date,sum(split_original_amount),sum(split_coupon_amount)
from
(selectid,coupon_name,coupon_type_code,coupon_type_name,benefit_rule,date_format(start_time,'yyyy-MM-dd') start_datefrom dim_coupon_fullwhere dt='2022-05-01'and date_format(start_time,'yyyy-MM-dd')>=date_add('2022-05-01',-29)
)cou
left join
(selectcoupon_id,order_id,split_original_amount,split_coupon_amountfrom dwd_trade_order_detail_incwhere dt>=date_add('2022-05-01',-29)and dt<='2022-05-01'and coupon_id is not null
)od
on cou.id=od.coupon_id
group by id,coupon_name,coupon_type_code,coupon_type_name,benefit_rule,start_date;
3.8 交易域活动粒度订单最近n日汇总表
3.8.1建表语句
DROP TABLE IF EXISTS dws_trade_activity_order_nd;
CREATE EXTERNAL TABLE dws_trade_activity_order_nd
(`activity_id` STRING COMMENT '活动id',`activity_name` STRING COMMENT '活动名称',`activity_type_code` STRING COMMENT '活动类型编码',`activity_type_name` STRING COMMENT '活动类型名称',`start_date` STRING COMMENT '发布日期',`original_amount_30d` DECIMAL(16, 2) COMMENT '参与活动订单原始金额',`activity_reduce_amount_30d` DECIMAL(16, 2) COMMENT '参与活动订单优惠金额'
) COMMENT '交易域活动粒度订单最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_activity_order_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.8.2数据装载
insert overwrite table dws_trade_activity_order_nd partition(dt='2022-05-01')
selectact.activity_id,activity_name,activity_type_code,activity_type_name,date_format(start_time,'yyyy-MM-dd'),sum(split_original_amount),sum(split_activity_amount)
from
(selectactivity_id,activity_name,activity_type_code,activity_type_name,start_timefrom dim_activity_fullwhere dt='2022-05-01'and date_format(start_time,'yyyy-MM-dd')>=date_add('2022-05-01',-29)group by activity_id, activity_name, activity_type_code, activity_type_name,start_time
)act
left join
(selectactivity_id,order_id,split_original_amount,split_activity_amountfrom dwd_trade_order_detail_incwhere dt>=date_add('2022-05-01',-29)and dt<='2022-05-01'and activity_id is not null
)od
on act.activity_id=od.activity_id
group by act.activity_id,activity_name,activity_type_code,activity_type_name,start_time;
3.9 交易域用户粒度退单最近n日汇总表
3.9.1建表语句
DROP TABLE IF EXISTS dws_trade_user_order_refund_nd;
CREATE EXTERNAL TABLE dws_trade_user_order_refund_nd
(`user_id` STRING COMMENT '用户id',`order_refund_count_7d` BIGINT COMMENT '最近7日退单次数',`order_refund_num_7d` BIGINT COMMENT '最近7日退单商品件数',`order_refund_amount_7d` DECIMAL(16, 2) COMMENT '最近7日退单金额',`order_refund_count_30d` BIGINT COMMENT '最近30日退单次数',`order_refund_num_30d` BIGINT COMMENT '最近30日退单商品件数',`order_refund_amount_30d` DECIMAL(16, 2) COMMENT '最近30日退单金额'
) COMMENT '交易域用户粒度退单最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_order_refund_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.9.2数据装载
insert overwrite table dws_trade_user_order_refund_nd partition(dt='2022-05-01')
selectuser_id,sum(if(dt>=date_add('2022-05-01',-6),order_refund_count_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_refund_num_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),order_refund_amount_1d,0)),sum(order_refund_count_1d),sum(order_refund_num_1d),sum(order_refund_amount_1d)
from dws_trade_user_order_refund_1d
where dt>=date_add('2022-05-01',-29)
and dt<='2022-05-01'
group by user_id;
3.10 流量域访客页面粒度页面浏览最近n日汇总表
3.10.1建表语句
DROP TABLE IF EXISTS dws_traffic_page_visitor_page_view_nd;
CREATE EXTERNAL TABLE dws_traffic_page_visitor_page_view_nd
(`mid_id` STRING COMMENT '访客id',`brand` string comment '手机品牌',`model` string comment '手机型号',`operate_system` string comment '操作系统',`page_id` STRING COMMENT '页面id',`during_time_7d` BIGINT COMMENT '最近7日浏览时长',`view_count_7d` BIGINT COMMENT '最近7日访问次数',`during_time_30d` BIGINT COMMENT '最近30日浏览时长',`view_count_30d` BIGINT COMMENT '最近30日访问次数'
) COMMENT '流量域访客页面粒度页面浏览最近n日汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_traffic_page_visitor_page_view_nd'TBLPROPERTIES ('orc.compress' = 'snappy');
3.10.2数据装载
insert overwrite table dws_traffic_page_visitor_page_view_nd partition(dt='2022-05-01')
selectmid_id,brand,model,operate_system,page_id,sum(if(dt>=date_add('2022-05-01',-6),during_time_1d,0)),sum(if(dt>=date_add('2022-05-01',-6),view_count_1d,0)),sum(during_time_1d),sum(view_count_1d)
from dws_traffic_page_visitor_page_view_1d
where dt>=date_add('2022-05-01',-29)
and dt<='2022-05-01'
group by mid_id,brand,model,operate_system,page_id;
3.11 数据装载脚本编写
(1)在hadoop102的**/home/hadoop/bin**目录下创建dws_1d_to_dws_nd.sh
脚本的内容如下所示:
#!/bin/bash
APP=gmall# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidws_trade_activity_order_nd="
insert overwrite table ${APP}.dws_trade_activity_order_nd partition(dt='$do_date')
selectact.activity_id,activity_name,activity_type_code,activity_type_name,date_format(start_time,'yyyy-MM-dd'),sum(split_original_amount),sum(split_activity_amount)
from
(selectactivity_id,activity_name,activity_type_code,activity_type_name,start_timefrom ${APP}.dim_activity_fullwhere dt='$do_date'and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)group by activity_id, activity_name, activity_type_code, activity_type_name,start_time
)act
left join
(selectactivity_id,order_id,split_original_amount,split_activity_amountfrom ${APP}.dwd_trade_order_detail_incwhere dt>=date_add('$do_date',-29)and dt<='$do_date'and activity_id is not null
)od
on act.activity_id=od.activity_id
group by act.activity_id,activity_name,activity_type_code,activity_type_name,start_time;
"
dws_trade_coupon_order_nd="
insert overwrite table ${APP}.dws_trade_coupon_order_nd partition(dt='$do_date')
selectid,coupon_name,coupon_type_code,coupon_type_name,benefit_rule,start_date,sum(split_original_amount),sum(split_coupon_amount)
from
(selectid,coupon_name,coupon_type_code,coupon_type_name,benefit_rule,date_format(start_time,'yyyy-MM-dd') start_datefrom ${APP}.dim_coupon_fullwhere dt='$do_date'and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
)cou
left join
(selectcoupon_id,order_id,split_original_amount,split_coupon_amountfrom ${APP}.dwd_trade_order_detail_incwhere dt>=date_add('$do_date',-29)and dt<='$do_date'and coupon_id is not null
)od
on cou.id=od.coupon_id
group by id,coupon_name,coupon_type_code,coupon_type_name,benefit_rule,start_date;
"
dws_trade_province_order_nd="
insert overwrite table ${APP}.dws_trade_province_order_nd partition(dt='$do_date')
selectprovince_id,province_name,area_code,iso_code,iso_3166_2,sum(if(dt>=date_add('$do_date',-6),order_count_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_original_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),activity_reduce_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),coupon_reduce_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_total_amount_1d,0)),sum(order_count_1d),sum(order_original_amount_1d),sum(activity_reduce_amount_1d),sum(coupon_reduce_amount_1d),sum(order_total_amount_1d)
from ${APP}.dws_trade_province_order_1d
where dt>=date_add('$do_date',-29)
and dt<='$do_date'
group by province_id,province_name,area_code,iso_code,iso_3166_2;
"
dws_trade_user_cart_add_nd="
insert overwrite table ${APP}.dws_trade_user_cart_add_nd partition(dt='$do_date')
selectuser_id,sum(if(dt>=date_add('$do_date',-6),cart_add_count_1d,0)),sum(if(dt>=date_add('$do_date',-6),cart_add_num_1d,0)),sum(cart_add_count_1d),sum(cart_add_num_1d)
from ${APP}.dws_trade_user_cart_add_1d
where dt>=date_add('$do_date',-29)
and dt<='$do_date'
group by user_id;
"
dws_trade_user_order_nd="
insert overwrite table ${APP}.dws_trade_user_order_nd partition(dt='$do_date')
selectuser_id,sum(if(dt>=date_add('$do_date',-6),order_count_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_num_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_original_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),activity_reduce_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),coupon_reduce_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_total_amount_1d,0)),sum(order_count_1d),sum(order_num_1d),sum(order_original_amount_1d),sum(activity_reduce_amount_1d),sum(coupon_reduce_amount_1d),sum(order_total_amount_1d)
from ${APP}.dws_trade_user_order_1d
where dt>=date_add('$do_date',-29)
and dt<='$do_date'
group by user_id;
"
dws_trade_user_order_refund_nd="
insert overwrite table ${APP}.dws_trade_user_order_refund_nd partition(dt='$do_date')
selectuser_id,sum(if(dt>=date_add('$do_date',-6),order_refund_count_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_refund_num_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_refund_amount_1d,0)),sum(order_refund_count_1d),sum(order_refund_num_1d),sum(order_refund_amount_1d)
from ${APP}.dws_trade_user_order_refund_1d
where dt>=date_add('$do_date',-29)
and dt<='$do_date'
group by user_id;
"
dws_trade_user_payment_nd="
insert overwrite table ${APP}.dws_trade_user_payment_nd partition (dt = '$do_date')
select user_id,sum(if(dt >= date_add('$do_date', -6), payment_count_1d, 0)),sum(if(dt >= date_add('$do_date', -6), payment_num_1d, 0)),sum(if(dt >= date_add('$do_date', -6), payment_amount_1d, 0)),sum(payment_count_1d),sum(payment_num_1d),sum(payment_amount_1d)
from ${APP}.dws_trade_user_payment_1d
where dt >= date_add('$do_date', -29)and dt <= '$do_date'
group by user_id;
"
dws_trade_user_sku_order_nd="
insert overwrite table ${APP}.dws_trade_user_sku_order_nd partition(dt='$do_date')
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,sum(if(dt>=date_add('$do_date',-6),order_count_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_num_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_original_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),activity_reduce_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),coupon_reduce_amount_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_total_amount_1d,0)),sum(order_count_1d),sum(order_num_1d),sum(order_original_amount_1d),sum(activity_reduce_amount_1d),sum(coupon_reduce_amount_1d),sum(order_total_amount_1d)
from ${APP}.dws_trade_user_sku_order_1d
where dt>=date_add('$do_date',-30)
group by user_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name;
"
dws_trade_user_sku_order_refund_nd="
insert overwrite table ${APP}.dws_trade_user_sku_order_refund_nd partition(dt='$do_date')
selectuser_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name,sum(if(dt>=date_add('$do_date',-6),order_refund_count_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_refund_num_1d,0)),sum(if(dt>=date_add('$do_date',-6),order_refund_amount_1d,0)),sum(order_refund_count_1d),sum(order_refund_num_1d),sum(order_refund_amount_1d)
from ${APP}.dws_trade_user_sku_order_refund_1d
where dt>=date_add('$do_date',-29)
and dt<='$do_date'
group by user_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name;
"
dws_traffic_page_visitor_page_view_nd="
insert overwrite table ${APP}.dws_traffic_page_visitor_page_view_nd partition(dt='$do_date')
selectmid_id,brand,model,operate_system,page_id,sum(if(dt>=date_add('$do_date',-6),during_time_1d,0)),sum(if(dt>=date_add('$do_date',-6),view_count_1d,0)),sum(during_time_1d),sum(view_count_1d)
from ${APP}.dws_traffic_page_visitor_page_view_1d
where dt>=date_add('$do_date',-29)
and dt<='$do_date'
group by mid_id,brand,model,operate_system,page_id;
"case $1 in"dws_trade_activity_order_nd" )hive -e "$dws_trade_activity_order_nd";;"dws_trade_coupon_order_nd" )hive -e "$dws_trade_coupon_order_nd";;"dws_trade_province_order_nd" )hive -e "$dws_trade_province_order_nd";;"dws_trade_user_cart_add_nd" )hive -e "$dws_trade_user_cart_add_nd";;"dws_trade_user_order_nd" )hive -e "$dws_trade_user_order_nd";;"dws_trade_user_order_refund_nd" )hive -e "$dws_trade_user_order_refund_nd";;"dws_trade_user_payment_nd" )hive -e "$dws_trade_user_payment_nd";;"dws_trade_user_sku_order_nd" )hive -e "$dws_trade_user_sku_order_nd";;"dws_trade_user_sku_order_refund_nd" )hive -e "$dws_trade_user_sku_order_refund_nd";;"dws_traffic_page_visitor_page_view_nd" )hive -e "$dws_traffic_page_visitor_page_view_nd";;"all" )hive -e "$dws_trade_activity_order_nd$dws_trade_coupon_order_nd$dws_trade_province_order_nd$dws_trade_user_cart_add_nd$dws_trade_user_order_nd$dws_trade_user_order_refund_nd$dws_trade_user_payment_nd$dws_trade_user_sku_order_nd$dws_trade_user_sku_order_refund_nd$dws_traffic_page_visitor_page_view_nd";;
esac
(3)增加脚本执行权限
[root@hadoop102 bin]$ chmod +x dws_1d_to_dws_nd.sh
(4)我们使用该脚本进行数据装载
[root@hadoop102 bin]$ dws_1d_to_dws_nd.sh all 2022-05-01
4.DWS层历史至今汇总表设计
4.1 交易域用户用户粒度订单(下单)历史至今汇总表
4.1.1建表语句
DROP TABLE IF EXISTS dws_trade_user_order_td;
CREATE EXTERNAL TABLE dws_trade_user_order_td
(`user_id` STRING COMMENT '用户id',`order_date_first` STRING COMMENT '首次下单日期',`order_date_last` STRING COMMENT '末次下单日期',`order_count_td` BIGINT COMMENT '下单次数',`order_num_td` BIGINT COMMENT '购买商品件数',`original_amount_td` DECIMAL(16, 2) COMMENT '原始金额',`activity_reduce_amount_td` DECIMAL(16, 2) COMMENT '活动优惠金额',`coupon_reduce_amount_td` DECIMAL(16, 2) COMMENT '优惠券优惠金额',`total_amount_td` DECIMAL(16, 2) COMMENT '最终金额'
) COMMENT '交易域用户粒度订单历史至今汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_order_td'TBLPROPERTIES ('orc.compress' = 'snappy');
4.1.2数据装载
1)首日装载:
insert overwrite table dws_trade_user_order_td partition(dt='2022-05-01')
selectuser_id,min(dt) login_date_first,max(dt) login_date_last,sum(order_count_1d) order_count,sum(order_num_1d) order_num,sum(order_original_amount_1d) original_amount,sum(activity_reduce_amount_1d) activity_reduce_amount,sum(coupon_reduce_amount_1d) coupon_reduce_amount,sum(order_total_amount_1d) total_amount
from dws_trade_user_order_1d
group by user_id;
2)每日装载:
insert overwrite table dws_trade_user_order_td partition(dt='2022-05-02')
selectnvl(old.user_id,new.user_id),if(new.user_id is not null and old.user_id is null,'2022-05-02',old.order_date_first),if(new.user_id is not null,'2022-05-02',old.order_date_last),nvl(old.order_count_td,0)+nvl(new.order_count_1d,0),nvl(old.order_num_td,0)+nvl(new.order_num_1d,0),nvl(old.original_amount_td,0)+nvl(new.order_original_amount_1d,0),nvl(old.activity_reduce_amount_td,0)+nvl(new.activity_reduce_amount_1d,0),nvl(old.coupon_reduce_amount_td,0)+nvl(new.coupon_reduce_amount_1d,0),nvl(old.total_amount_td,0)+nvl(new.order_total_amount_1d,0)
from
(selectuser_id,order_date_first,order_date_last,order_count_td,order_num_td,original_amount_td,activity_reduce_amount_td,coupon_reduce_amount_td,total_amount_tdfrom dws_trade_user_order_tdwhere dt=date_add('2022-05-02',-1)
)old
full outer join
(selectuser_id,order_count_1d,order_num_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1dfrom dws_trade_user_order_1dwhere dt='2022-05-02'
)new
on old.user_id=new.user_id;
4.2 交易域用户商品粒度支付历史至今汇总表
4.2.1建表语句
DROP TABLE IF EXISTS dws_trade_user_payment_td;
CREATE EXTERNAL TABLE dws_trade_user_payment_td
(`user_id` STRING COMMENT '用户id',`payment_date_first` STRING COMMENT '首次支付日期',`payment_date_last` STRING COMMENT '末次支付日期',`payment_count_td` BIGINT COMMENT '最近7日支付次数',`payment_num_td` BIGINT COMMENT '最近7日支付商品件数',`payment_amount_td` DECIMAL(16, 2) COMMENT '最近7日支付金额'
) COMMENT '交易域用户粒度支付历史至今汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_trade_user_payment_td'TBLPROPERTIES ('orc.compress' = 'snappy');
4.2.2数据装载
1)首日装载:
insert overwrite table dws_trade_user_payment_td partition(dt='2022-05-01')
selectuser_id,min(dt) payment_date_first,max(dt) payment_date_last,sum(payment_count_1d) payment_count,sum(payment_num_1d) payment_num,sum(payment_amount_1d) payment_amount
from dws_trade_user_payment_1d
group by user_id;
2)每日装载:
insert overwrite table dws_trade_user_payment_td partition(dt='2022-05-02')
selectnvl(old.user_id,new.user_id),if(old.user_id is null and new.user_id is not null,'2022-05-02',old.payment_date_first),if(new.user_id is not null,'2022-05-02',old.payment_date_last),nvl(old.payment_count_td,0)+nvl(new.payment_count_1d,0),nvl(old.payment_num_td,0)+nvl(new.payment_num_1d,0),nvl(old.payment_amount_td,0)+nvl(new.payment_amount_1d,0)
from
(selectuser_id,payment_date_first,payment_date_last,payment_count_td,payment_num_td,payment_amount_tdfrom dws_trade_user_payment_tdwhere dt=date_add('2022-05-02',-1)
)old
full outer join
(selectuser_id,payment_count_1d,payment_num_1d,payment_amount_1dfrom dws_trade_user_payment_1dwhere dt='2022-05-02'
)new
on old.user_id=new.user_id;
4.3 用户域用户登录历史至今汇总表
4.3.1建表语句
DROP TABLE IF EXISTS dws_user_user_login_td;
CREATE EXTERNAL TABLE dws_user_user_login_td
(`user_id` STRING COMMENT '用户id',`login_date_last` STRING COMMENT '末次登录日期',`login_count_td` BIGINT COMMENT '累计登录次数'
) COMMENT '用户域用户粒度登录历史至今汇总事实表'PARTITIONED BY (`dt` STRING)STORED AS ORCLOCATION '/warehouse/gmall/dws/dws_user_user_login_td'TBLPROPERTIES ('orc.compress' = 'snappy');
4.3.2数据装载
1)首日装载:
insert overwrite table dws_user_user_login_td partition(dt='2022-05-01')
selectu.id,nvl(login_date_last,date_format(create_time,'yyyy-MM-dd')),nvl(login_count_td,1)
from
(selectid,create_timefrom dim_user_zipwhere dt='9999-12-31'
)u
left join
(selectuser_id,max(dt) login_date_last,count(*) login_count_tdfrom dwd_user_login_incgroup by user_id
)l
on u.id=l.user_id;
2)每日装载:
insert overwrite table dws_user_user_login_td partition(dt='2022-05-02')
selectnvl(old.user_id,new.user_id),if(new.user_id is null,old.login_date_last,'2022-05-02'),nvl(old.login_count_td,0)+nvl(new.login_count_1d,0)
from
(selectuser_id,login_date_last,login_count_tdfrom dws_user_user_login_tdwhere dt=date_add('2022-05-02',-1)
)old
full outer join
(selectuser_id,count(*) login_count_1dfrom dwd_user_login_incwhere dt='2022-05-02'group by user_id
)new
on old.user_id=new.user_id;
4.4 数据装载脚本编写
2.10.1首日数据装载脚本编写
(1)在hadoop102的**/home/hadoop/bin目录下创建dwd_to_dws_td_init.sh**
脚本内容如下所示:
#!/bin/bash
APP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "请传入日期参数"exit
fidws_trade_user_order_td="
insert overwrite table ${APP}.dws_trade_user_order_td partition(dt='$do_date')
selectuser_id,min(dt) login_date_first,max(dt) login_date_last,sum(order_count_1d) order_count,sum(order_num_1d) order_num,sum(order_original_amount_1d) original_amount,sum(activity_reduce_amount_1d) activity_reduce_amount,sum(coupon_reduce_amount_1d) coupon_reduce_amount,sum(order_total_amount_1d) total_amount
from ${APP}.dws_trade_user_order_1d
group by user_id;
"dws_trade_user_payment_td="
insert overwrite table ${APP}.dws_trade_user_payment_td partition(dt='$do_date')
selectuser_id,min(dt) payment_date_first,max(dt) payment_date_last,sum(payment_count_1d) payment_count,sum(payment_num_1d) payment_num,sum(payment_amount_1d) payment_amount
from ${APP}.dws_trade_user_payment_1d
group by user_id;
"dws_user_user_login_td="
insert overwrite table ${APP}.dws_user_user_login_td partition(dt='$do_date')
selectu.id,nvl(login_date_last,date_format(create_time,'yyyy-MM-dd')),nvl(login_count_td,1)
from
(selectid,create_timefrom ${APP}.dim_user_zipwhere dt='9999-12-31'
)u
left join
(selectuser_id,max(dt) login_date_last,count(*) login_count_tdfrom ${APP}.dwd_user_login_incgroup by user_id
)l
on u.id=l.user_id;
"case $1 in"dws_trade_user_order_td" )hive -e "$dws_trade_user_order_td";;"dws_trade_user_payment_td" )hive -e "$dws_trade_user_payment_td";;"dws_user_user_login_td" )hive -e "$dws_user_user_login_td";;"all" )hive -e "$dws_trade_user_order_td$dws_trade_user_payment_td$dws_user_user_login_td";;
esac
(2)增加脚本执行的权限
[root@hadoop102 bin]$ chmod +x dwd_to_dws_td_init.sh
(3)使用脚本进行首日同步
[root@hadoop102 bin]$ dwd_to_dws_td_init.sh all 2022-05-01
2.10.2每日数据装载脚本编写
(1)在hadoop102的**/home/hadoop/bin目录下创建dwd_to_dws_td.sh**
脚本内容如下所示:
#!/bin/bash
APP=gmall# 如果输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidws_trade_user_order_td="
insert overwrite table ${APP}.dws_trade_user_order_td partition(dt='$do_date')
selectnvl(old.user_id,new.user_id),if(new.user_id is not null and old.user_id is null,'$do_date',old.order_date_first),if(new.user_id is not null,'$do_date',old.order_date_last),nvl(old.order_count_td,0)+nvl(new.order_count_1d,0),nvl(old.order_num_td,0)+nvl(new.order_num_1d,0),nvl(old.original_amount_td,0)+nvl(new.order_original_amount_1d,0),nvl(old.activity_reduce_amount_td,0)+nvl(new.activity_reduce_amount_1d,0),nvl(old.coupon_reduce_amount_td,0)+nvl(new.coupon_reduce_amount_1d,0),nvl(old.total_amount_td,0)+nvl(new.order_total_amount_1d,0)
from
(selectuser_id,order_date_first,order_date_last,order_count_td,order_num_td,original_amount_td,activity_reduce_amount_td,coupon_reduce_amount_td,total_amount_tdfrom ${APP}.dws_trade_user_order_tdwhere dt=date_add('$do_date',-1)
)old
full outer join
(selectuser_id,order_count_1d,order_num_1d,order_original_amount_1d,activity_reduce_amount_1d,coupon_reduce_amount_1d,order_total_amount_1dfrom ${APP}.dws_trade_user_order_1dwhere dt='$do_date'
)new
on old.user_id=new.user_id;
"dws_trade_user_payment_td="
insert overwrite table ${APP}.dws_trade_user_payment_td partition(dt='$do_date')
selectnvl(old.user_id,new.user_id),if(old.user_id is null and new.user_id is not null,'$do_date',old.payment_date_first),if(new.user_id is not null,'$do_date',old.payment_date_last),nvl(old.payment_count_td,0)+nvl(new.payment_count_1d,0),nvl(old.payment_num_td,0)+nvl(new.payment_num_1d,0),nvl(old.payment_amount_td,0)+nvl(new.payment_amount_1d,0)
from
(selectuser_id,payment_date_first,payment_date_last,payment_count_td,payment_num_td,payment_amount_tdfrom ${APP}.dws_trade_user_payment_tdwhere dt=date_add('$do_date',-1)
)old
full outer join
(selectuser_id,payment_count_1d,payment_num_1d,payment_amount_1dfrom ${APP}.dws_trade_user_payment_1dwhere dt='$do_date'
)new
on old.user_id=new.user_id;
"dws_user_user_login_td="
insert overwrite table ${APP}.dws_user_user_login_td partition(dt='$do_date')
selectnvl(old.user_id,new.user_id),if(new.user_id is null,old.login_date_last,'$do_date'),nvl(old.login_count_td,0)+nvl(new.login_count_1d,0)
from
(selectuser_id,login_date_last,login_count_tdfrom ${APP}.dws_user_user_login_tdwhere dt=date_add('$do_date',-1)
)old
full outer join
(selectuser_id,count(*) login_count_1dfrom ${APP}.dwd_user_login_incwhere dt='$do_date'group by user_id
)new
on old.user_id=new.user_id;
"case $1 in"dws_trade_user_order_td" )hive -e "$dws_trade_user_order_td";;"dws_trade_user_payment_td" )hive -e "$dws_trade_user_payment_td";;"dws_user_user_login_td" )hive -e "$dws_user_user_login_td";;"all" )hive -e "$dws_trade_user_order_td$dws_trade_user_payment_td$dws_user_user_login_td";;
esac
(2)增加脚本执行的权限
[root@hadoop102 bin]$ chmod +x dwd_to_dws_td.sh
(3)使用脚本进行首日同步
[root@hadoop102 bin]$ dwd_to_dws_td.sh all 2022-05-02
11.数据仓库搭建之DWS层搭建相关推荐
- 10.数据仓库搭建之DWD层搭建
数据仓库搭建之DWD层搭建 我们在设计项目中DWD层时,需要注意以下几点: 1)DWD层的设计依据维度建模理论,该层存储维度模型当中的事实表. 2)DWD层的数据存储格式为ORC列式存储结合snapp ...
- 电商平台数据仓库搭建02-Hadoop集群搭建
1,项目说明 本项目来源于github 电商平台数据仓库搭建 . 项目为个人学习记录,项目代码及文件可访问 电商平台数据仓库搭建 获得. 2,项目准备 虚拟机准备 虚拟机开发工具为 VMware15. ...
- 数据仓库之【用户行为数仓】10:【dws层:数据汇总层】【appc层:数据应用层】需求3:用户7日流失push提醒
一.用户7日流失push提醒分析 什么是流失? 假设这个用户在2026年2月2日是新增用户,如果他在后续的7天内,也就是在2月9日内没有再使用app,则认为是流失用户,具体多少天属于流失用户,这个是需 ...
- 【离线数仓-9-数据仓库开发DWS层设计要点-1d/nd/td表设计】
离线数仓-9-数据仓库开发DWS层设计要点-1d/nd/td表设计 离线数仓-9-数据仓库开发DWS层设计要点-1d/nd/td表设计 一.DWS层设计要点 二.DWS层设计分析 - 1d/nd 1. ...
- 数据仓库搭建DWS层
本篇只是DWS层,其他内容请关注我的博客!在<项目>专栏里!!! 本篇文章参考尚硅谷大数据项目写成! 一.业务术语 1)用户 用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立 ...
- hive udf 分组取top1_项目实战从0到1之hive(27)数仓项目(九)数仓搭建 DWS 层
点击上方蓝字关注我们 一.数仓搭建 - DWS 层 1.1 业务术语 1)用户 用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户.Android 系统根据 IMEI 号,IOS 系统 ...
- 层 数据仓库_小尝试:基于指标体系的数据仓库搭建和数据可视化
关于作者:小姬,某知名互联网公司产品专家,对数据采集.生产.加工有所了解,期望多和大家交流数据知识,以数据作为提出好问题的基础,挖掘商业价值. 0x00 前言 我将整理文章分享数据工作中的经验,因为业 ...
- 数仓(十)从0到1简单搭建加载数仓DWS层
数仓(一)简介数仓,OLTP和OLAP 数仓(二)关系建模和维度建模 数仓(三)简析阿里.美团.网易.恒丰银行.马蜂窝5家数仓分层架构 数仓(四)数据仓库分层 数仓 (五) 元数据管理系统解析 数仓( ...
- 数据仓库搭建DWD层
本篇只是DWD层,其他内容请关注我的博客!在<项目>专栏里!!! 本篇文章参考尚硅谷大数据项目写成! 目录 一.用户行为日志 1.1日志格式 1.2get_json_object函数使用 ...
最新文章
- 杨泽业:让你的网站无限可能之给你的网站增加汉字转拼音的新功能
- 全新ARM base PocketPC 2003 Emulator Beta 已登場。
- GDCM:MrProtocol的测试程序
- lamp/lnmp实例
- 对acm icpc 的随笔——01
- 硬盘安装 solaris
- [html] html标签中的lang属性有什么作用?
- json输出count如何计算_基于 Kafka + Flink + Redis 的电商大屏实时计算案例
- X5 浏览器内核调研报告
- 图书条形码跟ISBN号互相转换的类 续
- 室内GPS定位初露峥嵘
- 轻松绕过PayPal双重认证
- CRYPTO buuctf 摩斯
- Nginx 配置 HTTPS 证书
- Linux监控平台搭建Zabbix(资源)
- c++primer读书笔记
- 计算机算法与程序设计知识点,算法与程序设计知识点(答案)
- .NET使用MailKit进行邮件处理
- 冷冻电镜聚类中心(2D Class)粒子图像的解析
- NTP服务器时间同步部署 -- 内网环境下,亲测有效.
热门文章
- python视频转字符详细教程_Python实现视频转字符画
- 在微软AD域环境下批量部署安装软件
- WIP Supply Type(Push,Assembly pull,Operation pull...)
- 中国绿色专利申请与授权数据
- unity的ugui-5.小地图(minimap)制作
- [iOS]UIButton+Badge
- 论文2: EMNLP2019-Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree
- 分布式链路监控与追踪系统(SpringCloud Sleuth + Zipkin)
- 编程思想-模块化-模块化设计:模块化设计
- select * 和select 1,select count(*)和select count(1)