本篇只是DWS层,其他内容请关注我的博客!在<项目>专栏里!!!

本篇文章参考尚硅谷大数据项目写成!

一、业务术语

1)用户

用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android系统根据IMEI号,IOS系统根据OpenUDID来标识一个独立用户,每部手机一个用户。

2)新增用户

首次联网使用应用的用户。如果一个用户首次打开某APP,那这个用户定义为新增用户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月新增用户。

3)活跃用户

打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计为一个活跃用户。

4)周(月)活跃用户

某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户。

5)月活跃率

月活跃用户与截止到该月累计的用户总和之间的比例。

6)沉默用户

用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用户质量和用户与APP的匹配程度。

7)版本分布

不同版本在一·个周内各天新增用户数,活跃用户数和启动次数。利于判断APP各个版本之间的优劣和用户行为习惯。

8)本周回流用户

上周未启动过应用,本周启动了应用的用户。

9)连续n周活跃用户

连续n周,每周至少启动一次。

10)忠诚用户

连续活跃5周以上的用户

11)连续活跃用户

连续2周及以上活跃的用户

12)近期流失用户

连续n(2<= n <= 4)周没有启动应用的用户。(第n+1周没有启动过)

13)留存用户

某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分用户占当时新增用户的比例即是留存率。

例如,5月份新增用户200,这200人在6月份启动过应用的有100人,7月份启动过应用的有80人,8月份启动过应用的有50人;则5月份新增用户一个月后的留存率是50%,二个月后的留存率是40%,三个月后的留存率是25%。

14)用户新鲜度

每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例。

15)单次使用时长

每次启动使用的时间长度。

16)日使用时长

累计一天内的使用时间长度。

17)启动次数计算标准

IOS平台应用退到后台就算一次独立的启动;Android平台我们规定,两次启动之间的间隔小于30秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用30秒这个标准,但用户还是可以自定义此时间间隔。

二、系统函数

2.1nvl函数

1)基本语法

NVL(表达式1,表达式2)

如果表达式1为空值,NVL返回值为表达式2的值,否则返回表达式1的值。

该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式1和表达式2的数据类型必须为同一个类型。

2)案例实操

select nvl(1,0);
select nvl(null,"hello");

2.2日期处理函数

1)date_format函数(根据格式整理日期)

select date_format('2022-05-24','yyyy-MM');

2)date_add函数(加减日期)

select date_add('2022-05-24',-1);
select date_add('2022-05-24',1);

3)next_day函数

(1)取当前天的下一个周一

select next_day('2020-06-14','MO');

说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)

(2)取当前周的周一

select date_add(next_day('2020-06-14','MO'),-7);

4)last_day函数(求当月最后一天日期)

select last_day('2020-06-14');

2.3复杂数据类型定义

1)map结构数据定义

map<string,string>

2)array结构数据定义

array<string>

3)struct结构数据定义

struct<id:int,name:string,age:int>

4)struct和array嵌套定义

array<struct<id:int,name:string,age:int>>

三、DWS层

3.1每日设备行为

1)建表语句

drop table if exists dws_uv_detail_daycount;
create external table dws_uv_detail_daycount
(`mid_id`      string COMMENT '设备id',`brand`       string COMMENT '手机品牌',`model`       string COMMENT '手机型号',`login_count` bigint COMMENT '活跃次数',`page_stats`  array<struct<page_id:string,page_count:bigint>> COMMENT '页面访问统计'
) COMMENT '每日设备行为表'
partitioned by(dt string)
stored as parquet
location '/warehouse/gmall/dws/dws_uv_detail_daycount'
tblproperties ("parquet.compression"="lzo");

2)数据导入

with
tmp_start as
(select  mid_id,brand,model,count(*) login_countfrom dwd_start_logwhere dt='2022-05-20'group by mid_id,brand,model
),
tmp_page as
(selectmid_id,brand,model,        collect_set(named_struct('page_id',page_id,'page_count',page_count)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_countfrom dwd_page_logwhere dt='2022-05-20'group by mid_id,brand,model,page_id)tmpgroup by mid_id,brand,model
)
insert overwrite table dws_uv_detail_daycount partition(dt='2022-05-20')
selectnvl(tmp_start.mid_id,tmp_page.mid_id),nvl(tmp_start.brand,tmp_page.brand),nvl(tmp_start.model,tmp_page.model),tmp_start.login_count,tmp_page.page_stats
from tmp_start
full outer join tmp_page
on tmp_start.mid_id=tmp_page.mid_id
and tmp_start.brand=tmp_page.brand
and tmp_start.model=tmp_page.model;

3)查询数据

select * from dws_uv_detail_daycount where dt='2022-05-20' limit 2;

3.2每日会员行为

1)建表语句

create external table dws_user_action_daycount
(   user_id string comment '用户 id',login_count bigint comment '登录次数',cart_count bigint comment '加入购物车次数',order_count bigint comment '下单次数',order_amount    decimal(16,2)  comment '下单金额',payment_count   bigint      comment '支付次数',payment_amount  decimal(16,2) comment '支付金额',order_detail_stats array<struct<sku_id:string,sku_num:bigint,order_count:bigint,order_amount:decimal(20,2)>> comment '下单明细统计'
) COMMENT '每日会员行为'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_user_action_daycount/'
tblproperties ("parquet.compression"="lzo");

2)数据导入

with
tmp_login as
(selectuser_id,count(*) login_countfrom dwd_start_logwhere dt='2022-05-20'and user_id is not nullgroup by user_id
),
tmp_cart as
(selectuser_id,count(*) cart_countfrom dwd_action_logwhere dt='2022-05-20'and user_id is not nulland action_id='cart_add'group by user_id
),tmp_order as
(selectuser_id,count(*) order_count,sum(final_total_amount) order_amountfrom dwd_fact_order_infowhere dt='2022-05-20'group by user_id
) ,
tmp_payment as
(selectuser_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_fact_payment_infowhere dt='2022-05-20'group by user_id
),
tmp_order_detail as
(selectuser_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'order_amount',order_amount)) order_statsfrom(selectuser_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(final_amount_d) as decimal(20,2)) order_amountfrom dwd_fact_order_detailwhere dt='2022-05-20'group by user_id,sku_id)tmpgroup by user_id
)insert overwrite table dws_user_action_daycount partition(dt='2022-05-20')
selecttmp_login.user_id,login_count,nvl(cart_count,0),nvl(order_count,0),nvl(order_amount,0.0),nvl(payment_count,0),nvl(payment_amount,0.0),order_stats
from tmp_login
left join tmp_cart on tmp_login.user_id=tmp_cart.user_id
left join tmp_order on tmp_login.user_id=tmp_order.user_id
left join tmp_payment on tmp_login.user_id=tmp_payment.user_id
left join tmp_order_detail on tmp_login.user_id=tmp_order_detail.user_id;

3)数据查询

select * from dws_user_action_daycount where dt='2022-05-20' limit 2;

3.3每日商品行为

1)建表语句

create external table dws_sku_action_daycount
(   sku_id string comment 'sku_id',order_count bigint comment '被下单次数',order_num bigint comment '被下单件数',order_amount decimal(16,2) comment '被下单金额',payment_count bigint  comment '被支付次数',payment_num bigint comment '被支付件数',payment_amount decimal(16,2) comment '被支付金额',refund_count bigint  comment '被退款次数',refund_num bigint comment '被退款件数',refund_amount  decimal(16,2) comment '被退款金额',cart_count bigint comment '被加入购物车次数',favor_count bigint comment '被收藏次数',appraise_good_count bigint comment '好评数',appraise_mid_count bigint comment '中评数',appraise_bad_count bigint comment '差评数',appraise_default_count bigint comment '默认评价数'
) COMMENT '每日商品行为'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_sku_action_daycount/'
tblproperties ("parquet.compression"="lzo");

2)数据导入

注意:如果是23点59下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,且订单时间是昨天或者今天的订单。

with
tmp_order as
(selectsku_id,count(*) order_count,sum(sku_num) order_num,sum(final_amount_d) order_amountfrom dwd_fact_order_detailwhere dt='2022-05-20'group by sku_id
),
tmp_payment as
(selectsku_id,count(*) payment_count,sum(sku_num) payment_num,sum(final_amount_d) payment_amountfrom dwd_fact_order_detailwhere (dt='2022-05-20'or dt=date_add('2022-05-20',-1))and order_id in(selectidfrom dwd_fact_order_infowhere (dt='2022-05-20'or dt=date_add('2022-05-20',-1))and date_format(payment_time,'yyyy-MM-dd')='2022-05-20')group by sku_id
),
tmp_refund as
(selectsku_id,count(*) refund_count,sum(refund_num) refund_num,sum(refund_amount) refund_amountfrom dwd_fact_order_refund_infowhere dt='2022-05-20'group by sku_id
),
tmp_cart as
(selectitem sku_id,count(*) cart_countfrom dwd_action_logwhere dt='2022-05-20'and user_id is not nulland action_id='cart_add'group by item
),tmp_favor as
(selectitem sku_id,count(*) favor_countfrom dwd_action_logwhere dt='2022-05-20'and user_id is not nulland action_id='favor_add'group by item
),
tmp_appraise as
(
selectsku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_count
from dwd_fact_comment_info
where dt='2022-05-20'
group by sku_id
)insert overwrite table dws_sku_action_daycount partition(dt='2022-05-20')
selectsku_id,sum(order_count),sum(order_num),sum(order_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_count),sum(refund_num),sum(refund_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count)
from
(selectsku_id,order_count,order_num,order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectsku_id,0 order_count,0 order_num,0 order_amount,payment_count,payment_num,payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_paymentunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,refund_count,refund_num,refund_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_count        from tmp_refundunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cartunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_favorunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_appraise
)tmp
group by sku_id;

3)查看数据

select * from dws_sku_action_daycount where dt='2022-05-20' limit 2;

3.4每日活动统计

1)建表语句

create external table dws_activity_info_daycount(`id` string COMMENT '编号',`activity_name` string  COMMENT '活动名称',`activity_type` string  COMMENT '活动类型',`start_time` string  COMMENT '开始时间',`end_time` string  COMMENT '结束时间',`create_time` string  COMMENT '创建时间',`display_count` bigint COMMENT '曝光次数',`order_count` bigint COMMENT '下单次数',`order_amount` decimal(20,2) COMMENT '下单金额',`payment_count` bigint COMMENT '支付次数',`payment_amount` decimal(20,2) COMMENT '支付金额'
) COMMENT '每日活动统计'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_activity_info_daycount/'
tblproperties ("parquet.compression"="lzo");

2)数据导入

with
tmp_op as
(selectactivity_id,sum(if(date_format(create_time,'yyyy-MM-dd')='2022-05-20',1,0)) order_count,sum(if(date_format(create_time,'yyyy-MM-dd')='2022-05-20',final_total_amount,0)) order_amount,sum(if(date_format(payment_time,'yyyy-MM-dd')='2022-05-20',1,0)) payment_count,sum(if(date_format(payment_time,'yyyy-MM-dd')='2022-05-20',final_total_amount,0)) payment_amountfrom dwd_fact_order_infowhere (dt='2022-05-20' or dt=date_add('2022-05-20',-1))and activity_id is not nullgroup by activity_id
),
tmp_display as
(selectitem activity_id,count(*) display_countfrom dwd_display_logwhere dt='2022-05-20'and item_type='activity_id'group by item
),
tmp_activity as
(select*from dwd_dim_activity_infowhere dt='2022-05-20'
)
insert overwrite table dws_activity_info_daycount partition(dt='2022-05-20')
selectnvl(tmp_op.activity_id,tmp_display.activity_id),tmp_activity.activity_name,tmp_activity.activity_type,tmp_activity.start_time,tmp_activity.end_time,tmp_activity.create_time,tmp_display.display_count,tmp_op.order_count,tmp_op.order_amount,tmp_op.payment_count,tmp_op.payment_amount
from tmp_op
full outer join tmp_display on tmp_op.activity_id=tmp_display.activity_id
left join tmp_activity on nvl(tmp_op.activity_id,tmp_display.activity_id)=tmp_activity.id;

3)查询加载结果

select * from dws_activity_info_daycount where dt='2022-05-20' limit 2;

3.5每日地区统计

1)建表语句

create external table dws_area_stats_daycount(`id` bigint COMMENT '编号',`province_name` string COMMENT '省份名称',`area_code` string COMMENT '地区编码',`iso_code` string COMMENT 'iso编码',`region_id` string COMMENT '地区ID',`region_name` string COMMENT '地区名称',`login_count` string COMMENT '活跃设备数',`order_count` bigint COMMENT '下单次数',`order_amount` decimal(20,2) COMMENT '下单金额',`payment_count` bigint COMMENT '支付次数',`payment_amount` decimal(20,2) COMMENT '支付金额'
) COMMENT '每日地区统计表'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_area_stats_daycount/'
tblproperties ("parquet.compression"="lzo");

2)数据导入

with
tmp_login as
(selectarea_code,count(*) login_countfrom dwd_start_logwhere dt='2022-05-20'group by area_code
),
tmp_op as
(selectprovince_id,sum(if(date_format(create_time,'yyyy-MM-dd')='2022-05-20',1,0)) order_count,sum(if(date_format(create_time,'yyyy-MM-dd')='2022-05-20',final_total_amount,0)) order_amount,sum(if(date_format(payment_time,'yyyy-MM-dd')='2022-05-20',1,0)) payment_count,sum(if(date_format(payment_time,'yyyy-MM-dd')='2022-05-20',final_total_amount,0)) payment_amountfrom dwd_fact_order_infowhere (dt='2022-05-20' or dt=date_add('2022-05-20',-1))group by province_id
)
insert overwrite table dws_area_stats_daycount partition(dt='2022-05-20')
selectpro.id,pro.province_name,pro.area_code,pro.iso_code,pro.region_id,pro.region_name,nvl(tmp_login.login_count,0),nvl(tmp_op.order_count,0),nvl(tmp_op.order_amount,0.0),nvl(tmp_op.payment_count,0),nvl(tmp_op.payment_amount,0.0)
from dwd_dim_base_province pro
left join tmp_login on pro.area_code=tmp_login.area_code
left join tmp_op on pro.id=tmp_op.province_id;

3)查询数据

select * from dws_area_stats_daycount where dt='2022-05-20' limit 2;

四、数据导入脚本

vim dwd_to_dws.sh

在脚本中填写如下内容:

#!/bin/bashAPP=default
hive=/training/hive/bin/hive# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;thendo_date=$1
elsedo_date=`date -d "-1 day" +%F`
fisql="
set mapreduce.job.queuename=default;
with
tmp_start as
(select  mid_id,brand,model,count(*) login_countfrom ${APP}.dwd_start_logwhere dt='$do_date'group by mid_id,brand,model
),
tmp_page as
(selectmid_id,brand,model,        collect_set(named_struct('page_id',page_id,'page_count',page_count)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_countfrom ${APP}.dwd_page_logwhere dt='$do_date'group by mid_id,brand,model,page_id)tmpgroup by mid_id,brand,model
)
insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
selectnvl(tmp_start.mid_id,tmp_page.mid_id),nvl(tmp_start.brand,tmp_page.brand),nvl(tmp_start.model,tmp_page.model),tmp_start.login_count,tmp_page.page_stats
from tmp_start
full outer join tmp_page
on tmp_start.mid_id=tmp_page.mid_id
and tmp_start.brand=tmp_page.brand
and tmp_start.model=tmp_page.model;with
tmp_login as
(selectuser_id,count(*) login_countfrom ${APP}.dwd_start_logwhere dt='$do_date'and user_id is not nullgroup by user_id
),
tmp_cart as
(selectuser_id,count(*) cart_countfrom ${APP}.dwd_action_logwhere dt='$do_date'and user_id is not nulland action_id='cart_add'group by user_id
),tmp_order as
(selectuser_id,count(*) order_count,sum(final_total_amount) order_amountfrom ${APP}.dwd_fact_order_infowhere dt='$do_date'group by user_id
) ,
tmp_payment as
(selectuser_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_fact_payment_infowhere dt='$do_date'group by user_id
),
tmp_order_detail as
(selectuser_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'order_amount',order_amount)) order_statsfrom(selectuser_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(final_amount_d) as decimal(20,2)) order_amountfrom ${APP}.dwd_fact_order_detailwhere dt='$do_date'group by user_id,sku_id)tmpgroup by user_id
)insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
selecttmp_login.user_id,login_count,nvl(cart_count,0),nvl(order_count,0),nvl(order_amount,0.0),nvl(payment_count,0),nvl(payment_amount,0.0),order_stats
from tmp_login
left outer join tmp_cart on tmp_login.user_id=tmp_cart.user_id
left outer join tmp_order on tmp_login.user_id=tmp_order.user_id
left outer join tmp_payment on tmp_login.user_id=tmp_payment.user_id
left outer join tmp_order_detail on tmp_login.user_id=tmp_order_detail.user_id;with
tmp_order as
(selectsku_id,count(*) order_count,sum(sku_num) order_num,sum(final_amount_d) order_amountfrom ${APP}.dwd_fact_order_detailwhere dt='$do_date'group by sku_id
),
tmp_payment as
(selectsku_id,count(*) payment_count,sum(sku_num) payment_num,sum(final_amount_d) payment_amountfrom ${APP}.dwd_fact_order_detailwhere (dt='$do_date'or dt=date_add('$do_date',-1))and order_id in(selectidfrom ${APP}.dwd_fact_order_infowhere (dt='$do_date'or dt=date_add('$do_date',-1))and date_format(payment_time,'yyyy-MM-dd')='$do_date')group by sku_id
),
tmp_refund as
(selectsku_id,count(*) refund_count,sum(refund_num) refund_num,sum(refund_amount) refund_amountfrom ${APP}.dwd_fact_order_refund_infowhere dt='$do_date'group by sku_id
),
tmp_cart as
(selectitem sku_id,count(*) cart_countfrom ${APP}.dwd_action_logwhere dt='$do_date'and user_id is not nulland action_id='cart_add'group by item
),tmp_favor as
(selectitem sku_id,count(*) favor_countfrom ${APP}.dwd_action_logwhere dt='$do_date'and user_id is not nulland action_id='favor_add'group by item
),
tmp_appraise as
(
selectsku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_count
from ${APP}.dwd_fact_comment_info
where dt='$do_date'
group by sku_id
)insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
selectsku_id,sum(order_count),sum(order_num),sum(order_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_count),sum(refund_num),sum(refund_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count)
from
(selectsku_id,order_count,order_num,order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectsku_id,0 order_count,0 order_num,0 order_amount,payment_count,payment_num,payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_paymentunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,refund_count,refund_num,refund_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_count        from tmp_refundunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cartunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_favorunion allselectsku_id,0 order_count,0 order_num,0 order_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_count,0 refund_num,0 refund_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_appraise
)tmp
group by sku_id;with
tmp_login as
(selectarea_code,count(*) login_countfrom ${APP}.dwd_start_logwhere dt='$do_date'group by area_code
),
tmp_op as
(selectprovince_id,sum(if(date_format(create_time,'yyyy-MM-dd')='$do_date',1,0)) order_count,sum(if(date_format(create_time,'yyyy-MM-dd')='$do_date',final_total_amount,0)) order_amount,sum(if(date_format(payment_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count,sum(if(date_format(payment_time,'yyyy-MM-dd')='$do_date',final_total_amount,0)) payment_amountfrom ${APP}.dwd_fact_order_infowhere (dt='$do_date' or dt=date_add('$do_date',-1))group by province_id
)
insert overwrite table ${APP}.dws_area_stats_daycount partition(dt='$do_date')
selectpro.id,pro.province_name,pro.area_code,pro.iso_code,pro.region_id,pro.region_name,nvl(tmp_login.login_count,0),nvl(tmp_op.order_count,0),nvl(tmp_op.order_amount,0.0),nvl(tmp_op.payment_count,0),nvl(tmp_op.payment_amount,0.0)
from ${APP}.dwd_dim_base_province pro
left join tmp_login on pro.area_code=tmp_login.area_code
left join tmp_op on pro.id=tmp_op.province_id;with
tmp_op as
(selectactivity_id,sum(if(date_format(create_time,'yyyy-MM-dd')='$do_date',1,0)) order_count,sum(if(date_format(create_time,'yyyy-MM-dd')='$do_date',final_total_amount,0)) order_amount,sum(if(date_format(payment_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count,sum(if(date_format(payment_time,'yyyy-MM-dd')='$do_date',final_total_amount,0)) payment_amountfrom ${APP}.dwd_fact_order_infowhere (dt='$do_date' or dt=date_add('$do_date',-1))and activity_id is not nullgroup by activity_id
),
tmp_display as
(selectitem activity_id,count(*) display_countfrom ${APP}.dwd_display_logwhere dt='$do_date'and item_type='activity_id'group by item
),
tmp_activity as
(select*from ${APP}.dwd_dim_activity_infowhere dt='$do_date'
)
insert overwrite table ${APP}.dws_activity_info_daycount partition(dt='$do_date')
selectnvl(tmp_op.activity_id,tmp_display.activity_id),tmp_activity.activity_name,tmp_activity.activity_type,tmp_activity.start_time,tmp_activity.end_time,tmp_activity.create_time,tmp_display.display_count,tmp_op.order_count,tmp_op.order_amount,tmp_op.payment_count,tmp_op.payment_amount
from tmp_op
full outer join tmp_display on tmp_op.activity_id=tmp_display.activity_id
left join tmp_activity on nvl(tmp_op.activity_id,tmp_display.activity_id)=tmp_activity.id;
"$hive -e "$sql"

2)增加脚本执行权限: chmod 777 dwd_to_dws.sh

3)执行脚本导入数据: dwd_to_dws.sh 2022-05-21

4)查看导入数据

select * from dws_uv_detail_daycount where dt='2022-05-20' limit 2;
select * from dws_user_action_daycount where dt='2022-05-20' limit 2;
select * from dws_sku_action_daycount where dt='2022-05-20' limit 2;
select * from dws_activity_info_daycount where dt='2022-05-21' limit 2;
select * from dws_area_stats_daycount where dt='2022-05-21' limit 2;

作者水平低,如有错误,恳请指正!谢谢!!!!!

本篇文章参考尚硅谷大数据项目写成!!!

数据仓库搭建DWS层相关推荐

  1. 数据仓库搭建DWD层

    本篇只是DWD层,其他内容请关注我的博客!在<项目>专栏里!!! 本篇文章参考尚硅谷大数据项目写成! 目录 一.用户行为日志 1.1日志格式 1.2get_json_object函数使用 ...

  2. 数据仓库搭建DWT层

    本篇只是DWT层,其他内容请关注我的博客!在<项目>专栏里!!! 本篇文章参考尚硅谷大数据项目写成! 目录 一.DWT层 1.设备主题宽表 2.会员主题宽表 3.商品主题宽表 4.活动主题 ...

  3. 数据仓库搭建ODS层

    其他内容请关注我的博客!在<项目>专栏里!!! 目录 一.用户行为数据 1.1创建日志表 1.2ODS层加载数据脚本 二.业务数据 2.1hive建表 2.2ODS层加载数据脚本 一.用户 ...

  4. 数据仓库搭建ADS层

    本篇只是ADS层,其他内容请关注我的博客!在<项目>专栏里!!! 本篇文章参考尚硅谷大数据项目写成! 目录 搭建ADS层 一.设备主题 1.1活跃设备数(日.周.月) 1.2 每日新增设备 ...

  5. 【实时数仓】DWS层访客主题计算(续)、商品主题计算

    文章目录 一 DWS层-访客主题计算 1 写入OLAP数据库 (1)增加ClickhouseUtil a JdbcSink.<T>sink( )的四个参数说明 b ClickhouseUt ...

  6. 数仓ODS,DWD,DWS层

    数据仓库中的数据表,往往是分层管理.分层计算的: 所谓分层,具体来说,就是将大量的数据表按照一定规则和定义来进行逻辑划分: ADS层: 应用服务层 DWS层:数仓汇总层 DWD层:数仓明细层 ODS层 ...

  7. 11.数据仓库搭建之DWS层搭建

    数据仓库搭建之DWS层搭建 在搭建该层时,我们需要注意的是: 1)本层的设计主要参考指标体系 2)DWS层数据的数据存储格式为orc列式存储+snappy压缩. 3)DWS层表名的命名规范为:dws ...

  8. 层 数据仓库_小尝试:基于指标体系的数据仓库搭建和数据可视化

    关于作者:小姬,某知名互联网公司产品专家,对数据采集.生产.加工有所了解,期望多和大家交流数据知识,以数据作为提出好问题的基础,挖掘商业价值. 0x00 前言 我将整理文章分享数据工作中的经验,因为业 ...

  9. 10.数据仓库搭建之DWD层搭建

    数据仓库搭建之DWD层搭建 我们在设计项目中DWD层时,需要注意以下几点: 1)DWD层的设计依据维度建模理论,该层存储维度模型当中的事实表. 2)DWD层的数据存储格式为ORC列式存储结合snapp ...

最新文章

  1. 【转】贴片电阻的工作寿命
  2. matlab simulink 求解连续微分系统 混沌系统
  3. 如何用Python实现八大排序算法
  4. django----admin
  5. 【Modelsim入门】新建项目,添加verilog文件,经编译的程序进行仿真
  6. LeetCode刷题实战(43):Multiply Strings
  7. 简述HTML语言概念,HTML语言的基本概念和基本格式.doc
  8. python delete_rows,Python:如何刪除以特定字符結尾的行?
  9. 金三银四Java高级工程师面试题整理,2年以上经验必看
  10. 华为P7电信4G版刷机包 EMUI2.3 官方B125 第3版 精简 ROOT
  11. 删除Github上项目
  12. python生成序列_python如何生成随机序列?
  13. linux 安装pinphp遇到的“系统不支持curl!”问题
  14. springboot2.0整合logback日志(详细)
  15. 秀!搭建一个永久运行的个人服务器!
  16. c语言两个矩形相交部分坐标,C++判断矩形相交的方法
  17. HTML5-俄罗斯方块
  18. 使用计算机录制声音10,Win10怎么录制电脑内部声音 Win10电脑自身录音教程
  19. 15.Scala- 文件和正则表达式
  20. 【java华为机试】HJ10 字符个数统计

热门文章

  1. Spark在Yarn上运行Wordcount程序
  2. RTOS中相对延时和绝对延时的区别
  3. 线代:已知一个特征向量快速求另外两个与之正交的特征向量
  4. Unity体感设备KinectV2虚拟换装解决方案
  5. 在kd下查看CR3寄存器
  6. VBA读取excel文件
  7. Java----使用二维数组完成一个电影院选座系统
  8. pygame飞机大战用精灵组层编写英雄系列(一)英雄也问出处,界面的菜单选择
  9. 联想服务器s650装系统,联想手机S650系统固件升级方法
  10. mc官方认证服务器_有关MC *认证的更多信息*