大数据之数据仓库建设(三)
数据仓库理论一和二,主要讲流量域;
数据仓库理论三和四,主要讲业务域,即业务库里的数据。
一、sqoop导入数据处理
字典表,小杂表:全量导入
实体表(量级很大),事实表(每天都变化的业务表):增量导入
增量导入后的数据,存储在数仓的 ODS 层中,对于统计分析,不便利;需要滚动合并生成全量快照。
1、将数据全量导入
建表并全量导入
2、将每天的增量数据使用sqoop导入,放在ODS层
导入增量脚本:
########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc oms_order增量抽取任务 启动脚本 #
# #
########################################################export SQOOP_HOME=/opt/apps/sqoop-1.4.7/
export HIVE_HOME=/opt/apps/hive-3.1.2/# 加载上次import的最大id
if [ -f preid ]
thenSTART_ID=`cat preid`
elseSTART_ID=0
fires=`mysql -h 192.168.77.2 -uroot -p123456 <<EOF
use realtimedw;
select max(id) from oms_order_item;
EOF`CUR_MAX_ID=`echo ${res} | awk '{print $2}'`
echo ${CUR_MAX_ID} > preidDT_INCR=`date -d'-1 day' +%Y-%m-%d`if [ $1 && $2 ]
then
ID=$1
DT_EXTRACT=$2
fiecho "上次导入的最大ID为:${START_ID}"
echo "本次导入所能达到的最大ID为: ${CUR_MAX_ID}"${SQOOP_HOME}/bin/sqoop import \
--connect jdbc:mysql://192.168.77.2:3306/realtimedw \
--username root \
--password 123456 \
--table oms_order_item \
--target-dir "/incr_test/oms_order_item/${DT_EXTRACT}" \
--incremental append \
--check-column id \
--null-string '\\N' \
--null-non-string '\\N' \
--last-value "${START_ID}" \
--fields-terminated-by ',' \
--split-by id \
-m 2 \if [ $? -eq 0 ]
then echo "congratulations! sqoop数据导入成功! 邮件已发送至admin@51doit.com"echo "准备加载到hive表分区:dt=${DT_INCR}"${HIVE_HOME}/bin/hive -e "alter table ods17.oms_order_incr add partition(dt='${DT_INCR}')"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fielseecho "节哀顺变! sqoop数据迁移任务失败! 邮件已发送至admin@51doit.com"
fi
注意:需要更新hive元数据分区日期,然后将hdfs数据load到hive或者直接写到hive表,然后使用sqoop的partition by属性
3、对每天的增量数据+前一天的全量数据,进行数据合并,放在DWD层一个新表里; 也可以重新写入旧表里,对之前的数据覆盖更新
建立一个新表,执行合并数据脚本:
########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc oms_order 数据合并任务 启动脚本 #
# #
########################################################export SQOOP_HOME=/opt/apps/sqoop-1.4.7/
export HIVE_HOME=/opt/apps/hive-3.1.2/DT_WHOLE=`date -d'-2 day' +%Y-%m-%d`
DT_INCR=`date -d'-1 day' +%Y-%m-%d`if [[ $1 && $2 ]]
then
DT_WHOLE=$1
DT_INCR=$2
fi${HIVE_HOME}/bin/hive -e "with tmp as (selectid ,member_id ,coupon_id ,order_sn ,create_time ,member_username ,total_amount ,pay_amount ,freight_amount ,promotion_amount ,integration_amount ,coupon_amount ,discount_amount ,pay_type ,source_type ,status ,order_type ,delivery_company ,delivery_sn ,auto_confirm_day ,integration ,growth ,promotion_info ,bill_type ,bill_header ,bill_content ,bill_receiver_phone ,bill_receiver_email ,receiver_name ,receiver_phone ,receiver_post_code ,receiver_province ,receiver_city ,receiver_region ,receiver_detail_address ,note ,confirm_status ,delete_status ,use_integration ,payment_time ,delivery_time ,receive_time ,comment_time ,modify_time
from ods17.oms_order_incr where dt='${DT_INCR}'union allselectid ,member_id ,coupon_id ,order_sn ,create_time ,member_username ,total_amount ,pay_amount ,freight_amount ,promotion_amount ,integration_amount ,coupon_amount ,discount_amount ,pay_type ,source_type ,status ,order_type ,delivery_company ,delivery_sn ,auto_confirm_day ,integration ,growth ,promotion_info ,bill_type ,bill_header ,bill_content ,bill_receiver_phone ,bill_receiver_email ,receiver_name ,receiver_phone ,receiver_post_code ,receiver_province ,receiver_city ,receiver_region ,receiver_detail_address ,note ,confirm_status ,delete_status ,use_integration ,payment_time ,delivery_time ,receive_time ,comment_time ,modify_time from dwd17.oms_order where dt='${DT_WHOLE}'
)insert into table dwd17.oms_order partition(dt='${DT_INCR}')
selectid ,member_id ,coupon_id ,order_sn ,create_time ,member_username ,total_amount ,pay_amount ,freight_amount ,promotion_amount ,integration_amount ,coupon_amount ,discount_amount ,pay_type ,source_type ,status ,order_type ,delivery_company ,delivery_sn ,auto_confirm_day ,integration ,growth ,promotion_info ,bill_type ,bill_header ,bill_content ,bill_receiver_phone ,bill_receiver_email ,receiver_name ,receiver_phone ,receiver_post_code ,receiver_province ,receiver_city ,receiver_region ,receiver_detail_address ,note ,confirm_status ,delete_status ,use_integration ,payment_time ,delivery_time ,receive_time ,comment_time ,modify_time
from
(selectid ,member_id ,coupon_id ,order_sn ,create_time ,member_username ,total_amount ,pay_amount ,freight_amount ,promotion_amount ,integration_amount ,coupon_amount ,discount_amount ,pay_type ,source_type ,status ,order_type ,delivery_company ,delivery_sn ,auto_confirm_day ,integration ,growth ,promotion_info ,bill_type ,bill_header ,bill_content ,bill_receiver_phone ,bill_receiver_email ,receiver_name ,receiver_phone ,receiver_post_code ,receiver_province ,receiver_city ,receiver_region ,receiver_detail_address ,note ,confirm_status ,delete_status ,use_integration ,payment_time ,delivery_time ,receive_time ,comment_time ,modify_time ,row_number() over(partition by id order by modify_time desc) as rn
from tmp
) o
where rn=1
"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi
二、数据域大宽表加工
事实表关联各种纬度表(字典表、实体表、维表),行成宽表,方便后续各种多维度分析。
成交金额分析:
1、建立宽表并写入数据
#!/bin/bash# -- 成交金额分析
# -- DWS宽表(oms_order + oms_order_item)
# CREATE TABLE dws17.oms_order_detail (
# id bigint COMMENT '订单id',
# member_id bigint ,
# coupon_id bigint ,
# order_sn string COMMENT '订单编号',
# create_time BIGINT COMMENT '提交时间',
# member_username string COMMENT '用户帐号',
# total_amount decimal(10,2) COMMENT '订单总金额',
# pay_amount decimal(10,2) COMMENT '应付金额(实际支付金额)',
# freight_amount decimal(10,2) COMMENT '运费金额',
# promotion_amount decimal(10,2) COMMENT '促销优化金额(促销价、满减、阶梯价)',
# integration_amount decimal(10,2) COMMENT '积分抵扣金额',
# coupon_amount decimal(10,2) COMMENT '优惠券抵扣金额',
# discount_amount decimal(10,2) COMMENT '管理员后台调整订单使用的折扣金额',
# pay_type int COMMENT '支付方式:0->未支付;1->支付宝;2->微信',
# source_type int COMMENT '订单来源:0->PC订单;1->app订单',
# status int COMMENT '订单状态:0->待付款;1->待发货;2->已发货;3->已完成;4->已关闭;5->无效订单',
# order_type int COMMENT '订单类型:0->正常订单;1->秒杀订单',
# delivery_company string COMMENT '物流公司',
# delivery_sn string COMMENT '物流单号',
# auto_confirm_day int COMMENT '自动确认时间(天)',
# integration int COMMENT '可以获得的积分',
# growth int COMMENT '可以活动的成长值',
# promotion_info string COMMENT '活动信息',
# bill_type int COMMENT '发票类型:0->不开发票;1->电子发票;2->纸质发票',
# bill_header string COMMENT '发票抬头',
# bill_content string COMMENT '发票内容',
# bill_receiver_phone string COMMENT '收票人电话',
# bill_receiver_email string COMMENT '收票人邮箱',
# receiver_name string COMMENT '收货人姓名',
# receiver_phone string COMMENT '收货人电话',
# receiver_post_code string COMMENT '收货人邮编',
# receiver_province string COMMENT '省份/直辖市',
# receiver_city string COMMENT '城市',
# receiver_region string COMMENT '区',
# receiver_detail_address string COMMENT '详细地址',
# note string COMMENT '订单备注',
# confirm_status int COMMENT '确认收货状态:0->未确认;1->已确认',
# delete_status int COMMENT '删除状态:0->未删除;1->已删除',
# use_integration int COMMENT '下单时使用的积分',
# payment_time BIGINT COMMENT '支付时间',
# delivery_time BIGINT COMMENT '发货时间',
# receive_time BIGINT COMMENT '确认收货时间',
# comment_time BIGINT COMMENT '评价时间',
# modify_time BIGINT COMMENT '修改时间',
# item_product_id bigint ,
# item_product_pic string ,
# item_product_name string ,
# item_product_brand string ,
# item_product_sn string ,
# item_product_price decimal(10,2) COMMENT '销售价格',
# item_product_quantity int COMMENT '购买数量',
# item_product_sku_id bigint COMMENT '商品sku编号',
# item_product_sku_code string COMMENT '商品sku条码',
# item_product_category_id bigint COMMENT '商品分类id',
# item_sp1 string COMMENT '商品的销售属性',
# item_sp2 string ,
# item_sp3 string ,
# item_promotion_name string COMMENT '商品促销名称',
# item_promotion_amount decimal(10,2) COMMENT '商品促销分解金额',
# item_coupon_amount decimal(10,2) COMMENT '优惠券优惠分解金额',
# item_integration_amount decimal(10,2) COMMENT '积分优惠分解金额',
# item_real_amount decimal(10,2) COMMENT '该商品经过优惠后的分解金额',
# item_gift_integration int ,
# item_gift_growth int ,
# item_product_attr string COMMENT '商品销售属性:[{"key":"颜色","value":"颜色"},{"key":"容量","value":"4G"}]'
# )
# COMMENT '订单表'
# PARTITIONED BY (dt string)
# stored as PARQUET
# ;########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc oms_order增量抽取任务 启动脚本 #
# #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi-- 计算插入
${HIVE_HOME}/bin/hive -e "
INSERT INTO TABLE dws17.oms_order_detail partition(dt='${DT}')
SELECTo.id ,o.member_id ,o.coupon_id ,o.order_sn ,o.create_time ,o.member_username ,o.total_amount ,o.pay_amount ,o.freight_amount ,o.promotion_amount ,o.integration_amount ,o.coupon_amount ,o.discount_amount ,o.pay_type ,o.source_type ,o.status ,o.order_type ,o.delivery_company ,o.delivery_sn ,o.auto_confirm_day ,o.integration ,o.growth ,o.promotion_info ,o.bill_type ,o.bill_header ,o.bill_content ,o.bill_receiver_phone ,o.bill_receiver_email ,o.receiver_name ,o.receiver_phone ,o.receiver_post_code ,o.receiver_province ,o.receiver_city ,o.receiver_region ,o.receiver_detail_address ,o.note ,o.confirm_status ,o.delete_status ,o.use_integration ,o.payment_time ,o.delivery_time ,o.receive_time ,o.comment_time ,o.modify_time ,i.product_id ,i.product_pic ,i.product_name ,i.product_brand ,i.product_sn ,i.product_price ,i.product_quantity ,i.product_sku_id ,i.product_sku_code ,i.product_category_id ,i.sp1 ,i.sp2 ,i.sp3 ,i.promotion_name ,i.promotion_amount ,i.coupon_amount ,i.integration_amount ,i.real_amount ,i.gift_integration ,i.gift_growth ,i.product_attr
FROM dwd17.oms_order o
join dwd17.oms_order_item i
on o.dt='${DT}' and o.id=i.order_id
"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi
2、建立分析纬度表,并写入数据
#!/bin/bash# -- 成交金额分析统计报表
# -- 源表1: DWS宽表(dws17.oms_order_detail)
# -- 源表2: 会员信息表
# CREATE TABLE dwd17.ums_member_detail (
# id bigint ,
# member_level_id bigint ,
# username string COMMENT '用户名',
# password string COMMENT '密码',
# nickname string COMMENT '昵称',
# phone string COMMENT '手机号码',
# status int COMMENT '帐号启用状态:0->禁用;1->启用',
# create_time bigint COMMENT '注册时间',
# icon string COMMENT '头像',
# gender int COMMENT '性别:0->未知;1->男;2->女',
# birthday date COMMENT '生日',
# city string COMMENT '所做城市',
# job string COMMENT '职业',
# personalized_signature string COMMENT '个性签名',
# source_type int COMMENT '用户来源',
# integration int COMMENT '积分',
# growth int COMMENT '成长值',
# luckey_count int COMMENT '剩余抽奖次数',
# history_integration int COMMENT '历史积分数量'
# )
# PARTITIONED BY (dt string)
# STORED as parquet
# ;# -- 目标表:ads17.oms_order_gmv_cube
# DROP TABLE ads17.oms_order_gmv_cube;
# CREATE TABLE ads17.oms_order_gmv_cube (
# order_total_amount decimal(10,2),
# order_pay_amount decimal(10,2),
# coupon_amount decimal(10,2) COMMENT '优惠券抵扣金额',
# promotion_amount decimal(10,2) COMMENT '促销优化金额(促销价、满减、阶梯价)',
# integration_amount decimal(10,2) COMMENT '积分抵扣金额',
# dim_day string,
# dim_category_id string,
# dim_brand_id string,
# dim_member_level_id string,
# dim_order_type string,
# dim_source_type string,
# dim_promotion_name string
# )
# COMMENT '订单表'
# PARTITIONED BY (dt string)
# stored as PARQUET
# ;########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc gmv统计分析报表计算任务启动脚本 #
# #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi# -- 计算插入
${HIVE_HOME}/bin/hive -e "
INSERT INTO TABLE ads17.oms_order_gmv_cube partition(dt='${DT}')
SELECTsum(item_product_price*item_product_quantity) as order_total_amount ,sum(item_real_amount) as order_pay_amount ,sum(item_coupon_amount) as coupon_amount ,sum(item_promotion_amount) as promotion_amount ,sum(item_integration_amount) as integration_amount ,'2020-10-07' as dim_day ,o.item_product_category_id as dim_category_id , o.item_product_brand as dim_brand_id , u.member_level_id as dim_member_level_id ,o.order_type as dim_order_type , o.source_type as dim_source_type , o.item_promotion_name as dim_promotion_name
FROM dws17.oms_order_detail o
JOIN dwd17.ums_member_detail u on o.dt='${DT}' and from_unixtime(o.create_time,'yyyy-MM-dd')='${DT}' and o.member_id=u.id
GROUP BY o.item_product_category_id,o.item_product_brand,u.member_level_id,o.order_type,o.source_type,o.item_promotion_name
with cube
"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi
下单退单分析:
1、建立退单宽表并写入数据
#!/bin/bash# -- 目标表建表:
# CREATE TABLE dws17.oms_order_and_return(
# od_id bigint COMMENT '订单id',
# od_member_id bigint ,
# od_coupon_id bigint ,
# od_order_sn string COMMENT '订单编号',
# od_create_time BIGINT COMMENT '提交时间',
# od_member_username string COMMENT '用户帐号',
# od_total_amount string COMMENT '订单总金额',
# od_pay_amount string COMMENT '应付金额(实际支付金额)',
# od_freight_amount string COMMENT '运费金额',
# od_promotion_amount string COMMENT '促销优化金额(促销价、满减、阶梯价)',
# od_integration_amount string COMMENT '积分抵扣金额',
# od_coupon_amount string COMMENT '优惠券抵扣金额',
# od_discount_amount string COMMENT '管理员后台调整订单使用的折扣金额',
# od_pay_type int COMMENT '支付方式:0->未支付;1->支付宝;2->微信',
# od_source_type int COMMENT '订单来源:0->PC订单;1->app订单',
# od_status int COMMENT '订单状态:0->待付款;1->待发货;2->已发货;3->已完成;4->已关闭;5->无效订单',
# od_order_type int COMMENT '订单类型:0->正常订单;1->秒杀订单',
# od_delivery_company string COMMENT '物流公司',
# od_delivery_sn string COMMENT '物流单号',
# od_auto_confirm_day int COMMENT '自动确认时间(天)',
# od_integration int COMMENT '可以获得的积分',
# od_growth int COMMENT '可以活动的成长值',
# od_promotion_info string COMMENT '活动信息',
# od_bill_type int COMMENT '发票类型:0->不开发票;1->电子发票;2->纸质发票',
# od_bill_header string COMMENT '发票抬头',
# od_bill_content string COMMENT '发票内容',
# od_bill_receiver_phone string COMMENT '收票人电话',
# od_bill_receiver_email string COMMENT '收票人邮箱',
# od_receiver_name string COMMENT '收货人姓名',
# od_receiver_phone string COMMENT '收货人电话',
# od_receiver_post_code string COMMENT '收货人邮编',
# od_receiver_province string COMMENT '省份/直辖市',
# od_receiver_city string COMMENT '城市',
# od_receiver_region string COMMENT '区',
# od_receiver_detail_address string COMMENT '详细地址',
# od_note string COMMENT '订单备注',
# od_confirm_status int COMMENT '确认收货状态:0->未确认;1->已确认',
# od_delete_status int COMMENT '删除状态:0->未删除;1->已删除',
# od_use_integration int COMMENT '下单时使用的积分',
# od_payment_time BIGINT COMMENT '支付时间',
# od_delivery_time BIGINT COMMENT '发货时间',
# od_receive_time BIGINT COMMENT '确认收货时间',
# od_comment_time BIGINT COMMENT '评价时间',
# od_modify_time BIGINT COMMENT '修改时间' ,
# -- 退货表字段
# rt_id bigint ,
# rt_order_id bigint COMMENT '订单id',
# rt_company_address_id bigint COMMENT '收货地址表id',
# rt_product_id bigint COMMENT '退货商品id',
# rt_order_sn string COMMENT '订单编号',
# rt_create_time bigint COMMENT '申请时间',
# rt_member_username string COMMENT '会员用户名',
# rt_return_amount string COMMENT '退款金额',
# rt_return_name string COMMENT '退货人姓名',
# rt_return_phone string COMMENT '退货人电话',
# rt_status int COMMENT '申请状态:0->待处理;1->退货中;2->已完成;3->已拒绝',
# rt_handle_time bigint COMMENT '处理时间',
# rt_product_pic string COMMENT '商品图片',
# rt_product_name string COMMENT '商品名称',
# rt_product_brand string COMMENT '商品品牌',
# rt_product_attr string COMMENT '商品销售属性:颜色:红色;尺码:xl;',
# rt_product_count int COMMENT '退货数量',
# rt_product_price string COMMENT '商品单价',
# rt_product_real_price string COMMENT '商品实际支付单价',
# rt_reason string COMMENT '原因',
# rt_description string COMMENT '描述',
# rt_proof_pics string COMMENT '凭证图片,以逗号隔开',
# rt_handle_note string COMMENT '处理备注',
# rt_handle_man string COMMENT '处理人员',
# rt_receive_man string COMMENT '收货人',
# rt_receive_time bigint COMMENT '收货时间',
# rt_receive_note string COMMENT '收货备注'
# )
# PARTITIONED BY (dt STRING)
# STORED AS PARQUET
# ;########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc 下单、退单统计分析服务表启动脚本 #
# #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi# -- 计算插入
${HIVE_HOME}/bin/hive -e "
WITH od as (
SELECT * FROM dwd17.oms_order WHERE dt='${DT}' AND from_unixtime(create_time,'yyyy-MM-dd')='${DT}'
),
rt as (
SELECT * FROM dwd17.oms_order_return_apply WHERE dt='${DT}' AND from_unixtime(create_time,'yyyy-MM-dd')='${DT}'
)
INSERT INTO TABLE dws17.oms_order_and_return PARTITION (dt='${DT}')
SELECTod.id ,od.member_id ,od.coupon_id ,od.order_sn ,od.create_time ,od.member_username ,od.total_amount ,od.pay_amount ,od.freight_amount ,od.promotion_amount ,od.integration_amount ,od.coupon_amount ,od.discount_amount ,od.pay_type ,od.source_type ,od.status ,od.order_type ,od.delivery_company ,od.delivery_sn ,od.auto_confirm_day ,od.integration ,od.growth ,od.promotion_info ,od.bill_type ,od.bill_header ,od.bill_content ,od.bill_receiver_phone ,od.bill_receiver_email ,od.receiver_name ,od.receiver_phone ,od.receiver_post_code ,od.receiver_province ,od.receiver_city ,od.receiver_region ,od.receiver_detail_address ,od.note ,od.confirm_status ,od.delete_status ,od.use_integration ,od.payment_time ,od.delivery_time ,od.receive_time ,od.comment_time ,od.modify_time ,rt.id ,rt.order_id ,rt.company_address_id ,rt.product_id ,rt.order_sn ,rt.create_time ,rt.member_username ,rt.return_amount ,rt.return_name ,rt.return_phone ,rt.status ,rt.handle_time ,rt.product_pic ,rt.product_name ,rt.product_brand ,rt.product_attr ,rt.product_count ,rt.product_price ,rt.product_real_price ,rt.reason ,rt.description ,rt.proof_pics ,rt.handle_note ,rt.handle_man ,rt.receive_man ,rt.receive_time ,rt.receive_note
FROM od
FULL JOINrt
ONod.id=rt.order_id
"if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi
2、建立退货分析纬度表,并写入数据
#!/bin/bash# -- 下单退单人数单数统计日报表
#
# -- 源表:dws.oms_order_and_return
# -- 目标:ads.oms_order_odcnt_oduser_cube
#
# -- 建表:
# CREATE TABLE ads17.oms_order_odcnt_oduser_cube(
# order_cnt int,
# order_users int,
# order_cancel_cnt int,
# order_cancel_users int,
# order_return_cnt int,
# order_return_users int,
# order_return_p_cnt int,
# dim_dt string,
# dim_member_level_id string,
# dim_order_type string,
# dim_source_type string
# )
# PARTITIONED BY (dt string)
# STORED AS PARQUET
# ;# o11,u1,${DT},1 , null
# o12,u1,${DT},1 , o12,p01,20.8,2,u1_name
# o12,u1,${DT},1 , o12,p02,10.0,1,u1_name
# o13,u2,${DT},5 , null
# o14,u2,${DT},5 , null
# null , o02,p06,10.50,1,u3_name
# null , o02,p04,10.50,1,u3_name########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc 下单、退单统计分析服务表启动脚本 #
# #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi# -- 计算插入
${HIVE_HOME}/bin/hive -e "
WITH od_rt AS (
SELECT * FROM dws17.oms_order_and_return WHERE dt='${DT}'
)SELECTCOUNT(DISTINCT od_id) AS order_cn,COUNT(DISTINCT od_member_id) AS order_users,COUNT(DISTINCT if(od_status=5,od_id,null)) AS order_cancel_cnt,COUNT(DISTINCT if(od_status=5,od_member_id,null)) AS order_cancel_users,COUNT(DISTINCT rt_order_id) AS order_return_cnt,COUNT(DISTINCT rt_member_username) order_return_users ,SUM(rt_product_count) AS order_return_p_cnt,'${DT}' AS dim_dt ,member_level_id AS dim_member_level_id,od_order_type AS dim_order_type ,od_source_type AS dim_source_type
FROM
(SELECTod_rt.*,u.member_level_idFROM od_rt JOIN dwd17.ums_member_detail u ON od_rt.od_member_id=u.idUNION ALLSELECTod_rt.*,u.member_level_idFROM od_rt JOIN dwd17.ums_member_detail u ON od_rt.rt_member_username=u.username
) o
GROUP BY member_level_id,od_order_type,od_source_type
WITH cube"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi
三、消费画像标签(DWS层)
给用户打上一些消费相关(下单、退货、金额、客单价)的统计数据标签。
1、建表
drop table if exists ads17.userprofile_consume_tag;
create table ads17.userprofile_consume_tag(
user_id bigint ,--用户
first_order_time string ,--首单日期
last_order_time string ,--末单日期
first_order_ago bigint ,--首单距今时间
last_order_ago bigint ,--末单距今时间
month1_order_cnt bigint ,--近30天下单次数
month1_order_amt double ,--近30天购买金额(总金额)
month2_order_cnt bigint ,--近60天购买次数
month2_order_amt double ,--近60天购买金额
month3_order_cnt bigint ,--近90天购买次数
month3_order_amt double ,--近90天购买金额
max_order_amt double ,--最大订单金额
min_order_amt double ,--最小订单金额
total_order_cnt bigint ,--累计消费次数(不含退拒)
total_order_amt double ,--累计消费金额(不含退拒)
total_coupon_amt double ,--累计使用代金券金额
user_avg_order_amt double ,--平均订单金额(含退拒)
month3_user_avg_amt double ,--近90天平均订单金额(含退拒)
common_address string ,--常用收货地址
common_paytype string ,--常用支付方式
month1_cart_cnt_30 bigint ,--最近30天加购次数
month1_cart_goods_cnt_30 bigint ,--最近30天加购商品件数
month1_cart_cancel_cnt bigint ,--最近30天取消商品件数
dw_date string -- 计算日期
) partitioned by (dt string)
stored as parquet
;
2、计算指标,写入目标表
-- 计算
-- 订单表
-- o11,u1,${DT},1
-- o12,u1,${DT},1 o12,退款金额
-- o13,u2,${DT},5
-- o14,u2,${DT},5 -- 退货表
-- o12,p01,20.8,2,u1_name
-- o12,p02,20.8,2,u1_name
-- o02,p06,10,1,u3_name
-- o02,p04,10,1,u3_namewith tmp1 as (SELECTa.member_id AS member_id ,--用户min(from_unixtime(a.create_time,'yyyy-MM-dd')) AS first_order_time ,--首单日期max(from_unixtime(a.create_time,'yyyy-MM-dd')) AS last_order_time ,--末单日期-- AS first_order_ago ,--首单距今时间 -- AS last_order_ago ,--末单距今时间 count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=30,1,null)) AS month1_order_cnt ,--近30天下单次数sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=30,total_amount,0)) AS month1_order_amt ,--近30天购买金额(总金额)count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=60,1,null)) AS month2_order_cnt ,--近60天购买次数sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=60,total_amount,0)) AS month2_order_amt ,--近60天购买金额count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,1,null)) AS month3_order_cnt ,--近90天购买次数sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,total_amount,0)) AS month3_order_amt ,--近90天购买金额max(total_amount) AS max_order_amt ,--最大订单金额min(total_amount) AS min_order_amt ,--最小订单金额count(if(b.order_id is null,1,null)) AS total_order_cnt ,--累计消费次数(不含退拒)sum(if(b.order_id is null,total_amount,0)) AS total_order_amt ,--累计消费金额(不含退拒)sum(coupon_amount) AS total_coupon_amt ,--累计使用代金券金额round(avg(total_amount),2) AS user_avg_order_amt ,--平均订单金额(含退拒)round(avg(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,total_amount,null)),2) AS month3_user_avg_amt --近90天平均订单金额(含退拒)FROM ( SELECT * FROM DWD17.oms_order od WHERE dt='2020-10-07' ) aLEFT JOIN ( SELECT order_id FROM dwd17.oms_order_return_apply rt WHERE dt='2020-10-07' GROUP BY order_id) bON a.id = b.order_idGROUP BY a.member_id
) ,-- common_address string ,--常用收货地址
tmp2 as (SELECTmember_id,addr as common_addressFROM (SELECTmember_id,addr,row_number() over(partition by member_id order by order_cnt desc) as rnFROM (SELECTmember_id,concat_ws(',',receiver_province,receiver_city,receiver_region,receiver_detail_address) as addr,count(1) as order_cntFROM DWD17.oms_order od WHERE dt='2020-10-07'GROUP BY member_id,concat_ws(',',receiver_province,receiver_city,receiver_region,receiver_detail_address)) o1) o2WHERE rn=1
) ,-- common_paytype string ,--常用支付方式
tmp3 as (
SELECTmember_id,pay_type as common_paytype
FROM (SELECTmember_id,pay_type,row_number() over(partition by member_id order by order_cnt desc) as rnFROM (SELECTmember_id,pay_type,count(1) as order_cntFROM DWD17.oms_order od WHERE dt='2020-10-07'GROUP BY member_id,pay_type) o1
) o2
WHERE rn=1
) ,-- 购物车相关指标标签
-- month1_cart_cnt_30 bigint ,--最近30天加购次数
-- month1_cart_goods_cnt_30 bigint ,--最近30天加购商品件数
-- month1_cart_cancel_cnt bigint ,--最近30天取消商品件数
tmp4 as (
SELECTmember_id,COUNT(DISTINCT if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30,create_date,null)) as month1_cart_cnt_30,SUM(if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30,quantity,0)) as month1_cart_goods_cnt_30,SUM(if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30 and delete_status=1,quantity,0)) as month1_cart_cancel_cnt
FROM dwd17.oms_cart_item WHERE dt='2020-10-07'
GROUP BY member_id
),tmp5 as (
SELECT member_id FROM dwd17.oms_order od WHERE dt='2020-10-07'
UNION
SELECT member_id FROM dwd17.oms_cart_item WHERE dt='2020-10-07'
)INSERT INTO TABLE ads17.userprofile_consume_tag PARTITION (dt='2020-10-07')
SELECTtmp5.member_id AS user_id ,--用户tmp1.first_order_time ,--首单日期tmp1.last_order_time ,--末单日期datediff('2020-10-07',tmp1.first_order_time) AS first_order_ago ,--首单距今时间datediff('2020-10-07',tmp1.last_order_time) AS last_order_ago ,--末单距今时间tmp1.month1_order_cnt ,--近30天下单次数tmp1.month1_order_amt ,--近30天购买金额(总金额)tmp1.month2_order_cnt ,--近60天购买次数tmp1.month2_order_amt ,--近60天购买金额tmp1.month3_order_cnt ,--近90天购买次数tmp1.month3_order_amt ,--近90天购买金额tmp1.max_order_amt ,--最大订单金额tmp1.min_order_amt ,--最小订单金额tmp1.total_order_cnt ,--累计消费次数(不含退拒)tmp1.total_order_amt ,--累计消费金额(不含退拒)tmp1.total_coupon_amt ,--累计使用代金券金额tmp1.user_avg_order_amt ,--平均订单金额(含退拒)tmp1.month3_user_avg_amt ,--近90天平均订单金额(含退拒)tmp2.common_address ,--常用收货地址tmp3.common_paytype ,--常用支付方式tmp4.month1_cart_cnt_30 ,--最近30天加购次数tmp4.month1_cart_goods_cnt_30 ,--最近30天加购商品件数tmp4.month1_cart_cancel_cnt ,--最近30天取消商品件数'2020-10-07' AS dw_date -- 计算日期
FROM tmp5
LEFT JOIN tmp1 ON tmp5.member_id=tmp1.member_id
LEFT JOIN tmp2 ON tmp5.member_id=tmp2.member_id
LEFT JOIN tmp3 ON tmp5.member_id=tmp3.member_id
LEFT JOIN tmp4 ON tmp5.member_id=tmp4.member_id
;
四、退货拒货画像标签
退拒商品统计标签表
1、建表
drop table if exists ads17.userprofile_reject_tag;
create table ads17.userprofile_reject_tag(
user_id bigint ,-- 用户
p_sales_cnt bigint ,-- 不含退拒商品购买数量
p_sales_amt double ,-- 不含退拒商品购买的商品总价
p_sales_cut_amt double ,-- 不含退拒实付金额(扣促销减免)
h_sales_cnt bigint ,-- 含退拒购买数量
h_sales_amt double ,-- 含退拒购买金额
h_sales_cut_amt double ,-- 含退拒实付金额(扣促销减免)
return_cnt bigint ,-- 退货商品数量
return_amt double ,-- 退货商品金额
dw_date bigint
) partitioned by (dt string)
stored as parquet
;
2、计算指标,并写入数据
-- 计算
-- 将 订单商品项详情表 关联 退货申请记录表,得到如下形式schema
-- 用户,订单,商品,商品价格,购买数量,实付金额, 退货订单Id,退货商品,退货件数,退货的金额
-- zhas,od01,p1, 20, 100 , 1800 , od01 p1 , 80 , 1400
-- zhas,od01,p2, 10, 20 , 200 , null null null nullSELECTa.member_id as user_id ,sum(if(b.order_id is null,a.product_quantity,a.product_quantity-b.product_count)) as p_sales_cnt ,sum(if(b.order_id is null,a.product_quantity*a.product_price,(a.product_quantity-b.product_count)*a.product_price)) as p_sales_amt ,sum(if(b.order_id is null,a.real_amount,a.real_amount-b.return_amount)) as p_sales_cut_amt ,sum(a.product_quantity) as h_sales_cnt ,sum(a.product_quantity*a.product_price) as h_sales_amt ,sum(a.real_amount) as h_sales_cut_amt ,sum(b.product_count) as return_cnt ,sum(b.return_amount) as return_amt ,'2020-10-07' as dw_date
FROM (SELECT oi.*,od.member_id FROM dwd17.oms_order_item oiJOIN dwd17.oms_order odON od.dt='2020-10-07' and oi.dt='2020-10-07' and od.id=oi.order_id) a
LEFT JOIN (SELECT * FROM dwd17.oms_order_return_apply where dt='2020-10-07') b
ON a.order_id=b.order_id and a.product_id=b.product_id
GROUP BY a.member_id
五、拉链表
数据合并后,已经看不到前一天的数据,所以需要拉链表,记住每天的数据状态
1、订单表
create table test.order_incr_test(
oid string,
uid string,
amt string,
sts string
)
partitioned by (dt string)
row format delimited fields terminated by ','
;
2、10月1日数据
o1,u1,20,status1
o2,u1,30,status1
o3,u2,40,status4
o4,u3,28,status3
10月2日增量数据
o1,u1,20,status2
o2,u1,30,status3
o5,u4,26,status1
3、创建拉链表
create table test.order_zip_test(
oid string,
uid string,
amt string,
sts string,
start_dt string,
end_dt string
)
partitioned by (dt string)
row format delimited fields terminated by ','
;
4、计算逻辑
-- 计算逻辑
-- 1.将T-1日的拉链表 LEFT JOIN T日的增量,对T-1拉链中的数据有效期做更新
-- 2.UNION ALL 上T日增量(老数据的新状态,新数据)with zip as (select * from test.order_zip_test where dt='2020-10-01'
),
newdata as (select *from test.order_incr_test where dt='2020-10-02'
)INSERT INTO TABLE test.order_zip_test PARTITION(dt='2020-10-02')
SELECTzip.oid,zip.uid,zip.amt,zip.sts,zip.start_dt,if(newdata.oid is not null and end_dt='9999-12-31',zip.dt,zip.end_dt) as end_dt
FROM zip left join newdata on zip.oid=newdata.oidUNION ALLSELECToid,uid,amt,sts,dt as start_dt,'9999-12-31' as end_dt
FROM newdata
大数据之数据仓库建设(三)相关推荐
- 大数据之数据仓库建设(二)
一.DWS 层开发 它的建模思想,就是为最终需求计算来提供支持服务,所以建模相对灵活. 常见建模方法: 1.维度集成(建宽表): 事实表中,将各种维度 id,和维度表关联后换成各种维度值,有可能将多个 ...
- 三位院士压轴,大数据产业生态建设与发展高峰会成功举办
数领万物,共创未来.12月25日下午,由中国国际大数据产业博览会组委会主办的"永不落幕的数博会"2020系列活动--"大数据产业生态创新发展高峰会"在北京举行, ...
- oracle 数据立方_大数据之数据仓库分层
大数据之数据仓库分层 1. 什么是数据分层? 2. 数据分层的好处 一种通用的数据分层设计 3. 举例 4. 各层会用到的计算引擎和存储系统 5. 分层实现 6.数据分层的一些概念说明 7.大数据相关 ...
- 高校大数据专业竞赛建设方案
第一章 建设背景 1.1 政策分析 2017年1月 工业和信息化部正式发布了<大数据产业发展规划(2016-2020年)>,明确了"十三五"时期大数据产业的发展思路 ...
- 大数据治理平台建设方案(文末附PDF下载)
这份材料我给满分!分享一份非常好的大数据治理平台解决方案材料,这份PPT将理论与实践相结合,值得仔细阅读,建议收藏. 文档目录主要包含了以下几点: 数据治理概述 某行数据现状及问题 数据治理阶段目标 ...
- 魅族大数据可视化平台建设之路
本文是根据魅族科技大数据平台架构师赵天烁3月31日在msup携手魅族主办的第十二期魅族技术开放日<魅族大数据可视化平台建设之路>演讲中的分享内容整理而成. 内容简介:本文主要从现状& ...
- 数据仓库物理分层_大数据之数据仓库分层
大数据之数据仓库分层 什么是数据分层? 数据分层的好处 一种通用的数据分层设计 举例 各层会用到的计算引擎和存储系统 分层实现 数据分层的一些概念说明 7.大数据相关基础概念 1. 什么是数据分层? ...
- 有赞大数据平台安全建设实践
一.概述 在大数据平台建设初期,安全也许并不是被重点关注的一环.大数据平台的定位主要是服务数据开发人员,提高数据开发效率,提供便捷的开发流程,有效支持数仓建设.大数据平台的用户都是公司内部人员.数据本 ...
- 大数据BI平台建设需注意什么问题
大数据是新资源.新技术和新理念的混合体.从资源的角度看,大数据是一种新的资源,反映了一种新的资源观.因此,处理好大数据的问题,就拥有了提前了解市场的能力.但是,大数据分析技术也面临着一些问题,大数据B ...
最新文章
- 从“优化”、“向社会输送人才”到“毕业”,互联网的高情商裁员
- Python Web实时消息后台服务器推送技术---GoEasy
- 后台开发经典书籍--深入理解计算机系统
- 进程间通信——DLL共享节
- FIXML and FpML - Background, Comparison, Integration Interoperability Opportunities
- python iloc函数_python选取特定列 pandas iloc,loc,icol的使用详解(列切片及行切片)
- 正则去除html行内样式,Android-富文本处理-html字符串去掉内部样式,统一添加body、style,统一支持换行等...
- 【cocos2d-x从c++到js】08:JS脚本语言的优势与一些问题
- Java Web学习笔记06:利用JDBC访问数据库
- 关于html的一切(updating...)
- dell mobile connect 兑换码_剑与远征万圣节兑换码是什么?剑与远征2020万圣节兑换码使用解析...
- 论文翻译:MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing
- 游戏反编译工具dnSpy
- TP5 微信分享朋友圈接口显示自定义图片和标题
- mysql chunk_【MySQL参数】-innodb_buffer_pool_chunk_size
- 概率论基础知识(三) 参数估计
- Bootstrap4使用教程
- MiniGPT4,开源了
- html5工业相机,AVT工业相机
- Java高级开发面试题整理
热门文章
- 南京理工大学matlab怎么弄,基于MATLAB/SimDriveline 的某型军用车辆
起步过程仿真研究...
- 美国标准的网络安全体系架构
- 【前端实例代码】使用 HTML CSS 和 JavaScript 实现具有彩色发光霓虹灯效果的数字时钟|前端开发 网页制作 基础入门教程
- net npf 服务名无效_win10系统打开wireshark提示npF驱动没有运行的处理方法
- 星际战甲与计算机版本不兼容,win10星际战甲游戏出现无法更新的三种解决方法...
- mybatis 实体嵌套查询
- 还想朝九晚五?不可能!
- 【黑马Java笔记+踩坑】MyBatisPlus基础
- 关于设定校园二手租赁系统的计划、功能及建议
- 《从程序员到项目经理》读后感-合群