数据仓库理论一和二,主要讲流量域;
数据仓库理论三和四,主要讲业务域,即业务库里的数据。

一、sqoop导入数据处理

字典表,小杂表:全量导入
实体表(量级很大),事实表(每天都变化的业务表):增量导入

增量导入后的数据,存储在数仓的 ODS 层中,对于统计分析,不便利;需要滚动合并生成全量快照。

1、将数据全量导入

建表并全量导入

2、将每天的增量数据使用sqoop导入,放在ODS层

导入增量脚本:

########################################################
#                                                      #
#  @author hunter@doitedu                              #
#  @date   ${DT_INCR}                                  #
#  @desc   oms_order增量抽取任务 启动脚本                #
#                                                      #
########################################################export SQOOP_HOME=/opt/apps/sqoop-1.4.7/
export HIVE_HOME=/opt/apps/hive-3.1.2/# 加载上次import的最大id
if [ -f preid ]
thenSTART_ID=`cat preid`
elseSTART_ID=0
fires=`mysql -h 192.168.77.2 -uroot -p123456 <<EOF
use realtimedw;
select max(id) from oms_order_item;
EOF`CUR_MAX_ID=`echo ${res} | awk '{print $2}'`
echo ${CUR_MAX_ID} > preidDT_INCR=`date -d'-1 day' +%Y-%m-%d`if [ $1 && $2 ]
then
ID=$1
DT_EXTRACT=$2
fiecho "上次导入的最大ID为:${START_ID}"
echo "本次导入所能达到的最大ID为: ${CUR_MAX_ID}"${SQOOP_HOME}/bin/sqoop import \
--connect jdbc:mysql://192.168.77.2:3306/realtimedw \
--username root \
--password 123456 \
--table oms_order_item \
--target-dir "/incr_test/oms_order_item/${DT_EXTRACT}"  \
--incremental append \
--check-column id \
--null-string '\\N'  \
--null-non-string '\\N' \
--last-value "${START_ID}"  \
--fields-terminated-by ','  \
--split-by id  \
-m 2   \if [ $? -eq 0 ]
then echo "congratulations! sqoop数据导入成功! 邮件已发送至admin@51doit.com"echo "准备加载到hive表分区:dt=${DT_INCR}"${HIVE_HOME}/bin/hive -e "alter table ods17.oms_order_incr add partition(dt='${DT_INCR}')"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fielseecho "节哀顺变! sqoop数据迁移任务失败! 邮件已发送至admin@51doit.com"
fi

注意:需要更新hive元数据分区日期,然后将hdfs数据load到hive或者直接写到hive表,然后使用sqoop的partition by属性

3、对每天的增量数据+前一天的全量数据,进行数据合并,放在DWD层一个新表里; 也可以重新写入旧表里,对之前的数据覆盖更新

建立一个新表,执行合并数据脚本:

########################################################
#                                                      #
#  @author hunter@doitedu                              #
#  @date   ${DT_INCR}                                  #
#  @desc   oms_order 数据合并任务 启动脚本               #
#                                                      #
########################################################export SQOOP_HOME=/opt/apps/sqoop-1.4.7/
export HIVE_HOME=/opt/apps/hive-3.1.2/DT_WHOLE=`date -d'-2 day' +%Y-%m-%d`
DT_INCR=`date -d'-1 day' +%Y-%m-%d`if [[ $1 && $2 ]]
then
DT_WHOLE=$1
DT_INCR=$2
fi${HIVE_HOME}/bin/hive -e "with tmp as (selectid                          ,member_id                   ,coupon_id                   ,order_sn                    ,create_time                 ,member_username             ,total_amount                ,pay_amount                  ,freight_amount              ,promotion_amount            ,integration_amount          ,coupon_amount               ,discount_amount             ,pay_type                    ,source_type                 ,status                      ,order_type                  ,delivery_company            ,delivery_sn                 ,auto_confirm_day            ,integration                 ,growth                      ,promotion_info              ,bill_type                   ,bill_header                 ,bill_content                ,bill_receiver_phone         ,bill_receiver_email         ,receiver_name               ,receiver_phone              ,receiver_post_code          ,receiver_province           ,receiver_city               ,receiver_region             ,receiver_detail_address     ,note                        ,confirm_status              ,delete_status               ,use_integration             ,payment_time                ,delivery_time               ,receive_time                ,comment_time                ,modify_time
from ods17.oms_order_incr where dt='${DT_INCR}'union allselectid                          ,member_id                   ,coupon_id                   ,order_sn                    ,create_time                 ,member_username             ,total_amount                ,pay_amount                  ,freight_amount              ,promotion_amount            ,integration_amount          ,coupon_amount               ,discount_amount             ,pay_type                    ,source_type                 ,status                      ,order_type                  ,delivery_company            ,delivery_sn                 ,auto_confirm_day            ,integration                 ,growth                      ,promotion_info              ,bill_type                   ,bill_header                 ,bill_content                ,bill_receiver_phone         ,bill_receiver_email         ,receiver_name               ,receiver_phone              ,receiver_post_code          ,receiver_province           ,receiver_city               ,receiver_region             ,receiver_detail_address     ,note                        ,confirm_status              ,delete_status               ,use_integration             ,payment_time                ,delivery_time               ,receive_time                ,comment_time                ,modify_time                 from dwd17.oms_order where dt='${DT_WHOLE}'
)insert into table dwd17.oms_order partition(dt='${DT_INCR}')
selectid                          ,member_id                   ,coupon_id                   ,order_sn                    ,create_time                 ,member_username             ,total_amount                ,pay_amount                  ,freight_amount              ,promotion_amount            ,integration_amount          ,coupon_amount               ,discount_amount             ,pay_type                    ,source_type                 ,status                      ,order_type                  ,delivery_company            ,delivery_sn                 ,auto_confirm_day            ,integration                 ,growth                      ,promotion_info              ,bill_type                   ,bill_header                 ,bill_content                ,bill_receiver_phone         ,bill_receiver_email         ,receiver_name               ,receiver_phone              ,receiver_post_code          ,receiver_province           ,receiver_city               ,receiver_region             ,receiver_detail_address     ,note                        ,confirm_status              ,delete_status               ,use_integration             ,payment_time                ,delivery_time               ,receive_time                ,comment_time                ,modify_time
from
(selectid                          ,member_id                   ,coupon_id                   ,order_sn                    ,create_time                 ,member_username             ,total_amount                ,pay_amount                  ,freight_amount              ,promotion_amount            ,integration_amount          ,coupon_amount               ,discount_amount             ,pay_type                    ,source_type                 ,status                      ,order_type                  ,delivery_company            ,delivery_sn                 ,auto_confirm_day            ,integration                 ,growth                      ,promotion_info              ,bill_type                   ,bill_header                 ,bill_content                ,bill_receiver_phone         ,bill_receiver_email         ,receiver_name               ,receiver_phone              ,receiver_post_code          ,receiver_province           ,receiver_city               ,receiver_region             ,receiver_detail_address     ,note                        ,confirm_status              ,delete_status               ,use_integration             ,payment_time                ,delivery_time               ,receive_time                ,comment_time                ,modify_time                 ,row_number() over(partition by id order by modify_time desc)  as rn
from tmp
) o
where rn=1
"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

二、数据域大宽表加工

事实表关联各种纬度表(字典表、实体表、维表),行成宽表,方便后续各种多维度分析。

成交金额分析:

1、建立宽表并写入数据

#!/bin/bash# -- 成交金额分析
# -- DWS宽表(oms_order + oms_order_item)
# CREATE TABLE dws17.oms_order_detail (
#   id bigint COMMENT '订单id',
#   member_id bigint ,
#   coupon_id bigint ,
#   order_sn string  COMMENT '订单编号',
#   create_time BIGINT  COMMENT '提交时间',
#   member_username string  COMMENT '用户帐号',
#   total_amount decimal(10,2)  COMMENT '订单总金额',
#   pay_amount decimal(10,2)  COMMENT '应付金额(实际支付金额)',
#   freight_amount decimal(10,2)  COMMENT '运费金额',
#   promotion_amount decimal(10,2)  COMMENT '促销优化金额(促销价、满减、阶梯价)',
#   integration_amount decimal(10,2)  COMMENT '积分抵扣金额',
#   coupon_amount decimal(10,2)  COMMENT '优惠券抵扣金额',
#   discount_amount decimal(10,2)  COMMENT '管理员后台调整订单使用的折扣金额',
#   pay_type int  COMMENT '支付方式:0->未支付;1->支付宝;2->微信',
#   source_type int  COMMENT '订单来源:0->PC订单;1->app订单',
#   status int  COMMENT '订单状态:0->待付款;1->待发货;2->已发货;3->已完成;4->已关闭;5->无效订单',
#   order_type int  COMMENT '订单类型:0->正常订单;1->秒杀订单',
#   delivery_company string  COMMENT '物流公司',
#   delivery_sn string  COMMENT '物流单号',
#   auto_confirm_day int  COMMENT '自动确认时间(天)',
#   integration int  COMMENT '可以获得的积分',
#   growth int  COMMENT '可以活动的成长值',
#   promotion_info string  COMMENT '活动信息',
#   bill_type int  COMMENT '发票类型:0->不开发票;1->电子发票;2->纸质发票',
#   bill_header string  COMMENT '发票抬头',
#   bill_content string  COMMENT '发票内容',
#   bill_receiver_phone string  COMMENT '收票人电话',
#   bill_receiver_email string  COMMENT '收票人邮箱',
#   receiver_name string  COMMENT '收货人姓名',
#   receiver_phone string  COMMENT '收货人电话',
#   receiver_post_code string  COMMENT '收货人邮编',
#   receiver_province string  COMMENT '省份/直辖市',
#   receiver_city string  COMMENT '城市',
#   receiver_region string  COMMENT '区',
#   receiver_detail_address string  COMMENT '详细地址',
#   note string  COMMENT '订单备注',
#   confirm_status int  COMMENT '确认收货状态:0->未确认;1->已确认',
#   delete_status int   COMMENT '删除状态:0->未删除;1->已删除',
#   use_integration int  COMMENT '下单时使用的积分',
#   payment_time BIGINT  COMMENT '支付时间',
#   delivery_time BIGINT  COMMENT '发货时间',
#   receive_time BIGINT  COMMENT '确认收货时间',
#   comment_time BIGINT  COMMENT '评价时间',
#   modify_time BIGINT  COMMENT '修改时间',
#   item_product_id bigint ,
#   item_product_pic string ,
#   item_product_name string ,
#   item_product_brand string ,
#   item_product_sn string ,
#   item_product_price decimal(10,2)  COMMENT '销售价格',
#   item_product_quantity int  COMMENT '购买数量',
#   item_product_sku_id bigint  COMMENT '商品sku编号',
#   item_product_sku_code string  COMMENT '商品sku条码',
#   item_product_category_id bigint  COMMENT '商品分类id',
#   item_sp1 string  COMMENT '商品的销售属性',
#   item_sp2 string ,
#   item_sp3 string ,
#   item_promotion_name string  COMMENT '商品促销名称',
#   item_promotion_amount decimal(10,2)  COMMENT '商品促销分解金额',
#   item_coupon_amount decimal(10,2)  COMMENT '优惠券优惠分解金额',
#   item_integration_amount decimal(10,2)  COMMENT '积分优惠分解金额',
#   item_real_amount decimal(10,2)  COMMENT '该商品经过优惠后的分解金额',
#   item_gift_integration int ,
#   item_gift_growth int ,
#   item_product_attr string  COMMENT '商品销售属性:[{"key":"颜色","value":"颜色"},{"key":"容量","value":"4G"}]'
# )
# COMMENT '订单表'
# PARTITIONED BY (dt string)
# stored as PARQUET
# ;########################################################
#                                                      #
#  @author hunter@doitedu                              #
#  @date   ${DT_INCR}                                  #
#  @desc   oms_order增量抽取任务 启动脚本              #
#                                                      #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi-- 计算插入
${HIVE_HOME}/bin/hive  -e "
INSERT INTO TABLE dws17.oms_order_detail partition(dt='${DT}')
SELECTo.id                                ,o.member_id                         ,o.coupon_id                         ,o.order_sn                          ,o.create_time                       ,o.member_username                   ,o.total_amount                      ,o.pay_amount                        ,o.freight_amount                    ,o.promotion_amount                  ,o.integration_amount                ,o.coupon_amount                     ,o.discount_amount                   ,o.pay_type                          ,o.source_type                       ,o.status                            ,o.order_type                        ,o.delivery_company                  ,o.delivery_sn                       ,o.auto_confirm_day                  ,o.integration                       ,o.growth                            ,o.promotion_info                    ,o.bill_type                         ,o.bill_header                       ,o.bill_content                      ,o.bill_receiver_phone               ,o.bill_receiver_email               ,o.receiver_name                     ,o.receiver_phone                    ,o.receiver_post_code                ,o.receiver_province                 ,o.receiver_city                     ,o.receiver_region                   ,o.receiver_detail_address           ,o.note                              ,o.confirm_status                    ,o.delete_status                     ,o.use_integration                   ,o.payment_time                      ,o.delivery_time                     ,o.receive_time                      ,o.comment_time                      ,o.modify_time                       ,i.product_id                        ,i.product_pic                       ,i.product_name                      ,i.product_brand                     ,i.product_sn                        ,i.product_price                     ,i.product_quantity                  ,i.product_sku_id                    ,i.product_sku_code                  ,i.product_category_id               ,i.sp1                               ,i.sp2                               ,i.sp3                               ,i.promotion_name                    ,i.promotion_amount                  ,i.coupon_amount                     ,i.integration_amount                ,i.real_amount                       ,i.gift_integration                  ,i.gift_growth                       ,i.product_attr
FROM dwd17.oms_order o
join dwd17.oms_order_item i
on o.dt='${DT}' and o.id=i.order_id
"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

2、建立分析纬度表,并写入数据

#!/bin/bash# -- 成交金额分析统计报表
# -- 源表1: DWS宽表(dws17.oms_order_detail)
# -- 源表2: 会员信息表
# CREATE TABLE dwd17.ums_member_detail (
#   id bigint  ,
#   member_level_id bigint ,
#   username string  COMMENT '用户名',
#   password string  COMMENT '密码',
#   nickname string  COMMENT '昵称',
#   phone string  COMMENT '手机号码',
#   status int  COMMENT '帐号启用状态:0->禁用;1->启用',
#   create_time bigint  COMMENT '注册时间',
#   icon string  COMMENT '头像',
#   gender int  COMMENT '性别:0->未知;1->男;2->女',
#   birthday date  COMMENT '生日',
#   city string  COMMENT '所做城市',
#   job string  COMMENT '职业',
#   personalized_signature string  COMMENT '个性签名',
#   source_type int  COMMENT '用户来源',
#   integration int  COMMENT '积分',
#   growth int  COMMENT '成长值',
#   luckey_count int  COMMENT '剩余抽奖次数',
#   history_integration int  COMMENT '历史积分数量'
# )
# PARTITIONED BY (dt string)
# STORED as parquet
# ;# -- 目标表:ads17.oms_order_gmv_cube
# DROP TABLE ads17.oms_order_gmv_cube;
# CREATE TABLE ads17.oms_order_gmv_cube (
#   order_total_amount    decimal(10,2),
#   order_pay_amount      decimal(10,2),
#   coupon_amount decimal(10,2)  COMMENT '优惠券抵扣金额',
#   promotion_amount decimal(10,2)  COMMENT '促销优化金额(促销价、满减、阶梯价)',
#   integration_amount decimal(10,2)  COMMENT '积分抵扣金额',
#   dim_day     string,
#   dim_category_id     string,
#   dim_brand_id     string,
#   dim_member_level_id     string,
#   dim_order_type     string,
#   dim_source_type     string,
#   dim_promotion_name     string
# )
# COMMENT '订单表'
# PARTITIONED BY (dt string)
# stored as PARQUET
# ;########################################################
#                                                      #
#  @author hunter@doitedu                              #
#  @date   ${DT_INCR}                                  #
#  @desc   gmv统计分析报表计算任务启动脚本             #
#                                                      #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi# -- 计算插入
${HIVE_HOME}/bin/hive  -e "
INSERT INTO TABLE ads17.oms_order_gmv_cube partition(dt='${DT}')
SELECTsum(item_product_price*item_product_quantity) as order_total_amount   ,sum(item_real_amount)                         as order_pay_amount     ,sum(item_coupon_amount)                       as coupon_amount        ,sum(item_promotion_amount)                    as promotion_amount     ,sum(item_integration_amount)                  as integration_amount   ,'2020-10-07'                                  as dim_day              ,o.item_product_category_id                    as dim_category_id      ,    o.item_product_brand                          as dim_brand_id         ,  u.member_level_id                             as dim_member_level_id  ,o.order_type                                  as dim_order_type       ,  o.source_type                                 as dim_source_type      ,   o.item_promotion_name                         as dim_promotion_name
FROM dws17.oms_order_detail o
JOIN dwd17.ums_member_detail u on o.dt='${DT}' and from_unixtime(o.create_time,'yyyy-MM-dd')='${DT}' and  o.member_id=u.id
GROUP BY  o.item_product_category_id,o.item_product_brand,u.member_level_id,o.order_type,o.source_type,o.item_promotion_name
with cube
"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

下单退单分析:

1、建立退单宽表并写入数据

#!/bin/bash# -- 目标表建表:
# CREATE TABLE dws17.oms_order_and_return(
# od_id bigint COMMENT '订单id',
# od_member_id bigint ,
# od_coupon_id bigint ,
# od_order_sn string  COMMENT '订单编号',
# od_create_time BIGINT  COMMENT '提交时间',
# od_member_username string  COMMENT '用户帐号',
# od_total_amount string  COMMENT '订单总金额',
# od_pay_amount string  COMMENT '应付金额(实际支付金额)',
# od_freight_amount string  COMMENT '运费金额',
# od_promotion_amount string  COMMENT '促销优化金额(促销价、满减、阶梯价)',
# od_integration_amount string  COMMENT '积分抵扣金额',
# od_coupon_amount string  COMMENT '优惠券抵扣金额',
# od_discount_amount string  COMMENT '管理员后台调整订单使用的折扣金额',
# od_pay_type int  COMMENT '支付方式:0->未支付;1->支付宝;2->微信',
# od_source_type int  COMMENT '订单来源:0->PC订单;1->app订单',
# od_status int  COMMENT '订单状态:0->待付款;1->待发货;2->已发货;3->已完成;4->已关闭;5->无效订单',
# od_order_type int  COMMENT '订单类型:0->正常订单;1->秒杀订单',
# od_delivery_company string  COMMENT '物流公司',
# od_delivery_sn string  COMMENT '物流单号',
# od_auto_confirm_day int  COMMENT '自动确认时间(天)',
# od_integration int  COMMENT '可以获得的积分',
# od_growth int  COMMENT '可以活动的成长值',
# od_promotion_info string  COMMENT '活动信息',
# od_bill_type int  COMMENT '发票类型:0->不开发票;1->电子发票;2->纸质发票',
# od_bill_header string  COMMENT '发票抬头',
# od_bill_content string  COMMENT '发票内容',
# od_bill_receiver_phone string  COMMENT '收票人电话',
# od_bill_receiver_email string  COMMENT '收票人邮箱',
# od_receiver_name string  COMMENT '收货人姓名',
# od_receiver_phone string  COMMENT '收货人电话',
# od_receiver_post_code string  COMMENT '收货人邮编',
# od_receiver_province string  COMMENT '省份/直辖市',
# od_receiver_city string  COMMENT '城市',
# od_receiver_region string  COMMENT '区',
# od_receiver_detail_address string  COMMENT '详细地址',
# od_note string  COMMENT '订单备注',
# od_confirm_status int  COMMENT '确认收货状态:0->未确认;1->已确认',
# od_delete_status int   COMMENT '删除状态:0->未删除;1->已删除',
# od_use_integration int  COMMENT '下单时使用的积分',
# od_payment_time BIGINT  COMMENT '支付时间',
# od_delivery_time BIGINT  COMMENT '发货时间',
# od_receive_time BIGINT  COMMENT '确认收货时间',
# od_comment_time BIGINT  COMMENT '评价时间',
# od_modify_time BIGINT  COMMENT '修改时间' ,
# -- 退货表字段
# rt_id bigint ,
# rt_order_id bigint COMMENT '订单id',
# rt_company_address_id bigint COMMENT '收货地址表id',
# rt_product_id bigint COMMENT '退货商品id',
# rt_order_sn string COMMENT '订单编号',
# rt_create_time bigint COMMENT '申请时间',
# rt_member_username string COMMENT '会员用户名',
# rt_return_amount string COMMENT '退款金额',
# rt_return_name string COMMENT '退货人姓名',
# rt_return_phone string COMMENT '退货人电话',
# rt_status int COMMENT '申请状态:0->待处理;1->退货中;2->已完成;3->已拒绝',
# rt_handle_time bigint COMMENT '处理时间',
# rt_product_pic string COMMENT '商品图片',
# rt_product_name string COMMENT '商品名称',
# rt_product_brand string COMMENT '商品品牌',
# rt_product_attr string COMMENT '商品销售属性:颜色:红色;尺码:xl;',
# rt_product_count int COMMENT '退货数量',
# rt_product_price string COMMENT '商品单价',
# rt_product_real_price string COMMENT '商品实际支付单价',
# rt_reason string COMMENT '原因',
# rt_description string COMMENT '描述',
# rt_proof_pics string COMMENT '凭证图片,以逗号隔开',
# rt_handle_note string COMMENT '处理备注',
# rt_handle_man string COMMENT '处理人员',
# rt_receive_man string COMMENT '收货人',
# rt_receive_time bigint COMMENT '收货时间',
# rt_receive_note string COMMENT '收货备注'
# )
# PARTITIONED BY (dt STRING)
# STORED AS PARQUET
# ;########################################################
#                                                      #
#  @author hunter@doitedu                              #
#  @date   ${DT_INCR}                                  #
#  @desc   下单、退单统计分析服务表启动脚本            #
#                                                      #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi# -- 计算插入
${HIVE_HOME}/bin/hive  -e "
WITH od as (
SELECT * FROM  dwd17.oms_order WHERE dt='${DT}' AND from_unixtime(create_time,'yyyy-MM-dd')='${DT}'
),
rt as (
SELECT * FROM  dwd17.oms_order_return_apply WHERE dt='${DT}' AND from_unixtime(create_time,'yyyy-MM-dd')='${DT}'
)
INSERT INTO TABLE dws17.oms_order_and_return PARTITION (dt='${DT}')
SELECTod.id                        ,od.member_id                 ,od.coupon_id                 ,od.order_sn                  ,od.create_time               ,od.member_username           ,od.total_amount              ,od.pay_amount                ,od.freight_amount            ,od.promotion_amount          ,od.integration_amount        ,od.coupon_amount             ,od.discount_amount           ,od.pay_type                  ,od.source_type               ,od.status                    ,od.order_type                ,od.delivery_company          ,od.delivery_sn               ,od.auto_confirm_day          ,od.integration               ,od.growth                    ,od.promotion_info            ,od.bill_type                 ,od.bill_header               ,od.bill_content              ,od.bill_receiver_phone       ,od.bill_receiver_email       ,od.receiver_name             ,od.receiver_phone            ,od.receiver_post_code        ,od.receiver_province         ,od.receiver_city             ,od.receiver_region           ,od.receiver_detail_address   ,od.note                      ,od.confirm_status            ,od.delete_status             ,od.use_integration           ,od.payment_time              ,od.delivery_time             ,od.receive_time              ,od.comment_time              ,od.modify_time               ,rt.id                        ,rt.order_id                  ,rt.company_address_id        ,rt.product_id                ,rt.order_sn                  ,rt.create_time               ,rt.member_username           ,rt.return_amount             ,rt.return_name               ,rt.return_phone              ,rt.status                    ,rt.handle_time               ,rt.product_pic               ,rt.product_name              ,rt.product_brand             ,rt.product_attr              ,rt.product_count             ,rt.product_price             ,rt.product_real_price        ,rt.reason                    ,rt.description               ,rt.proof_pics                ,rt.handle_note               ,rt.handle_man                ,rt.receive_man               ,rt.receive_time              ,rt.receive_note
FROM od
FULL JOINrt
ONod.id=rt.order_id
"if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

2、建立退货分析纬度表,并写入数据

#!/bin/bash# -- 下单退单人数单数统计日报表
#
# -- 源表:dws.oms_order_and_return
# -- 目标:ads.oms_order_odcnt_oduser_cube
#
# -- 建表:
# CREATE TABLE ads17.oms_order_odcnt_oduser_cube(
#    order_cnt     int,
#    order_users   int,
#    order_cancel_cnt     int,
#    order_cancel_users   int,
#    order_return_cnt     int,
#    order_return_users   int,
#    order_return_p_cnt   int,
#    dim_dt               string,
#    dim_member_level_id  string,
#    dim_order_type       string,
#    dim_source_type      string
# )
# PARTITIONED BY (dt string)
# STORED AS PARQUET
# ;# o11,u1,${DT},1  ,  null
# o12,u1,${DT},1  ,  o12,p01,20.8,2,u1_name
# o12,u1,${DT},1  ,  o12,p02,10.0,1,u1_name
# o13,u2,${DT},5  ,  null
# o14,u2,${DT},5  ,  null
#          null   ,  o02,p06,10.50,1,u3_name
#          null   ,  o02,p04,10.50,1,u3_name########################################################
#                                                      #
#  @author hunter@doitedu                              #
#  @date   ${DT_INCR}                                  #
#  @desc   下单、退单统计分析服务表启动脚本               #
#                                                      #
########################################################export HIVE_HOME=/opt/apps/hive-3.1.2/DT=`date -d'-1 day' +%Y-%m-%d`if [ $1 ]
then
DT=$1
fi# -- 计算插入
${HIVE_HOME}/bin/hive  -e "
WITH od_rt AS (
SELECT  * FROM dws17.oms_order_and_return WHERE dt='${DT}'
)SELECTCOUNT(DISTINCT od_id) AS  order_cn,COUNT(DISTINCT od_member_id) AS order_users,COUNT(DISTINCT if(od_status=5,od_id,null))    AS order_cancel_cnt,COUNT(DISTINCT if(od_status=5,od_member_id,null))    AS order_cancel_users,COUNT(DISTINCT rt_order_id)   AS  order_return_cnt,COUNT(DISTINCT rt_member_username)   order_return_users ,SUM(rt_product_count) AS  order_return_p_cnt,'${DT}' AS  dim_dt ,member_level_id AS dim_member_level_id,od_order_type AS  dim_order_type ,od_source_type AS  dim_source_type
FROM
(SELECTod_rt.*,u.member_level_idFROM od_rt JOIN dwd17.ums_member_detail u ON od_rt.od_member_id=u.idUNION ALLSELECTod_rt.*,u.member_level_idFROM od_rt JOIN dwd17.ums_member_detail u ON od_rt.rt_member_username=u.username
) o
GROUP BY member_level_id,od_order_type,od_source_type
WITH cube"if [ $? -eq 0 ]
then echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
elseecho "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

三、消费画像标签(DWS层)

给用户打上一些消费相关(下单、退货、金额、客单价)的统计数据标签。

1、建表

drop table if exists ads17.userprofile_consume_tag;
create table ads17.userprofile_consume_tag(
user_id                        bigint     ,--用户
first_order_time               string     ,--首单日期
last_order_time                string     ,--末单日期
first_order_ago                bigint     ,--首单距今时间
last_order_ago                 bigint     ,--末单距今时间
month1_order_cnt               bigint     ,--近30天下单次数
month1_order_amt               double     ,--近30天购买金额(总金额)
month2_order_cnt               bigint     ,--近60天购买次数
month2_order_amt               double     ,--近60天购买金额
month3_order_cnt               bigint     ,--近90天购买次数
month3_order_amt               double     ,--近90天购买金额
max_order_amt                  double     ,--最大订单金额
min_order_amt                  double     ,--最小订单金额
total_order_cnt                bigint     ,--累计消费次数(不含退拒)
total_order_amt                double     ,--累计消费金额(不含退拒)
total_coupon_amt               double     ,--累计使用代金券金额
user_avg_order_amt             double     ,--平均订单金额(含退拒)
month3_user_avg_amt            double     ,--近90天平均订单金额(含退拒)
common_address                 string     ,--常用收货地址
common_paytype                 string     ,--常用支付方式
month1_cart_cnt_30             bigint     ,--最近30天加购次数
month1_cart_goods_cnt_30       bigint     ,--最近30天加购商品件数
month1_cart_cancel_cnt         bigint     ,--最近30天取消商品件数
dw_date                        string        -- 计算日期
) partitioned by (dt string)
stored as parquet
;

2、计算指标,写入目标表

-- 计算
-- 订单表
-- o11,u1,${DT},1
-- o12,u1,${DT},1  o12,退款金额
-- o13,u2,${DT},5
-- o14,u2,${DT},5 -- 退货表
-- o12,p01,20.8,2,u1_name
-- o12,p02,20.8,2,u1_name
-- o02,p06,10,1,u3_name
-- o02,p04,10,1,u3_namewith tmp1 as (SELECTa.member_id                                                                                                AS  member_id           ,--用户min(from_unixtime(a.create_time,'yyyy-MM-dd'))                                                             AS  first_order_time    ,--首单日期max(from_unixtime(a.create_time,'yyyy-MM-dd'))                                                             AS  last_order_time     ,--末单日期-- AS  first_order_ago     ,--首单距今时间                                                                 -- AS  last_order_ago      ,--末单距今时间                                                                 count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=30,1,null))                     AS  month1_order_cnt    ,--近30天下单次数sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=30,total_amount,0))               AS  month1_order_amt    ,--近30天购买金额(总金额)count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=60,1,null))                     AS  month2_order_cnt    ,--近60天购买次数sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=60,total_amount,0))               AS  month2_order_amt    ,--近60天购买金额count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,1,null))                     AS  month3_order_cnt    ,--近90天购买次数sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,total_amount,0))               AS  month3_order_amt    ,--近90天购买金额max(total_amount)                                                                                          AS  max_order_amt       ,--最大订单金额min(total_amount)                                                                                          AS  min_order_amt       ,--最小订单金额count(if(b.order_id is null,1,null))                                                                       AS  total_order_cnt     ,--累计消费次数(不含退拒)sum(if(b.order_id is null,total_amount,0))                                                                 AS  total_order_amt     ,--累计消费金额(不含退拒)sum(coupon_amount)                                                                                         AS  total_coupon_amt    ,--累计使用代金券金额round(avg(total_amount),2)                                                                                 AS  user_avg_order_amt  ,--平均订单金额(含退拒)round(avg(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,total_amount,null)),2)  AS  month3_user_avg_amt  --近90天平均订单金额(含退拒)FROM ( SELECT * FROM DWD17.oms_order od WHERE dt='2020-10-07' ) aLEFT JOIN ( SELECT order_id FROM dwd17.oms_order_return_apply rt WHERE dt='2020-10-07'  GROUP BY order_id) bON a.id = b.order_idGROUP BY a.member_id
) ,-- common_address                 string     ,--常用收货地址
tmp2 as (SELECTmember_id,addr as common_addressFROM (SELECTmember_id,addr,row_number() over(partition by member_id order by order_cnt desc) as rnFROM (SELECTmember_id,concat_ws(',',receiver_province,receiver_city,receiver_region,receiver_detail_address) as addr,count(1) as order_cntFROM DWD17.oms_order od WHERE dt='2020-10-07'GROUP BY member_id,concat_ws(',',receiver_province,receiver_city,receiver_region,receiver_detail_address)) o1) o2WHERE rn=1
) ,-- common_paytype                 string     ,--常用支付方式
tmp3 as (
SELECTmember_id,pay_type as common_paytype
FROM (SELECTmember_id,pay_type,row_number() over(partition by member_id order by order_cnt desc) as rnFROM (SELECTmember_id,pay_type,count(1) as order_cntFROM DWD17.oms_order od WHERE dt='2020-10-07'GROUP BY member_id,pay_type) o1
) o2
WHERE rn=1
) ,-- 购物车相关指标标签
-- month1_cart_cnt_30             bigint     ,--最近30天加购次数
-- month1_cart_goods_cnt_30       bigint     ,--最近30天加购商品件数
-- month1_cart_cancel_cnt         bigint     ,--最近30天取消商品件数
tmp4 as (
SELECTmember_id,COUNT(DISTINCT if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30,create_date,null))       as    month1_cart_cnt_30,SUM(if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30,quantity,0))                        as    month1_cart_goods_cnt_30,SUM(if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30 and delete_status=1,quantity,0))    as    month1_cart_cancel_cnt
FROM dwd17.oms_cart_item WHERE dt='2020-10-07'
GROUP BY member_id
),tmp5 as (
SELECT  member_id  FROM dwd17.oms_order od WHERE dt='2020-10-07'
UNION
SELECT  member_id  FROM dwd17.oms_cart_item WHERE dt='2020-10-07'
)INSERT INTO TABLE ads17.userprofile_consume_tag PARTITION (dt='2020-10-07')
SELECTtmp5.member_id   AS  user_id  ,--用户tmp1.first_order_time              ,--首单日期tmp1.last_order_time               ,--末单日期datediff('2020-10-07',tmp1.first_order_time) AS  first_order_ago  ,--首单距今时间datediff('2020-10-07',tmp1.last_order_time)  AS  last_order_ago   ,--末单距今时间tmp1.month1_order_cnt              ,--近30天下单次数tmp1.month1_order_amt              ,--近30天购买金额(总金额)tmp1.month2_order_cnt              ,--近60天购买次数tmp1.month2_order_amt              ,--近60天购买金额tmp1.month3_order_cnt              ,--近90天购买次数tmp1.month3_order_amt              ,--近90天购买金额tmp1.max_order_amt                 ,--最大订单金额tmp1.min_order_amt                 ,--最小订单金额tmp1.total_order_cnt               ,--累计消费次数(不含退拒)tmp1.total_order_amt               ,--累计消费金额(不含退拒)tmp1.total_coupon_amt              ,--累计使用代金券金额tmp1.user_avg_order_amt            ,--平均订单金额(含退拒)tmp1.month3_user_avg_amt           ,--近90天平均订单金额(含退拒)tmp2.common_address                ,--常用收货地址tmp3.common_paytype                ,--常用支付方式tmp4.month1_cart_cnt_30            ,--最近30天加购次数tmp4.month1_cart_goods_cnt_30      ,--最近30天加购商品件数tmp4.month1_cart_cancel_cnt        ,--最近30天取消商品件数'2020-10-07'  AS    dw_date         -- 计算日期
FROM  tmp5
LEFT JOIN tmp1 ON tmp5.member_id=tmp1.member_id
LEFT JOIN tmp2 ON tmp5.member_id=tmp2.member_id
LEFT JOIN tmp3 ON tmp5.member_id=tmp3.member_id
LEFT JOIN tmp4 ON tmp5.member_id=tmp4.member_id
;

四、退货拒货画像标签

退拒商品统计标签表

1、建表

drop table if exists ads17.userprofile_reject_tag;
create table ads17.userprofile_reject_tag(
user_id                           bigint      ,-- 用户
p_sales_cnt                       bigint      ,-- 不含退拒商品购买数量
p_sales_amt                       double      ,-- 不含退拒商品购买的商品总价
p_sales_cut_amt                   double      ,-- 不含退拒实付金额(扣促销减免)
h_sales_cnt                       bigint      ,-- 含退拒购买数量
h_sales_amt                       double      ,-- 含退拒购买金额
h_sales_cut_amt                   double      ,-- 含退拒实付金额(扣促销减免)
return_cnt                        bigint      ,-- 退货商品数量
return_amt                        double      ,-- 退货商品金额
dw_date                            bigint
) partitioned by (dt string)
stored as parquet
;

2、计算指标,并写入数据

-- 计算
-- 将 订单商品项详情表 关联  退货申请记录表,得到如下形式schema
-- 用户,订单,商品,商品价格,购买数量,实付金额,   退货订单Id,退货商品,退货件数,退货的金额
-- zhas,od01,p1,  20,       100    ,  1800   ,    od01         p1   ,  80    , 1400
-- zhas,od01,p2,  10,       20    ,   200   ,     null         null    null     nullSELECTa.member_id                                                                                                         as user_id                ,sum(if(b.order_id is null,a.product_quantity,a.product_quantity-b.product_count))                                   as p_sales_cnt            ,sum(if(b.order_id is null,a.product_quantity*a.product_price,(a.product_quantity-b.product_count)*a.product_price)) as p_sales_amt            ,sum(if(b.order_id is null,a.real_amount,a.real_amount-b.return_amount))                                             as p_sales_cut_amt        ,sum(a.product_quantity)                                                                                             as h_sales_cnt            ,sum(a.product_quantity*a.product_price)                                                                             as h_sales_amt            ,sum(a.real_amount)                                                                                                  as h_sales_cut_amt        ,sum(b.product_count)                                                                                                as return_cnt             ,sum(b.return_amount)                                                                                                as return_amt             ,'2020-10-07'                                                                                                        as dw_date
FROM (SELECT oi.*,od.member_id FROM  dwd17.oms_order_item oiJOIN  dwd17.oms_order  odON od.dt='2020-10-07' and oi.dt='2020-10-07' and od.id=oi.order_id) a
LEFT JOIN (SELECT * FROM  dwd17.oms_order_return_apply where dt='2020-10-07') b
ON a.order_id=b.order_id and a.product_id=b.product_id
GROUP BY a.member_id

五、拉链表

数据合并后,已经看不到前一天的数据,所以需要拉链表,记住每天的数据状态

1、订单表

create table test.order_incr_test(
oid   string,
uid   string,
amt   string,
sts   string
)
partitioned by (dt string)
row format delimited fields terminated by ','
;

2、10月1日数据

o1,u1,20,status1
o2,u1,30,status1
o3,u2,40,status4
o4,u3,28,status3

10月2日增量数据

o1,u1,20,status2
o2,u1,30,status3
o5,u4,26,status1

3、创建拉链表

create table test.order_zip_test(
oid   string,
uid   string,
amt   string,
sts   string,
start_dt  string,
end_dt    string
)
partitioned by (dt string)
row format delimited fields terminated by ','
;

4、计算逻辑

-- 计算逻辑
-- 1.将T-1日的拉链表  LEFT JOIN  T日的增量,对T-1拉链中的数据有效期做更新
-- 2.UNION ALL 上T日增量(老数据的新状态,新数据)with zip as (select * from test.order_zip_test  where dt='2020-10-01'
),
newdata as (select *from test.order_incr_test where dt='2020-10-02'
)INSERT INTO TABLE test.order_zip_test PARTITION(dt='2020-10-02')
SELECTzip.oid,zip.uid,zip.amt,zip.sts,zip.start_dt,if(newdata.oid is not null and end_dt='9999-12-31',zip.dt,zip.end_dt) as end_dt
FROM  zip left join newdata on zip.oid=newdata.oidUNION ALLSELECToid,uid,amt,sts,dt as start_dt,'9999-12-31' as end_dt
FROM newdata

大数据之数据仓库建设(三)相关推荐

  1. 大数据之数据仓库建设(二)

    一.DWS 层开发 它的建模思想,就是为最终需求计算来提供支持服务,所以建模相对灵活. 常见建模方法: 1.维度集成(建宽表): 事实表中,将各种维度 id,和维度表关联后换成各种维度值,有可能将多个 ...

  2. 三位院士压轴,大数据产业生态建设与发展高峰会成功举办

    数领万物,共创未来.12月25日下午,由中国国际大数据产业博览会组委会主办的"永不落幕的数博会"2020系列活动--"大数据产业生态创新发展高峰会"在北京举行, ...

  3. oracle 数据立方_大数据之数据仓库分层

    大数据之数据仓库分层 1. 什么是数据分层? 2. 数据分层的好处 一种通用的数据分层设计 3. 举例 4. 各层会用到的计算引擎和存储系统 5. 分层实现 6.数据分层的一些概念说明 7.大数据相关 ...

  4. 高校大数据专业竞赛建设方案

    第一章 建设背景 1.1 政策分析  2017年1月 工业和信息化部正式发布了<大数据产业发展规划(2016-2020年)>,明确了"十三五"时期大数据产业的发展思路 ...

  5. 大数据治理平台建设方案(文末附PDF下载)

    这份材料我给满分!分享一份非常好的大数据治理平台解决方案材料,这份PPT将理论与实践相结合,值得仔细阅读,建议收藏. 文档目录主要包含了以下几点: 数据治理概述 某行数据现状及问题 数据治理阶段目标 ...

  6. 魅族大数据可视化平台建设之路

    本文是根据魅族科技大数据平台架构师赵天烁3月31日在msup携手魅族主办的第十二期魅族技术开放日<魅族大数据可视化平台建设之路>演讲中的分享内容整理而成. 内容简介:本文主要从现状& ...

  7. 数据仓库物理分层_大数据之数据仓库分层

    大数据之数据仓库分层 什么是数据分层? 数据分层的好处 一种通用的数据分层设计 举例 各层会用到的计算引擎和存储系统 分层实现 数据分层的一些概念说明 7.大数据相关基础概念 1. 什么是数据分层? ...

  8. 有赞大数据平台安全建设实践

    一.概述 在大数据平台建设初期,安全也许并不是被重点关注的一环.大数据平台的定位主要是服务数据开发人员,提高数据开发效率,提供便捷的开发流程,有效支持数仓建设.大数据平台的用户都是公司内部人员.数据本 ...

  9. 大数据BI平台建设需注意什么问题

    大数据是新资源.新技术和新理念的混合体.从资源的角度看,大数据是一种新的资源,反映了一种新的资源观.因此,处理好大数据的问题,就拥有了提前了解市场的能力.但是,大数据分析技术也面临着一些问题,大数据B ...

最新文章

  1. 从“优化”、“向社会输送人才”到“毕业”,互联网的高情商裁员
  2. Python Web实时消息后台服务器推送技术---GoEasy
  3. 后台开发经典书籍--深入理解计算机系统
  4. 进程间通信——DLL共享节
  5. FIXML and FpML - Background, Comparison, Integration Interoperability Opportunities
  6. python iloc函数_python选取特定列 pandas iloc,loc,icol的使用详解(列切片及行切片)
  7. 正则去除html行内样式,Android-富文本处理-html字符串去掉内部样式,统一添加body、style,统一支持换行等...
  8. 【cocos2d-x从c++到js】08:JS脚本语言的优势与一些问题
  9. Java Web学习笔记06:利用JDBC访问数据库
  10. 关于html的一切(updating...)
  11. dell mobile connect 兑换码_剑与远征万圣节兑换码是什么?剑与远征2020万圣节兑换码使用解析...
  12. 论文翻译:MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing
  13. 游戏反编译工具dnSpy
  14. TP5 微信分享朋友圈接口显示自定义图片和标题
  15. mysql chunk_【MySQL参数】-innodb_buffer_pool_chunk_size
  16. 概率论基础知识(三) 参数估计
  17. Bootstrap4使用教程
  18. MiniGPT4,开源了
  19. html5工业相机,AVT工业相机
  20. Java高级开发面试题整理

热门文章

  1. 南京理工大学matlab怎么弄,基于MATLAB/SimDriveline 的某型军用车辆 起步过程仿真研究...
  2. 美国标准的网络安全体系架构
  3. 【前端实例代码】使用 HTML CSS 和 JavaScript 实现具有彩色发光霓虹灯效果的数字时钟|前端开发 网页制作 基础入门教程
  4. net npf 服务名无效_win10系统打开wireshark提示npF驱动没有运行的处理方法
  5. 星际战甲与计算机版本不兼容,win10星际战甲游戏出现无法更新的三种解决方法...
  6. mybatis 实体嵌套查询
  7. 还想朝九晚五?不可能!
  8. 【黑马Java笔记+踩坑】MyBatisPlus基础
  9. 关于设定校园二手租赁系统的计划、功能及建议
  10. 《从程序员到项目经理》读后感-合群