1 说明

本文基于《本地数据仓库项目(一)——本地数仓搭建详细流程》业务数据,在本地搭建系统业务数仓。
根据模拟sql脚本生成业务数据,依次执行生成业务数据即可。

sql脚本提供如下

链接:https://pan.baidu.com/s/1AhLIuTNIyJ_GBD7M0b2RoA
提取码:1lm8

生成的数据如下:

2 业务数据导入数仓

数仓整体框架如下,在前面的《本地数据仓库项目(一)——本地数仓搭建详细流程》已完成对数据采集及分析整体流程。这里的业务数仓数据需要用到sqoop完成从mysql导入数据到HDFS中。

2.1 安装sqoop

2.1.1 解压并重命名

tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop-1.4.6

2.1.2 配置SQOOP_HOME环境变量

SQOOP_HOME=/root/soft/sqoop-1.4.6
PATH=$PATH:$JAVA_HOME/bin:$SHELL_HOME:$FLUME_HOME/bin:$HIVE_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin

2.1.3 配置sqoop-env.sh

mv sqoop-env-template.sh sqoop-env.sh
export HADOOP_COMMON_HOME=/root/soft/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/root/soft/hadoop-2.7.2
export HIVE_HOME=/root/soft/hive
export ZOOKEEPER_HOME=/root/soft/zookeeper-3.4.10
export ZOOCFGDIR=/root/soft/zookeeper-3.4.10

2.1.4拷贝mysql的jdbc驱动到sqoop的lib目录下

2.1.5 测试链接

bin/sqoop list-databases --connect jdbc:mysql://192.168.2.100:3306/ --username root --password 123456

出现如下页面表示sqoop安装成功

2.2 sqoop导入数据到HDFS

如下sqoop脚本,可实现定时自动导入数据到HDFS

#!/bin/bashdb_date=$2
echo $db_date
db_name=gmallimport_data() {/root/soft/sqoop-1.4.6/bin/sqoop import \
--connect jdbc:mysql://192.168.2.100:3306/$db_name \
--username root \
--password 123456 \
--target-dir /origin_data/$db_name/db/$1/$db_date \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query "$2"' and $CONDITIONS;' \
--null-string '\\N' \
--null-non-string '\\N'
}import_sku_info(){import_data "sku_info" "select
id, spu_id, price, sku_name, sku_desc, weight, tm_id,
category3_id, create_timefrom sku_info where 1=1"
}import_user_info(){import_data "user_info" "select
id, name, birthday, gender, email, user_level,
create_time
from user_info where 1=1"
}import_base_category1(){import_data "base_category1" "select
id, name from base_category1 where 1=1"
}import_base_category2(){import_data "base_category2" "select
id, name, category1_id from base_category2 where 1=1"
}import_base_category3(){import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1"
}import_order_detail(){import_data   "order_detail"   "select od.id, order_id, user_id, sku_id, sku_name, order_price, sku_num, o.create_time  from order_info o, order_detail odwhere o.id=od.order_idand DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date'"
}import_payment_info(){import_data "payment_info"   "select id,  out_trade_no, order_id, user_id, alipay_trade_no, total_amount,  subject, payment_type, payment_time from payment_info where DATE_FORMAT(payment_time,'%Y-%m-%d')='$db_date'"
}import_order_info(){import_data   "order_info"   "select id, total_amount, order_status, user_id, payment_way, out_trade_no, create_time, operate_time  from order_info where (DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$db_date')"
}case $1 in"base_category1")import_base_category1
;;"base_category2")import_base_category2
;;"base_category3")import_base_category3
;;"order_info")import_order_info
;;"order_detail")import_order_detail
;;"sku_info")import_sku_info
;;"user_info")import_user_info
;;"payment_info")import_payment_info
;;"all")import_base_category1import_base_category2import_base_category3import_order_infoimport_order_detailimport_sku_infoimport_user_infoimport_payment_info
;;
esac

注意:
①默认sqoop到import数据时,将Mysql的Null类型,转为’null’
②hive中使用\N代表NULL类型
③如果希望在import时,讲将Mysql的Null类型,转为自己期望的类型,
需要使用–null-string and --null-non-string

–null-string: 当mysql的string类型列为null时,导入到hive时,使用什么来代替!
–null-string a: 如果mysql中,当前列是字符串类型(varchar,char),假如这列值为NULL,导入到hive时,使用a来代替!
–null-non-string: 当mysql的非string类型列为null时,导入到hive时,使用什么来代替!
–null-non-string b: 如果mysql中,当前列不是字符串类型(varchar,char),假如这列值为NULL,导入到hive时,使用b来代替!

④如果到导出时,希望将指定的参数,导出为mysql的NULL类型,需要使用

–input-null-string and --input-null-non-string --input-null-string a: 在hive导出到mysql时,如果hive中string类型的列的值为a,导出到mysql中,使用NULL代替!
–input-null-non-string b:
在hive导出到mysql时,如果hive中非string类型的列的值为b,导出到mysql中,使用NULL代替!

执行脚本,导入数据

3 ODS层

3.1 创建ods表

3.1.1 创建订单表

drop table if exists ods_order_info;
create external table ods_order_info (`id` string COMMENT '订单编号',`total_amount` decimal(10,2) COMMENT '订单金额',`order_status` string COMMENT '订单状态',`user_id` string COMMENT '用户id',`payment_way` string COMMENT '支付方式',`out_trade_no` string COMMENT '支付流水号',`create_time` string COMMENT '创建时间',`operate_time` string COMMENT '操作时间'
) COMMENT '订单表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_order_info/';

3.1.2 创建订单明细表

drop table if exists ods_order_detail;
create external table ods_order_detail( `id` string COMMENT '订单详情编号',`order_id` string  COMMENT '订单号', `user_id` string COMMENT '用户id',`sku_id` string COMMENT '商品id',`sku_name` string COMMENT '商品名称',`order_price` string COMMENT '商品单价',`sku_num` string COMMENT '商品数量',`create_time` string COMMENT '创建时间'
) COMMENT '订单明细表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_order_detail/';

3.1.3 创建商品信息表

drop table if exists ods_sku_info;
create external table ods_sku_info( `id` string COMMENT 'skuId',`spu_id` string   COMMENT 'spuid', `price` decimal(10,2) COMMENT '价格',`sku_name` string COMMENT '商品名称',`sku_desc` string COMMENT '商品描述',`weight` string COMMENT '重量',`tm_id` string COMMENT '品牌id',`category3_id` string COMMENT '品类id',`create_time` string COMMENT '创建时间'
) COMMENT '商品表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_sku_info/';

3.1.4 创建用户表

drop table if exists ods_user_info;
create external table ods_user_info( `id` string COMMENT '用户id',`name`  string COMMENT '姓名',`birthday` string COMMENT '生日',`gender` string COMMENT '性别',`email` string COMMENT '邮箱',`user_level` string COMMENT '用户等级',`create_time` string COMMENT '创建时间'
) COMMENT '用户信息'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_user_info/';

3.1.5 创建商品一级分类表

drop table if exists ods_base_category1;
create external table ods_base_category1( `id` string COMMENT 'id',`name`  string COMMENT '名称'
) COMMENT '商品一级分类'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_base_category1/';

3.1.6 创建商品二级分类表

drop table if exists ods_base_category2;
create external table ods_base_category2( `id` string COMMENT ' id',`name` string COMMENT '名称',category1_id string COMMENT '一级品类id'
) COMMENT '商品二级分类'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_base_category2/';

3.1.7 创建商品三级表

drop table if exists ods_base_category3;
create external table ods_base_category3(`id` string COMMENT ' id',`name`  string COMMENT '名称',category2_id string COMMENT '二级品类id'
) COMMENT '商品三级分类'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_base_category3/';

3.1.8 创建支付流水表

drop table if exists ods_payment_info;
create external table ods_payment_info(`id`   bigint COMMENT '编号',`out_trade_no`    string COMMENT '对外业务编号',`order_id`        string COMMENT '订单编号',`user_id`         string COMMENT '用户编号',`alipay_trade_no` string COMMENT '支付宝交易流水编号',`total_amount`    decimal(16,2) COMMENT '支付金额',`subject`         string COMMENT '交易内容',`payment_type`    string COMMENT '支付类型',`payment_time`    string COMMENT '支付时间')  COMMENT '支付流水表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_payment_info/';

3.2 导入数据

load data inpath '/origin_data/gmall/db/order_info/2023-01-04' OVERWRITE into table gmall.ods_order_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/order_info/2023-01-05' OVERWRITE into table gmall.ods_order_info partition(dt='2023-01-05');load data inpath '/origin_data/gmall/db/order_detail/2023-01-04' OVERWRITE into table gmall.ods_order_detail partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/order_detail/2023-01-05' OVERWRITE into table gmall.ods_order_detail partition(dt='2023-01-05');load data inpath '/origin_data/gmall/db/sku_info/2023-01-04' OVERWRITE into table gmall.ods_sku_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/sku_info/2023-01-05' OVERWRITE into table gmall.ods_sku_info partition(dt='2023-01-05');load data inpath '/origin_data/gmall/db/user_info/2023-01-04' OVERWRITE into table gmall.ods_user_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/user_info/2023-01-05' OVERWRITE into table gmall.ods_user_info partition(dt='2023-01-05');load data inpath '/origin_data/gmall/db/payment_info/2023-01-04' OVERWRITE into table gmall.ods_payment_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/payment_info/2023-01-05' OVERWRITE into table gmall.ods_payment_info partition(dt='2023-01-05');load data inpath '/origin_data/gmall/db/base_category1/2023-01-04' OVERWRITE into table gmall.ods_base_category1 partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/base_category1/2023-01-05' OVERWRITE into table gmall.ods_base_category1 partition(dt='2023-01-05');load data inpath '/origin_data/gmall/db/base_category2/2023-01-04' OVERWRITE into table gmall.ods_base_category2 partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/base_category2/2023-01-05' OVERWRITE into table gmall.ods_base_category2 partition(dt='2023-01-05');load data inpath '/origin_data/gmall/db/base_category3/2023-01-04' OVERWRITE into table gmall.ods_base_category3 partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/base_category3/2023-01-05' OVERWRITE into table gmall.ods_base_category3 partition(dt='2023-01-05');

可以将以上写成脚本,以日期为传参参数,每天定时执行即可。

4 DWD层

4.1 创建dwd明细表

4.1.1 创建订单表

drop table if exists dwd_order_info;
create external table dwd_order_info (`id` string COMMENT '',`total_amount` decimal(10,2) COMMENT '',`order_status` string COMMENT ' 1 2 3 4 5',`user_id` string COMMENT 'id',`payment_way` string COMMENT '',`out_trade_no` string COMMENT '',`create_time` string COMMENT '',`operate_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_order_info/'
tblproperties ("parquet.compression"="snappy");

4.1.2 创建订单详情表

drop table if exists dwd_order_detail;
create external table dwd_order_detail( `id` string COMMENT '',`order_id` decimal(10,2) COMMENT '', `user_id` string COMMENT 'id',`sku_id` string COMMENT 'id',`sku_name` string COMMENT '',`order_price` string COMMENT '',`sku_num` string COMMENT '',`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_order_detail/'
tblproperties ("parquet.compression"="snappy");

4.1.3 创建用户表

drop table if exists dwd_user_info;
create external table dwd_user_info( `id` string COMMENT 'id',`name` string COMMENT '', `birthday` string COMMENT '',`gender` string COMMENT '',`email` string COMMENT '',`user_level` string COMMENT '',`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_user_info/'
tblproperties ("parquet.compression"="snappy");

4.1.4 创建支付流水表

drop table if exists dwd_payment_info;
create external table dwd_payment_info(`id`   bigint COMMENT '',`out_trade_no`    string COMMENT '',`order_id`        string COMMENT '',`user_id`         string COMMENT '',`alipay_trade_no` string COMMENT '',`total_amount`    decimal(16,2) COMMENT '',`subject`         string COMMENT '',`payment_tpe`    string COMMENT '',`payment_time`    string COMMENT '')
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_payment_info/'
tblproperties ("parquet.compression"="snappy");

4.1.5 创建商品分类表

drop table if exists dwd_sku_info;
create external table dwd_sku_info(`id` string COMMENT 'skuId',`spu_id` string COMMENT 'spuid',`price` decimal(10,2) COMMENT '',`sku_name` string COMMENT '',`sku_desc` string COMMENT '',`weight` string COMMENT '',`tm_id` string COMMENT 'id',`category3_id` string COMMENT '1id',`category2_id` string COMMENT '2id',`category1_id` string COMMENT '3id',`category3_name` string COMMENT '3',`category2_name` string COMMENT '2',`category1_name` string COMMENT '1',`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_sku_info/'
tblproperties ("parquet.compression"="snappy");

4.2 导入数据

#!/bin/bash
# 定义变量方便修改
APP=gmall
hive=/root/soft/hive/bin/hive# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;thendo_date=$1
else do_date=`date -d "-1 day" +%F`
fi sql="set hive.exec.dynamic.partition.mode=nonstrict;insert overwrite table "$APP".dwd_order_info partition(dt)
select * from "$APP".ods_order_info
where dt='$do_date' and id is not null;insert overwrite table "$APP".dwd_order_detail partition(dt)
select * from "$APP".ods_order_detail
where dt='$do_date'   and id is not null;insert overwrite table "$APP".dwd_user_info partition(dt)
select * from "$APP".ods_user_info
where dt='$do_date' and id is not null;insert overwrite table "$APP".dwd_payment_info partition(dt)
select * from "$APP".ods_payment_info
where dt='$do_date' and id is not null;insert overwrite table "$APP".dwd_sku_info partition(dt)
select  sku.id,sku.spu_id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.tm_id,sku.category3_id,c2.id category2_id,c1.id category1_id,c3.name category3_name,c2.name category2_name,c1.name category1_name,sku.create_time,sku.dt
from"$APP".ods_sku_info sku
join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id
where sku.dt='$do_date'  and c2.dt='$do_date'
and c3.dt='$do_date' and c1.dt='$do_date'
and sku.id is not null;
"
$hive -e "$sql"

5 dws层

5.1 用户行为宽表

需求目标,把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后进行,不同角度的统计分析。

5.1.1 创建用户行为宽表

drop table if exists dws_user_action;
create external table dws_user_action
(   user_id          string      comment '用户 id',order_count     bigint      comment '下单次数 ',order_amount    decimal(16,2)  comment '下单金额 ',payment_count   bigint      comment '支付次数',payment_amount  decimal(16,2) comment '支付金额 ',comment_count   bigint      comment '评论次数'
) COMMENT '每日用户行为宽表'
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dws/dws_user_action/';

5.1.2 导入数据

with
tmp_order as
(select user_id, count(*)  order_count,sum(oi.total_amount) order_amountfrom dwd_order_info oiwhere date_format(oi.create_time,'yyyy-MM-dd')='2023-01-04'group by user_id
) ,
tmp_payment as
(selectuser_id, sum(pi.total_amount) payment_amount, count(*) payment_count from dwd_payment_info pi where date_format(pi.payment_time,'yyyy-MM-dd')='2023-01-04'group by user_id
),
tmp_comment as
(selectuser_id,count(*) comment_countfrom dwd_comment_log cwhere date_format(c.dt,'yyyy-MM-dd')='2023-01-04'group by user_id
)insert overwrite table dws_user_action partition(dt='2023-01-04')
selectuser_actions.user_id,sum(user_actions.order_count),sum(user_actions.order_amount),sum(user_actions.payment_count),sum(user_actions.payment_amount),sum(user_actions.comment_count)
from
(selectuser_id,order_count,order_amount,0 payment_count,0 payment_amount,0 comment_countfrom tmp_orderunion allselectuser_id,0,0,payment_count,payment_amount,0from tmp_paymentunion allselectuser_id,0,0,0,0,comment_countfrom tmp_comment) user_actions
group by user_id;

6 需求

6.1 需求1

求GMV成交总额。GMV是指一定时间内的成交总额(如一天、一周、一个月)
建表

drop table if exists ads_gmv_sum_day;
create external table ads_gmv_sum_day(`dt` string COMMENT '统计日期',`gmv_count`  bigint COMMENT '当日gmv订单个数',`gmv_amount`  decimal(16,2) COMMENT '当日gmv订单总金额',`gmv_payment`  decimal(16,2) COMMENT '当日支付金额'
) COMMENT 'GMV'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_gmv_sum_day/';

插入数据

INSERT INTO TABLE ads_gmv_sum_day
SELECT
'2023-01-04' dt,
sum(order_count) gmv_count,
sum(order_amount) gmv_amount,
sum(payment_amount) gmv_payment
FROM
dws_user_action
WHERE dt='2023-01-04'
GROUP BY dt;

6.2 需求2

求转换率之用户新鲜度及漏斗分析

6.2.1 ADS层之新增用户占日活跃用户比率(用户新鲜度)

建表

drop table if exists ads_user_convert_day;
create external table ads_user_convert_day( `dt` string COMMENT '统计日期',`uv_m_count`  bigint COMMENT '当日活跃设备',`new_m_count`  bigint COMMENT '当日新增设备',`new_m_ratio`   decimal(10,2) COMMENT '当日新增占日活的比率'
) COMMENT '转化率'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_user_convert_day/';

数据导入

insert into table ads_user_convert_day
select'2023-01-04' dt,sum(uc.dc) sum_dc,sum(uc.nmc) sum_nmc,cast(sum( uc.nmc)/sum( uc.dc)*100 as decimal(10,2)) new_m_ratio
from
(selectday_count dc,0 nmcfrom ads_uv_countwhere dt='2023-01-04'union allselect0 dc,new_mid_count nmcfrom ads_new_mid_countwhere create_date='2023-01-04'
)uc;

6.2.2 ADS层之用户行为漏斗分析


创建表

drop table if exists ads_user_action_convert_day;
create external  table ads_user_action_convert_day(`dt` string COMMENT '统计日期',`total_visitor_m_count`  bigint COMMENT '总访问人数',`order_u_count` bigint     COMMENT '下单人数',`visitor2order_convert_ratio`  decimal(10,2) COMMENT '访问到下单转化率',`payment_u_count` bigint     COMMENT '支付人数',`order2payment_convert_ratio` decimal(10,2) COMMENT '下单到支付的转化率') COMMENT '用户行为漏斗分析'
row format delimited  fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_user_action_convert_day/';

插入数据

insert into table ads_user_action_convert_day
select '2023-01-04',uv.day_count,ua.order_count,cast(ua.order_count/uv.day_count as  decimal(10,2)) visitor2order_convert_ratio,ua.payment_count,cast(ua.payment_count/ua.order_count as  decimal(10,2)) order2payment_convert_ratio
from
(select dt,sum(if(order_count>0,1,0)) order_count,sum(if(payment_count>0,1,0)) payment_countfrom dws_user_actionwhere dt='2023-01-04'group by dt
)ua join ads_uv_count  uv on uv.dt=ua.dt;

6.3 需求3

品牌复购率
需求:以月为单位统计,购买2次以上商品的用户

6.3.1 DWS层建表

drop table if exists dws_sale_detail_daycount;
create external table dws_sale_detail_daycount
(   user_id   string  comment '用户 id',sku_id    string comment '商品 Id',user_gender  string comment '用户性别',user_age string  comment '用户年龄',user_level string comment '用户等级',order_price decimal(10,2) comment '商品价格',sku_name string   comment '商品名称',sku_tm_id string   comment '品牌id',sku_category3_id string comment '商品三级品类id',sku_category2_id string comment '商品二级品类id',sku_category1_id string comment '商品一级品类id',sku_category3_name string comment '商品三级品类名称',sku_category2_name string comment '商品二级品类名称',sku_category1_name string comment '商品一级品类名称',spu_id  string comment '商品 spu',sku_num  int comment '购买个数',order_count string comment '当日下单单数',order_amount string comment '当日下单金额'
) COMMENT '用户购买商品明细表'
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dws/dws_user_sale_detail_daycount/'
tblproperties ("parquet.compression"="snappy");

数据导入

with
tmp_detail as
(selectuser_id,sku_id, sum(sku_num) sku_num,   count(*) order_count, sum(od.order_price*sku_num) order_amountfrom dwd_order_detail odwhere od.dt='2023-01-05'group by user_id, sku_id
)
insert overwrite table dws_sale_detail_daycount partition(dt='2023-01-05')
select tmp_detail.user_id,tmp_detail.sku_id,u.gender,months_between('2023-01-05', u.birthday)/12  age, u.user_level,price,sku_name,tm_id,category3_id,category2_id,category1_id,category3_name,category2_name,category1_name,spu_id,tmp_detail.sku_num,tmp_detail.order_count,tmp_detail.order_amount
from tmp_detail
left join dwd_user_info u on tmp_detail.user_id =u.id and u.dt='2023-01-05'
left join dwd_sku_info s on tmp_detail.sku_id =s.id and s.dt='2023-01-05';

6.3.2 ods层

建表

drop table ads_sale_tm_category1_stat_mn;
create external table ads_sale_tm_category1_stat_mn
(   tm_id string comment '品牌id',category1_id string comment '1级品类id ',category1_name string comment '1级品类名称 ',buycount   bigint comment  '购买人数',buy_twice_last bigint  comment '两次以上购买人数',buy_twice_last_ratio decimal(10,2)  comment  '单次复购率',buy_3times_last   bigint comment   '三次以上购买人数',buy_3times_last_ratio decimal(10,2)  comment  '多次复购率',stat_mn string comment '统计月份',stat_date string comment '统计日期'
)   COMMENT '复购率统计'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_sale_tm_category1_stat_mn/';

插入数据

insert into table ads_sale_tm_category1_stat_mn
select   mn.sku_tm_id,mn.sku_category1_id,mn.sku_category1_name,sum(if(mn.order_count>=1,1,0)) buycount,sum(if(mn.order_count>=2,1,0)) buyTwiceLast,sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,sum(if(mn.order_count>=3,1,0))  buy3timeLast  ,sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,date_format('2023-01-04' ,'yyyy-MM') stat_mn,'2023-01-04' stat_date
from
(
select user_id,
sd.sku_tm_id,sd.sku_category1_id,sd.sku_category1_name,sum(order_count) order_countfrom dws_sale_detail_daycount sd where date_format(dt,'yyyy-MM')=date_format('2023-01-04' ,'yyyy-MM')group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name
) mn
group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;

6.4 需求4

各用户等级对应的复购率前十的商品排行
建表

drop  table ads_ul_rep_ratio;
create  table ads_ul_rep_ratio(   user_level string comment '用户等级' ,sku_id string comment '商品id',
buy_count bigint  comment '购买总人数',
buy_twice_count bigint comment  '两次购买总数',buy_twice_rate decimal(10,2)  comment  '二次复购率',
rank string comment  '排名' ,state_date string comment '统计日期'
)   COMMENT '复购率统计'
row format delimited  fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_ul_rep_ratio/';

插入数据

with
tmp_count as(select -- 每个等级内每个用户对每个产品的下单次数
user_level,
user_id,sku_id,sum(order_count) order_countfrom dws_sale_detail_daycountwhere dt<='2023-01-04'group by user_level, user_id, sku_id
)
insert overwrite table ads_ul_rep_ratio
select*
from(selectuser_level,sku_id,sum(if(order_count >=1, 1, 0)) buy_count,sum(if(order_count >=2, 1, 0)) buy_twice_count,sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) * 100  buy_twice_rate,row_number() over(partition by user_level order by sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) desc) rn,'2023-01-04'from tmp_countgroup by user_level, sku_id
) t1
where rn<=10

接下来是本地数仓项目数据可视化和任务调度,详见《本地数仓项目(三)—— 数据可视化和任务调度》

本地数仓项目(二)——搭建系统业务数仓详细流程相关推荐

  1. 金蝶EAS BOS合并报表取数公式(二次开发取数公式)在调整分录模板和抵消分录模板显示

    合并报表取数公式(二次开发取数公式)在不同的情形下想显示 标准产品自定义汇率取数公式是在这里不显示的 通过对标准产品的类进行扩展或者新增一个代码逻辑完全一样的类 //需要扩展的类 com.kingde ...

  2. 数据仓库项目(第五节)数仓理论、电商业务数仓介绍与创建

    目录 数仓理论 表的分类 实体表 维度表 事务型事实表 周期型事实表 同步策略 实体表同步策略 维度表同步策略 事务型事实表同步策略 周期型事实表同步策略 范式理论 范式概念 函数依赖 三范式区分 关 ...

  3. 浅谈数商云S2B2C商城系统业务一体化管理功能在医疗器械行业的应用

    近些年,随着医疗器械行业的快速发展,医疗器械市场规模也随之日益提升,根据公开数据统计,2021年我国医疗器械市场规模为9630亿元,同比增长31.18%,预计2022年将突破万亿大关.但与此同时,医疗 ...

  4. springboot+vue练手小项目[前台搭建+后台编写](非常详细)

    [ springboot+vue练手小项目 ] 技术栈: springboot+vue3+element-plus +Mybaties-plus+hutool +mysql8 项目介绍 :最近刚学了s ...

  5. Android 12系统源码_SystemUI(二)系统状态栏StatusBar的创建流程

    前言 上一篇我们具体分析了SystemUI的启动流程,在SystemServer的startOtherServices方法中,会启动SystemUIService服务,SystemUIService服 ...

  6. Windows11 + Ubuntu 18.04 双系统制作教程(详细流程无法联网问题解决)

    文章目录 一.安装前确认信息 二.双系统制作流程说明 step 1:下载 Ubuntu 镜像文件 step 2:制作 USB 启动盘 step 3:为 Ubuntu 新建硬盘分区 step 4:确认引 ...

  7. 双系统 移动硬盘安装Ubuntu详细流程与避坑(acpi error和卡在grub)

    引言: 1.首先要准备一个没用的U盘(要格式化)作为启动盘.(这个U盘相当于变成了一个安装光盘,系统并不装在这个U盘里面. 2.准备一个固态移动硬盘和USB3.0数据线(因为串口传输速度会下降,所以硬 ...

  8. 区块链项目如何包装?点击查看详细流程

    区块链项目该如何进行包装? 第一步:必备元素 ●核心元素:精美的白皮书.充满未来科技感的项目宣传片.牛逼的老外团队.花里胡哨的科技名词 人们都喜欢看起来高大上.具有逼格的产品,这一点在中国人身上表现的 ...

  9. Win7系统下装Linux操作系统详细流程(图文)

    经过大半天的摸索与实验终于在自己的电脑上成功的装上了Win7和Linux的双系统,现在我把详细的流程给大家分享了,希望有兴趣的可以去试试. 不要说什么百度一下一大片,百度上的流程都是相当相当的抽象,当 ...

最新文章

  1. html xhtml and css,HTML与XHTML的重要区别
  2. SWFUpload上传
  3. Vue.js——60分钟快速入门
  4. 不同路径Python解法
  5. 【渝粤题库】广东开放大学 网络营销基础与实践 形成性考核 (2)
  6. 【NOI2022】PV「什么是信息学精神?」
  7. python笔记06_进程vs线程
  8. Flutter进阶—布局方法演示
  9. Excel中实现跨表数据有效性
  10. 用python画环形图
  11. python for循环与函数
  12. ocx 访问 html,HTML 加载ocx VB编写的控件
  13. 用计算机算坐标距离,施工笔记 --坐标计算:(关于如何使用可编程计算器卡西欧5800计算坐标点之间的距离及方位角和坐标反算)...
  14. Meth | 新建git项目
  15. jzoj4210. 【五校联考1day1】我才不是萝莉控呢(哈夫曼树)
  16. P02014245薛宇涵信息论课外拓展
  17. C++手敲灰度图均值滤波中值滤波高斯滤波
  18. 硅谷性别歧视案女高管鲍康如败诉
  19. 手机怎么修改照片大小尺寸?这两种方法轻松解决
  20. 浅析LUM及相关实验

热门文章

  1. 基于intel i3/i5/i7+FGPA 视觉控制器 4个POE GigE
  2. CSV解析器,CSV解释器,新媒传信上机题,新媒传信面试,java解析csv
  3. boxplot用法 python,[Python画图笔记]利用Python画箱型图boxplot
  4. 蒸蒸日上的智能手机,国产手机却迎来寒冬,因为手机操作系统android太垃圾
  5. IDEA: XXX项目 is registered as a Git root, but no Git repositories were found there
  6. java 阿拉伯数字书写的金额,转换为中文形式 初步结果 不知有无bug
  7. 【云原生 | 24】Docker运行数据库实战之MySQL
  8. 祝愿大家新年快乐,祝愿祖国繁荣昌盛
  9. js获取当前URL、参数、端口、IP等信息
  10. Python爬取天气网历史天气数据