千亿级数仓项目第03天讲义

课程目标

  • 能够使用kettle生成日期维度数据

  • 能够完成订单时间维度分析

  • 能够完成订单地域、分类维度业务开发

  • 能够完成用户订单行为业务开发

指标开发-时间维度分析业务开发

需求分析

  • 统计2019年期间每个季度的销售订单笔数、订单总额
  • 统计2019年期间每个月的销售订单笔数、订单总额
  • 统计2019年期间每周一到周日的销售订单笔数、订单总额
  • 统计2019年期间国家法定节假日、休息日、工作日的订单笔数、订单总额

生成日期维度数据

经过需求分析我们发现指标统计需要的日期类型比较多如果在事实表的时间字段上做文章,比较复杂,甚至有些根本无法获取到比如节假日、休息日、工作日信息。所以我们可以考虑提前准备一张日期维度表,准备好指标统计的日期类型。

利用我们前面学习的kettle工具生成日期维度数据,需要借助于kettle的js脚本组件。

kettle实现思路:

  • 日期使用 Canlendar、SimpleDateFormat,编写Javascript脚本实现
  • 国家法定节假日实现,需要往http://timor.tech/api/holiday/info/2018-03-02年月日,发送请求,获取结果,以下为该REST API接口简介
1、接口地址:http://timor.tech/api/holiday/info/数字日期,支持https协议。
2、返回数据:正常工作日对应结果为 0,,休息日对应结果为 1, 法定节假日对应结果为 2
3、节假日数据说明:本接口包含2017年起的中国法定节假日数据,数据来源国务院发布的公告,每年更新1次,确保数据最新
4、示例:
http://timor.tech/api/holiday/info/2018-03-02
返回数据:
{"code":0,"type":{"type":0,"name":"周五","week":5},"holiday":null}{"code": 0,              // 0服务正常。-1服务出错"type": {"type": enum(0, 1, 2), // 节假日类型,分别表示 工作日、周末、节日。"name": "周六",         // 节假日类型中文名,可能值为 周一 至 周日、假期的名字、某某调休。"week": enum(1 - 7)    // 一周中的第几天。值为 1 - 7,分别表示 周一 至 周日。},"holiday": {"holiday": false,     // true表示是节假日,false表示是调休"name": "国庆前调休",  // 节假日的中文名。如果是调休,则是调休的中文名,例如'国庆前调休'"wage": 1,            // 薪资倍数,1表示是1倍工资"after": false,       // 只在调休下有该字段。true表示放完假后调休,false表示先调休再放假"target": '国庆节'     // 只在调休下有该字段。表示调休的节假日}
}
类型 示例值 中文名
date_key string 20000101 代理键
date_value string 2000-01-01 年-月-日
day_in_year string 1 当年的第几天
day_in_month string 1 当月的第几天
is_first_day_in_month string y 是否月的第一天
is_last_day_in_month string n 是否月的最后一天
weekday string 星期一 星期
week_in_month string 1 月的第几个星期
is_first_day_in_week string y、n 是否周一
is_dayoff string y、n 是否休息日
is_workday string y、n 是否工作日
is_holiday string y、n 是否国家法定节假日
date_type string workday、weekend、holiday
工作日、周末、法定节假日
日期类型
工作日:workday
国家法定节假日:holiday

休息日:weekend

month_number string 1、2、…、12 月份
year string 2000 年份
quarter_name string Q1 季度名称
quarter_number string 1 季度
year_quarter string 2000-Q1 年-季度
year_month_number string 2000-01 年-月份

参考数据:

dim_date.date_value dim_date.day_in_month dim_date.is_first_day_in_month dim_date.is_last_day_in_month dim_date.weekday dim_date.week_in_month dim_date.is_first_day_in_week dim_date.is_dayoff dim_date.is_workday dim_date.is_holiday dim_date.date_type dim_date.month_number dim_date.year dim_date.year_month_number dim_date.quarter_name dim_date.quarter_number dim_date.year_quarter
2019-1-1 1 y n 2 1 n n n y holiday 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-3 3 n n 4 1 n n y n workday 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-4 4 n n 5 1 n n y n workday 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-5 5 n n 6 1 n y n n weekend 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-2 2 n n 3 1 n n y n workday 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-6 6 n n 7 2 n y n n weekend 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-7 7 n n 1 2 y n y n workday 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-8 8 n n 2 2 n n y n workday 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-9 9 n n 3 2 n n y n workday 1 2019 Jan-19 Q1 1 2019-Q1
2019-1-10 10 n n 4 2 n n y n workday 1 2019 Jan-19 Q1 1 2019-Q1

Hive创建日期维度表

参考SQL语句:

DROP TABLE IF EXISTS `itcast_dw`.`dim_date`;
CREATE TABLE `itcast_dw`.`dim_date`(`date_key` string,`date_value` string,`day_in_year` string,`day_in_month` string,`is_first_day_in_month` string,`is_last_day_in_month` string,`weekday` string,`week_in_month` string,`is_first_day_in_week` string,`is_dayoff` string,`is_workday` string,`is_holiday` string,`date_type` string,`month_number` string,                        `year` string,`year_month_number` string,`quarter_name` string,`quarter_number` string,`year_quarter` string
)
stored as parquet TBLPROPERTIES('parquet.compression'='SNAPPY');

使用Kettle构建以下组件结构图

  • 生成记录
  • 增加序列
  • Javascript代码
  • Http Client组件(5个并行度)
  • JSON Input
  • Switch/case
  • 字段映射为常量
    • 工作日
    • 法定节假日
    • 节假日调休补班
    • 休息日
  • 字段选择
  • Parquet Output
kettle生成自定义日期维度数据组件图

组件配置

1、生成记录组件

生成记录组件-配置限制为 365
配置增加序列组件

3、Javascript组件,生成基本时间维度数据

 //初始日期var initStr = "2019-01-01";//转为日期对象var dateFormat = new java.text.SimpleDateFormat("yyyy-MM-dd");var initDate = dateFormat.parse(initStr);//获取日历对象var calendar = java.util.Calendar.getInstance();//设置日历的时间就是我们的初始时间calendar.setTime(initDate);//使用日历加减天,变换日期calendar.add(java.util.Calendar.DAY_OF_MONTH, setup);//获取增加一天之后的日期var newDate = calendar.getTime();var date_value = dateFormat.format(newDate);dateFormat.applyPattern("yyyyMMdd");var date_key = dateFormat.format(newDate);//一年当中第几天?var day_in_year = calendar.get(java.util.Calendar.DAY_OF_YEAR)+"";//一月当中第几天var day_in_month = calendar.get(java.util.Calendar.DAY_OF_MONTH)+"";//是否是月的第一天var is_first_day_in_month = "n";if (day_in_month.equals("1")) {is_first_day_in_month = "y";}//是否是月的最后一天var is_last_day_in_month = "n";calendar.add(java.util.Calendar.DAY_OF_MONTH, 1);var nextDay = calendar.get(java.util.Calendar.DAY_OF_MONTH) + "";if (nextDay.equals("1")) {is_last_day_in_month = "y";}//注意增加之后的日历对象需要再减回去calendar.add(java.util.Calendar.DAY_OF_MONTH, -1);//星期几var weekday= (calendar.get(java.util.Calendar.DAY_OF_WEEK)-1)+"";//星期日:1,星期六是7if(weekday.equals("0")){weekday="7";}//获取月中第几周,先减一保证时间正确(我们的规则)calendar.add(java.util.Calendar.DAY_OF_MONTH, -1);var week_in_month = calendar.get(java.util.Calendar.WEEK_OF_MONTH)+"";//再增加回去calendar.add(java.util.Calendar.DAY_OF_MONTH, 1);//是否是周一var is_first_day_in_week = "n";if (weekday.equals("1")) {is_first_day_in_week = "y";}var is_dayoff = "n";var is_workday = "n";var is_holiday = "n";var date_type = "workday";//定义查询的urlvar holidayUrl = "http://timor.tech/api/holiday/info/" + date_value;//月份dateFormat.applyPattern("MM");var month_number = dateFormat.format(newDate);dateFormat.applyPattern("yyyy");var year = dateFormat.format(newDate);var year_month_number = year + "-" + month_number;//季度var quarter_name = "";var quarter_number = "";var year_quarter = "";//按照我们的规则,否则默认是国外规则,星期日属于下个周switch (calendar.get(java.util.Calendar.MONTH) + 1) {case 1:case 2:case 3:quarter_name = "Q1";quarter_number = "1";year_quarter = year + "-" + quarter_name;break;case 4:case 5:case 6:quarter_name = "Q2";quarter_number = "2";year_quarter = year + "-" + quarter_name;break;case 7:case 8:case 9:quarter_name = "Q3";quarter_number = "3";year_quarter = year + "-" + quarter_name;break;case 10:case 11:case 12:quarter_name = "Q4";quarter_number = "4";year_quarter = year + "-" + quarter_name;break;}

4、配置HttpClient组件

配置HttpClient组件

5、配置JSON Input组件

配置JSON Input组件

6、配置 switch/case 组件

配置 switch/case 组件

7、配置工作日、法定节假日、休息日等组件

配置工作日、法定节假日、休息日等组件

8、配置字段选择组件

配置字段选择组件

9、配置parquet输出

配置parquet输出

验证数据

select * from itcast_dw.dim_date limit 10;

创建ads层数据表

drop table if exists itcast_ads.ads_order_date;
create table itcast_ads.ads_order_date(date_type string,   -- 时间维度类型date_val string,   -- 时间维度值order_count bigint, -- 订单笔数order_money double  -- 订单交易额
)
partitioned by (dt string)
row format delimited fields terminated by '\001' stored as TEXTFILE;

统计半年期间每个季度的销售订单笔数、订单总额

时间维度类型:

  • 1 → 季度
  • 2 → 月度
  • 3 → 星期
  • 4 → 日期类型(节假日、休息日、工作日)

统计2019年期间每个季度的销售订单笔数、订单总额

insert overwrite table itcast_ads.ads_order_date partition(dt='20190909')
select'1' as date_type,                                      -- 时间维度类型,1表示季度t1.year_quarter as date_val,                           -- 季度名称count(orderid),                            -- 订单总数sum(totalMoney)                         -- 订单总额
from itcast_dw.dim_date t1left join(select * from itcast_dw.fact_orders where dt >='20190101' and dw_end_date='9999-12-31') t2on t1.date_value = date_format(createtime, 'yyyy-MM-dd')
group by t1.year_quarter
order by t1.year_quarter;-- 测试
select * from itcast_ads.ads_order_date where dt='20190909' and date_type = '1';

统计2019年期间每个月的销售订单笔数、订单总额

insert into table itcast_ads.ads_order_date partition(dt='20190909')
select'2' as date_type, --2表示月类型t1.year_month_number as date_val,count(orderid),                  -- 订单总数sum(totalMoney)                         -- 订单总额
fromitcast_dw.dim_date t1left join(select * from itcast_dw.fact_orders where dt >='20190101' and dw_end_date='9999-12-31') t2on t1.date_value = date_format(createtime, 'yyyy-MM-dd')group by t1.year_month_number
order by t1.year_month_number;-- 测试
select * from itcast_ads.ads_order_date where dt='20190909' and date_type = '2';

统计2019年期间每周一到周日的销售订单笔数、订单总额

insert into table itcast_ads.ads_order_date partition(dt='20190909')
select'3' as date_type,t1.weekday as date_val,count(orderid),                  -- 订单总数sum(totalMoney)                         -- 订单总额
from itcast_dw.dim_date t1left join(select * from itcast_dw.fact_orders where dt >='20190101' and dw_end_date='9999-12-31') t2on t1.date_value = date_format(createtime, 'yyyy-MM-dd')
group byt1.weekday
order by t1.weekday;-- 测试
select * from itcast_ads.ads_order_date where dt='20190909' and date_type = '3';

统计2019年期间国家法定节假日、休息日、工作日的订单笔数、订单总额

insert into table itcast_ads.ads_order_date partition(dt='20190909')
select'4' as date_type,t1.date_type as date_val,count(orderid) as order_count,                  -- 订单总数sum(totalMoney) as order_money                        -- 订单总额
from itcast_dw.dim_date t1left join(select * from itcast_dw.fact_orders where dt >='20190101' and dw_end_date='9999-12-31') t2on t1.date_value = date_format(createtime, 'yyyy-MM-dd')
group byt1.date_type
order by t1.date_type;-- 测试
select * from itcast_ads.ads_order_date where dt='20190909' and date_type = '4';

指标开发-订单分析地域、分类维度分析业务开发

需求分析

集团总公司分为很多的分公司(销售事业部)

分公司
唐山市分公司
邯郸市分公司
邢台市分公司
保定市分公司
承德市分公司
沧州市分公司
廊坊市分公司
衡水市分公司

为了公司的经营需要,公司需要定期检查各个分公司的经营指标完成情况,运营部门提出数据分析需求:

  • 交易金额
  • 交易笔数
  • 微信交易笔数
  • 支付宝交易笔数

通过以上指标数据,公司的运营人员能够了解到各个分公司的经营情况,在出现经营问题时,能够及时发现,并根据问题,去寻找问题原因。

大区 城市 一级分类 二级分类 微信交易笔数 支付宝交易笔数 交易笔数 交易金额
0 9999 0 0 1001341 1238581 2239922 62348998

维度如下:

  • 商品分类
  • 行政区域

要求:最终可以根据不同大区、不同城市、不能分类级别查询交易数据,也就是要求支持不同维度的组合查询。

创建 dw 层数据表

该层主要创建维度表与事实表。为了便于识别,维度表增加 dim_ 前缀。事实表增加 fact_ 前缀:

fact_order_goods_wide

-- 1. 创建订单明细事实表
DROP TABLE IF EXISTS itcast_dw.fact_order_goods_wide;
create table if not exists itcast_dw.fact_order_goods_wide(order_id string,            -- 订单idgoods_cat_3d_id string,     -- 商品三级分类idshop_id string,             -- 店铺idpayment_id string,          -- 订单支付方式goods_num bigint,           -- 商品数量pay_money double,           -- 订单明细金额paytime string              -- 订单时间
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');

dim_goods_cat

-- 表创建
DROP TABLE IF EXISTS itcast_dw.dim_goods_cat;
create table if not exists itcast_dw.dim_goods_cat(cat_3d_id string,                   -- 三级商品分类idcat_3d_name string,                 -- 三级商品分类名称cat_2d_id string,                    -- 二级商品分类Idcat_2d_name string,                  -- 二级商品分类名称cat_1t_id string,                   -- 一级商品分类idcat_1t_name string                  -- 一级商品分类名称
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');

dim_shops

-- 创建dw层商铺维度表
DROP TABLE IF EXISTS itcast_dw.dim_shops;
-- 1. 店铺维度表
create table if not exists itcast_dw.dim_shops(shop_id string,             -- 店铺id,shop_name string,           -- 店铺名称city_id string,             -- 店铺所属城市组织机构idcity_name string,           -- 店铺所属城市组织机构名称region_id string,         -- 店铺所属省份组织机构idregion_name string        -- 店铺所属省份组织机构名称
)
partitioned by (dt string)
STORED AS PARQUET  TBLPROPERTIES('parquet.compression'='SNAPPY');

dim_payment

-- 1. 创建支付方式维度表
drop table if exists itcast_dw.dim_payment;
create table if not exists itcast_dw.dim_payment(payment_id string,          -- 支付方式idpayment_name string         -- 支付方式名称
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');

创建 ads 层数据表

-- 创建ads(数据集市层)订单分析表
DROP TABLE IF EXISTS itcast_ads.ads_trade_order;
create table if not exists itcast_ads.ads_trade_order(area_type string,               -- 区域范围:区域类型(全国、大区、城市)region_name string,             -- 区域名称city_name string,               -- 城市名称category_type string,           --分类类型(一级、二级)category_1st_name string,       -- 一级分类名称category_2d_name string,        -- 二级分类名称payment_name string,            -- 支付方式(所有、微信、支付宝、...)total_count bigint,             -- 订单数量total_goods_num bigint,         -- 商品数量total_money double              -- 支付金额
)
partitioned by (dt string)
row format delimited fields terminated by '\001' stored as TEXTFILE;

ods层数据至dw层

fact_orders

直接查询之前准备好的拉链表:

select * from itcast_dw.fact_orders where  dt ='20190909' and dw_end_date = '9999-12-31' limit 5;

dim_goods

直接查询之前准备好的拉链表:

-- 测试
select * from itcast_dw.dim_goods where dw_end_date = '9999-12-31' limit 5;

dw层数据维度拉宽

数据维度表拉宽

区域店铺维度数据拉宽

店铺表(itcast_shops)与组织机构表(itcast_org)进行关联

列名称 数据类型 数据来源
shop_id string itcast_shops
shop_name string itcast_shops
city_id string itcast_org
city_name string itcast_org
region_id string itcast_org
region_name string itcast_org

参考代码:

-- 2. 加载订单组织机构维度表数据
insert overwrite table itcast_dw.dim_shops partition(dt='20190909')
selectt1.shopid as shop_id,           -- 店铺idt1.shopname as shop_name,       -- 店铺名称t2.orgid as city_id,            -- 城市组织机构idt2.orgname as city_name,        -- 城市组织机构名称t3.orgid as region_id,          -- 区域组织机构idt3.orgname as region_name       -- 区域组织机构名称
from(select shopid, shopname, areaid from itcast_ods.itcast_shops where dt='20190909') t1                         -- 店铺数据left join(select orgid, parentid, orgname, orglevel from itcast_ods.itcast_org where orglevel=2 and dt='20190909') t2    -- 城市组织机构数据on t1.areaid = t2.orgidleft join(select orgid, parentid, orgname, orglevel from itcast_ods.itcast_org where orglevel=1 and dt='20190909') t3    -- 省份组织机构数据on t2.parentid = t3.orgid;-- 测试数据select * from itcast_dw.dim_org limit 15;
商品分类维度数据拉宽

商品表维度数据拉宽

参考代码:

列名称 数据类型 数据来源
cat_3d_id string itcast_goods_cats
cat_3d_name string itcast_goods_cats
cat_2d_id string itcast_goods_cats
cat_2d_name string itcast_goods_cats
cat_1t_id string itcast_goods_cats
cat_1t_name string itcast_goods_cats
insert overwrite table itcast_dw.dim_goods_cat partition(dt='20190909')
select t3.catid as cat_3d_id,          -- 三级分类idt3.catname as cat_3d_name,      -- 三级分类名称t2.catid as cat_2d_id,          -- 二级分类idt2.catname as cat_2d_name,      -- 二级分类名称t1.catid as cat_1t_id,          -- 一级分类idt1.catname as cat_1t_name       -- 一级分类名称
from (select catid, catname, parentid from itcast_ods.itcast_goods_cats where cat_level=3 and dt='20190909') t3 -- 商品三级分类数据left join(select catid, catname, parentid from itcast_ods.itcast_goods_cats where cat_level=2 and dt='20190909') t2 -- 商品二级分类数据on t3.parentid = t2.catidleft join(select catid, catname, parentid from itcast_ods.itcast_goods_cats where cat_level=1 and dt='20190909') t1 -- 商品一级分类数据on t2.parentid = t1.catid;-- 测试数据
select * from itcast_dw.dim_goods_cat limit 5;
支付方式维度表拉宽
列名称 数据类型 数据来源
payment_id string itcast_payments
payment_name string itcast_payments
-- 2. 加载支付方式维度数据
-- 需要额外添加一行数据 0 -> 其他
insert overwrite table itcast_dw.dim_payment partition(dt='20190909')
select t1.id as payment_id,        -- 支付方式idt1.payName as payment_name  -- 支付方式名称
from (select id, payName from itcast_ods.itcast_payments where dt='20190909') t1;-- 测试查询支付方式维度数据
select * from itcast_dw.dim_payment limit 5;
事实表拉宽
列名称 数据类型 数据来源
order_id string itcast_orders
goods_cat_3d_id string dim_goods
shop_id string dim_goods
payment_id string itcast_orders
goods_num bigint itcast_order_goods
pay_money double itcast_orders
paytime string itcast_orders
-- 2. 拉宽订单明细事实表
insert overwrite table itcast_dw.fact_order_goods_wide partition(dt='20190909')
selectt1.orderid as order_id,t3.goodscatid as goods_cat_3d_id,t3.shopid as shop_id,t1.paytype as payment_id,t2.goodsnum as goods_num,t2.goodsnum*t2.payprice  as pay_money,t1.paytime as paytime
from(select orderid, paytype, paytime from itcast_dw.fact_orders where dt='20190909') t1                     -- 订单表数据left join(select orderid, goodsid, goodsnum, payprice from itcast_ods.itcast_order_goods where dt='20190909') t2     -- 订单明细数on t1.orderid = t2.orderidleft join(select goodsid, shopid, goodscatid from itcast_dw.dim_goods where dw_end_date = '9999-12-31') t3           -- 商品数量on t2.goodsid = t3.goodsid;-- 测试数据
select * from itcast_dw.fact_order_goods_wide limit 5;

指标计算

-- 指标统计
-- 创建 ads 层结果表
create table if not exists itcast_ads.ads_trade_order(area_type string,               -- 区域范围:区域类型(全国、大区、城市)region_name string,             -- 区域名称city_name string,               -- 城市名称category_type string,           --分类类型(一级、二级)category_1st_name string,       -- 一级分类名称category_2d_name string,        -- 二级分类名称payment_name string,            -- 支付方式(所有、微信、支付宝、...)total_count bigint,             -- 订单数量total_goods_num bigint,         -- 商品数量total_money double              -- 支付金额
)
partitioned by (dt string)
row format delimited fields terminated by '\001' stored as TEXTFILE;

指标计算汇总计算

维度数据:

  • 区域维度数据
  • 商品分类维度数据。

根据需求,需要依据这两个维度进行汇总。最终在业务系统中展示如下:

数据展示示例

公司高管拥有查询所有数据的权限,省份不选择时,看到的为全国范围内的数据,如果指定省份(本次开发的系统为大区)时,看到的为指定省份数据,如果指定某个城市,则看到的是指定城市的数据,同理商品分类维度类似推算。

由于区域维度为分级维度,分为全国、大区、城市三级。通常情况下,我们会计算出城市维度,然后对城市维度数据进行汇总计算得出大区维度和全国维度数据,实际上,由于一个订单,由于会有多个商品,这些商品属于不同的商店,这些商店又属于不同的区域,故不能简单累加计算。

ogid orderid shopid cityid regionid goodsid dt
120242 532 100050 北京 华北 100111 20190905
120243 532 100058 石家庄 华北 100176 20190905
120244 532 100062 上海 华东 100105 20190905

根据此订单信息分析,orderid为532的订单,实际上仅为一个订单,该订单共计3件商品,分别属于不同的商店,这些商店分属北京、石家庄、上海城市。

在区域维度计算时。精确到城市维度数据计算应该如下:

大区 城市 订单量
华北 北京 1
华北 石家庄 1
华东 上海 1

精确到大区维度数据计算应该如下:

大区 订单量
华北 1
华东 1

精确到全国维度数据计算时结果数据应该如下:

区域 订单量
全国 1

通过分析,发现如果使用已经计算过的城市维度的数据汇总至大区时,华北大区的订单量会被计算为2。故不能简单累加。每个区域级别的数据需要单独计算,同理在商品分类维度中,也有同样的情况。

全国、无商品分类维度的交易信息

由于需求中需要列出支付宝支付笔数,微信交易笔数与总交易笔数,根据数据信息,支付共计分为支付宝交易、微信交易、现金交易、其他交易和未知支付类型(支付方式,0:未知;1:支付宝,2:微信;3、现金;4、其他)

故每个维度的数据计算时,除按照支付类型进行分组获取各种支付类型支付的数据外,还要获取不分任何类型的数据。

需求:

– 1、全国、无商品分类不分支付类型维度分析

– 2、 全国、无商品分类分支付类型维度分析

注意事项:

第一个INSERT导入数据到数据集市,使用overwrite,第二个则应该使用into

计算如下:

-- 1、全国、无商品分类不分支付类型维度分析
insert overwrite table itcast_ads.ads_trade_order partition(dt='20190909')
select'全国' as area_type,'' as region_name,'' as city_name,'' as category_type,'' as category_1st_name,'' as category_2d_name,'所有' as payment_name,count(distinct t1.order_id) as total_count,--订单笔数需要去重sum(t1.goods_num) as total_goods_num,--总的商品数量sum(t1.pay_money ) as total_money
from(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t1;-- 2、测试获取数据
select * from itcast_ads.ads_trade_order t where t.area_type = '全国';
-- 全国、无商品分类分支付类型维度分析
-- 注意:要使用 insert into,否则数据会被覆写
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'全国' as area_type,'' as region_name,'' as city_name,'' as category_type,'' as category_1st_name,'' as category_2d_name,t1.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from
(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2
left join(select * from itcast_dw.dim_payment where dt='20190909') t1   on t1.payment_id = t2.payment_id
group by t1.payment_name;-- 测试查询数据select * from itcast_ads.ads_trade_order t1 where t1.payment_name != '所有' limit 5;

全国、一级商品分类维度交易信息

需求:

1、获取全国、一级商品分类维度的不分支付类型数据

2、获取全国、一级商品分类维度的分支付类型数据

-- 获取全国、一级商品分类维度的分支付类型数据-- 3.1 获取全国、一级商品分类维度的不分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'全国' as area_type,'' as region_name,'' as city_name,'一级分类' as category_type,t1.cat_1t_name as category_1st_name,'' as category_2d_name,'所有' as payment_name,count(distinct t2.order_id) as total_count,sum(case when t2.goods_num is nullthen 0else t2.goods_numend) as total_goods_num,sum(case when t2.pay_money is nullthen 0else t2.pay_moneyend) as total_money
from(select * from itcast_dw.dim_goods_cat where dt='20190909') t1left join (select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.cat_3d_id = t2.goods_cat_3d_id
group by t1.cat_1t_name;
-- 测试
select * from itcast_ads.ads_trade_order where dt='20190909' and category_type = '一级分类' and payment_name = '所有';-- 3.2 获取全国、一级商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'全国' as area_type,'' as region_name,'' as city_name,'一级分类' as category_type,t3.cat_1t_name as category_1st_name,'' as category_2d_name,t1.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_payment where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.payment_id = t2.payment_idleft join (select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_id
group by t1.payment_name, t3.cat_1t_name;-- 测试查询数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '全国' and category_type = '一级分类' and payment_name != '所有';

全国、二级商品分类维度交易信息

需求:

1、获取全国、二级商品分类维度的分支付类型数据

2、获取全国、二级商品不分类维度的分支付类型数据

-- 全国、二级商品分类维度分析
-- 1. 获取全国、二级商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'全国' as area_type,'' as region_name,'' as city_name,'二级分类' as category_type,t3.cat_1t_name as category_1st_name,t3.cat_2d_name as category_2d_name,t1.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_payment where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.payment_id = t2.payment_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_id
group by t1.payment_name, t3.cat_1t_name, t3.cat_2d_name;-- 测试查询数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '全国' and category_type = '二级分类' limit 10;-- 2. 获取全国、二级商品不分类维度的不分支付类型数据
insert overwrite table itcast_ads.ads_trade_order partition(dt='20190909')
select'全国' as area_type,'' as region_name,'' as city_name,'二级分类' as category_type,t2.cat_1t_name as category_1st_name,t2.cat_2d_name as category_2d_name,'所有' as payment_name,count(distinct t1.order_id) as total_count,sum(t1.goods_num) as total_goods_num,sum(t1.pay_money) as total_money
from(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t1left join(select * from itcast_dw.dim_goods_cat where dt='20190909') t2on t1.goods_cat_3d_id = t2.cat_3d_id
group by t2.cat_1t_name, t2.cat_2d_name;-- 测试查询数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '全国' and category_type = '二级分类' and payment_name = '所有' limit 10;

大区、无商品分类维度的交易信息

需求:

1、获取大区、无商品分类维度的分支付类型数据

2、获取大区、无商品分类维度的不分支付类型数据

参考代码:

-- 大区、无商品分类维度的交易信息
-- 1、获取大区、无商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'大区' as area_type,t1.region_name as region_name,'' as city_name,'所有' as category_type,'' as category_1st_name,'' as category_2d_name,t3.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_payment where dt='20190909') t3on t2.payment_id = t3.payment_id
group by t1.region_name, t3.payment_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '大区' and category_type = '所有';-- 2、获取大区、不分商品不分类维度的不分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'大区' as area_type,t1.region_name as region_name,'' as city_name,'所有' as category_type,'' as category_1st_name,'' as category_2d_name,'所有' as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_id
group by t1.region_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '大区' and category_type = '所有' and payment_name = '所有';

大区、一级商品分类维度交易信息

1、获取大区、一级商品分类维度的分支付类型数据

2、获取大区、一级商品分类维度的不分支付类型数据

参考代码:

-- 3. 大区、一级商品分类维度交易信息
-- 3.1、获取大区、一级商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'大区' as area_type,t1.region_name as region_name,'' as city_name,'一级分类' as category_type,t3.cat_1t_name as category_1st_name,'' as category_2d_name,t4.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shop where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_idleft join(select * from itcast_dw.dim_payment where dt='20190909') t4on t2.payment_id = t4.payment_id
group by t1.region_name, t3.cat_1t_name, t4.payment_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '大区' and category_type = '一级分类' and payment_name != '所有';-- 3.2、获取大区、一级商品分类维度的不分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'大区' as area_type,t1.region_name as region_name,'' as city_name,'一级分类' as category_type,t3.cat_1t_name as category_1st_name,'' as category_2d_name,'所有' as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shop where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_id
group by t1.region_name, t3.cat_1t_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '大区' and category_type = '一级分类' and payment_name = '所有';

大区、二级商品分类维度交易信息

1、获取大区、二级商品分类维度的分支付类型数据

2、获取大区、二级商品分类维度的不分支付类型数据

参考代码:

-- 大区、二级商品分类维度交易信息
--- 1、获取大区、二级商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'大区' as area_type,t1.region_name as region_name,'' as city_name,'二级分类' as category_type,t3.cat_1t_name as category_1st_name,'' as category_2d_name,t4.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_idleft join(select * from itcast_dw.dim_payment where dt='20190909') t4on t2.payment_id = t4.payment_id
group by t1.region_name, t3.cat_1t_name, t3.cat_2d_name, t4.payment_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '大区' and category_type = '二级分类' limit 10;-- 2、获取大区、二级商品分类维度的不分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'大区' as area_type,t1.region_name as region_name,'' as city_name,'二级分类' as category_type,t3.cat_1t_name as category_1st_name,'' as category_2d_name,'所有' as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_id
group by t1.region_name, t3.cat_1t_name, t3.cat_2d_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '大区' and category_type = '二级分类' and payment_name='所有';

城市、无商品分类维度交易信息

1、 获取城市、无商品分类维度的分支付类型数据

2、获取城市、无商品分类维度的不分支付类型数据

参考代码:

-- #################
-- 城市、无商品分类维度交易信息
-- #################
-- 1. 获取城市、无商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'城市' as area_type,t1.region_name as region_name,t1.city_name as city_name,'所有' as category_type,'' as category_1st_name,'' as category_2d_name,t3.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_payment where dt='20190909') t3on t2.payment_id = t3.payment_id
group by t1.region_name, t1.city_name, t3.payment_name;-- 获取测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '城市' and category_type = '所有';-- 2、获取城市、无商品分类维度的不分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'城市' as area_type,t1.region_name as region_name,t1.city_name as city_name,'所有' as category_type,'' as category_1st_name,'' as category_2d_name,'所有' as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_id
group by t1.region_name, t1.city_name;-- 获取测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '城市' and category_type = '所有' and payment_name = '所有';

城市、一级商品分类维度交易信息

1、获取城市、一级商品分类维度的不分支付类型数据

2、获取城市、一级商品分类维度的分支付类型数据

参考代码:

-- #### 城市、一级商品分类维度交易信息
-- #################
-- 1、获取城市、一级商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'城市' as area_type,t1.region_name as region_name,t1.city_name as city_name,'一级分类' as category_type,t4.cat_1t_name as category_1st_name,'' as category_2d_name,t3.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_payment where dt='20190909') t3on t2.payment_id = t3.payment_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t4on t2.goods_cat_3d_id = t4.cat_3d_id
group by t1.region_name, t1.city_name, t3.payment_name, t4.cat_1t_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '城市' and category_type = '一级分类';-- 2、获取城市、一级商品分类维度的不分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'城市' as area_type,t1.region_name as region_name,t1.city_name as city_name,'一级分类' as category_type,t3.cat_1t_name as category_1st_name,'' as category_2d_name,'所有' as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_id
group by t1.region_name, t1.city_name, t3.cat_1t_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '城市' and category_type = '一级分类' and payment_name = '所有';

城市、二级商品分类维度交易信息

1、获取城市、二级商品分类维度的不分支付类型数据

2、获取城市、二级商品分类维度的分支付类型数据

参考代码:

-- #### 城市、二级商品分类维度交易信息
-- #################
-- 1、获取城市、二级商品分类维度的分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'城市' as area_type,t1.region_name as region_name,t1.city_name as city_name,'二级分类' as category_type,t3.cat_1t_name as category_1st_name,t3.cat_2d_name as category_2d_name,t4.payment_name as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_idleft join(select * from itcast_dw.dim_payment where dt='20190909') t4on t2.payment_id = t4.payment_id
group by t1.region_name, t1.city_name, t3.cat_1t_name, t3.cat_2d_name, t4.payment_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '城市' and category_type = '二级分类';-- 2、获取城市、二级商品分类维度的不分支付类型数据
insert into table itcast_ads.ads_trade_order partition(dt='20190909')
select'城市' as area_type,t1.region_name as region_name,t1.city_name as city_name,'二级分类' as category_type,t3.cat_1t_name as category_1st_name,t3.cat_2d_name as category_2d_name,'所有' as payment_name,count(distinct t2.order_id) as total_count,sum(t2.goods_num) as total_goods_num,sum(t2.pay_money) as total_money
from(select * from itcast_dw.dim_shops where dt='20190909') t1left join(select * from itcast_dw.fact_order_goods_wide where dt='20190909') t2on t1.shop_id = t2.shop_idleft join(select * from itcast_dw.dim_goods_cat where dt='20190909') t3on t2.goods_cat_3d_id = t3.cat_3d_id
group by t1.region_name, t1.city_name, t3.cat_1t_name, t3.cat_2d_name;-- 测试数据
select * from itcast_ads.ads_trade_order where dt='20190909' and area_type = '城市' and category_type = '二级分类' and payment_name = '所有';

指标开发-用户订单指标业务开发

需求分析

  • 电商平台往往需要根据用户的购买数据来分析用户的行为
  • 根据用户的消费行为习惯,给运营部门提供用户分析数据指标
  • 主要围绕订单表开展分析

基于用户的订单指标

统计指标 所需字段
第一次消费时间 支付时间
最近一次消费时间 支付时间
首单距今时间 支付时间
尾单距今时间 支付时间
最小消费金额 订单支付金额
最大消费金额 订单支付金额
累计消费次数(不含退拒) 订单id、订单状态
累计消费金额(不含退拒) 订单支付金额、订单状态
近30天购买次数(不含退拒) 是否为近30天、订单id、订单状态
近30天购买金额(不含退拒) 是否为近30天、订单支付金额、订单状态
近30天购买次数(含退拒) 是否为近30天、订单id
近30天购买金额(含退拒) 是否为近30天、订单支付金额
客单价(含退拒) 订单支付金额、客户订单数
客单价(不含退拒) 订单支付金额、客户订单数、订单状态
近60天客单价(含退拒) 是否为近60天、订单支付金额、客户订单数
近60天客单价(不含退拒) 是否为近60天、订单支付金额、订单状态
常用收货地址 用户收货地址
常用支付方式 支付方式
学校下单总数 用户收货地址
单位下单总数 用户收货地址
家里下单总数 用户收货地址
上午下单总数 订单id、是否为上午
下午下单总数 订单id、是否为下午
晚上下单总数 订单id、是否为晚上

订单状态

  • 交易未成功(拒收/reject)

    • 订单未送达、送达后未签收或签收后办理了退货
    • 订单状态显示「交易未成功」
  • 取消(退货/return)

    • 订单中的商品缺货、订单超出了订单保留期限,或订单进行了取消操作
    • 订单显示「取消」状态
  • 等待支付

    • 用户选择 「网上支付」和 「银行转账」这两种付款方式
    • 在暂未收到款项时,订单会显示「等待支付」
  • 等待预售商品到货

    • 订购的商品为预售商品
    • 商品到货前订单会显示 「等待预售商品到货」
  • 正在配货

    • 订单正在库房配货。
  • 等待移仓

    • 订单中的商品当地库房缺货
    • 将从北京仓库调货至当地仓库,订单显示「等待移仓」
  • 正在移仓

    • 订单中的商品正从北京仓库发往当地仓库,订单会显示「正在移仓」
  • 已配货

    • 此订单已完成配货,正在等待发货
  • 已发货

    • 订单已从库房发出,正在配送途中,订单显示「已发货」
  • 已送达

    1. 已收到商品并在“我的订单”中进行了「收货反馈」,订单会显示「已送达」
    2. 未进行「收货反馈」操作,系统在发货后的20天默认变为「已送达」
    3. 订单状态为「已发货」,但物流配送信息显示「配送成功」时,系统默认态显示为「已送达」
    4. 国内平邮订单,未进行「确认收货」操作,系统在发货25天后默认收到商品,订单状态显示为「已送达」
    5. 海外订单,如果未进行「确认收货」操作,系统会在发货60天后默认您收到商品,订单状态显示为「已送达」
  • 交易成功

    • 若订单状态为」已送达」,且此状态后的15天内未发生退货,系统将默认变为「交易成功」

创建dw层表

1、创建itcast_dw.dim_user表

字段 说明
userId 用户id
loginName 登录名
userSex 性别
drop table if exists itcast_dw.dim_user;
create table itcast_dw.dim_user(userId          bigint,loginName       string,userSex         bigint
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');

2、创建dw层用户收货地址表 itcast_dw.dim_user_address

--创建dw层dim_user_address表
drop table if exists itcast_dw.dim_user_address;
create table itcast_dw.dim_user_address(addressId    bigint,userId       bigint,userName     string,otherName    string,userPhone    string,areaIdPath   string,areaId       bigint,userAddress  string,isDefault    bigint,dataFlag     bigint,createTime   string
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');--从ods层itcast_user_address导出数据到dim_user_address表
insert overwrite table itcast_dw.dim_user_address partition(dt="20190909")
select
addressId,
userId,
userName,
otherName,
userPhone,
areaIdPath,
areaId,
userAddress,
isDefault,
dataFlag,
createTime
from itcast_ods.itcast_user_address where dt="20190909";

3、创建订单时间标志、地址标志宽表 fact_order_wide

字段名称 说明
flag30 近30天标志
flag60 近60天标志
flag90 近90天标志
flag180 近180天标志
flagTimeBucket 一天时间段标志(凌晨、早晨、上午…
othername 地址标志(家里、学校、工作单位…)
-- 地址拉宽
drop table itcast_dw.fact_order_wide;
create table itcast_dw.fact_order_wide(orderId bigint,orderStatus bigint,payType bigint,userId bigint,userAddressId bigint,payTime string,payMoney double,createtime string,flag30 bigint,flag60 bigint,flag90 bigint,flag180 bigint,flagTimeBucket string,othername string
)
partitioned by(dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');

订单宽表ETL处理

加载用户维度数据

参考代码:

insert overwrite table itcast_dw.dim_user partition(dt='20190909')
select userId,   loginName,userSex
fromitcast_ods.itcast_users where dt ='20190909' ;--验证
select * from itcast_dw.dim_user limit 10;

时间近30天、90天、180天、订单上午、下午时间拉宽

参考代码:

insert overwrite table itcast_dw.fact_order_wide partition(dt='20190909')
selectt1.orderId,t1.orderStatus,t1.payType,t1.userId,t1.userAddressId,t1.payTime,t1.totalMoney,t1.createtime,--近30天case when datediff(current_timestamp, t1.createtime) <= 30then 1else 0end as flag_30,--近60天case when datediff(current_timestamp, t1.createtime) <= 60then 1else 0end as flag_60,--近90天case when datediff(current_timestamp, t1.createtime) <= 90then 1else 0end as flag_90, --近180天case when datediff(current_timestamp, t1.createtime) <= 180then 1else 0end as flag_180,--所属时间段case when hour(t1.createtime) >= 0 and hour(t1.createtime) < 6then '凌晨'when hour(t1.createtime) >= 6 and hour(t1.createtime) < 12then '上午'when hour(t1.createtime) >= 12 and hour(t1.createtime) < 14then '中午'when hour(t1.createtime) >= 14 and hour(t1.createtime) < 18then '下午'else '晚上'end as flag_time_bucket,--家里单位学校t2.othername
from (select orderId,orderStatus,payType,userId,userAddressId,payTime,totalMoney,createtime from  itcast_dw.fact_orders
where dw_end_date ='9999-12-31') t1
left join(select * from itcast_dw.dim_user_address where dt='20190909') t2on t1.userAddressId = t2.addressId;-- 测试语句
select * from itcast_dw.fact_order_wide limit 5;

指标开发

指标开发 一

指标 字段说明
第一次消费时间 支付时间
最近一次消费时间 支付时间
首单距今时间 支付时间
尾单距今时间------分析用户什么时候来购买商品以及多久没有购买了 支付时间
最小消费金额 订单支付金额
最大消费金额 订单支付金额

参考代码:

selectt1.userid,t1.loginname,MIN(t2.payTime) as first_paytime, --首次下单时间MAX(t2.payTime) as lastest_paytime, --尾单时间DATEDIFF(CURRENT_TIMESTAMP, MIN(t2.payTime)) as first_day_during_days,--首单距今DATEDIFF(CURRENT_TIMESTAMP, MAX(t2.payTime)) as lastest_day_durning_days, --尾单距今MIN(t2.paymoney) as min_paymoney,MAX(t2.paymoney) as max_paymoney
from(select * from itcast_dw.fact_order_wide where dt='20190909') as t2left join(select * from itcast_dw.dim_user where dt='20190909') as t1on t1.userId = t2.userId
group by t1.userid,t1.loginname
limit 5;

指标开发二

指标
累计消费次数(不含退拒)
累计消费金额(不含退拒)
近30天购买次数(不含退拒)
近30天购买金额(不含退拒)
近30天购买次数(含退拒)
近30天购买金额(含退拒)----分析用户最近的消费能力

参考代码:

selectt2.userid,t2.loginname,MIN(t1.payTime) as first_paytime,MAX(t1.payTime) as lastest_paytime,DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,MIN(t1.paymoney) as min_paymoney,MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back, --累计消费金额不含退拒--累计近30天消费次数不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end) as total_count_without_back_30,--累计近30天消费金额不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back_30,--累计近30天消费次数含退拒sum(case when t1.flag30 =1 then 1else 0end) as total_count_without_30,--累计近30天消费金额含退拒sum(case when t1.flag30 =1 then t1.paymoneyelse 0end) as total_money_with_back_30from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
group by t2.userid,t2.loginname limit 5;

指标开发三

指标
客单价(含退拒)
客单价(不含退拒)
近60天客单价(含退拒)
近60天客单价(不含退拒)

参考代码:

-- 指标开发三
-- 1. 客单价(含退拒)
-- 2. 客单价(不含退拒)
-- 3. 近60天客单价(含退拒)-----分析用户消费水平
-- 4. 近60天客单价(不含退拒)selectt2.userid,t2.loginname,MIN(t1.payTime) as first_paytime,MAX(t1.payTime) as lastest_paytime,DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,MIN(t1.paymoney) as min_paymoney,MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back, --累计消费金额不含退拒--累计近30天消费次数不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end) as total_count_without_back_30,--累计近30天消费金额不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back_30,--累计近30天消费次数含退拒sum(case when t1.flag30 =1 then 1else 0end) as total_count_without_30,--累计近30天消费金额含退拒sum(case when t1.flag30 =1 then t1.paymoneyelse 0end) as total_money_with_back_30,--客单价含退拒SUM(t1.paymoney) / SUM(1) AS atv,--客单价不含退拒SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then t1.paymoney else 0 end) / SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then 1 else 0end) AS atv_withoutback,--近60天客单价含退拒SUM(case when t1.flag60 = 1 then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1then 1else 0end) AS atv_60,--近60天不含退拒SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 and t1.flag60 = 1then t1.paymoneyelse 0end) / SUM(case when (t1.orderStatus != 10 or t1.orderStatus != 11) and t1.flag60 = 1then 1else 0end) AS atv_60_withoutbackfrom
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
group by t2.userid,t2.loginname limit 5;

指标开发四

指标
学校下单总数
单位下单总数
家里下单总数
上午下单总数
下午下单总数
晚上下单总数

参考代码

selectt2.userid,t2.loginname,MIN(t1.payTime) as first_paytime,MAX(t1.payTime) as lastest_paytime,DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,MIN(t1.paymoney) as min_paymoney,MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back, --累计消费金额不含退拒--累计近30天消费次数不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end) as total_count_without_back_30,--累计近30天消费金额不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back_30,--累计近30天消费次数含退拒sum(case when t1.flag30 =1 then 1else 0end) as total_count_without_30,--累计近30天消费金额含退拒sum(case when t1.flag30 =1 then t1.paymoneyelse 0end) as total_money_with_back_30,SUM(t1.paymoney) / SUM(1) AS atv,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then t1.paymoney else 0 end) / SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then 1 else 0end) AS atv_withoutback,SUM(case when t1.flag60 = 1 then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1then 1else 0end) AS atv_60,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 and t1.flag60 = 1then t1.paymoneyelse 0end) / SUM(case when (t1.orderStatus != 10 or t1.orderStatus != 11) and t1.flag60 = 1then 1else 0end) AS atv_60_withoutback,SUM(case when t1.otherName = '学校'then 1else 0end) as school_order_count,      -- 学校下单总数SUM(case when t1.otherName = '单位'then 1else 0end) as company_order_count,    -- 单位下单总数SUM(case when t1.otherName = '家里'then 1else 0end) as home_order_count,        -- 家里下单总数SUM(case when t1.flagTimeBucket = '上午'then 1else 0end) as am_order_count,          -- 上午下单总数SUM(case when t1.flagTimeBucket = '下午'then 1else 0end) as pm_order_count,          -- 下午下单总数SUM(case when t1.flagTimeBucket = '晚上'then 1else 0end) as night_order_count-- 晚上下单总数
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
group by t2.userid,t2.loginname limit 5;

指标开发五

指标
常用收货地址:某个地址对应的订单数量越多则越常用

1、加载订单地址分析表

参考代码:

-- 创建订单地址分析表
drop table if exists itcast_ads.tmp_order_address;
create table itcast_ads.tmp_order_address(userId bigint,          -- 用户IdotherName string,       -- 地址类型(家里、学校...)totalCount bigint,     -- 下单数rn bigint       -- 下单排名
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');--从tmp_order_datetag_wide统计最常用地址
insert overwrite table itcast_ads.tmp_order_address partition(dt='20190909')
select
t3.userid,
t3.othername,
t3.ordercount,row_number() over( partition by t3.userid order by ordercount desc ) rn  --partiton by userid:按照用户分组,order by ordercount :按照订单数量排序 降序  ,rn:组内的序号
from
(selectt1.userId as userid,t1.othername as othername,count(t1.orderid) as ordercount  -->每个用户每个地址订单的数量
from(select * from itcast_dw.fact_order_wide where dt='20190909') t1group by t1.userid,t1.otherName order by t1.userid  ) t3  ;-- 测试
select * from itcast_ads.tmp_order_address order by userId, rn limit 10;

2、统计常用收货地址指标

参考代码:

selectt2.userid,t2.loginname,MIN(t1.payTime) as first_paytime,MAX(t1.payTime) as lastest_paytime,DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,MIN(t1.paymoney) as min_paymoney,MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back, --累计消费金额不含退拒--累计近30天消费次数不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end) as total_count_without_back_30,--累计近30天消费金额不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back_30,--累计近30天消费次数含退拒sum(case when t1.flag30 =1 then 1else 0end) as total_count_without_30,--累计近30天消费金额含退拒sum(case when t1.flag30 =1 then t1.paymoneyelse 0end) as total_money_with_back_30,SUM(t1.paymoney) / SUM(1) AS atv,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then t1.paymoney else 0 end) / SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then 1 else 0end) AS atv_withoutback,SUM(case when t1.flag60 = 1 then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1then 1else 0end) AS atv_60,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 and t1.flag60 = 1then t1.paymoneyelse 0end) / SUM(case when (t1.orderStatus != 10 or t1.orderStatus != 11) and t1.flag60 = 1then 1else 0end) AS atv_60_withoutback,SUM(case when t1.otherName = '学校'then 1else 0end) as school_order_count,      -- 学校下单总数SUM(case when t1.otherName = '单位'then 1else 0end) as company_order_count,    -- 单位下单总数SUM(case when t1.otherName = '家里'then 1else 0end) as home_order_count,        -- 家里下单总数SUM(case when t1.flagTimeBucket = '上午'then 1else 0end) as am_order_count,          -- 上午下单总数SUM(case when t1.flagTimeBucket = '下午'then 1else 0end) as pm_order_count,          -- 下午下单总数SUM(case when t1.flagTimeBucket = '晚上'then 1else 0end) as night_order_count,    -- 晚上下单总数--最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.useridleft join(select * from itcast_ads.tmp_order_address where dt='20190909') as t3on t1.userId = t3.userId
group by t2.userid,t2.loginname limit 5;

指标开发六

指标
常用支付方式:某种支付方式对应的订单数量越多则越常用

1、加载支付方式排名

-- 创建支付方式分析表
drop table if exists itcast_ads.tmp_order_paytype;
create table itcast_ads.tmp_order_paytype(userid bigint,              -- 用户idpayType bigint,           -- 支付类型idpayName string,             -- 支付码totalCount bigint,          -- 订单总数rn bigint                   -- 等级
)
partitioned by (dt string)
STORED AS PARQUET;-- 加载支付方式分析
insert overwrite table itcast_ads.tmp_order_paytype partition(dt='20190909')
selectt3.*,row_number() over(partition by userId order by totalCount desc) rn
from(selectt1.userId,t1.payType,t2.payment_name,sum(1) as totalCount   --sum(1)等同于count效果from(select * from itcast_dw.fact_order_wide where dt='20190909') t1left join (select * from itcast_dw.dim_payment where dt='20190909') t2on t1.payType = t2.payment_idgroup by t1.userId, t1.payType,t2.payment_name) t3;-- 测试
select * from itcast_ads.tmp_order_paytype limit 5;

2、统计常用支付方式指标

参考代码:

selectt2.userid,t2.loginname,MIN(t1.payTime) as first_paytime,MAX(t1.payTime) as lastest_paytime,DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,MIN(t1.paymoney) as min_paymoney,MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back, --累计消费金额不含退拒--累计近30天消费次数不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end) as total_count_without_back_30,--累计近30天消费金额不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back_30,--累计近30天消费次数含退拒sum(case when t1.flag30 =1 then 1else 0end) as total_count_without_30,--累计近30天消费金额含退拒sum(case when t1.flag30 =1 then t1.paymoneyelse 0end) as total_money_with_back_30,SUM(t1.paymoney) / SUM(1) AS atv,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then t1.paymoney else 0 end) / SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then 1 else 0end) AS atv_withoutback,SUM(case when t1.flag60 = 1 then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1then 1else 0end) AS atv_60,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 and t1.flag60 = 1then t1.paymoneyelse 0end) / SUM(case when (t1.orderStatus != 10 or t1.orderStatus != 11) and t1.flag60 = 1then 1else 0end) AS atv_60_withoutback,SUM(case when t1.otherName = '学校'then 1else 0end) as school_order_count,      -- 学校下单总数SUM(case when t1.otherName = '单位'then 1else 0end) as company_order_count,    -- 单位下单总数SUM(case when t1.otherName = '家里'then 1else 0end) as home_order_count,        -- 家里下单总数SUM(case when t1.flagTimeBucket = '上午'then 1else 0end) as am_order_count,          -- 上午下单总数SUM(case when t1.flagTimeBucket = '下午'then 1else 0end) as pm_order_count,          -- 下午下单总数SUM(case when t1.flagTimeBucket = '晚上'then 1else 0end) as night_order_count,-- 晚上下单总数--最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address,
--常用的支付方式MAX(case when t4.rn = 1then t4.payNameelse ''end) as most_usual_paytype
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.useridleft join(select * from itcast_ads.tmp_order_address where dt='20190909') as t3on t1.userId = t3.userIdleft join (select * from itcast_ads.tmp_order_paytype where dt='20190909') as t4on t1.userId = t4.userId
group by t2.userid,t2.loginname limit 5;

创建ads层表、加载数据

drop table if exists itcast_ads.user_order_measure;
create table itcast_ads.user_order_measure(userid string,                          -- 用户idusername string,                        -- 用户名称first_paytime string,                   -- 第一次消费时间lastest_paytime string,                 -- 最近一次消费时间first_day_during_days bigint,           -- 首单距今时间lastest_day_durning_days bigint,        -- 尾单距今时间min_paymoney double,                    -- 最小消费金额max_paymoney double,                    -- 最大消费金额total_count_without_back bigint,        -- 累计消费次数(不含退拒)total_money_without_back double,        -- 累计消费金额(不含退拒)total_count_without_back_30 bigint,     -- 近30天累计消费次数(不含退拒)total_money_without_back_30 double,     -- 近30天累计消费金额(不含退拒)total_count_30 bigint,                  -- 近30天累计消费次数(含退拒)total_money_30 double,                  -- 近30天累计消费金额(含退拒)atv double,                             -- 客单价(含退拒)atv_withoutback double,                 -- 客单价(不含退拒)atv_60 double,                          -- 近60天客单价(含退拒)atv_60_withoutback double,              -- 近60天客单价(不含退拒)school_order_count bigint,              -- 学校下单总数company_order_count bigint,             -- 单位下单总数home_order_count bigint,                -- 家里下单总数am_order_count bigint,                  -- 上午下单总数pm_order_count bigint,                  -- 下午下单总数night_order_count bigint,               -- 晚上下单总数most_usual_address string,              -- 常用收货地址most_usual_paytype string               -- 常用支付方式
)
partitioned by (dt string)
row format delimited fields terminated by '\001' stored as TEXTFILE;insert overwrite table itcast_ads.user_order_measure partition (dt='20190909')
selectt2.userid,t2.loginname,MIN(t1.payTime) as first_paytime,MAX(t1.payTime) as lastest_paytime,DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,MIN(t1.paymoney) as min_paymoney,MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back, --累计消费金额不含退拒--累计近30天消费次数不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then 1else 0end) as total_count_without_back_30,--累计近30天消费金额不含退拒sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11then t1.paymoneyelse 0end) as total_money_without_back_30,--累计近30天消费次数含退拒sum(case when t1.flag30 =1 then 1else 0end) as total_count_without_30,--累计近30天消费金额含退拒sum(case when t1.flag30 =1 then t1.paymoneyelse 0end) as total_money_with_back_30,SUM(t1.paymoney) / SUM(1) AS atv,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then t1.paymoney else 0 end) / SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 then 1 else 0end) AS atv_withoutback,SUM(case when t1.flag60 = 1 then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1then 1else 0end) AS atv_60,SUM(case when t1.orderStatus != 10 or t1.orderStatus != 11 and t1.flag60 = 1then t1.paymoneyelse 0end) / SUM(case when (t1.orderStatus != 10 or t1.orderStatus != 11) and t1.flag60 = 1then 1else 0end) AS atv_60_withoutback,SUM(case when t1.otherName = '学校'then 1else 0end) as school_order_count,      -- 学校下单总数SUM(case when t1.otherName = '单位'then 1else 0end) as company_order_count,    -- 单位下单总数SUM(case when t1.otherName = '家里'then 1else 0end) as home_order_count,        -- 家里下单总数SUM(case when t1.flagTimeBucket = '上午'then 1else 0end) as am_order_count,          -- 上午下单总数SUM(case when t1.flagTimeBucket = '下午'then 1else 0end) as pm_order_count,          -- 下午下单总数SUM(case when t1.flagTimeBucket = '晚上'then 1else 0end) as night_order_count,-- 晚上下单总数--最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address,
--常用的支付方式MAX(case when t4.rn = 1then t4.payNameelse ''end) as most_usual_paytype
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.useridleft join(select * from itcast_ads.tmp_order_address where dt='20190909') as t3on t1.userId = t3.userIdleft join (select * from itcast_ads.tmp_order_paytype where dt='20190909') as t4on t1.userId = t4.userId
group by t2.userid,t2.loginname ;-- 测试
select * from itcast_ads.user_order_measure limit 10;

大数据千亿级离线数仓项目第三天 维度数据分析与业务开发相关推荐

  1. 大数据千亿级离线数仓项目第一天 环境部署和etl

    千亿级数仓项目第01天讲义 课程目标 了解大数据离线数仓架构 了解项目部署环境(数据规模和集群规模) 掌握ETL工具 Kettle常用组件的使用 能够掌握kettle作业与转换区别以及linux部署 ...

  2. 大数据千亿级离线数仓项目第五天 指标统计/导出和工作流调度

    千亿级数仓第05天讲义 课程目标 掌握常见指标PV.UV的开发统计 掌握流量常见指标分类 掌握指标数据导出 理解工作流调度流程 模块开发–数据入库 创建ODS层数据表 原始日志数据表 drop tab ...

  3. 电商离线数仓项目实战(下)

    电商离线数仓项目实战(下) 电商分析--核心交易 文章目录 电商离线数仓项目实战(下) 电商分析--核心交易 一.业务需求 二.业务数据库表结构 1. 数据库表之间的联系 img 2. 业务数据库-- ...

  4. 大数据电商离线数仓项目-上篇

    下一篇:电商数仓项目-下篇 文章目录 第1章 数仓分层 1.1 为什么要分层 1.2 数据集市与数据仓库概念 1.3 数仓命名规范 1.3.1 表命名 1.3.2 脚本命名 1.3.3 表字段类型 第 ...

  5. 大数据演进简史:从数仓到数据中台,谈技术选型最优解

    大家好,今天分享的议题主要包括几大内容: 带大家回顾一下大数据在国内的发展,从传统数仓到当前数据中台的演进过程: 我个人认为数据中台的核心组成,以及一些技术选型参考: 数据研发是数据中台很重要的一环, ...

  6. 大数据电商离线数仓项目-下篇

    上一篇:电商数仓系统1 文章目录 第5章 数仓搭建-DWS层 5.1 业务术语 5.2 系统函数 5.2.1 nvl函数 5.2.2 日期处理函数 5.2.3 复杂数据类型定义 5.3 DWS层 5. ...

  7. 大数据之离线数仓项目搭建(一)

    数据仓库搭建 文章目录 数据仓库搭建 1.开启hadoop的权限验证 2.在hive中创建数据库 3.在hdfs中创建5个目录 4.在linux中创建5个用户 5.将目录权限赋值给不同用户 6.修改h ...

  8. PB级企业电商离线数仓项目实战

    项目背景:人类从IT走向DT时代.人们往往可以收集到更多的数据.到2020,全球数据总量将超过40ZB,是2011年的22倍.正在呈爆炸式增长. 数据仓库的四大特征:面向主题.集成的.稳定的.反映历史 ...

  9. 电商离线数仓项目-埋点数据/事件日志的基本格式详细理解

    数据格式详解 埋点数据 事件数据 商品点击 商品详情页 商品列表页 广告 消息通知 用户前台活跃 用户后台活跃 评论 收藏 点赞 错误日志数据 启动日志数据 埋点数据 数据埋点产生的Json格式的数据 ...

  10. 【项目】数仓项目(三)

    [1]Linux环境搭建 Linux配置请看这篇博客

最新文章

  1. 迁移学习:领域自适应的理论分析
  2. vs2005不能找到ASP.NET网站之郁闷
  3. glReadPixels的用法和说明
  4. python 类方法调用一次自增1_Python+selenium自动化脚本如何使数字每次执行自增1
  5. 微信基于时间序的海量存储扩展性与多机容灾能力提升
  6. http:(4):http请求方法
  7. 百度SEO站群自采集壁纸源码小韩美化版
  8. 浅谈Android binder原理与设计 (1)
  9. linux 下安装adobe flash的关键。
  10. ipoo3可以用鸿蒙,vivo新机入网!支持44W快充 配置与iQOO Neo3类似
  11. 【汉字识别】基于matlab SVM汉字识别【含Matlab源码 830期】
  12. 微信语音怎么转发给好友
  13. 【Android自动化测试】Robot Framework+adb框架(三)——L3公共层
  14. 面对妖艳的配置文件,python小技巧来帮你!
  15. 完美世界手游服务器维护中,完美世界手游黑屏闪退解决办法 玩不了怎么办
  16. RCLAMP0544T 国产替代上海雷卯ULC0544T
  17. 如何使用eclipse开发android
  18. UPS的红单和蓝单的区别
  19. Gradle安装部署与基础入门详解
  20. 【亲测可用】经验分享,开发在线考试平台(类似问卷星、考试星、阿里巴巴在线考试、易考通、考试云、移动网大、移动网校、中石化网络学院、宝武微学院、iTEST等)如何用程序代码防止学生作弊?

热门文章

  1. 微信公众号文章采集方案
  2. 微信小程序(3)——引入百度api天气信息全过程
  3. 浅析MultipartResolver
  4. Tixati——BT下载软件
  5. Axure RP9 制作平台界面
  6. 虚拟机主机服务器出现SSH已启用,ESXi去掉 SSH已经启用的警告信息
  7. flash 不显示 旋转 补间动画_【图片】Flash入门5:详解制作补间动画(非传统补间)【flash软件吧】_百度贴吧...
  8. 阿里、百度、搜狐等公司社招面试记录与总结
  9. 任务管理器服务中没有mysql_已安装的Mysql在任务管理器中找不到的解决方法
  10. 洛谷 Floating point exception: 8 Floating-point exception. 报错