一、分析

针对ods层表中的数据进行清洗,参考数据清洗规则,按照实际情况对数据进行清洗。
由于数据库中的数据都是比较规整的,其实可以直接迁移到dwd层,不过为了以防万一,还是对ods层的数据进行过滤,主要过滤表中的id字段为null的数据,在关系型数据库中表中的id字段都是主键,肯定是不为Null的,我们在这里进行判断主要是为了避免数据在采集过程中出现问题。

二、构建dwd层

1、dwd_user

(1)源表

ods_user

(2)建表语句

create external table if not exists dwd_mall.dwd_user(user_id              bigint,user_name            string,user_gender          tinyint,user_birthday        string,e_mail               string,mobile               string,register_time        string,is_blacklist         tinyint
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_user partition(dt='20220309')  select user_id,user_name,user_gender,user_birthday,e_mail,mobile,register_time,is_blacklist
from ods_mall.ods_user
where dt = '20220309' and user_id is not null;

2、dwd_user_extend

(1)源表

ods_user_extend

(2)建表语句

create external table if not exists dwd_mall.dwd_user_extend(user_id              bigint,is_pregnant_woman    tinyint,is_have_children     tinyint,is_have_car          tinyint,phone_brand          string,phone_cnt            int,change_phone_cnt     int,weight               int,height               int
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user_extend/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_user_extend partition(dt='20220309')  select user_id,is_pregnant_woman,is_have_children,is_have_car,phone_brand,phone_cnt,change_phone_cnt,weight,height
from ods_mall.ods_user_extend
where dt = '20220309' and user_id is not null;

3、dwd_user_addr

(1)源表

ods_user_addr

(2)建表语句

create external table if not exists dwd_mall.dwd_user_addr(addr_id              bigint,user_id              bigint,addr_name            string,order_flag           tinyint,user_name            string,mobile               string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user_addr/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_user_addr partition(dt='20220309')  select addr_id,user_id,addr_name,order_flag,user_name,mobile
from ods_mall.ods_user_addr
where dt = '20220309' and addr_id is not null;

4、dwd_goods_info

(1)源表

ods_goods_info

(2)建表语句

create external table if not exists dwd_mall.dwd_goods_info(goods_id             bigint,goods_no             string,goods_name           string,curr_price           double,third_category_id    int,goods_desc           string,create_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/goods_info/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_goods_info partition(dt='20220309')  select goods_id,goods_no,goods_name,curr_price,third_category_id,goods_desc,create_time
from ods_mall.ods_goods_info
where dt = '20220309' and goods_id is not null;

5、dwd_category_code

(1)源表

ods_category_code

(2)建表语句

create external table if not exists dwd_mall.dwd_category_code(first_category_id    int,first_category_name  string,second_category_id   int,second_catery_name   string,third_category_id    int,third_category_name  string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/category_code/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_category_code partition(dt='20220309')  select first_category_id,first_category_name,second_category_id,second_catery_name,third_category_id,third_category_name
from ods_mall.ods_category_code
where dt = '20220309' and first_category_id is not null;

6、dwd_user_order

(1)源表

ods_user_order

(2)建表语句

create external table if not exists dwd_mall.dwd_user_order(order_id             bigint,order_date           string,user_id              bigint,order_money          double,order_type           int,order_status         int,pay_id               bigint,update_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user_order/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_user_order partition(dt='20220309')  select order_id,order_date,user_id,order_money,order_type,order_status,pay_id,update_time
from ods_mall.ods_user_order
where dt = '20220309' and order_id is not null;

7、dwd_order_item

(1)源表

ods_order_item

(2)建表语句

create external table if not exists dwd_mall.dwd_order_item(order_id             bigint,goods_id             bigint,goods_amount         int,curr_price           double,create_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/order_item/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_order_item partition(dt='20220309')  select order_id,goods_id,goods_amount,curr_price,create_time
from ods_mall.ods_order_item
where dt = '20220309' and order_id is not null;

8、dwd_order_delivery

(1)源表

ods_order_delivery

(2)建表语句

create external table if not exists dwd_mall.dwd_order_delivery(order_id             bigint,addr_id              bigint,user_id              bigint,carriage_money       double,create_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/order_delivery/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_order_delivery partition(dt='20220309')  select order_id,addr_id,user_id,carriage_money,create_time
from ods_mall.ods_order_delivery
where dt = '20220309' and order_id is not null;

9、dwd_payment_flow

(1)源表

ods_payment_flow

(2)建表语句

create external table if not exists dwd_mall.dwd_payment_flow(pay_id               bigint,order_id             bigint,trade_no             bigint,pay_money            double,pay_type             int,pay_time             string
)partitioned by(dt string) row format delimited   fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/payment_flow/';

(3) 映射关系

insert overwrite table dwd_mall.dwd_payment_flow partition(dt='20220309')  select pay_id,order_id,trade_no,pay_money,pay_type,pay_time
from ods_mall.ods_payment_flow
where dt = '20220309' and order_id is not null;

三、抽取脚本

1、初始化表的脚本(执行一次)

dwd_mall_init_table.sh

内容如下:

#!/bin/bash
# dwd层数据库和表初始化脚本,只需要执行一次即可hive -e "
create database if not exists dwd_mall;create external table if not exists dwd_mall.dwd_user(user_id              bigint,user_name            string,user_gender          tinyint,user_birthday        string,e_mail               string,mobile               string,register_time        string,is_blacklist         tinyint
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user/';create external table if not exists dwd_mall.dwd_user_extend(user_id              bigint,is_pregnant_woman    tinyint,is_have_children     tinyint,is_have_car          tinyint,phone_brand          string,phone_cnt            int,change_phone_cnt     int,weight               int,height               int
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user_extend/';create external table if not exists dwd_mall.dwd_user_addr(addr_id              bigint,user_id              bigint,addr_name            string,order_flag           tinyint,user_name            string,mobile               string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user_addr/';create external table if not exists dwd_mall.dwd_goods_info(goods_id             bigint,goods_no             string,goods_name           string,curr_price           double,third_category_id    int,goods_desc           string,create_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/goods_info/';create external table if not exists dwd_mall.dwd_category_code(first_category_id    int,first_category_name  string,second_category_id   int,second_catery_name   string,third_category_id    int,third_category_name  string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/category_code/';create external table if not exists dwd_mall.dwd_user_order(order_id             bigint,order_date           string,user_id              bigint,order_money          double,order_type           int,order_status         int,pay_id               bigint,update_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/user_order/';create external table if not exists dwd_mall.dwd_order_item(order_id             bigint,goods_id             bigint,goods_amount         int,curr_price           double,create_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/order_item/';create external table if not exists dwd_mall.dwd_order_delivery(order_id             bigint,addr_id              bigint,user_id              bigint,carriage_money       double,create_time          string
)partitioned by(dt string) row format delimited  fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/order_delivery/';create external table if not exists dwd_mall.dwd_payment_flow(pay_id               bigint,order_id             bigint,trade_no             bigint,pay_money            double,pay_type             int,pay_time             string
)partitioned by(dt string) row format delimited   fields terminated by '\t'location 'hdfs://bigdata01:9000/data/dwd/payment_flow/';"

2、添加数据分区脚本(每天执行一次)

dwd_mall_add_partition.sh
内容如下:

#!/bin/bash
# 基于ods层的表进行清洗,将清洗之后的数据添加到dwd层对应表的对应分区中
# 每天凌晨执行一次# 默认获取昨天的日期,也支持传参指定一个日期
if [ "z$1" = "z" ]
then
dt=`date +%Y%m%d --date="1 days ago"`
else
dt=$1
fihive -e "insert overwrite table dwd_mall.dwd_user partition(dt='${dt}')  select user_id,user_name,user_gender,user_birthday,e_mail,mobile,register_time,is_blacklist
from ods_mall.ods_user
where dt = '${dt}' and user_id is not null;insert overwrite table dwd_mall.dwd_user_extend partition(dt='${dt}')  select user_id,is_pregnant_woman,is_have_children,is_have_car,phone_brand,phone_cnt,change_phone_cnt,weight,height
from ods_mall.ods_user_extend
where dt = '${dt}' and user_id is not null;insert overwrite table dwd_mall.dwd_user_addr partition(dt='${dt}')  select addr_id,user_id,addr_name,order_flag,user_name,mobile
from ods_mall.ods_user_addr
where dt = '${dt}' and addr_id is not null;insert overwrite table dwd_mall.dwd_goods_info partition(dt='${dt}')  select goods_id,goods_no,goods_name,curr_price,third_category_id,goods_desc,create_time
from ods_mall.ods_goods_info
where dt = '${dt}' and goods_id is not null;insert overwrite table dwd_mall.dwd_category_code partition(dt='${dt}')  select first_category_id,first_category_name,second_category_id,second_catery_name,third_category_id,third_category_name
from ods_mall.ods_category_code
where dt = '${dt}' and first_category_id is not null;insert overwrite table dwd_mall.dwd_user_order partition(dt='${dt}')  select order_id,order_date,user_id,order_money,order_type,order_status,pay_id,update_time
from ods_mall.ods_user_order
where dt = '${dt}' and order_id is not null;insert overwrite table dwd_mall.dwd_order_item partition(dt='${dt}')  select order_id,goods_id,goods_amount,curr_price,create_time
from ods_mall.ods_order_item
where dt = '${dt}' and order_id is not null;insert overwrite table dwd_mall.dwd_order_delivery partition(dt='${dt}')  select order_id,addr_id,user_id,carriage_money,create_time
from ods_mall.ods_order_delivery
where dt = '${dt}' and order_id is not null;insert overwrite table dwd_mall.dwd_payment_flow partition(dt='${dt}')  select pay_id,order_id,trade_no,pay_money,pay_type,pay_time
from ods_mall.ods_payment_flow
where dt = '${dt}' and order_id is not null;
"

四、执行脚本

1、先执行初始化脚本

sh dwd_mall_init_table.sh

2、再执行添加分区脚本

sh dwd_mall_add_partition.sh 20220309

这个要等一会,大概10分钟左右,如下就好了:

五、验证

连接到hive,随便查一张表,检查是否有数据。

select * from dwd_mall.dwd_user limit 1;

数据仓库之【商品订单数据数仓】02:【dwd层】相关推荐

  1. 数据仓库之【商品订单数据数仓】05:需求2:电商GMV

    一.电商GMV分析 GMV:Gross Merchandise Volume,是指一定时间内的成交总金额. GMV 多用于电商行业,这个实际指的是拍下的订单总金额,包含付款和未付款的部分. 我们在统计 ...

  2. 数仓学习笔记(5)——数仓搭建(DWD层)

    目录 一.数仓搭建--DWD层 1.DWD层(用户行为日志) 1.1 日志解析思路 1.2 get_json_object函数使用 1.3 启动日志表 1.4 页面日志表 1.5 动作日志表 1.6 ...

  3. 数仓搭建——DWD层

    1 DWD层(用户行为日志) 1.1 日志解析思路 页面埋点日志 启动日志 思路 1.2 get_json_object函数使用 数据 [{"name":"大郎" ...

  4. 离线数仓 (十三) --------- DWD 层搭建

    目录 前言 一.DWD 层 (用户行为日志) 1. 日志解析思路 2. get_json_object 函数使用 3. 启动日志表 4. 页面日志表 5. 动作日志表 6. 曝光日志表 7. 错误日志 ...

  5. 离线数仓搭建_11_DWD层用户行为日志创建

    文章目录 13.0 数仓搭建-DWD层 13.1 DWD层(用户行为日志) 13.1.1 日志解析思路 13.1.2 get_json_object函数使用 13.1.3 启动日志表 13.1.4 页 ...

  6. Python + 大数据 - 数仓实战之智能电商分析平台

    Python + 大数据 - 数仓实战之智能电商分析平台 1. 项目架构 2. 数据仓库维度模型设计-事实表 事实表的特征:表里没有存放实际的内容,他是一堆主键的集合,这些ID分别能对应到维度表中的一 ...

  7. 大数据/数仓面试灵魂30问

    1.什么是数据仓库?如何构建数据仓库?(如果这个问题回答的好,后面很多问题都不需要再问) 2.如何建设数据中台?可简单说下理解与思路 3.数据仓库.数据中台.数据湖的理解 4.传统数仓的程度(建模工具 ...

  8. 大数据/数仓面试灵魂30问(转)

    1.什么是数据仓库?如何构建数据仓库?(如果这个问题回答的好,后面很多问题都不需要再问) 2.如何建设数据中台?可简单说下理解与思路 3.数据仓库.数据中台.数据湖的理解 4.传统数仓的程度(建模工具 ...

  9. Python+大数据-数仓实战之滴滴出行(二)

    Python+大数据-数仓实战之滴滴出行(二) 1. 数据转移 #验证sqoop是否工作 /export/server/sqoop-1.4.7/bin/sqoop list-databases \ - ...

最新文章

  1. 腾讯年终奖刷屏了...
  2. 鸿蒙兼容安卓app 为什么还要生态,就因为鸿蒙兼容安卓APP,中兴就宣布弃用?...
  3. webdriver+python 对三大浏览器的支持
  4. 常见计算机英语,常见计算机英语词汇
  5. VUE技术栈学习笔记(https://segmentfault.com/a/1190000012530187)
  6. Hyper-V复制功能
  7. 不用任何插件实现 WordPress 的彩色标签云
  8. centos-install-kong-cassandra
  9. Silverlight实用窍门系列:63.Silverlight中的Command,自定义简单Command
  10. 构建高并发高可用的电商平台架构实践 转载
  11. Spring Cloud 微服务实战系列-Eureka注册中心(二)
  12. centos directory server
  13. css 单位之px , em , rem
  14. car-like robot运动机构简析
  15. 五分钟使用WebStack构建个人网址导航
  16. oracle 定时任务 每天执行,Oracle定时任务(定时执行某个SQL语句)
  17. 极寒天气肆虐美国中西部地区
  18. vue使用datav+echarts
  19. MATLAB在图像上标记特定点
  20. AD20和立创EDA设计(4)PCB设计

热门文章

  1. [js] 实现多张图片合成一张的效果
  2. layui日期时间选择器
  3. 计算机985大学高考分数,高考志愿“捡漏王”。460分被985大学录取,考生填报志愿要注意...
  4. 一种微型步进电机驱动控制器
  5. leetcode 5687. 执行乘法运算的最大分数
  6. 【引流必备技术】斗音直播间弹幕监控脚本,精准采集快速截流【永久脚本+软件使用视频教程】
  7. 1分钟集成物流查询 -- 快递100 -- php -- laravel
  8. flutter_hybird_webview 跨进程渲染的实践技术分享
  9. Qt实现Excel表格的读写操作(office,WPS)
  10. QT windows 应用程序 exe ,设置详细信息并解决中文乱码问题