点击上方蓝字关注我们

一、数仓搭建 - DWS 层

1.1 业务术语

1)用户

用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android

系统根据 IMEI 号,IOS 系统根据 OpenUDID 来标识一个独立用户,每部手机一个用户

2)新增用户

首次联网使用应用的用户。如果一个用户首次打开某 APP,那这个用户定义为新增用

户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月

新增用户

3)活跃用户

打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计

为一个活跃用户

4)周(月)活跃用户

某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户

5)月活跃率

月活跃用户与截止到该月累计的用户总和之间的比例

6)沉默用户

用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用

户质量和用户与 APP 的匹配程度

7)版本分布

不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断 APP 各个版本之

间的优劣和用户行为习惯

8)本周回流用户

上周未启动过应用,本周启动了应用的用户

9)连续 n 周活跃用户

连续 n 周,每周至少启动一次

10)忠诚用户

连续活跃 5 周以上的用户

11)连续活跃用户

连续 2 周及以上活跃的用户

12)近期流失用户

连续 n(2<= n <= 4)周没有启动应用的用户。(第 n+1 周没有启动过)

13)留存用户

某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分

用户占当时新增用户的比例即是留存率

例如,5 月份新增用户 200,这 200 人在 6 月份启动过应用的有 100 人,7 月份启动过应用的有 80 人,8 月份启动过应用的有 50 人;则 5 月份新增用户一个月后的留存率是 50%,二个月后的留存率是 40%,三个月后的留存率是 25%

14)用户新鲜度

每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例

15)单次使用时长

每次启动使用的时间长度

16)日使用时长

累计一天内的使用时间长度

17)启动次数计算标准

IOS 平台应用退到后台就算一次独立的启动;Android 平台我们规定,两次启动之间的间隔小于 30 秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30 秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用 30 秒这个标准,但用户还是可以自定义此时间间隔

1.2 系统函数

1.2.1 collect_set 函数

1)创建原数据表

drop table if exists stud;

create table stud (name string, area string, course string, score int);

2)向原数据表中插入数据

insert into table stud values('zhang3','bj','math',88);

insert into table stud values('li4','bj','math',99);

insert into table stud values('wang5','sh','chinese',92);

insert into table stud values('zhao6','sh','chinese',54);

insert into table stud values('tian7','bj','chinese',91);

3)查询表中数据

select * from stud;

stud.name stud.area stud.course stud.score

zhang3 bj math 88

li4 bj math 99

wang5 sh chinese 92

zhao6 sh chinese 54

tian7 bj chinese 91

4)把同一分组的不同行的数据聚合成一个集合

select course, collect_set(area), avg(score) from stud group by course;

chinese ["sh","bj"] 79.0

math ["bj"] 93.5

5) 用下标可以取某一个

select course, collect_set(area)[0], avg(score) from

stud group by course;

chinese sh 79.0

math bj 93.5

1.2.2 nvl 函数

1)基本语法

NVL(表达式 1,表达式 2)

如果表达式 1 为空值,NVL 返回值为表达式 2 的值,否则返回表达式 1 的值。 该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型

1.2.3 日期处理函数

1)date_format 函数(根据格式整理日期)

hive (gmall)> select date_format('2020-03-10','yyyy-MM');

2020-03

2)date_add 函数(加减日期)

hive (gmall)> select date_add('2020-03-10',-1);

2020-03-09

hive (gmall)> select date_add('2020-03-10',1);

2020-03-11

3)next_day 函数

(1)取当前天的下一个周一

hive (gmall)> select next_day('2020-03-12','MO');

2020-03-16

说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)

(2)取当前周的周一

hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);

2020-03-11

4)last_day 函数(求当月最后一天日期)

hive (gmall)> select last_day('2020-03-10');

2020-03-31

1.3 DWS 层(用户行为)

1.3.1 每日设备行为

每日设备行为,主要按照 设备 id 统计

1)建表语句

drop table if exists dws_uv_detail_daycount;

create external table dws_uv_detail_daycount

(

`mid_id` string COMMENT '设备唯一标识',

`user_id` string COMMENT '用户标识',

`version_code` string COMMENT '程序版本号',

`version_name` string COMMENT '程序版本名',

`lang` string COMMENT '系统语言',

`source` string COMMENT '渠道号',

`os` string COMMENT '安卓系统版本',

`area` string COMMENT '区域',

`model` string COMMENT '手机型号',

`brand` string COMMENT '手机品牌',

`sdk_version` string COMMENT 'sdkVersion',

`gmail` string COMMENT 'gmail',

`height_width` string COMMENT '屏幕宽高',

`app_time` string COMMENT '客户端日志产生时的时间',

`network` string COMMENT '网络模式',

`lng` string COMMENT '经度',

`lat` string COMMENT '纬度',

`login_count` bigint COMMENT '活跃次数'

)

partitioned by(dt string)

stored as parquet

location '/warehouse/gmall/dws/dws_uv_detail_daycount';

2)数据装载

insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')

select

mid_id,

concat_ws('|', collect_set(user_id)) user_id,

concat_ws('|', collect_set(version_code)) version_code,

concat_ws('|', collect_set(version_name)) version_name,

concat_ws('|', collect_set(lang))lang,

concat_ws('|', collect_set(source)) source,

concat_ws('|', collect_set(os)) os,

concat_ws('|', collect_set(area)) area,

concat_ws('|', collect_set(model)) model,

concat_ws('|', collect_set(brand)) brand,

concat_ws('|', collect_set(sdk_version)) sdk_version,

concat_ws('|', collect_set(gmail)) gmail,

concat_ws('|', collect_set(height_width)) height_width,

concat_ws('|', collect_set(app_time)) app_time,

concat_ws('|', collect_set(network)) network,

concat_ws('|', collect_set(lng)) lng,

concat_ws('|', collect_set(lat)) lat,

count(*) login_count

from dwd_start_log

where dt='2020-03-10'

group by mid_id;

3)查询加载结果

select * from dws_uv_detail_daycount where dt='2020-03-10';

1.4 DWS 层(业务)

DWS 层的宽表字段,是站在不同维度的视角去看事实表,重点关注事实表的度量值

1.4.1 每日会员行为

1)建表语句

drop table if exists dws_user_action_daycount;

create external table dws_user_action_daycount

(

user_id string comment '用户 id',

login_count bigint comment '登录次数',

cart_count bigint comment '加入购物车次数',

cart_amount double comment '加入购物车金额',

order_count bigint comment '下单次数',

order_amount decimal(16,2) comment '下单金额',

payment_count bigint comment '支付次数',

payment_amount decimal(16,2) comment '支付金额'

) COMMENT '每日用户行为'

PARTITIONED BY (`dt` string)

stored as parquet

location '/warehouse/gmall/dws/dws_user_action_daycount/'

tblproperties ("parquet.compression"="lzo");

2)数据装载

with

tmp_login as

(

select

user_id,

count(*) login_count

from dwd_start_log

where dt='2020-03-10'

and user_id is not null

group by user_id

),

tmp_cart as

(

select

user_id,

count(*) cart_count,

sum(cart_price*sku_num) cart_amount

from dwd_fact_cart_info

where dt='2020-03-10'

and user_id is not null

and date_format(create_time,'yyyy-MM-dd')='2020-03-10'

group by user_id

),

tmp_order as

(

select

user_id,

count(*) order_count,

sum(final_total_amount) order_amount

from dwd_fact_order_info

where dt='2020-03-10'

group by user_id

) ,

tmp_payment as

(

select

user_id,

count(*) payment_count,

sum(payment_amount) payment_amount

from dwd_fact_payment_info

where dt='2020-03-10'

group by user_id

)

insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')

select

user_actions.user_id,

sum(user_actions.login_count),

sum(user_actions.cart_count),

sum(user_actions.cart_amount),

sum(user_actions.order_count),

sum(user_actions.order_amount),

sum(user_actions.payment_count),

sum(user_actions.payment_amount)

from

(

select

user_id,

login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_login

union all

select

user_id,

0 login_count,

cart_count,

cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_cart

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

order_count,

order_amount,

0 payment_count,

0 payment_amount

from tmp_order

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

payment_count,

payment_amount

from tmp_payment

) user_actions

group by user_id;

3)查询加载结果

select * from dws_user_action_daycount where dt=‘2020-03-10’;

drop table if exists dws_sku_action_daycount;

create external table dws_sku_action_daycount

(

sku_id string comment 'sku_id',

order_count bigint comment '被下单次数',

order_num bigint comment '被下单件数',

order_amount decimal(16,2) comment '被下单金额',

payment_count bigint comment '被支付次数',

payment_num bigint comment '被支付件数',

payment_amount decimal(16,2) comment '被支付金额',

refund_count bigint comment '被退款次数',

refund_num bigint comment '被退款件数',

refund_amount decimal(16,2) comment '被退款金额',

cart_count bigint comment '被加入购物车次数',

cart_num bigint comment '被加入购物车件数',

favor_count bigint comment '被收藏次数',

appraise_good_count bigint comment '好评数',

appraise_mid_count bigint comment '中评数',

appraise_bad_count bigint comment '差评数',

appraise_default_count bigint comment '默认评价数'

) COMMENT '每日商品行为'

PARTITIONED BY (`dt` string)

stored as parquet

location '/warehouse/gmall/dws/dws_sku_action_daycount/'

tblproperties ("parquet.compression"="lzo");

2)数据装载

注意:如果是 23 点 59 下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,订单时间是昨天或者今天的订单

with

tmp_order as

(

select

sku_id,

count(*) order_count,

sum(sku_num) order_num,

sum(total_amount) order_amount

from dwd_fact_order_detail

where dt='2020-03-10'

group by sku_id

),

tmp_payment as

(

select

sku_id,

count(*) payment_count,

sum(sku_num) payment_num,

sum(total_amount) payment_amount

from dwd_fact_order_detail

where dt='2020-03-10'

and order_id in

(

select

id

from dwd_fact_order_info

where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))

and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'

)

group by sku_id

),

tmp_refund as

(

select

sku_id,

count(*) refund_count,

sum(refund_num) refund_num,

sum(refund_amount) refund_amount

from dwd_fact_order_refund_info

where dt='2020-03-10'

group by sku_id

),

tmp_cart as

(

select

sku_id,

count(*) cart_count,

sum(sku_num) cart_num

from dwd_fact_cart_info

where dt='2020-03-10'

and date_format(create_time,'yyyy-MM-dd')='2020-03-10'

group by sku_id

),

tmp_favor as

(

select

sku_id,

count(*) favor_count

from dwd_fact_favor_info

where dt='2020-03-10'

and date_format(create_time,'yyyy-MM-dd')='2020-03-10'

group by sku_id

),

tmp_appraise as

(

select

sku_id,

sum(if(appraise='1201',1,0)) appraise_good_count,

sum(if(appraise='1202',1,0)) appraise_mid_count,

sum(if(appraise='1203',1,0)) appraise_bad_count,

sum(if(appraise='1204',1,0)) appraise_default_count

from dwd_fact_comment_info

where dt='2020-03-10'

group by sku_id

)

insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')

select

sku_id,

sum(order_count),

sum(order_num),

sum(order_amount),

sum(payment_count),

sum(payment_num),

sum(payment_amount),

sum(refund_count),

sum(refund_num),

sum(refund_amount),

sum(cart_count),

sum(cart_num),

sum(favor_count),

sum(appraise_good_count),

sum(appraise_mid_count),

sum(appraise_bad_count),

sum(appraise_default_count)

from

(

select

sku_id,

order_count,

order_num,

order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_order

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

payment_count,

payment_num,

payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_payment

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

refund_count,

refund_num,

refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_refund

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

cart_count,

cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_cart

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_favor

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

appraise_good_count,

appraise_mid_count,

appraise_bad_count,

appraise_default_count

from tmp_appraise

)tmp

group by sku_id;

3)查询加载结果

select * from dws_sku_action_daycount where dt='2020-03-10';

1.4.5 每日购买行为

1)建表语句

drop table if exists dws_sale_detail_daycount;

create external table dws_sale_detail_daycount

(

user_id string comment '用户 id',

sku_id string comment '商品 id',

user_gender string comment '用户性别',

user_age string comment '用户年龄',

user_level string comment '用户等级',

order_price decimal(10,2) comment '商品价格',

sku_name string comment '商品名称',

sku_tm_id string comment '品牌 id',

sku_category3_id string comment '商品三级品类 id',

sku_category2_id string comment '商品二级品类 id',

sku_category1_id string comment '商品一级品类 id',

sku_category3_name string comment '商品三级品类名称',

sku_category2_name string comment '商品二级品类名称',

sku_category1_name string comment '商品一级品类名称',

spu_id string comment '商品 spu',

sku_num int comment '购买个数',

order_count bigint comment '当日下单单数',

order_amount decimal(16,2) comment '当日下单金额'

) COMMENT '每日购买行为'

PARTITIONED BY (`dt` string)

stored as parquet

location '/warehouse/gmall/dws/dws_sale_detail_daycount/'

tblproperties ("parquet.compression"="lzo");

2)数据装载

insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')

select

op.user_id,

op.sku_id,

ui.gender,

months_between('2020-03-10', ui.birthday)/12 age,

ui.user_level,

si.price,

si.sku_name,

si.tm_id,

si.category3_id,

si.category2_id,

si.category1_id,

si.category3_name,

si.category2_name,

si.category1_name,

si.spu_id,

op.sku_num,

op.order_count,

op.order_amount

from

(

select

user_id,

sku_id,

sum(sku_num) sku_num,

count(*) order_count,

sum(total_amount) order_amount

from dwd_fact_order_detail

where dt='2020-03-10'

group by user_id, sku_id

)op

join

(

select

*

from dwd_dim_user_info_his

where end_date='9999-99-99'

)ui on op.user_id = ui.id

join

(

select

*

from dwd_dim_sku_info

where dt='2020-03-10'

)si on op.sku_id = si.id;

3)查询加载结果

select * from dws_sale_detail_daycount where dt='2020-03-10';

1.5 DWS 层数据导入脚本

1)vim dwd_to_dws.sh

在脚本中填写如下内容

#!/bin/bash

APP=gmall

hive=/opt/modules/hive/bin/hive

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n "$1" ] ;then

do_date=$1

else

do_date=`date -d "-1 day" +%F`

fi

sql="

insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')

select

mid_id,

concat_ws('|', collect_set(user_id)) user_id,

concat_ws('|', collect_set(version_code)) version_code,

concat_ws('|', collect_set(version_name)) version_name,

concat_ws('|', collect_set(lang))lang,

concat_ws('|', collect_set(source)) source,

concat_ws('|', collect_set(os)) os,

concat_ws('|', collect_set(area)) area,

concat_ws('|', collect_set(model)) model,

concat_ws('|', collect_set(brand)) brand,

concat_ws('|', collect_set(sdk_version)) sdk_version,

concat_ws('|', collect_set(gmail)) gmail,

concat_ws('|', collect_set(height_width)) height_width,

concat_ws('|', collect_set(app_time)) app_time,

concat_ws('|', collect_set(network)) network,

concat_ws('|', collect_set(lng)) lng,

concat_ws('|', collect_set(lat)) lat,

count(*) login_count

from ${APP}.dwd_start_log

where dt='$do_date'

group by mid_id;

with

tmp_login as

(

select

user_id,

count(*) login_count

from ${APP}.dwd_start_log

where dt='$do_date'

and user_id is not null

group by user_id

),

tmp_cart as

(

select

user_id,

count(*) cart_count,

sum(cart_price*sku_num) cart_amount

from ${APP}.dwd_fact_cart_info

where dt='$do_date'

and user_id is not null

and date_format(create_time,'yyyy-MM-dd')='$do_date'

group by user_id

),

tmp_order as

(

select

user_id,

count(*) order_count,

sum(final_total_amount) order_amount

from ${APP}.dwd_fact_order_info

where dt='$do_date'

group by user_id

) ,

tmp_payment as

(

select

user_id,

count(*) payment_count,

sum(payment_amount) payment_amount

from ${APP}.dwd_fact_payment_info

where dt='$do_date'

group by user_id

)

insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')

select

user_actions.user_id,

sum(user_actions.login_count),

sum(user_actions.cart_count),

sum(user_actions.cart_amount),

sum(user_actions.order_count),

sum(user_actions.order_amount),

sum(user_actions.payment_count),

sum(user_actions.payment_amount)

from

(

select

user_id,

login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_login

union all

select

user_id,

0 login_count,

cart_count,

cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_cart

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

order_count,

order_amount,

0 payment_count,

0 payment_amount

from tmp_order

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

payment_count,

payment_amount

from tmp_payment

) user_actions

group by user_id;

with

tmp_order as

(

select

sku_id,

count(*) order_count,

sum(sku_num) order_num,

sum(total_amount) order_amount

from ${APP}.dwd_fact_order_detail

where dt='$do_date'

group by sku_id

),

tmp_payment as

(

select

sku_id,

count(*) payment_count,

sum(sku_num) payment_num,

sum(total_amount) payment_amount

from ${APP}.dwd_fact_order_detail

where dt='$do_date'

and order_id in

(

select

id

from ${APP}.dwd_fact_order_info

where (dt='$do_date' or dt=date_add('$do_date',-1))

and date_format(payment_time,'yyyy-MM-dd')='$do_date'

)

group by sku_id

),

tmp_refund as

(

select

sku_id,

count(*) refund_count,

sum(refund_num) refund_num,

sum(refund_amount) refund_amount

from ${APP}.dwd_fact_order_refund_info

where dt='$do_date'

group by sku_id

),

tmp_cart as

(

select

sku_id,

count(*) cart_count,

sum(sku_num) cart_num

from ${APP}.dwd_fact_cart_info

where dt='$do_date'

and date_format(create_time,'yyyy-MM-dd')='$do_date'

group by sku_id

),

tmp_favor as

(

select

sku_id,

count(*) favor_count

from ${APP}.dwd_fact_favor_info

where dt='$do_date'

and date_format(create_time,'yyyy-MM-dd')='$do_date'

group by sku_id

),

tmp_appraise as

(

select

sku_id,

sum(if(appraise='1201',1,0)) appraise_good_count,

sum(if(appraise='1202',1,0)) appraise_mid_count,

sum(if(appraise='1203',1,0)) appraise_bad_count,

sum(if(appraise='1204',1,0)) appraise_default_count

from ${APP}.dwd_fact_comment_info

where dt='$do_date'

group by sku_id

)

insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')

select

sku_id,

sum(order_count),

sum(order_num),

sum(order_amount),

sum(payment_count),

sum(payment_num),

sum(payment_amount),

sum(refund_count),

sum(refund_num),

sum(refund_amount),

sum(cart_count),

sum(cart_num),

sum(favor_count),

sum(appraise_good_count),

sum(appraise_mid_count),

sum(appraise_bad_count),

sum(appraise_default_count)

from

(

select

sku_id,

order_count,

order_num,

order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_order

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

payment_count,

payment_num,

payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_payment

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

refund_count,

refund_num,

refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_refund

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

cart_count,

cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_cart

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_favor

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

appraise_good_count,

appraise_mid_count,

appraise_bad_count,

appraise_default_count

from tmp_appraise

)tmp

group by sku_id;

insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')

select

op.user_id,

op.sku_id,

ui.gender,

months_between('$do_date', ui.birthday)/12 age,

ui.user_level,

si.price,

si.sku_name,

si.tm_id,

si.category3_id,

si.category2_id,

si.category1_id,

si.category3_name,

si.category2_name,

si.category1_name,

si.spu_id,

op.sku_num,

op.order_count,

op.order_amount

from

(

select

user_id,

sku_id,

sum(sku_num) sku_num,

count(*) order_count,

sum(total_amount) order_amount

from ${APP}.dwd_fact_order_detail

where dt='$do_date'

group by user_id, sku_id

)op

join

(

select

*

from ${APP}.dwd_dim_user_info_his

where end_date='9999-99-99'

)ui on op.user_id = ui.id

join

(

select

*

from ${APP}.dwd_dim_sku_info

where dt='$do_date'

)si on op.sku_id = si.id;

"

$hive -e "$sql"

2)增加脚本执行权限

3)执行脚本导入数据

4)查看导入数据

select * from dws_uv_detail_daycount where dt='2020-03-11';

select * from dws_user_action_daycount where dt='2020-03-11';

select * from dws_sku_action_daycount where dt='2020-03-11';

select * from dws_sale_detail_daycount where dt='2020-03-11';

hive udf 分组取top1_项目实战从0到1之hive(27)数仓项目(九)数仓搭建 DWS 层相关推荐

  1. hive udf 分组取top1_Hive的经典面试题

    很久没有发文章了,今天发表一下Hive的总结,如果那里有不足的欢迎指正,顺便再提一个问题(数仓建模中:细化程度越高,粒度级就越小,相反,细化程度越低,粒度级就越大,这个说法能打个比方比喻出来吗?) 必 ...

  2. hive udf 分组取top1_Hive中分组取前N个值的实现-row_number()

    背景 假设有一个学生各门课的成绩的表单,应用hive取出每科成绩前100名的学生成绩. 这个就是典型在分组取Top N的需求. 解决思路 对于取出每科成绩前100名的学生成绩,针对学生成绩表,根据学科 ...

  3. hive udf 分组取top1_Hive分组取Top K数据

    1.ROW_NUMBER,RANK(),DENSE_RANK() 语法格式:row_number() OVER (partition by COL1 order by COL2 desc ) rank ...

  4. Hive中分组取前N个值

    分享两篇文章,结合看更清楚一点. 背景 假设有一个学生各门课的成绩的表单,应用hive取出每科成绩前100名的学生成绩. 这个就是典型在分组取Top N的需求. 解决思路 对于取出每科成绩前100名的 ...

  5. hive UDF函数取最新分区

    hive UDF函数取最新分区 1.pom文件 <dependencies><!-- https://mvnrepository.com/artifact/org.apache.hi ...

  6. Meteor项目实战 -- Next 0.0.2

    接上一篇:Meteor项目实战 -- Next 0.0.1 Get things done , and do Next Next 0.0.2版的目标是账户系统,并把任务与用户关联起来. 首先增加登录所 ...

  7. hive中分组取前N个值的实现

    背景 假设有一个学生各门课的成绩的表单,应用hive取出每科成绩前100名的学生成绩. 这个就是典型在分组取Top N的需求. 解决思路 对于取出每科成绩前100名的学生成绩,针对学生成绩表,根据学科 ...

  8. spark hive udf java_【填坑六】 spark-sql无法加载Hive UDF的jar

    /usr/custom/spark/bin/spark-sql --deploy-mode client add jar hdfs://${clusterName}/user/hive/udf/udf ...

  9. android项目实战博学谷源码_阿里爆款SpringBoot项目实战PDF+源码+视频分享

    前言 关于SpringBoot网络上有太多的博客跟资料,其影响力想必不用我多说了.它作为当前最流行的微服务框架,不但使用更加简单,而且功能更加丰富.性能更加稳定和健壮.其"约定大于配置&qu ...

最新文章

  1. End Credits
  2. php 多进程设置,php多进程操作的三个实例
  3. 一道笔试题(vue,react)
  4. hdu 2025 查找最大元素
  5. SpringMvc接收日期参数
  6. python中main函数在运行中不是必须要写的_Python必备知识之“if __name__ == '__main__':”...
  7. Linux基本服务命令
  8. Linux下autoconf与automake
  9. a59s刷机包卡刷 oppo_OPPO A59S刷机包
  10. xp怎么看计算机是多少位的,WinXP系统怎么看电脑是32位还是64位?
  11. mysql出现core dumped_mysql 段错误 (core dumped)
  12. 03、订单商品信息表.sql
  13. 计算物体自由下落距离
  14. [2021]Linux下C语言qrencode二维码生成库的基本使用和ARM开发板移植
  15. html5 app 打包,手机h5网站如何快速打包做成APP?
  16. 一个单点登录系统设计
  17. 服务正在启动或停止中
  18. Python3集成PayPal跨境支付三方接口以及订单查询和退款业务!
  19. 解决:Error: Cannot retrieve metalink for repository: epel. Please verify its path and try again
  20. HTML 转 EXE 工具(HTML App Build)

热门文章

  1. JS中包含其它JS文件
  2. Libevent-网络服务器事件示例
  3. 板子ping不通PC怎么办——韦东山嵌入式Linux学习笔记07
  4. 程序的加载和执行(一)——《x86汇编语言:从实模式到保护模式》读书笔记21
  5. 【❤️Mark一下❤️】markdown中写上标、下标
  6. Linux (redhat 6.5)服务器上安装Webmin
  7. Mysql的分库分表(基于shardingsphere)
  8. Java8--Lambda表达式对List集合操作
  9. springboot学习笔记(七)
  10. C语言再学习 -- 声明与定义