用户购买CD消费行为分析

  • 1.进行用户消费趋势的分析(按月)
    • 1.1每月的消费总金额
    • 1.2每月的消费次数
    • 1.3每月的产品购买量
    • 1.4每月的消费人数
  • 2.用户个体消费分析
    • 2.1用户消费金额、消费次数、产品购买量的描述性统计
    • 2.2用户消费金额和产品购买量的散点图
    • 2.3用户产品购买量的分布图
    • 2.4用户累计消费金额占比(百分之多少的用户占了百分之多少的消费额)
  • 3.用户行为分析
    • 3.1用户第一次消费
    • 3.2用户最后一次消费
    • 3.3新老客户消费比
      • 3.3.1多少客户仅消费了一次
      • 3.3.2每月新客占比
    • 3.4用户分层
      • 3.4.1RFM用户分层
      • 3.4.2用户状态分析:注册、活跃、回流、流失(不活跃)
    • 3.5用户购买周期(按订单)
      • 3.5.1用户消费周期描述
      • 3.5.2用户消费周期分布
    • 3.6用户生命周期(按第一次&最后一次消费)
      • 3.6.1用户生命周期描述
      • 3.6.2用户生命周期分布
  • 4.用户消费指标
    • 4.1留存率
    • 4.2流失率
    • 4.3复购率
    • 4.4回购率
  • 5.总结
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['font.family'] = 'SimHei'              # 解决不能输出中文的问题。不区分大小写,即SimHei’效果等价于‘simhei’,中括号可以不要
plt.rcParams['figure.autolayout'] = True              # 解决不能完整显示的问题(比如因为饼图太大,显示窗口太小)
import seaborn as sns
columns=['user_id','order_dt','order_products','order_amount']
df = pd.read_table('CDNOW_master.txt',names=columns,sep='\s+',parse_dates=['order_dt'],infer_datetime_format=True)
  • user_id:用户id
  • order_dt:订单日期
  • order_products:购买产品数
  • order_amount:购买金额
df.head()
user_id order_dt order_products order_amount
0 1 1997-01-01 1 11.77
1 2 1997-01-12 1 12.00
2 2 1997-01-12 5 77.00
3 3 1997-01-02 2 20.76
4 3 1997-03-30 2 20.76
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69659 entries, 0 to 69658
Data columns (total 4 columns):#   Column          Non-Null Count  Dtype
---  ------          --------------  -----         0   user_id         69659 non-null  int64         1   order_dt        69659 non-null  datetime64[ns]2   order_products  69659 non-null  int64         3   order_amount    69659 non-null  float64
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 2.1 MB
df.describe()
user_id order_products order_amount
count 69659.000000 69659.000000 69659.000000
mean 11470.854592 2.410040 35.893648
std 6819.904848 2.333924 36.281942
min 1.000000 1.000000 0.000000
25% 5506.000000 1.000000 14.490000
50% 11410.000000 2.000000 25.980000
75% 17273.000000 3.000000 43.700000
max 23570.000000 99.000000 1286.010000
  • 大部分订单只消费少量商品(平均2.4),存在极值干扰
  • 用户的消费金额存在右偏分布,平均消费35.8,中位数为25.9
#第二种转换为时间格式的方法:df['order_dt'] = pd.to_datetime(df.order_dt,format='%Y%m%d')
df['month'] = df.order_dt.values.astype('datetime64[M]')
df
user_id order_dt order_products order_amount month
0 1 1997-01-01 1 11.77 1997-01-01
1 2 1997-01-12 1 12.00 1997-01-01
2 2 1997-01-12 5 77.00 1997-01-01
3 3 1997-01-02 2 20.76 1997-01-01
4 3 1997-03-30 2 20.76 1997-03-01
... ... ... ... ... ...
69654 23568 1997-04-05 4 83.74 1997-04-01
69655 23568 1997-04-22 1 14.99 1997-04-01
69656 23569 1997-03-25 2 25.74 1997-03-01
69657 23570 1997-03-25 3 51.12 1997-03-01
69658 23570 1997-03-26 2 42.96 1997-03-01

69659 rows × 5 columns

df.month.value_counts()
1997-03-01    11598
1997-02-01    11272
1997-01-01     8928
1997-04-01     3781
1997-06-01     3054
1997-07-01     2942
1997-05-01     2895
1998-03-01     2793
1997-11-01     2750
1997-10-01     2562
1997-12-01     2504
1997-08-01     2320
1997-09-01     2296
1998-06-01     2043
1998-01-01     2032
1998-02-01     2026
1998-05-01     1985
1998-04-01     1878
Name: month, dtype: int64

1.进行用户消费趋势的分析(按月)

  • 每月的消费总金额
  • 每月的消费次数
  • 每月的产品购买量
  • 每月的消费人数

1.1每月的消费总金额

grouped_month = df.groupby('month')
order_month_amount = grouped_month.order_amount.sum()
order_month_amount.head()
month
1997-01-01    299060.17
1997-02-01    379590.03
1997-03-01    393155.27
1997-04-01    142824.49
1997-05-01    107933.30
Name: order_amount, dtype: float64
order_month_amount.plot()

由上图可知,前两个月消费金额达到顶峰,三月开始下降,四月后消费金额较为稳定,有略微下降趋势

1.2每月的消费次数

grouped_month.user_id.count().plot()

前三个月 消费次数 在10000次左右,后续月份在2500左右

1.3每月的产品购买量

grouped_month.order_products.sum().plot()

第一季度产品购买数在20000件左右,后续开始从7500到5000缓慢下降

1.4每月的消费人数

grouped_month.user_id.apply(lambda x:len(x.drop_duplicates())).plot()

每月消费人数比消费次数略微下降,但差别不大
前三个月的每月消费人数在8000~10000之间,后续月份在2000左右

df.groupby(['month','user_id']).count().reset_index().groupby('month').user_id.count() == grouped_month.user_id.apply(lambda x:len(x.drop_duplicates()))
month
1997-01-01    True
1997-02-01    True
1997-03-01    True
1997-04-01    True
1997-05-01    True
1997-06-01    True
1997-07-01    True
1997-08-01    True
1997-09-01    True
1997-10-01    True
1997-11-01    True
1997-12-01    True
1998-01-01    True
1998-02-01    True
1998-03-01    True
1998-04-01    True
1998-05-01    True
1998-06-01    True
Name: user_id, dtype: bool
grouped_month1 = pd.pivot_table(df,index='month',values=['user_id','order_products','order_amount'],aggfunc={'order_amount':'sum','user_id':'count','order_products':'sum'})
grouped_month1['user_num'] = df.groupby('month').user_id.apply(lambda x:len(x.drop_duplicates()))
grouped_month1
order_amount order_products user_id user_num
month
1997-01-01 299060.17 19416 8928 7846
1997-02-01 379590.03 24921 11272 9633
1997-03-01 393155.27 26159 11598 9524
1997-04-01 142824.49 9729 3781 2822
1997-05-01 107933.30 7275 2895 2214
1997-06-01 108395.87 7301 3054 2339
1997-07-01 122078.88 8131 2942 2180
1997-08-01 88367.69 5851 2320 1772
1997-09-01 81948.80 5729 2296 1739
1997-10-01 89780.77 6203 2562 1839
1997-11-01 115448.64 7812 2750 2028
1997-12-01 95577.35 6418 2504 1864
1998-01-01 76756.78 5278 2032 1537
1998-02-01 77096.96 5340 2026 1551
1998-03-01 108970.15 7431 2793 2060
1998-04-01 66231.52 4697 1878 1437
1998-05-01 70989.66 4903 1985 1488
1998-06-01 76109.30 5287 2043 1506
plt.rcParams['font.family'] = 'SimHei'              # 解决不能输出中文的问题。不区分大小写,即SimHei’效果等价于‘simhei’,中括号可以不要
plt.rcParams['figure.autolayout'] = True              # 解决不能完整显示的问题(比如因为饼图太大,显示窗口太小)fig,axes = plt.subplots(3,1,figsize=(8,12))
for i in range(2):ax=axes[i]ax.plot(grouped_month1.iloc[:,i])
ax=axes[2]
ax.plot(grouped_month1.user_id,label='消费次数')
ax.plot(grouped_month1.user_num,label='消费人数')
ax.legend()

2.用户个体消费分析

  • 用户消费金额、消费次数、产品购买量的描述性统计
  • 用户消费金额和产品购买量的散点图
  • 用户产品购买量的分布图
  • 用户累计消费金额占比(百分之多少的用户占了百分之多少的消费额)

2.1用户消费金额、消费次数、产品购买量的描述性统计

grouped_user1 = pd.pivot_table(df,index='user_id',values=['order_dt','order_products','order_amount'],aggfunc={'order_amount':'sum','order_dt':'count','order_products':'sum'})
grouped_user1.rename(columns={'order_dt':'order_num'},inplace=True)
grouped_user1.describe()
order_amount order_num order_products
count 23570.000000 23570.000000 23570.000000
mean 106.080426 2.955409 7.122656
std 240.925195 4.736558 16.983531
min 0.000000 1.000000 1.000000
25% 19.970000 1.000000 1.000000
50% 43.395000 1.000000 3.000000
75% 106.475000 3.000000 7.000000
max 13990.930000 217.000000 1033.000000
grouped_user = df.groupby('user_id')
grouped_user.sum().describe([0.1,0.25,0.5,0.75,0.9,0.95])
order_products order_amount
count 23570.000000 23570.000000
mean 7.122656 106.080426
std 16.983531 240.925195
min 1.000000 0.000000
10% 1.000000 12.970000
25% 1.000000 19.970000
50% 3.000000 43.395000
75% 7.000000 106.475000
90% 16.000000 242.332000
95% 26.000000 380.923500
max 1033.000000 13990.930000

消费金额的平均值为106元,而中位数为43,右偏分布,说明小部分用户消费了大部分金额
用户的平均消费次数约为3次,而一半用户消费次数为1次
同理,用户平均购买了7张CD,中位数为3,有极值干扰

2.2用户消费金额和产品购买量的散点图

grouped_user.sum().plot.scatter(x='order_amount',y='order_products')

grouped_user.sum().query('order_amount < 4000').plot.scatter(x='order_amount',y='order_products')

grouped_user1.query('order_amount<6000').plot.scatter(x='order_amount',y='order_products')

sns.jointplot(grouped_user1.query('order_amount<829').order_products,grouped_user1.query('order_amount<829').order_amount, kind='reg')

2.3用户产品购买量的分布图

grouped_user.sum().order_products.plot.hist(bins=20)

从产品消费数量直方图可知,产品消费数量,绝大部分呈现集中趋势,小部分异常值干扰了判断,通过过滤操作排除异常

可以根据描述性统计过滤掉异常值,比如计算产品消费数量的95%数据分布情况,即过滤超过26之后的值

grouped_user.sum().query('order_products<30').order_products.plot.hist(bins=15)

grouped_user1.query('order_amount<400').order_amount.plot.hist(bins=20)

2.4用户累计消费金额占比(百分之多少的用户占了百分之多少的消费额)

user_cumsum = grouped_user1.sort_values(by='order_amount').apply(lambda x:x.cumsum()/x.sum()).reset_index()
user_cumsum
user_id order_amount order_num order_products
0 10175 0.000000 0.000014 0.000006
1 4559 0.000000 0.000029 0.000012
2 1948 0.000000 0.000043 0.000018
3 925 0.000000 0.000057 0.000024
4 10798 0.000000 0.000072 0.000030
... ... ... ... ...
23565 7931 0.985405 0.991056 0.982940
23566 19339 0.988025 0.991860 0.985192
23567 7983 0.990814 0.993999 0.988385
23568 14048 0.994404 0.997115 0.994538
23569 7592 1.000000 1.000000 1.000000

23570 rows × 4 columns

user_cumsum.order_amount.plot(xticks=range(0,23570,2000))

#订单量的累计百分比
user_cumsum.order_num.plot(xticks=range(0,23570,2000))

按照用户消费金额升序排列,50%的用户仅贡献了10%左右的消费金额,而前3570名用户(15%)贡献了60%的消费额,以及将近50%订单量

3.用户行为分析

  • 用户第一次消费
  • 用户最后一次消费
  • 新老客户消费比
    • 多少客户仅消费了一次
    • 每月新客占比
  • 用户分层
    • RFM用户分层
    • 用户状态分析:注册、活跃、回流、流失(不活跃)
  • 用户购买周期(按订单)
    • 用户消费周期描述
    • 用户消费周期分布
  • 用户生命周期(按第一次&最后一次消费)
    • 用户生命周期描述
    • 用户生命周期分布

3.1用户第一次消费

grouped_user.order_dt.min().value_counts()
1997-02-08    363
1997-02-24    347
1997-02-04    346
1997-02-06    346
1997-03-04    340...
1997-01-08    213
1997-03-21    213
1997-01-07    211
1997-01-01    209
1997-01-04    174
Name: order_dt, Length: 84, dtype: int64
#用户第一次消费
grouped_user.order_dt.min().value_counts().plot()

新增用户集中在前三个月,后续没有新用户进入
在1997-02-15号左右两周时间存在‘W’型剧烈波动

3.2用户最后一次消费

#用户最后一次消费
grouped_user.order_dt.max().value_counts().plot()

大部分用户最后一次购买集中在前三月,说明用户多购买一次就不在此继续消费
从3月开始消费用户存在断崖式下跌,随着时间的递增,最后一次购买数也在增加,用户流失呈上升趋势(当然也有可能是因为活动,购买数上升,需结合具体情况分析)

3.3新老客户消费比

3.3.1多少客户仅消费了一次

#多少客户仅消费了一次
user_life = grouped_user.order_dt.agg(['min','max'])
user_life
min max
user_id
1 1997-01-01 1997-01-01
2 1997-01-12 1997-01-12
3 1997-01-02 1998-05-28
4 1997-01-01 1997-12-12
5 1997-01-01 1998-01-03
... ... ...
23566 1997-03-25 1997-03-25
23567 1997-03-25 1997-03-25
23568 1997-03-25 1997-04-22
23569 1997-03-25 1997-03-25
23570 1997-03-25 1997-03-26

23570 rows × 2 columns

user_life_rate = (user_life['min'] == user_life['max']).value_counts()
user_life_rate
True     12054
False    11516
dtype: int64
print('仅消费一次用户占比{:.2%}'.format(user_life_rate[1]/user_life.shape[0]))

仅消费一次用户占比51.14%

user_life_rate.plot.pie(labels=['仅消费一次用户','消费多次用户'],autopct='%2.2f%%')
plt.legend()

3.3.2每月新客占比

grouped_user.min()
order_dt order_products order_amount month
user_id
1 1997-01-01 1 11.77 1997-01-01
2 1997-01-12 1 12.00 1997-01-01
3 1997-01-02 1 16.99 1997-01-01
4 1997-01-01 1 14.96 1997-01-01
5 1997-01-01 1 13.97 1997-01-01
... ... ... ... ...
23566 1997-03-25 2 36.00 1997-03-01
23567 1997-03-25 1 20.97 1997-03-01
23568 1997-03-25 1 14.99 1997-03-01
23569 1997-03-25 2 25.74 1997-03-01
23570 1997-03-25 2 42.96 1997-03-01

23570 rows × 4 columns

user_new = grouped_user.min().groupby('month').order_dt.count()
user_new
month
1997-01-01    7846
1997-02-01    8476
1997-03-01    7248
Name: order_dt, dtype: int64
user_new_ = df.drop_duplicates('user_id').groupby('month').order_dt.count()
user_new_
month
1997-01-01    7846
1997-02-01    8476
1997-03-01    7248
Name: order_dt, dtype: int64
user_sum = df.groupby('month').order_dt.count()
user_sum
month
1997-01-01     8928
1997-02-01    11272
1997-03-01    11598
1997-04-01     3781
1997-05-01     2895
1997-06-01     3054
1997-07-01     2942
1997-08-01     2320
1997-09-01     2296
1997-10-01     2562
1997-11-01     2750
1997-12-01     2504
1998-01-01     2032
1998-02-01     2026
1998-03-01     2793
1998-04-01     1878
1998-05-01     1985
1998-06-01     2043
Name: order_dt, dtype: int64
(user_new/user_sum).fillna(0).plot()

每月新老用户占比到4月为0,说明1997.4后无新用户

3.4用户分层

3.4.1RFM用户分层

rfm = pd.pivot_table(df,index='user_id',values=['order_dt','order_products','order_amount'],aggfunc={'order_dt':'max','order_amount':'sum','order_products':'count'})
rfm
order_amount order_dt order_products
user_id
1 11.77 1997-01-01 1
2 89.00 1997-01-12 2
3 156.46 1998-05-28 6
4 100.50 1997-12-12 4
5 385.61 1998-01-03 11
... ... ... ...
23566 36.00 1997-03-25 1
23567 20.97 1997-03-25 1
23568 121.70 1997-04-22 3
23569 25.74 1997-03-25 1
23570 94.08 1997-03-26 2

23570 rows × 3 columns

rfm.order_dt.max()-rfm.order_dt
user_id
1       545 days
2       534 days
3        33 days
4       200 days
5       178 days...
23566   462 days
23567   462 days
23568   434 days
23569   462 days
23570   461 days
Name: order_dt, Length: 23570, dtype: timedelta64[ns]
#将时间间隔转化为数值型
(rfm.order_dt.max()-rfm.order_dt).apply(lambda x:x.days)
user_id
1        545
2        534
3         33
4        200
5        178...
23566    462
23567    462
23568    434
23569    462
23570    461
Name: order_dt, Length: 23570, dtype: int64
rfm['R'] =  (rfm.order_dt.max()-rfm.order_dt).apply(lambda x:x.days)
rfm.rename(columns={'order_amount':'M','order_products':'F'},inplace=True)
rfm
M order_dt F R
user_id
1 11.77 1997-01-01 1 545
2 89.00 1997-01-12 2 534
3 156.46 1998-05-28 6 33
4 100.50 1997-12-12 4 200
5 385.61 1998-01-03 11 178
... ... ... ... ...
23566 36.00 1997-03-25 1 462
23567 20.97 1997-03-25 1 462
23568 121.70 1997-04-22 3 434
23569 25.74 1997-03-25 1 462
23570 94.08 1997-03-26 2 461

23570 rows × 4 columns

rfm.describe()
M F R
count 23570.000000 23570.000000 23570.000000
mean 106.080426 2.955409 367.221638
std 240.925195 4.736558 181.211177
min 0.000000 1.000000 0.000000
25% 19.970000 1.000000 207.000000
50% 43.395000 1.000000 471.000000
75% 106.475000 3.000000 505.000000
max 13990.930000 217.000000 545.000000
r_bins = [-1,367,545]
f_bins = [0,3,217]
m_bins = [-1,106,13991]
rfm['r_score'] = pd.cut(rfm.R,r_bins,labels=[1,0])
rfm['f_score'] = pd.cut(rfm.F,f_bins,labels=[0,1])
rfm['m_score'] = pd.cut(rfm.M,m_bins,labels=[0,1])
rfm
M order_dt F R r_score f_score m_score
user_id
1 11.77 1997-01-01 1 545 0 0 0
2 89.00 1997-01-12 2 534 0 0 0
3 156.46 1998-05-28 6 33 1 1 1
4 100.50 1997-12-12 4 200 1 1 0
5 385.61 1998-01-03 11 178 1 1 1
... ... ... ... ... ... ... ...
23566 36.00 1997-03-25 1 462 0 0 0
23567 20.97 1997-03-25 1 462 0 0 0
23568 121.70 1997-04-22 3 434 0 0 1
23569 25.74 1997-03-25 1 462 0 0 0
23570 94.08 1997-03-26 2 461 0 0 0

23570 rows × 7 columns

col = ['r_score','f_score','m_score']
for i in col:rfm[i] = rfm[i].astype(np.str)
rfm['rfm_group'] = rfm['r_score']+rfm['f_score']+rfm['m_score']
rfm['r_score'].str.cat(rfm['f_score']).str.cat(rfm['m_score'])
user_id
1        000
2        000
3        111
4        110
5        111...
23566    000
23567    000
23568    001
23569    000
23570    000
Name: r_score, Length: 23570, dtype: object
rfm
M order_dt F R r_score f_score m_score rfm_group
user_id
1 11.77 1997-01-01 1 545 0 0 0 000
2 89.00 1997-01-12 2 534 0 0 0 000
3 156.46 1998-05-28 6 33 1 1 1 111
4 100.50 1997-12-12 4 200 1 1 0 110
5 385.61 1998-01-03 11 178 1 1 1 111
... ... ... ... ... ... ... ... ...
23566 36.00 1997-03-25 1 462 0 0 0 000
23567 20.97 1997-03-25 1 462 0 0 0 000
23568 121.70 1997-04-22 3 434 0 0 1 001
23569 25.74 1997-03-25 1 462 0 0 0 000
23570 94.08 1997-03-26 2 461 0 0 0 000

23570 rows × 8 columns

def rfm_f(x):d = {'111':'重要价值客户','011':'重要保持客户','101':'重要挽留客户','001':'重要发展客户','110':'一般价值客户','010':'一般保持客户','100':'一般挽留客户','000':'一般发展客户'}return d[x.rfm_group]
rfm['label'] = rfm.apply(rfm_f,axis=1)
rfm
M order_dt F R r_score f_score m_score rfm_group label
user_id
1 11.77 1997-01-01 1 545 0 0 0 000 一般发展客户
2 89.00 1997-01-12 2 534 0 0 0 000 一般发展客户
3 156.46 1998-05-28 6 33 1 1 1 111 重要价值客户
4 100.50 1997-12-12 4 200 1 1 0 110 一般价值客户
5 385.61 1998-01-03 11 178 1 1 1 111 重要价值客户
... ... ... ... ... ... ... ... ... ...
23566 36.00 1997-03-25 1 462 0 0 0 000 一般发展客户
23567 20.97 1997-03-25 1 462 0 0 0 000 一般发展客户
23568 121.70 1997-04-22 3 434 0 0 1 001 重要发展客户
23569 25.74 1997-03-25 1 462 0 0 0 000 一般发展客户
23570 94.08 1997-03-26 2 461 0 0 0 000 一般发展客户

23570 rows × 9 columns

rfm.loc[rfm.label == '重要价值客户','color'] = 'g'
rfm.loc[~(rfm.label == '重要价值客户'),'color'] = 'r'
rfm.plot.scatter('R','F',c=rfm.color)

rfm_count = rfm.groupby('label').R.count()
rfm_count
label
一般价值客户      907
一般保持客户      119
一般发展客户    14031
一般挽留客户     2598
重要价值客户     4084
重要保持客户      256
重要发展客户      773
重要挽留客户      802
Name: R, dtype: int64
plt.figure(figsize=(6,6),dpi=160)
rfm_count.plot.pie(autopct='%2.2f%%',labels=rfm_count.index)
plt.legend(bbox_to_anchor=(1.5, 1.2))

由图可知,一般发展客户占到59.53%
重要价值客户占比第二,为17.33%
一般保持和重要保持客户占比较低

3.4.2用户状态分析:注册、活跃、回流、流失(不活跃)

pivoted_counts = pd.pivot_table(df,index='user_id',columns='month',values='order_dt',aggfunc='count').fillna(0)
pivoted_counts
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 2.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
5 2.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23567 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23568 0.0 0.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23569 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23570 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

23570 rows × 18 columns

#applymap作用于每一个元素
df_purchase = pivoted_counts.applymap(lambda x:1 if x>0 else 0)
df_purchase
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
5 1 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23567 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23568 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23569 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23570 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23570 rows × 18 columns

def active_status(data):status = []x = len(data)for i in range(x):#未消费if data[i] == 0:if len(status) == 0:status.append('unreg')else:if status[i-1] == 'unreg':status.append('unreg')else:status.append('unactive')#消费过else:if len(status) == 0:status.append('new')else:if status[i-1] == 'unreg':status.append('new')elif status[i-1] == 'unactive':status.append('return')else:status.append('active')return status
purchase_status = df_purchase.apply(active_status,axis=1,raw=True)
purchase_status
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
2 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
3 new unactive return active unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive return unactive
4 new unactive unactive unactive unactive unactive unactive return unactive unactive unactive return unactive unactive unactive unactive unactive unactive
5 new active unactive return active active active unactive return unactive unactive return active unactive unactive unactive unactive unactive
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23567 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23568 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23569 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23570 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive

23570 rows × 18 columns

purchase_status_ct = purchase_status.replace('unreg',np.nan).apply(lambda x:x.value_counts())
purchase_status_ct
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
active NaN 1157.0 1681 1773.0 852.0 747.0 746.0 604.0 528.0 532.0 624.0 632.0 512.0 472.0 571.0 518.0 459.0 446.0
new 7846.0 8476.0 7248 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
return NaN NaN 595 1049.0 1362.0 1592.0 1434.0 1168.0 1211.0 1307.0 1404.0 1232.0 1025.0 1079.0 1489.0 919.0 1029.0 1060.0
unactive NaN 6689.0 14046 20748.0 21356.0 21231.0 21390.0 21798.0 21831.0 21731.0 21542.0 21706.0 22033.0 22019.0 21510.0 22133.0 22082.0 22064.0
purchase_stack = purchase_status_ct.fillna(0).T
plt.figure(figsize=(6,3),dpi=150)
plt.stackplot(purchase_stack.index,purchase_stack['active'],purchase_stack['new'],purchase_stack['return'],purchase_stack['unactive'],labels=purchase_stack.columns)
plt.legend()
plt.show()
#purchase_stack.plot.area()

根据每月不同用户的计数统计做面积图:

  • 前三个月用户人数不断增加,新增用户数量占比较大,活跃用户数量上升
  • 1997年4月开始无新用户注册
  • 1997年4月开始活跃用户数量下降,最后趋于稳定,回流用户趋于稳定水平,该两层的用户为消费主力,约2000人上下浮动
  • 不活跃用户始终占大部分
  • 注:回流用户为上月未消费,本月消费用户
purchase_status_ct.fillna(0).T.head()
active new return unactive
month
1997-01-01 0.0 7846.0 0.0 0.0
1997-02-01 1157.0 8476.0 0.0 6689.0
1997-03-01 1681.0 7248.0 595.0 14046.0
1997-04-01 1773.0 0.0 1049.0 20748.0
1997-05-01 852.0 0.0 1362.0 21356.0
purchase_status_ct.fillna(0).T.apply(lambda x:x/x.sum(),axis=1).head(10)
active new return unactive
month
1997-01-01 0.000000 1.000000 0.000000 0.000000
1997-02-01 0.070886 0.519299 0.000000 0.409815
1997-03-01 0.071319 0.307510 0.025244 0.595927
1997-04-01 0.075223 0.000000 0.044506 0.880272
1997-05-01 0.036148 0.000000 0.057785 0.906067
1997-06-01 0.031693 0.000000 0.067543 0.900764
1997-07-01 0.031650 0.000000 0.060840 0.907510
1997-08-01 0.025626 0.000000 0.049555 0.924820
1997-09-01 0.022401 0.000000 0.051379 0.926220
1997-10-01 0.022571 0.000000 0.055452 0.921977
#消费用户中不同类型人占比
purchase_status_rate = purchase_status_ct.fillna(0).T.drop(columns=['unactive']).apply(lambda x:x/x.sum(),axis=1)
purchase_status_rate
active new return
month
1997-01-01 0.000000 1.000000 0.000000
1997-02-01 0.120108 0.879892 0.000000
1997-03-01 0.176501 0.761025 0.062474
1997-04-01 0.628278 0.000000 0.371722
1997-05-01 0.384824 0.000000 0.615176
1997-06-01 0.319367 0.000000 0.680633
1997-07-01 0.342202 0.000000 0.657798
1997-08-01 0.340858 0.000000 0.659142
1997-09-01 0.303623 0.000000 0.696377
1997-10-01 0.289288 0.000000 0.710712
1997-11-01 0.307692 0.000000 0.692308
1997-12-01 0.339056 0.000000 0.660944
1998-01-01 0.333116 0.000000 0.666884
1998-02-01 0.304320 0.000000 0.695680
1998-03-01 0.277184 0.000000 0.722816
1998-04-01 0.360473 0.000000 0.639527
1998-05-01 0.308468 0.000000 0.691532
1998-06-01 0.296149 0.000000 0.703851
plt.figure(figsize=(6,3),dpi=150)
plt.plot(purchase_status_rate)
#plt.plot(purchase_status_rate['return'])
plt.legend(purchase_status_rate.columns)
plt.show()

在消费用户中,4月后只存在活跃用户和回流用户进行消费
后期消费用户中,回流用户占比较大,整体消费用户质量一般

3.5用户购买周期(按订单)

3.5.1用户消费周期描述

order_diff = df.groupby('user_id').apply(lambda x:x.order_dt - x.order_dt.shift(1))
order_diff.head()
user_id
1        0       NaT
2        1       NaT2    0 days
3        3       NaT4   87 days
Name: order_dt, dtype: timedelta64[ns]
order_diff.describe()
count                      46089
mean     68 days 23:22:13.567662
std      91 days 00:47:33.924168
min              0 days 00:00:00
25%             10 days 00:00:00
50%             31 days 00:00:00
75%             89 days 00:00:00
max            533 days 00:00:00
Name: order_dt, dtype: object

用户平均购买时间间隔为68天(可以根据品类进行分析是否合理)
用户购买周期中位数为31天,远小于平均值,数据为右偏分布,存在极大值干扰

3.5.2用户消费周期分布

order_diff.apply(lambda x:x.days).hist(bins=20)

用户消费周期呈指数分布,绝大部分用户消费周期都在100天以内

3.6用户生命周期(按第一次&最后一次消费)

3.6.1用户生命周期描述

user_life = df.groupby('user_id').order_dt.agg(['max','min'])
user_life
max min
user_id
1 1997-01-01 1997-01-01
2 1997-01-12 1997-01-12
3 1998-05-28 1997-01-02
4 1997-12-12 1997-01-01
5 1998-01-03 1997-01-01
... ... ...
23566 1997-03-25 1997-03-25
23567 1997-03-25 1997-03-25
23568 1997-04-22 1997-03-25
23569 1997-03-25 1997-03-25
23570 1997-03-26 1997-03-25

23570 rows × 2 columns

user_cycle = (user_life['max']-user_life['min'])
user_cycle
user_id
1         0 days
2         0 days
3       511 days
4       345 days
5       367 days...
23566     0 days
23567     0 days
23568    28 days
23569     0 days
23570     1 days
Length: 23570, dtype: timedelta64[ns]
user_cycle.describe()
count                       23570
mean     134 days 20:55:36.987696
std      180 days 13:46:43.039788
min               0 days 00:00:00
25%               0 days 00:00:00
50%               0 days 00:00:00
75%             294 days 00:00:00
max             544 days 00:00:00
dtype: object

用户平均生命周期为134天,中位数为0天,说明大部分用户仅消费一次就不再进行消费

3.6.2用户生命周期分布

plt.figure()
user_cycle.apply(lambda x:x.days).hist(bins=20)
plt.title('用户生命周期分布')
plt.xlabel('天数')
plt.ylabel('人数')
plt.show()

用户生命周期受只购买一次的用户影响很大,可以过滤生命周期为0的用户

user_cycle.apply(lambda x:x.days)[user_cycle.apply(lambda x:x.days) > 0].hist(bins=20)

过滤掉生命周期为0的用户后的分布图:

  • 图像呈现双峰趋势,很多用户虽然消费多次,但其生命周期小于一个月,应该采取引导消费、促进活跃、防止流失等措施
  • 生命周期在350-500天的用户较为稳定,呈伪正态分布,属于忠诚用户,应重点维持

4.用户消费指标

  • 留存率

    • 用户在第一次消费后,进行第二次消费的比率
  • 流失率
    • 未消费用户占比
  • 复购率
    • 自然月内购买多次的用户占比
  • 回购率
    • 曾经购买过的用户在某一段时期内再次购买的占比

4.1留存率

user_purchase = df[['user_id','order_products','order_amount','order_dt']]
order_date_min = user_purchase.groupby('user_id').order_dt.min()
order_date_min
user_id
1       1997-01-01
2       1997-01-12
3       1997-01-02
4       1997-01-01
5       1997-01-01...
23566   1997-03-25
23567   1997-03-25
23568   1997-03-25
23569   1997-03-25
23570   1997-03-25
Name: order_dt, Length: 23570, dtype: datetime64[ns]
user_purchase_retention = pd.merge(user_purchase,order_date_min,on='user_id',how='left',suffixes=('', '_min'))
user_purchase_retention
user_id order_products order_amount order_dt order_dt_min
0 1 1 11.77 1997-01-01 1997-01-01
1 2 1 12.00 1997-01-12 1997-01-12
2 2 5 77.00 1997-01-12 1997-01-12
3 3 2 20.76 1997-01-02 1997-01-02
4 3 2 20.76 1997-03-30 1997-01-02
... ... ... ... ... ...
69654 23568 4 83.74 1997-04-05 1997-03-25
69655 23568 1 14.99 1997-04-22 1997-03-25
69656 23569 2 25.74 1997-03-25 1997-03-25
69657 23570 3 51.12 1997-03-25 1997-03-25
69658 23570 2 42.96 1997-03-26 1997-03-25

69659 rows × 5 columns

user_purchase_retention['dtdiff'] = (user_purchase_retention.order_dt - user_purchase_retention.order_dt_min).apply(lambda x:x.days)
user_purchase_retention
user_id order_products order_amount order_dt order_dt_min dtdiff
0 1 1 11.77 1997-01-01 1997-01-01 0
1 2 1 12.00 1997-01-12 1997-01-12 0
2 2 5 77.00 1997-01-12 1997-01-12 0
3 3 2 20.76 1997-01-02 1997-01-02 0
4 3 2 20.76 1997-03-30 1997-01-02 87
... ... ... ... ... ... ...
69654 23568 4 83.74 1997-04-05 1997-03-25 11
69655 23568 1 14.99 1997-04-22 1997-03-25 28
69656 23569 2 25.74 1997-03-25 1997-03-25 0
69657 23570 3 51.12 1997-03-25 1997-03-25 0
69658 23570 2 42.96 1997-03-26 1997-03-25 1

69659 rows × 6 columns

bins = [0,3,7,15,30,60,90,180,360,540]
user_purchase_retention['dtdiff_bin'] = pd.cut(user_purchase_retention.dtdiff, bins = bins)
user_purchase_retention
user_id order_products order_amount order_dt order_dt_min dtdiff dtdiff_bin
0 1 1 11.77 1997-01-01 1997-01-01 0 NaN
1 2 1 12.00 1997-01-12 1997-01-12 0 NaN
2 2 5 77.00 1997-01-12 1997-01-12 0 NaN
3 3 2 20.76 1997-01-02 1997-01-02 0 NaN
4 3 2 20.76 1997-03-30 1997-01-02 87 (60.0, 90.0]
... ... ... ... ... ... ... ...
69654 23568 4 83.74 1997-04-05 1997-03-25 11 (7.0, 15.0]
69655 23568 1 14.99 1997-04-22 1997-03-25 28 (15.0, 30.0]
69656 23569 2 25.74 1997-03-25 1997-03-25 0 NaN
69657 23570 3 51.12 1997-03-25 1997-03-25 0 NaN
69658 23570 2 42.96 1997-03-26 1997-03-25 1 (0.0, 3.0]

69659 rows × 7 columns

pivoted_retention= user_purchase_retention.pivot_table(index='user_id', columns='dtdiff_bin', values='order_dt',aggfunc='count',dropna=False)
pivoted_retention
dtdiff_bin (0, 3] (3, 7] (7, 15] (15, 30] (30, 60] (60, 90] (90, 180] (180, 360] (360, 540]
user_id
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN 2.0 NaN 2.0 1.0
4 NaN NaN NaN 1.0 NaN NaN NaN 2.0 NaN
5 NaN NaN 1.0 NaN 1.0 NaN 3.0 4.0 1.0
... ... ... ... ... ... ... ... ... ...
23566 NaN NaN NaN NaN NaN NaN NaN NaN NaN
23567 NaN NaN NaN NaN NaN NaN NaN NaN NaN
23568 NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN
23569 NaN NaN NaN NaN NaN NaN NaN NaN NaN
23570 1.0 NaN NaN NaN NaN NaN NaN NaN NaN

23570 rows × 9 columns

pivoted_retention.applymap(lambda x:1 if x>0 else 0).mean().plot.bar(figsize=(10,5))

从柱状图可以发现,2.68%的用户在首次购买后,会在1到3天内再次消费,随着时间范围加大,人数逐渐增多,12.96%的用户在首次购买后,会在一个月到两个月内再次消费,而在半年到一年之间再次消费的用户达到了25.64%
也可知CD营销确实不是高频次消费行为,若要使利益最大化,应该加强新用户的补充和培养忠诚用户

pivoted_retention.applymap(lambda x:1 if x>0 else 0).mean()
dtdiff_bin
(0, 3]        0.026856
(3, 7]        0.035129
(7, 15]       0.060798
(15, 30]      0.090539
(30, 60]      0.129699
(60, 90]      0.099703
(90, 180]     0.197030
(180, 360]    0.256428
(360, 540]    0.197412
dtype: float64

4.2流失率

purchase_status_ct.fillna(0).T.apply(lambda x:x/x.sum(),axis=1)[['unactive']]
unactive
month
1997-01-01 0.000000
1997-02-01 0.409815
1997-03-01 0.595927
1997-04-01 0.880272
1997-05-01 0.906067
1997-06-01 0.900764
1997-07-01 0.907510
1997-08-01 0.924820
1997-09-01 0.926220
1997-10-01 0.921977
1997-11-01 0.913958
1997-12-01 0.920916
1998-01-01 0.934790
1998-02-01 0.934196
1998-03-01 0.912601
1998-04-01 0.939033
1998-05-01 0.936869
1998-06-01 0.936105
purchase_status_ct.fillna(0).T.apply(lambda x:x/x.sum(),axis=1)[['unactive']].plot.bar(figsize=(10,5))

4.3复购率

pivoted_counts
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 2.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
5 2.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23567 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23568 0.0 0.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23569 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23570 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

23570 rows × 18 columns

purchase_r = pivoted_counts.applymap(lambda x: 1 if x>1 else np.NaN if x==0 else 0)
purchase_r
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 0.0 NaN 0.0 0.0 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 0.0 NaN
4 1.0 NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN
5 1.0 0.0 NaN 0.0 0.0 0.0 0.0 NaN 0.0 NaN NaN 1.0 0.0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23567 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23568 NaN NaN 0.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23569 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23570 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

23570 rows × 18 columns

(purchase_r.sum()/purchase_r.count()).plot()

后期复购率稳定在20%左右,前三个月有大量用户涌入,而这批用户多只购买一次,所以复购率较低

repurchase = df.groupby('user_id').month.count()
repurchase
user_id
1         1
2         2
3         6
4         4
5        11..
23566     1
23567     1
23568     3
23569     1
23570     2
Name: month, Length: 23570, dtype: int64
repurchase[repurchase>1].count()/repurchase.count()

0.4947815019092066

plt.figure(figsize=(6,3),dpi=150)
plt.plot(purchase_r.count())
plt.plot(purchase_r.sum())
plt.xlabel('时间(月)')
plt.ylabel('用户数(人)')
plt.legend(['消费人数', '二次消费以上人数'])
plt.show()

用户消费三个月后,人数迅速下降,最后稳定在2000人左右
而两次消费以上用户数,先随着新用户进入有上升趋势,随后开始缓慢下降,需要采取促活行为

purchase_r.sum()
month
1997-01-01     844.0
1997-02-01    1178.0
1997-03-01    1479.0
1997-04-01     631.0
1997-05-01     436.0
1997-06-01     458.0
1997-07-01     469.0
1997-08-01     355.0
1997-09-01     352.0
1997-10-01     380.0
1997-11-01     410.0
1997-12-01     410.0
1998-01-01     324.0
1998-02-01     315.0
1998-03-01     473.0
1998-04-01     286.0
1998-05-01     298.0
1998-06-01     323.0
dtype: float64

4.4回购率

df_purchase
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
5 1 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23567 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23568 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23569 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23570 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23570 rows × 18 columns

def purchse_back(data):status = [np.nan]l = len(data)-1for i in range(1,l):if data[i] == 1:if data[i-1] ==1:status.append(1)else:status.append(0)else:status.append(np.nan)status.append(np.nan)return status
df_purchase
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
5 1 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23567 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23568 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23569 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23570 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23570 rows × 18 columns

purchase_b = df_purchase.apply(purchse_back,axis=1,raw=True)
purchase_b
month 1997-01-01 1997-02-01 1997-03-01 1997-04-01 1997-05-01 1997-06-01 1997-07-01 1997-08-01 1997-09-01 1997-10-01 1997-11-01 1997-12-01 1998-01-01 1998-02-01 1998-03-01 1998-04-01 1998-05-01 1998-06-01
user_id
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN 0.0 1.0 NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN NaN NaN 0.0 NaN
4 NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN
5 NaN 1.0 NaN 0.0 1.0 1.0 1.0 NaN 0.0 NaN NaN 0.0 1.0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23566 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23567 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23568 NaN NaN 0.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23569 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23570 NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

23570 rows × 18 columns

(purchase_b.sum()/purchase_b.count()).plot(figsize=(10,5))

前三个月因为有新用户涌入,但是多只购买一次之后不再消费,说明了对于新用户,在其第一次消费后的三个月内是一段重要的时期,需要营销策略积极引导其再次消费及持续消费
随着消费总人数下降,而回购人数上升,在四月回购率达到顶峰
但随着用户流失,最后忠诚用户回购率稳定在30%左右,对于老客,也要适时推出反馈老客户的优惠活动,以加强老客的忠诚度

plt.figure(figsize=(6,3),dpi=150)
plt.plot(purchase_b.count())
plt.plot(purchase_b.sum())
plt.xlabel('时间(月)')
plt.ylabel('用户数(人)')
plt.legend(['消费人数', '回购人数'])
plt.show()

消费人数从三月份开始,因为没有新用户进入,出现断崖式下跌
而回购人数先缓慢增长,从四月份开始逐渐下降并稳定在500左右

5.总结

  • 网站新用户集中分布在前三月,产品缺乏价值点或难以满足用户需求,从而导致大量用户流失,1997年4月后消费群体全部为老用户
  • CD产品的消费呈现小额低频的特征,且用户消费相关数据多为右偏分布,排名前15%的用户贡献了60%的消费额和50%的订单量,反应出消费也呈现“二八”规律,
  • 消费用户中的回流用户与活跃用户后期均有下降趋势,需要对客户流失进行预警,建议应该优化产品,培养用户忠诚度
  • 新用户大多数只消费过一次,回购率和复购率偏低,整体质量低于老用户,建议推出优惠活动以吸引用户,提高用户粘性,并适时推出反馈老用户的优惠活动,以加强老用户的忠诚度

用户购买CD消费行为分析相关推荐

  1. 用户购买CD行为数据分析

    %matplotlib inline # 正常显示中文 from pylab import matplotlib matplotlib.rcParams['font.sans-serif'] = [' ...

  2. oracle:用户购买平台案例分析与优化

    用户购买平台案例,涉及时间型数据.个人第一眼感觉特别简单,但是当深入处理是难成狗了.虽然在测试样例中的结果中通过,但是在最终提交过程中,却显示超时.唉,还得优化呀!本文就是关于这个问题的分析和总结. ...

  3. 数据分析——用户消费行为分析

    用户消费行为分析 前言 一.数据预处理 二.用户整体消费趋势分析(按月份) 三.用户的个体消费分析 1 用户消费金额.消费次数(产品数量)描述统计 2 用户消费分布分析 3 用户累计消费金额占比分析( ...

  4. 某在线音乐零售平台用户消费行为分析

    一.项目背景 CDNow曾经是一家在线音乐零售平台,后被德国波泰尔斯曼娱乐集团公司出资收购,其资产总价值在最辉煌时曾超过10亿美元.本文主要通过分析CDNow网站的用户购买明细来分析该网站的用户消费行 ...

  5. 数据分析项目:用户消费行为分析

    用户消费行为分析实战 利用pandas进行数据处理,分析用户消费行为. 数据来源CDNow网站的用户购买明细,一共有用户ID,购买日期,购买数量,购买金额四个字段. 1.导入常用需要的库 # 导入常用 ...

  6. python 将多张大小不同的图依次放入较大的黑色背景_用户消费行为分析(Python)...

    一 分析背景及目的 通过历史数据,对用户的购买时间.购买数量.金额等字段,分析用户的复购率.回购率等指标,从中找出有商业价值的线索和规律,调查产品销售情况,根据分析结果做出不同的营销策略. 二 分析过 ...

  7. 数据可视化:利用Python和Echarts制作“用户消费行为分析”可视化大屏

    数据可视化:利用Python和Echarts制作"用户消费行为分析"可视化大屏 前言 实验目的: 准备工作: 一.创建项目: 二.建立数据库连接获取数据: 三.页面布局: 四.下载 ...

  8. CDNow网站用户消费行为分析

    目录 1.简介 1.1 数据集说明 1.2 分析思路 2.分析结果及建议 2.1 总体消费情况 2.2 客户获取和留存 2.3 客户价值分析 2.4 客户生命周期分析 3.数据处理过程 3.1 数据预 ...

  9. 计算机用户来源分析,中国PC用户购买渠道调查报告前期准备用户篇

    第1页:近八成用户近期打算购买电脑 电脑产品品牌众多.型号让人眼花缭乱,用户在选购产品之前,通常会花费一定时间去对市场进行前期了解,以便选购更加适合自己的产品. 为了解用户在购买不同电脑产品时对渠道选 ...

最新文章

  1. 解决Ubuntu的错误提示
  2. 【Paper】2015_异构无人机群鲁棒一致性协议设计_孙长银
  3. python笔记:断言assert
  4. Java的运行机制分析!
  5. VTK:循环布尔PolyData用法实战
  6. payara 创建 集群_Payara Micro在Oracle应用容器云上
  7. php listen命令,开启队列时,命令行输入php think queue:listen出现乱码
  8. Java实现中英文词典功能
  9. 台风怎么看内存颗粒_一文全懂!内存条超频、稳定怎么看?——从入门到精通...
  10. 现在谁还会LOL钻石网吧特权?
  11. 批处理删除文件夹命令_批处理文件夹命令
  12. 佳能Canon PIXMA MP568 一体机驱动
  13. laravel excel 导出图片
  14. 大数据论坛圆满落幕 开启海南房地产大数据时代
  15. 计网-网络号、子网号、主机号以及子网网络地址,子网广播地址的算法
  16. Go语言之高级篇beego框架之view
  17. 暗影骑士vn7安装linux,宏碁暗影骑士vn7 591g笔记本怎么样?
  18. Java job interview:网页设计HTML+CSS前端开发与PS前台美化案例分析
  19. vmware horizon view发布win7/win10即时克隆桌面池步骤图文
  20. var 、let 和 const 的区别

热门文章

  1. Python爬取小说实例
  2. 年度审计报告作用有哪些?
  3. 离散制造与流程制造的区别与特点
  4. 使用xrandr和cvt为ubuntu重新设置分辨率
  5. S7-200 SMART如何通过高速计数器来测定脉冲串的频率和速度?
  6. 去掉@Autowired注解下面的黄色警示波浪线
  7. 脉冲电子围栏在周界安防中的作用
  8. 电脑查看服务器ip地址
  9. 转转闲鱼源码搭建教程
  10. 易优CMS:screening的基础用法