项目介绍:

黑色星期五是美国感恩节后一天,圣诞节前的一次大采购活动,当天一般美国商场会推出大量的打折优惠、促销活动, 由于美国的商场一般以红笔记录赤字,以黑笔记录盈利,而感恩节后的这个星期五人们疯狂的抢购使得商场利润大增,因此被商家们称作黑色星期五。 商家期望通过以这一天开始的圣诞大采购为这一年获得最多的盈利。

分析目的:

本次的分析数据来自于Kaggle提供的某电商黑色星期五的销售记录, 参考网上的分析思路,将围绕产品和用户两大方面展开叙述,为电商平台制定策略提供分析及建议。

本文分析的主要框架

1.整体消费的情况
2.用户画像分析(探究最优价值的用户类型:性别、年龄、职业、婚姻)
3.城市业绩分析(城市分布 、居住年限分布)
3.产品分析(探究最优价值的产品) 细化分析:产品销售额Top 5产品、产品销售额Top5 产品类别
4.最大贡献用户价值分析: 客单价、价值Top1000用户清单、价值Top1000用户情况
5.结论以及建议

import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
import seaborn as sns 
df = pd.read_csv('D:\\BaiduNetdiskDownload\\practise\\Third Program\\BlackFriday.csv')
df.head()
User_ID Product_ID Gender Age Occupation City_Category Stay_In_Current_City_Years Marital_Status Product_Category_1 Product_Category_2 Product_Category_3 Purchase
0 1000001 P00069042 F 0-17 10 A 2 0 3 NaN NaN 8370
1 1000001 P00248942 F 0-17 10 A 2 0 1 6.0 14.0 15200
2 1000001 P00087842 F 0-17 10 A 2 0 12 NaN NaN 1422
3 1000001 P00085442 F 0-17 10 A 2 0 12 14.0 NaN 1057
4 1000002 P00285442 M 55+ 16 C 4+ 0 8 NaN NaN 7969

原始数据中共有12个字段,解释如下:

User_ID: 用户ID

Product_ID: 产品ID

Gender: 性别

Age: 年龄

Occupation: 职业

City_Category: 城市(A,B,C)

Stay_In_Current_City_Years: 居住时长

Marital_Status: 婚姻状况

Product_Category_1 产品类别1,是一级分类

Product_Category_2 产品类别2,是二级分类

Product_Category_3 产品类别3,是三级分类

Purchase: 金额(美元)

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 537577 entries, 0 to 537576
Data columns (total 12 columns):
User_ID                       537577 non-null int64
Product_ID                    537577 non-null object
Gender                        537577 non-null object
Age                           537577 non-null object
Occupation                    537577 non-null int64
City_Category                 537577 non-null object
Stay_In_Current_City_Years    537577 non-null object
Marital_Status                537577 non-null int64
Product_Category_1            537577 non-null int64
Product_Category_2            370591 non-null float64
Product_Category_3            164278 non-null float64
Purchase                      537577 non-null int64
dtypes: float64(2), int64(5), object(5)
memory usage: 49.2+ MB

1、整体消费情况

df.describe()
User_ID Occupation Marital_Status Product_Category_1 Product_Category_2 Product_Category_3 Purchase
count 5.375770e+05 537577.00000 537577.000000 537577.000000 370591.000000 164278.000000 537577.000000
mean 1.002992e+06 8.08271 0.408797 5.295546 9.842144 12.669840 9333.859853
std 1.714393e+03 6.52412 0.491612 3.750701 5.087259 4.124341 4981.022133
min 1.000001e+06 0.00000 0.000000 1.000000 2.000000 3.000000 185.000000
25% 1.001495e+06 2.00000 0.000000 1.000000 5.000000 9.000000 5866.000000
50% 1.003031e+06 7.00000 0.000000 5.000000 9.000000 14.000000 8062.000000
75% 1.004417e+06 14.00000 1.000000 8.000000 15.000000 16.000000 12073.000000
max 1.006040e+06 20.00000 1.000000 18.000000 18.000000 18.000000 23961.000000
df['Purchase'].sum()/df['User_ID'].drop_duplicates(keep='first').count()#平均客单价是85万美元
851751.5494822611

从本次的消费记录来看,记录的主要是大客户的消费数据,人均消费已经达到了85万美元!这些人一共贡献了50亿美金的销售额。抓住忠实用户,并促进他们消费,是互联网电商发展的基本操作。

2、从用户的角度来分析问题

(1)性别方面

#df_gender_purchase=df.groupby("Gender").agg({"Purchase":"sum"}).reset_index().rename(columns={"Purchase":"Purchase_amount"})
df_gender_purchase=df.groupby('Gender').agg({'Purchase':'sum'}).reset_index().rename(columns={'Purchase':'Purchase_amount'})
df_gender_purchase['gender_purchase_pro']=df_gender_purchase.apply(lambda x: x[1]/df['Purchase'].sum(),axis=1)
def Gender_user_count(x):if x[0]=='F':return df.loc[df['Gender']=='F'].drop_duplicates('User_ID',keep='first')['User_ID'].count()if x[0]=='M':return df.loc[df['Gender']=='M'].drop_duplicates('User_ID',keep='first')['User_ID'].count()
df_gender_purchase['gender_user_count']=df_gender_purchase.apply(lambda x:Gender_user_count(x),axis=1)
df_gender_purchase['gender_customer_price']=df_gender_purchase.apply(lambda x:x[1]/x[3],axis=1)
df_gender_purchase['gender_count_prop']=df_gender_purchase.apply(lambda x:x[3]/df_gender_purchase['gender_user_count'].sum(),axis=1)
df_gender_purchase
Gender Purchase_amount gender_purchase_pro gender_user_count gender_customer_price gender_count_prop
0 F 1164624021 0.232105 1666 699054.034214 0.282804
1 M 3853044357 0.767895 4225 911963.161420 0.717196

在黑色星期五的活动中,男性是占据了71%的用户,将近是女性的2.5倍;但是贡献了将近76%的销售额,是女生的3.3倍;显然是有跟多的男性参与这个活动,并且客单价还是较高于女性, 所以应该针对男性用价格较高的产品来推销。

(2)年龄方面

df_age_purchase = df.groupby('Age').agg({'Purchase':'sum'}).reset_index().rename(columns={'Purchase':'Purchase_amount'})
df_age_purchase['Purchase_amount_pro']=df_age_purchase.apply(lambda x: x[1]/df_age_purchase['Purchase_amount'].sum(),axis=1)
def Age_user_count(x):for i in df['Age'].drop_duplicates():if x[0]==i:return df.loc[df['Age']==i].drop_duplicates('User_ID',keep='first')['User_ID'].count()
df_age_purchase['age_user_count']=df_age_purchase.apply(lambda x: Age_user_count(x),axis=1)
df_age_purchase['age_user_count_pro']=df_age_purchase.apply(lambda x: x[3]/df.drop_duplicates('User_ID',keep='first')['User_ID'].count(),axis=1)
df_age_purchase['age_customer_price']=df_age_purchase.apply(lambda x: x[1]/x[3],axis=1)
df_age_purchase
Age Purchase_amount Purchase_amount_pro age_user_count age_user_count_pro age_customer_price
0 0-17 132659006 0.026438 218 0.037006 608527.550459
1 18-25 901669280 0.179699 1069 0.181463 843469.859682
2 26-35 1999749106 0.398542 2053 0.348498 974061.912323
3 36-45 1010649565 0.201418 1167 0.198099 866023.620394
4 46-50 413418223 0.082392 531 0.090137 778565.391714
5 51-55 361908356 0.072127 481 0.081650 752408.224532
6 55+ 197614842 0.039384 372 0.063147 531222.693548

消费人数和金额主要集中在18-45这个年龄阶段,几乎贡献了80%的销售额,其中26-35年龄段, 无论是消费者人数和消费金额都是最多的,这是应该重点推销商品的用户。

(3)婚姻状态方面

df_Marital_purchase=df.groupby('Marital_Status').agg({'Purchase':'sum'}).reset_index().rename(columns={'Purchase':'Purchase_amount'})
df_Marital_purchase['Marital_purchase_prop']=df_Marital_purchase.apply(lambda x:x[1]/df['Purchase'].sum(),axis=1)def Marital_user_count(x):if x[0]==0:return (df.loc[df['Marital_Status']==0].drop_duplicates(subset=['User_ID'],keep='first')['User_ID'].count())if x[0]==1:return (df.loc[df['Marital_Status']==1].drop_duplicates(subset=['User_ID'],keep='first')['User_ID'].count())df_Marital_purchase['Marital_user_count']=df_Marital_purchase.apply(lambda x:Marital_user_count(x),axis=1)
df_Marital_purchase['Marital_customer_price']=df_Marital_purchase.apply(lambda x:x[1]/x[3],axis=1)
df_Marital_purchase['Marital_count_prop']=df_Marital_purchase.apply(lambda x:x[3]/df.drop_duplicates(subset=['User_ID'],keep='first')['User_ID'].count(),axis=1)
df_Marital_purchase
Marital_Status Purchase_amount Marital_purchase_prop Marital_user_count Marital_customer_price Marital_count_prop
0 0 2966289500 0.591169 3417 868097.600234 0.580037
1 1 2051378878 0.408831 2474 829174.970897 0.419963

不结婚的人在销售金额、参与活动数量方面是比已经结婚的高出40%

(4)合并性别和婚姻状态这两个字段分析不同年龄段的销售额情况

df["Gender_MaritalStatus"]=df[["Gender","Marital_Status"]].apply(lambda x:str(x[0])+"_"+str(x[1]),axis=1)
df_Gender_MaritalStatus_purchase=df.groupby(["Gender_MaritalStatus","Age"]).agg({"Purchase":"sum"}).reset_index().rename(columns={"Purchase":"Purchase_amount"})
def Gender_MaritalStatus_user_count(x):for i in df["Gender_MaritalStatus"].drop_duplicates():for j in df["Age"].drop_duplicates():if x[0]==i and x[1]==j:return (df.loc[(df["Gender_MaritalStatus"]==i) & (df["Age"]==j)].drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count())df_Gender_MaritalStatus_purchase["Gender_MaritalStatus_user_count"]=df_Gender_MaritalStatus_purchase.apply(lambda x:Gender_MaritalStatus_user_count(x),axis=1)
df_Gender_MaritalStatus_purchase["Gender_MaritalStatus_user_price"]=df_Gender_MaritalStatus_purchase.apply(lambda x:x[2]/x[3],axis=1)
df_Gender_MaritalStatus_purchase["Gender_MaritalStatus_count_prop"]=df_Gender_MaritalStatus_purchase.apply(lambda x:x[2]/df.drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count(),axis=1)
df_Gender_MaritalStatus_purchase.head(5)
Gender_MaritalStatus Age Purchase_amount Gender_MaritalStatus_user_count Gender_MaritalStatus_user_price Gender_MaritalStatus_count_prop
0 F_0 0-17 41826615 78 536238.653846 7100.087421
1 F_0 18-25 153305178 217 706475.474654 26023.625530
2 F_0 26-35 254464648 320 795202.025000 43195.492786
3 F_0 36-45 148392364 202 734615.663366 25189.673061
4 F_0 46-50 27113309 49 553332.836735 4602.496860
sns.barplot(x="Age",hue="Gender_MaritalStatus",y="Gender_MaritalStatus_user_count",data=df_Gender_MaritalStatus_purchase)
<matplotlib.axes._subplots.AxesSubplot at 0x1dbcba59748>

26到35这个时间区间中,未婚状态下的男性参与活动的人数的最多的,而到18-35这个地区重未婚男性的销量也拍排到第二位的

sns.barplot(x="Age",hue="Gender_MaritalStatus",y="Purchase_amount",data=df_Gender_MaritalStatus_purchase)
<matplotlib.axes._subplots.AxesSubplot at 0x1dbc5204978>

26到35这个时间区间中,未婚状态下的男性参与活动的人数的最多的,而到18-35这个地区重未婚男性的销量也拍排到第二位的

(5)考虑不同职位的下的人购买情况

df_Occupation_purchase=df.groupby("Occupation").agg({"Purchase":"sum"}).reset_index().rename(columns={"Purchase":"Purchase_amount"})
df_Occupation_purchase["Occupation_purchase_prop"]=df_Occupation_purchase.apply(lambda x:x[1]/df["Purchase"].sum(),axis=1)
def Occupation_user_count(x):for i in df["Occupation"].drop_duplicates():if x[0]==i:return (df.loc[df["Occupation"]==i].drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count())df_Occupation_purchase["Occupation_user_count"]=df_Occupation_purchase.apply(lambda x:Occupation_user_count(x),axis=1)
df_Occupation_purchase["Occupation_customer_price"]=df_Occupation_purchase.apply(lambda x:x[1]/x[3],axis=1)
df_Occupation_purchase["Occupation_count_prop"]=df_Occupation_purchase.apply(lambda x:x[3]/df.drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count(),axis=1)
df_Occupation_purchase.sort_values(by="Occupation_user_count",ascending=False)
Occupation Purchase_amount Occupation_purchase_prop Occupation_user_count Occupation_customer_price Occupation_count_prop
4 4 657530393 0.131043 740 8.885546e+05 0.125615
0 0 625814811 0.124722 688 9.096146e+05 0.116788
7 7 549282744 0.109470 669 8.210504e+05 0.113563
1 1 414552829 0.082619 517 8.018430e+05 0.087761
17 17 387240355 0.077175 491 7.886769e+05 0.083347
12 12 300672105 0.059923 376 7.996599e+05 0.063826
14 14 255594745 0.050939 294 8.693699e+05 0.049907
20 20 292276985 0.058250 273 1.070612e+06 0.046342
2 2 233275393 0.046491 256 9.112320e+05 0.043456
16 16 234442330 0.046723 235 9.976269e+05 0.039891
6 6 185065697 0.036883 228 8.116917e+05 0.038703
10 10 114273954 0.022774 192 5.951768e+05 0.032592
3 3 160428450 0.031973 170 9.436968e+05 0.028858
13 13 71135744 0.014177 140 5.081125e+05 0.023765
15 15 116540026 0.023226 140 8.324288e+05 0.023765
11 11 105437359 0.021013 128 8.237294e+05 0.021728
5 5 112525355 0.022426 111 1.013742e+06 0.018842
9 9 53619309 0.010686 88 6.093103e+05 0.014938
19 19 73115489 0.014572 71 1.029796e+06 0.012052
18 18 60249706 0.012008 67 8.992493e+05 0.011373
8 8 14594599 0.002909 17 8.585058e+05 0.002886

4、0、7、1的人数占到了用户总人数的40%,这些职位应该是我们关注的对象

3、从城市贡献角度考虑

df_City_Category_purchase=df.groupby("City_Category").agg({"Purchase":"sum"}).reset_index().rename(columns={"Purchase":"Purchase_amount"})
df_City_Category_purchase=df.groupby("City_Category").agg({"Purchase":"sum"}).reset_index().rename(columns={"Purchase":"Purchase_amount"})
df_City_Category_purchase["Marital_purchase_prop"]=df_City_Category_purchase.apply(lambda x:x[1]/df["Purchase"].sum(),axis=1)def City_Category_user_count(x):if x[0]=="A":return (df.loc[df["City_Category"]=="A"].drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count())if x[0]=="B":return (df.loc[df["City_Category"]=="B"].drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count())if x[0]=="C":return (df.loc[df["City_Category"]=="C"].drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count())df_City_Category_purchase["City_Category_user_count"]=df_City_Category_purchase.apply(lambda x:City_Category_user_count(x),axis=1)
df_City_Category_purchase["City_Category_customer_price"]=df_City_Category_purchase.apply(lambda x:x[1]/x[3],axis=1)
df_City_Category_purchase["City_Category_count_prop"]=df_City_Category_purchase.apply(lambda x:x[3]/df.drop_duplicates(subset=["User_ID"],keep="first")["User_ID"].count(),axis=1)
df_City_Category_purchase
City_Category Purchase_amount Marital_purchase_prop City_Category_user_count City_Category_customer_price City_Category_count_prop
0 A 1295668797 0.258221 1045 1.239874e+06 0.177389
1 B 2083431612 0.415219 1707 1.220522e+06 0.289764
2 C 1638567969 0.326560 3139 5.220032e+05 0.532847

C 城市的参与活动的用户量占总的53%,但是贡献销售额仅仅占了30%,相反B城市是占的总用户量的28%确贡献了40%的销售额,并且AB城市的客单价是分别是C城市的近似2倍。我们大致能够猜测到AB城市的消费水品较高,下次举办活动的时候,可以对AB城市的价格适当提高。C城市可以适当降低价格,通过提高销售量来提高销售额

4、从产品品相考虑

(1)销量Top10的产品

df_count10=df.groupby("Product_ID").agg({"User_ID":"count","Purchase":"sum"}).rename(columns={"Purchase":"Purchase_amount","User_ID":"User_count"}).reset_index().sort_values(by=["Purchase_amount"],ascending=False)[["Product_ID","Purchase_amount"]].head(10)
df_count10
Product_ID Purchase_amount
249 P00025442 27532426
1014 P00110742 26382569
2441 P00255842 24652442
1743 P00184942 24060871
581 P00059442 23948299
1028 P00112142 23882624
1016 P00110942 23232538
2261 P00237542 23096487
565 P00057642 22493690
104 P00010742 21865042

(2)销售Top10的产品

df_amount10=df.groupby("Product_ID").agg({"User_ID":"count","Purchase":"sum"}).rename(columns={"Purchase":"Purchase_amount","User_ID":"User_count"}).reset_index().sort_values(by=["User_count"],ascending=False)[["Product_ID","User_count"]].head(10)
df_amount10
Product_ID User_count
2534 P00265242 1858
1014 P00110742 1591
249 P00025442 1586
1028 P00112142 1539
565 P00057642 1430
1743 P00184942 1424
458 P00046742 1417
568 P00058042 1396
1353 P00145042 1384
581 P00059442 1384

(3)销量和销量金额都在Top10的产品

pd.merge(df_amount10,df_count10,left_on="Product_ID",right_on="Product_ID",how="inner")
Product_ID User_count Purchase_amount
0 P00110742 1591 26382569
1 P00025442 1586 27532426
2 P00112142 1539 23882624
3 P00057642 1430 22493690
4 P00184942 1424 24060871
5 P00059442 1384 23948299
df_amount=df.groupby("Product_Category_1").agg({"User_ID":"count","Purchase":"sum"}).rename(columns={"Purchase":"Purchase_amount","User_ID":"User_count"}).reset_index().sort_values(by=["Purchase_amount"],ascending=False)[["Product_Category_1","Purchase_amount"]]
df_amount["Category_Prop"]=df_amount.apply(lambda x:x[1]/df["Purchase"].sum(),axis=1)
df_amount
Product_Category_1 Purchase_amount Category_Prop
0 1 1882666325 0.375207
4 5 926917497 0.184731
7 8 840693394 0.167547
5 6 319355286 0.063646
1 2 264497242 0.052713
2 3 200412211 0.039941
15 16 143168035 0.028533
10 11 112203088 0.022362
9 10 99029631 0.019736
14 15 91658147 0.018267
6 7 60059209 0.011970
3 4 26937957 0.005369
13 14 19718178 0.003930
17 18 9149071 0.001823
8 9 6277472 0.001251
16 17 5758702 0.001148
11 12 5235883 0.001043
12 13 3931050 0.000783

5、总结

1、用户的角度

  • 结论汇总:年龄在26-35岁,职业编号为"4",“0”,“7”,"1"的未婚男性消费人群属于高消费人群,该平台的超级忠实用户;
  • 后续改进:1)对高价值用户重点关注,进行更精细化的营销,后续为这些高价值用户提供更多的高价值消费品;
    2)针对其他的用户,主要引导用户点击购买,多推荐一些热销的商品;

2、商品的角度

  • 结论汇总: 1)黑色星期五期间,一级商品分类的5、1、8的销量、销售额都是排在前3的,而且最受用户欢迎的商品top10中也有这3类商品,这3类商品贡献了72%的销售额;
    2)销量排名最低的三个商品种类是16、11、12,占比都不到0.3%;
    3)即在在Top10销售额中的产品和在Top10销售量的产品,可利用爆款商品陈列位置为其他产品引流。
  • 后续改进: 1)可以在最受用户欢迎的商品top10的商品和其他一些相关的商品做一些捆绑销售,带动其他商品的销量;在一级商品分类为5、1、8的商品页面推荐一些其他的商品,引导用户去点击购买;
    2)具体再分析下销量排名最低的三个商品种类是什么原因造成的,如果商品种类16、11、12是一些已经淘汰过时的商品或者被一些该商品的替代品占领了市场,可以考虑是否要下架,减少相关渠道的广告等;

3、城市角度

  • 结论汇总:畅销第一级别类目依次是5、8、1,仓库管理需按畅销商品名单、分类,安排库存,对于消费旺盛B城市提前备货,节省调度;同时监控库存,防止断货。

最新文章

  1. java学习笔记2022.1.13
  2. Know more about Cache Buffer Handle
  3. 可以接收数量不定的参数的函数
  4. 添加简单的linux内核模块,操作系统实践 第12章-添加最简单的Linux内核模块.ppt
  5. 对python语言的认识_认识Python语言和基础知识
  6. linux 安装virtualbox5.2
  7. 局域网服务器的安全管理与维护,局域网组建与维护
  8. 使用ORL人脸库,通过GRNN网络和HOG特征提取的人脸识别算法matlab仿真
  9. 尝美食、看演出、听音乐…南亚风情第壹城非遗文化节继续等你玩
  10. kettle-3(linux环境调度kjb并配置定时读取)
  11. 化妆品级羧甲基纤维素钠(CMC)-市场现状及未来发展趋势
  12. 恋人/情人/性伴侣/红颜知己
  13. OPENGL简介---反走样
  14. 守护者创客~ 一个真正可以放手一搏的项目
  15. 多个西瓜视频怎么批量管理?
  16. 职高计算机专业能考大学吗,职高生不可以考大学,这是多数人的误解,网友:照样上本科!...
  17. ios逆向之frida简单教程
  18. python 离散化_python 离散化
  19. 拼多多科学计算机,人类祖先直立行走的 拼多多怎么得100元时间又提前了数百万年...
  20. 达沃斯的数字经济时间 马云:明天世界更普惠机会更均等

热门文章

  1. iOS 9人机界面指南(一):UI设计基础
  2. Linux是什么?大牛十年Linux心得文档给你答案
  3. 简历修订中,下载打开需密码
  4. 视频播放器是如何播放音视频的?
  5. 计算机 库 英文翻译,计算机专业英文翻译
  6. 民事诉讼过程中的一些笔记
  7. 电子商务网站的设计与实现(一):当前的一些购物体验和开发期望
  8. 创客学院9天C语言三
  9. 跨领域的智能云管理平台-孙立辉(云平台 CSM)
  10. 火狐浏览器安装FoxyProxy代理插件