点击上方“AI公园”，关注公众号，选择加“星标“或“置顶”

作者：Barış KaramanFollow

编译：ronghuaiyang

正文共：7593 字 8 图

预计阅读时间：22 分钟

导读

在营销活动的时候，我们并不希望把优惠发给每一个人，我们希望在给定的成本条件下，得到最大的增量，Uplift模型可以帮助我们做到这点。

前文回顾：

用机器学习来提升你的用户增长：第一步，了解你的目标

用机器学习来提升你的用户增长：第二步，客户分群

用机器学习来提升你的用户增长：第三步，预测客户的终生价值

用机器学习来提升你的用户增长：第四步，客户流失预测

用机器学习来提升你的用户增长：第五步，预测客户的下一个购买日

用机器学习来提升你的用户增长：第六步，预测销量

用机器学习来提升你的用户增长：第七步，构建市场响应模型

第八部分: Uplift模型

增长黑客最重要的工作之一就是尽可能地提高效率。首先，你需要“高效的利用时间”。这意味着你必须快速构思、试验、学习和重复。其次，你需要做到“成本高效”。它意味着在给定的预算/时间/努力下带来最大的回报。

客户细分帮助成长型黑客增加转化率，因此具有成本效益。想象一个案例，你要发起一个促销活动，你知道你想要针对哪个客户分群进行发放，但是，你需要把优惠发给每个人吗？

答案是否定的。在你当前的目标群体中，总会有顾客会购买你的产品。我们可以采用分群的方法，具体可以总结如下：

Treatment Responders：只有收到优惠才会购买的客户
Treatment Non-Responders：无论怎样都不会购买的客户
Control Responders：不需要优惠就会购买的客户
Control Non-Responders：没有收到优惠就不会购买的客户

这下就很明显了。你需要针对Treatment Responders(TR)和Control Non-Responders(CN)发放优惠。因为这些人你不发放优惠是不会购买的，这些群体会在你的促销活动中有增量。另一方面，你需要避免Treatment Non-Responders(TN)和Control Non-Responders(CR)。对TN和CN发放优惠不会让你有收益。

还有最后一件简单的事要做。我们需要确定哪些客户属于哪些类别。答案就是uplift模型。它有两个简单的步骤：

预测所有客户在每一组中的购买概率：我们将为此建立一个多分类模型。
我们会计算uplift分数，uplift分数的公式为：

我们会把TR和CN的概率加起来，并减去落入其他位置的概率。更高的分数意味着更高的uplift。

好的，让我们看一下如何用一个例子来实现它。我们将使用前一篇文章中的相同数据集：https://gist.github.com/karamanbk/ef1a118592e2f7954e5bb582e09bdde3。

我们从导入我们需要的库和函数开始：

from datetime import datetime, timedelta,date
import pandas as pd
%matplotlib inline
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from __future__ import division
from sklearn.cluster import KMeansimport plotly.plotly as py
import plotly.offline as pyoff
import plotly.graph_objs as goimport sklearn
import xgboost as xgb
from sklearn.model_selection import KFold, cross_val_score, train_test_split
import warnings
warnings.filterwarnings("ignore")#initiate plotly
pyoff.init_notebook_mode()#function to order clusters
def order_cluster(cluster_field_name, target_field_name,df,ascending):new_cluster_field_name = 'new_' + cluster_field_namedf_new = df.groupby(cluster_field_name)[target_field_name].mean().reset_index()df_new = df_new.sort_values(by=target_field_name,ascending=ascending).reset_index(drop=True)df_new['index'] = df_new.indexdf_final = pd.merge(df,df_new[[cluster_field_name,'index']], on=cluster_field_name)df_final = df_final.drop([cluster_field_name],axis=1)df_final = df_final.rename(columns={"index":cluster_field_name})return df_final#function for calculating the uplift
def calc_uplift(df):avg_order_value = 25#calculate conversions for each offer typebase_conv = df[df.offer == 'No Offer']['conversion'].mean()disc_conv = df[df.offer == 'Discount']['conversion'].mean()bogo_conv = df[df.offer == 'Buy One Get One']['conversion'].mean()#calculate conversion uplift for discount and bogodisc_conv_uplift = disc_conv - base_convbogo_conv_uplift = bogo_conv - base_conv#calculate order upliftdisc_order_uplift = disc_conv_uplift * len(df[df.offer == 'Discount']['conversion'])bogo_order_uplift = bogo_conv_uplift * len(df[df.offer == 'Buy One Get One']['conversion'])#calculate revenue upliftdisc_rev_uplift = disc_order_uplift * avg_order_valuebogo_rev_uplift = bogo_order_uplift * avg_order_valueprint('Discount Conversion Uplift: {0}%'.format(np.round(disc_conv_uplift*100,2)))print('Discount Order Uplift: {0}'.format(np.round(disc_order_uplift,2)))print('Discount Revenue Uplift: ${0}\n'.format(np.round(disc_rev_uplift,2)))if len(df[df.offer == 'Buy One Get One']['conversion']) > 0:print('-------------- \n')print('BOGO Conversion Uplift: {0}%'.format(np.round(bogo_conv_uplift*100,2)))print('BOGO Order Uplift: {0}'.format(np.round(bogo_order_uplift,2)))print('BOGO Revenue Uplift: ${0}'.format(np.round(bogo_rev_uplift,2)))

然后我们导入数据：

df_data = pd.read_csv('response_data.csv')
df_data.head(10)

你可以回想一下前一篇文章，我们有打折和买一赠一优惠的客户的数据以及他们的响应。我们还有一个对照组，他们什么优惠也没有。

列描述如下：

recency: 上次购买距离现在的月数
history: 历史购买的金额
used_discount/used_bogo: 表示用户是否使用了折扣或者买一送一
zip_code: 邮编的类型有农村/郊区/城市
is_referral: 表示用户是否通过referral获得
channel: 客户使用的渠道，电话/网站/多通道
offer: 发给用户的优惠，打折/买一送一/无优惠

在建立模型之前，我们应用calc_uplift函数，以本次活动的当前uplift为基准：

calc_uplift(df_data)

打折的转化率的uplift是7.66%，买一送一的是4.52%。

接下来，我们将开始构建我们的模型。

多分类模型来预测Uplift得分

目前，我们的标签是客户是否转换(1或0)，我们需要为TR， TN， CR，和CN创建4个类。我们知道收到折扣和bogo优惠的客户是Treatment，其他的是control。我们创建一个campaign_group列，使这个信息可见：

df_data['campaign_group'] = 'treatment'
df_data.loc[df_data.offer == 'No Offer', 'campaign_group'] = 'control'

完美，现在我们需要创建我们的新标签：

df_data['target_class'] = 0 #CN
df_data.loc[(df_data.campaign_group == 'control') & (df_data.conversion > 0),'target_class'] = 1 #CR
df_data.loc[(df_data.campaign_group == 'treatment') & (df_data.conversion == 0),'target_class'] = 2 #TN
df_data.loc[(df_data.campaign_group == 'treatment') & (df_data.conversion > 0),'target_class'] = 3 #TR

在这个例子中，类别的映射如下:

0 -> Control Non-Responders
1 -> Control Responders
2 -> Treatment Non-Responders
3 -> Treatment Responders

在训练我们的模型之前，有一个小的特征工程步骤。我们将从history列创建聚类，并应用get_dummies将类别列转换成数值：

#creating the clusters
kmeans = KMeans(n_clusters=5)
kmeans.fit(df_data[['history']])
df_data['history_cluster'] = kmeans.predict(df_data[['history']])#order the clusters
df_data = order_cluster('history_cluster', 'history',df_data,True)#creating a new dataframe as model and dropping columns that defines the label
df_model = df_data.drop(['offer','campaign_group','conversion'],axis=1)#convert categorical columns
df_model = pd.get_dummies(df_model)

我们来拟合我们的模型，得到每个类的概率：

#creating the clusters
kmeans = KMeans(n_clusters=5)
kmeans.fit(df_data[['history']])
df_data['history_cluster'] = kmeans.predict(df_data[['history']])#order the clusters
df_data = order_cluster('history_cluster', 'history',df_data,True)#creating a new dataframe as model and dropping columns that defines the label
df_model = df_data.drop(['offer','campaign_group','conversion'],axis=1)#convert categorical columns
df_model = pd.get_dummies(df_model)

变量class_probs表示每个客户的概率。让我们来看一个例子:

对于这个特定的客户，我们可以将概率映射如下：

CN: 32%
CR: 2%
TN: 58.9%
TR: 6.9%

所以该客户的uplift分数为：

0.32 + 0.069- 0.02- 0.589 = -0.22

我们把这个应用到所有的用户上，计算uplift得分：

#create feature set and labels
X = df_model.drop(['target_class'],axis=1)
y = df_model.target_class
#splitting train and test groups
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=56)
#fitting the model and predicting the probabilities
xgb_model = xgb.XGBClassifier().fit(X_train, y_train)
class_probs = xgb_model.predict_proba(X_test)

通过运行上面的代码，我们在主dataframe中添加了一个uplift_score列，如下所示：

现在是检查这个模型的最关键部分的时候了。这各模型真的有效吗？uplift模型的真实性能评价比较困难。我们将通过uplift评分分位数来检查uplift是如何变化的，看看我们是否可以在现实生活中使用该模型。

模型评估

为了评估我们的模型，我们将创建两个不同的组，并将它们与我们的基准进行比较。这两个组是：

高Uplift分数：客户的uplift分数>3/4分位数
低Uplift分数：客户的uplift分数<1/2分位数

我们会对比：

转化的uplift
每个目标用户的收入uplift，看看我们的模型是不是让我们的活动更有效了

这是打折活动的benchmark。

Total Targeted Customer Count: 21307
Discount Conversion Uplift: 7.66%
Discount Order Uplift: 1631.89
Discount Revenue Uplift: $40797.35
Revenue Uplift Per Targeted Customer: $1.91

我们构建第一个组，看看数字：

df_data_lift = df_data.copy()
uplift_q_75 = df_data_lift.uplift_score.quantile(0.75)
df_data_lift = df_data_lift[(df_data_lift.offer != 'Buy One Get One') & (df_data_lift.uplift_score > uplift_q_75)].reset_index(drop=True)
#calculate the uplift
calc_uplift(df_data_lift)results:
User Count: 5282
Discount Conversion Uplift: 12.18%
Discount Order Uplift: 643.57
Discount Revenue Uplift: $16089.36
Revenue Uplift Per Targeted Customer: $3.04

结果很好。每个目标客户的收入uplift提高了57%，我们可以很容易地看到，**25%的目标群体贡献了40%**的收入增长。

我们在低uplift分数的分组上查看同样的数据：

df_data_lift = df_data.copy()
uplift_q_5 = df_data_lift.uplift_score.quantile(0.5)
df_data_lift = df_data_lift[(df_data_lift.offer != 'Buy One Get One') & (df_data_lift.uplift_score < uplift_q_5)].reset_index(drop=True)
#calculate the uplift
calc_uplift(df_data_lift)results:
User Count: 10745
Discount Conversion Uplift: 5.63%
Discount Order Uplift: 604.62
Discount Revenue Uplift: $15115.52
Revenue Uplift Per Targeted Customer: $1.4

正如预期的那样，每个目标客户的收入增长下降到**$1.4**。另外，这个组使用**50%的目标客户贡献了37%**的收入uplift。

通过使用这个模型，我们可以很容易地使我们的活动更有效率：

根据uplift得分，针对特定的人群进行活动
根据uplift得分，尝试不同的优惠方式

在下一篇文章中，我将解释增长黑客的一个核心组件：A/B Testing，这将是我们本系列的最后一篇文章。

—END—

英文原文：https://towardsdatascience.com/uplift-modeling-e38f96b1ef60

请长按或扫描二维码关注本公众号

喜欢的话，请给我个好看吧！

用机器学习来提升你的用户增长：第八步，Uplift模型相关推荐

英文书《用unreal来学习c++》_用机器学习来提升你的用户增长：第四步，客户流失预测
作者:Barış KaramanFollow 编译:ronghuaiyang 正文共:8484 字 13 图预计阅读时间:25 分钟导读我们通过客户分群和终生价值的预测得到了我们的最好的客户,对 ...
【实战】用机器学习来提升你的用户增长：（三、预测客户的终生价值）
作者:Barış KaramanFollow 编译:ronghuaiyang 正文共:8484 字 13 图预计阅读时间:25 分钟导读前一篇文章我们对客户进行了分群,但是我们还希望对 ...
用机器学习来提升你的用户增长：第七步，构建市场响应模型
点击上方"AI公园",关注公众号,选择加"星标"或"置顶" 作者:Barış KaramanFollow 编译:ronghuaiyang 正 ...
【实战】用机器学习来提升你的用户增长
作者:Barış KaramanFollow 编译:ronghuaiyang 正文共:10130 字 24 图预计阅读时间:29 分钟导读这一系列的文章通过了一个实际的案例,向大家介绍了如何使 ...
【实战】用机器学习来提升你的用户增长（二）
作者:Barış KaramanFollow 编译:ronghuaiyang 正文共: 9230 字 18 图预计阅读时间: 27 分钟导读今天给大家介绍客户分群方面的内容,为什么要对客户进行 ...
从千团大战到网贷战国，烧钱千亿背后的底层用户增长逻辑
大虫新书:去哪儿.奇虎360.百度糯米用户增长经验复盘与总结文/黄天文(微信公众号:大虫运营心经 huadachong1986) 最近身边很多人来找我聊用户增长,但对于用户增长大家都莫衷一是,有 ...
首次公开 | 淘系技术总监马鏖谈淘系用户增长
作者|马鏖出品|阿里巴巴新零售淘系技术部导读:近年来,关于用户流量的瓶颈让很多企业感到焦虑不安,互联网用户整体增速放缓,用户规模趋于饱和.同时,竞争个体成倍增长,流量资源争夺越发激烈,流量成本日 ...
用户增长——CLV用户生命周期价值CLTV 笔记（一）
文章目录 1 概念介绍 1.1 概念介绍 1.2 关联指标 1.3 计算方式 1.4 LTV的价值 1.5 应用场景 1.5.1 宏观方面的几个场景: 1.5.2 一个细分的金融场景: 1.5.3 预 ...
如何理解全新的用户增长模型“6R模型”？和AARRR有何不同？
提到"用户增长",很多人都会想到AARRR模型.AARRR以"Acquisition(用户获取)""Activation(激活)"" ...
用户增长的基础、原理和方法论
用户增长(User Growth,UG)是指用户相关指标的增长. 用户增长的前提是你的产品是满足需求的,且与市场是匹配的,但到达用户存在阻碍.所以用户增长的主要工作就是要减少阻碍,降低交易成本(比如认 ...

用机器学习来提升你的用户增长：第八步，Uplift模型

第八部分: Uplift模型

多分类模型来预测Uplift得分

模型评估

用机器学习来提升你的用户增长：第八步，Uplift模型相关推荐

最新文章

热门文章