行销(Marketing)里客户流失

  • Load the packages
  • Load the data
  • Data Analysis & Preparation
  • Train & Test Sets

客户流失是指客户决定停止使用公司的服务,内容或产品。当我们讨论客户分析时,保留现有客户的成本要比获取新客户便宜得多,而且回头客的收入通常要比新客户高。在竞争激烈的行业中,企业面对许多竞争对手,因此获得新客户的成本甚至更高,因此保留现有客户对于此类企业而言变得越来越重要。客户离开公司有很多原因。客户流失的一些常见原因是不良的客户服务,无法在产品或服务中找到足够的价值,缺乏沟通和缺乏客户忠诚度。保留这些客户的第一步是监视一段时间内的客户流失率。如果客户流失率通常很高或随着时间的流逝而增加,那么最好花一些资源来改善客户保留率。

为了提高客户保留率,当务之急是更好地了解客户。我们可以调查已经流失的客户,以了解他们为什么离开。我们还可以调查现有客户,以了解他们的需求和痛苦点。例如,我们可以查看客户的网络活动数据,并了解他们在哪里花费最多的时间,他们正在查看的页面上是否有错误,或者他们的搜索结果是否未返回良好的内容。我们还可以查看客户服务呼叫日志,以了解他们的等待时间长短,他们的投诉是什么以及如何处理他们的问题。对这些数据点进行深入分析可以揭示企业在保留现有客户方面面临的问题。

在本文中, 我们来建立一个机器学习模型,该模型可以预测哪些客户可能流失,并锁定并留住这些较高流失风险的特定客户。我们会使用神经网络模型。人工神经网络(ANN)模型是一种机器学习模型,受人脑功能的启发。 ANN模型最近在图像识别,语音识别和机器人技术方面的成功应用证明了其在各种行业中的预测能力和实用性。您可能已经听说过“深度学习”一词。这是一种ANN模型,其中输入和输出层之间的层数很大。


此图显示了具有一个隐藏层的ANN模型的简单情况。此图中的圆圈表示人工神经元或节点,它们模拟人脑中的这些神经元。箭头表示信号如何从一个神经元传输到另一个神经元。如该图所示,ANN模型通过查找从每个输入神经元到下一层神经元的信号的模式或权重进行学习,从而最好地预测了输出。

下面我们还是用Kaggle数据集 WA_Fn-UseC_-Telco-Customer-Churn.csv 。然后我们用keras来构建一个神经网络。

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to loadimport numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directoryimport os
for dirname, _, filenames in os.walk('/kaggle/input'):for filename in filenames:print(os.path.join(dirname, filename))# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
/kaggle/input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv

Load the packages

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import roc_curve, auc
%matplotlib inline

Load the data

df = pd.read_csv('../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.head(3)
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity ... DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 7590-VHVEG Female 0 Yes No 1 No No phone service DSL No ... No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes ... Yes No No No One year No Mailed check 56.95 1889.5 No
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes ... No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes

3 rows ?? 21 columns

df.shape
(7043, 21)

Data Analysis & Preparation

Encoding target var: Churn

df['Churn'] = df['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)
df.Churn.mean()
0.2653698707936959

Create TotalCharges

df['TotalCharges'] = df['TotalCharges'].replace(' ', np.nan).astype(float)
df = df.dropna()

Create Continuous Vars

df[['tenure', 'MonthlyCharges', 'TotalCharges']].describe()
tenure MonthlyCharges TotalCharges
count 7032.000000 7032.000000 7032.000000
mean 32.421786 64.798208 2283.300441
std 24.545260 30.085974 2266.771362
min 1.000000 18.250000 18.800000
25% 9.000000 35.587500 401.450000
50% 29.000000 70.350000 1397.475000
75% 55.000000 89.862500 3794.737500
max 72.000000 118.750000 8684.800000

Normalize the variable

df['MonthlyCharges'] = np.log(df['MonthlyCharges'])
df['MonthlyCharges'] = (df['MonthlyCharges'] - df['MonthlyCharges'].mean())/df['MonthlyCharges'].std()df['TotalCharges'] = np.log(df['TotalCharges'])
df['TotalCharges'] = (df['TotalCharges'] - df['TotalCharges'].mean())/df['TotalCharges'].std()df['tenure'] = (df['tenure'] - df['tenure'].mean())/df['tenure'].std()
df[['tenure', 'MonthlyCharges', 'TotalCharges']].describe()
tenure MonthlyCharges TotalCharges
count 7.032000e+03 7.032000e+03 7.032000e+03
mean -1.028756e-16 4.688495e-14 7.150708e-15
std 1.000000e+00 1.000000e+00 1.000000e+00
min -1.280157e+00 -1.882268e+00 -2.579056e+00
25% -9.542285e-01 -7.583727e-01 -6.080585e-01
50% -1.394072e-01 3.885103e-01 1.950521e-01
75% 9.198605e-01 8.004829e-01 8.382338e-01
max 1.612459e+00 1.269576e+00 1.371323e+00
continuous_vars = list(df.describe().columns)
continuous_vars
['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges', 'Churn']

One-Hot Encoding

for col in list(df.columns):print(col, df[col].nunique())
customerID 7032
gender 2
SeniorCitizen 2
Partner 2
Dependents 2
tenure 72
PhoneService 2
MultipleLines 3
InternetService 3
OnlineSecurity 3
OnlineBackup 3
DeviceProtection 3
TechSupport 3
StreamingTV 3
StreamingMovies 3
Contract 3
PaperlessBilling 2
PaymentMethod 4
MonthlyCharges 1584
TotalCharges 6530
Churn 2
df.groupby('gender').count()['customerID'].plot(kind='bar', color='skyblue', grid=True, figsize=(8,6), title='Gender'
)
plt.show()df.groupby('InternetService').count()['customerID'].plot(kind='bar', color='skyblue', grid=True, figsize=(8,6), title='Internet Service'
)
plt.show()df.groupby('PaymentMethod').count()['customerID'].plot(kind='bar', color='skyblue', grid=True, figsize=(8,6), title='Payment Method'
)
plt.show()

dummy_cols = []sample_set = df[['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn']].copy(deep=True)for col in list(df.columns):if col not in ['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn'] and df[col].nunique() < 5:dummy_vars = pd.get_dummies(df[col])dummy_vars.columns = [col+str(x) for x in dummy_vars.columns]        sample_set = pd.concat([sample_set, dummy_vars], axis=1)
sample_set.head()
tenure MonthlyCharges TotalCharges Churn genderFemale genderMale SeniorCitizen0 SeniorCitizen1 PartnerNo PartnerYes ... StreamingMoviesYes ContractMonth-to-month ContractOne year ContractTwo year PaperlessBillingNo PaperlessBillingYes PaymentMethodBank transfer (automatic) PaymentMethodCredit card (automatic) PaymentMethodElectronic check PaymentMethodMailed check
0 -1.280157 -1.054244 -2.281382 0 1 0 1 0 0 1 ... 0 1 0 0 0 1 0 0 1 0
1 0.064298 0.032896 0.389269 0 0 1 1 0 1 0 ... 0 0 1 0 1 0 0 0 0 1
2 -1.239416 -0.061298 -1.452520 1 0 1 1 0 1 0 ... 0 1 0 0 0 1 0 0 0 1
3 0.512450 -0.467578 0.372439 0 0 1 1 0 1 0 ... 0 0 1 0 1 0 1 0 0 0
4 -1.239416 0.396862 -1.234860 1 1 0 1 0 1 0 ... 0 1 0 0 0 1 0 0 1 0

5 rows ?? 47 columns

list(sample_set.columns)
['tenure','MonthlyCharges','TotalCharges','Churn','genderFemale','genderMale','SeniorCitizen0','SeniorCitizen1','PartnerNo','PartnerYes','DependentsNo','DependentsYes','PhoneServiceNo','PhoneServiceYes','MultipleLinesNo','MultipleLinesNo phone service','MultipleLinesYes','InternetServiceDSL','InternetServiceFiber optic','InternetServiceNo','OnlineSecurityNo','OnlineSecurityNo internet service','OnlineSecurityYes','OnlineBackupNo','OnlineBackupNo internet service','OnlineBackupYes','DeviceProtectionNo','DeviceProtectionNo internet service','DeviceProtectionYes','TechSupportNo','TechSupportNo internet service','TechSupportYes','StreamingTVNo','StreamingTVNo internet service','StreamingTVYes','StreamingMoviesNo','StreamingMoviesNo internet service','StreamingMoviesYes','ContractMonth-to-month','ContractOne year','ContractTwo year','PaperlessBillingNo','PaperlessBillingYes','PaymentMethodBank transfer (automatic)','PaymentMethodCredit card (automatic)','PaymentMethodElectronic check','PaymentMethodMailed check']

Train & Test Sets

target_var = 'Churn'
features = [x for x in list(sample_set.columns) if x != target_var]
model = Sequential()
model.add(Dense(16, input_dim=len(features), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

???input_dim???16???relu???8???relu???Sigmoid???

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
X_train, X_test, y_train, y_test = train_test_split(sample_set[features], sample_set[target_var], test_size=0.3
)
model.fit(X_train, y_train, epochs=50, batch_size=100)
Epoch 1/50
4922/4922 [==============================] - 0s 73us/step - loss: 0.6871 - accuracy: 0.5638
Epoch 2/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.5409 - accuracy: 0.7314
Epoch 3/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.5034 - accuracy: 0.7322
Epoch 4/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4717 - accuracy: 0.7452
Epoch 5/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4404 - accuracy: 0.7926
Epoch 6/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4225 - accuracy: 0.8037
Epoch 7/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4150 - accuracy: 0.8066
Epoch 8/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4113 - accuracy: 0.8070
Epoch 9/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.4083 - accuracy: 0.8098
Epoch 10/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4063 - accuracy: 0.8090
Epoch 11/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4052 - accuracy: 0.8111
Epoch 12/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4037 - accuracy: 0.8090
Epoch 13/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4030 - accuracy: 0.8119
Epoch 14/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4021 - accuracy: 0.8127
Epoch 15/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4014 - accuracy: 0.8108
Epoch 16/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4009 - accuracy: 0.8104
Epoch 17/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4003 - accuracy: 0.8125
Epoch 18/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4002 - accuracy: 0.8147
Epoch 19/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3987 - accuracy: 0.8133
Epoch 20/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3982 - accuracy: 0.8139
Epoch 21/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3979 - accuracy: 0.8155
Epoch 22/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3976 - accuracy: 0.8137
Epoch 23/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3974 - accuracy: 0.8139
Epoch 24/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3971 - accuracy: 0.8129
Epoch 25/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3969 - accuracy: 0.8143
Epoch 26/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3970 - accuracy: 0.8135
Epoch 27/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3963 - accuracy: 0.8123
Epoch 28/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3959 - accuracy: 0.8141
Epoch 29/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3952 - accuracy: 0.8149
Epoch 30/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3948 - accuracy: 0.8153
Epoch 31/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3954 - accuracy: 0.8153
Epoch 32/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3948 - accuracy: 0.8163
Epoch 33/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3944 - accuracy: 0.8159
Epoch 34/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3940 - accuracy: 0.8169
Epoch 35/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3941 - accuracy: 0.8178
Epoch 36/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3938 - accuracy: 0.8161
Epoch 37/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3936 - accuracy: 0.8151
Epoch 38/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3929 - accuracy: 0.8147
Epoch 39/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3927 - accuracy: 0.8169
Epoch 40/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3930 - accuracy: 0.8155
Epoch 41/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3922 - accuracy: 0.8169
Epoch 42/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3925 - accuracy: 0.8178
Epoch 43/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3921 - accuracy: 0.8155
Epoch 44/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3915 - accuracy: 0.8182
Epoch 45/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3911 - accuracy: 0.8163
Epoch 46/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3912 - accuracy: 0.8159
Epoch 47/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3909 - accuracy: 0.8178
Epoch 48/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3909 - accuracy: 0.8169
Epoch 49/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3910 - accuracy: 0.8174
Epoch 50/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3901 - accuracy: 0.8190<keras.callbacks.callbacks.History at 0x7f9437d2e990>

Accuracy, Precision, Recall

in_sample_preds = [round(x[0]) for x in model.predict(X_train)]
out_sample_preds = [round(x[0]) for x in model.predict(X_test)]
print('In-Sample Accuracy: %0.4f' % accuracy_score(y_train, in_sample_preds))
print('Out-of-Sample Accuracy: %0.4f' % accuracy_score(y_test, out_sample_preds))print('\n')print('In-Sample Precision: %0.4f' % precision_score(y_train, in_sample_preds))
print('Out-of-Sample Precision: %0.4f' % precision_score(y_test, out_sample_preds))print('\n')print('In-Sample Recall: %0.4f' % recall_score(y_train, in_sample_preds))
print('Out-of-Sample Recall: %0.4f' % recall_score(y_test, out_sample_preds))
In-Sample Accuracy: 0.8171
Out-of-Sample Accuracy: 0.7991In-Sample Precision: 0.6946
Out-of-Sample Precision: 0.6440In-Sample Recall: 0.5660
Out-of-Sample Recall: 0.5154

ROC & AUC

in_sample_preds = [x[0] for x in model.predict(X_train)]
out_sample_preds = [x[0] for x in model.predict(X_test)]
in_sample_fpr, in_sample_tpr, in_sample_thresholds = roc_curve(y_train, in_sample_preds)
out_sample_fpr, out_sample_tpr, out_sample_thresholds = roc_curve(y_test, out_sample_preds)
in_sample_roc_auc = auc(in_sample_fpr, in_sample_tpr)
out_sample_roc_auc = auc(out_sample_fpr, out_sample_tpr)print('In-Sample AUC: %0.4f' % in_sample_roc_auc)
print('Out-Sample AUC: %0.4f' % out_sample_roc_auc)
In-Sample AUC: 0.8691
Out-Sample AUC: 0.8314
plt.figure(figsize=(10,7))plt.plot(out_sample_fpr, out_sample_tpr, color='darkorange', label='Out-Sample ROC curve (area = %0.4f)' % in_sample_roc_auc
)
plt.plot(in_sample_fpr, in_sample_tpr, color='navy', label='In-Sample ROC curve (area = %0.4f)' % out_sample_roc_auc
)
plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--')
plt.grid()
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")plt.show()

EOD

10_行销(Marketing)里客户流失相关推荐

  1. 06_ 行销(Marketing)客户分析:了解客户的行为

    行销(Marketing)客户分析:了解客户的行为 Load the packages Load the packages Analytics on Engaged Customers Custome ...

  2. 英文书《用unreal来学习c++》_用机器学习来提升你的用户增长:第四步,客户流失预测

    作者:Barış KaramanFollow 编译:ronghuaiyang 正文共:8484 字 13 图 预计阅读时间:25 分钟 导读 我们通过客户分群和终生价值的预测得到了我们的最好的客户,对 ...

  3. 机器学习 客户流失_通过机器学习预测流失

    机器学习 客户流失 介绍 (Introduction) This article is part of a project for Udacity "Become a Data Scient ...

  4. [转载] 使用神经网络和ml模型预测客户流失

    参考链接: Keras中的深度学习模型-探索性数据分析(EDA) This story is a walk-through of a notebook I uploaded on Kaggle. Or ...

  5. python之客户流失预警

    背景 客户流失率问题是电信运营商面临的一项重要的业务.根据测算,招揽新的客户比保留住既有客户花费大得多(通常5-20倍的差距).因此,如何保留住现在的客户对运营商而言是一项非常有意义的事情. 数据字段 ...

  6. 客户流失预测_如何不预测和防止客户流失

    客户流失预测 Customers are any business' bread and butter. Whether it is a B2B business model or B2C, ever ...

  7. python数据分析案例-利用生存分析Kaplan-Meier法与COX比例风险回归模型进行客户流失分析与剩余价值预测

    目录 1. 概述 1.1 背景 1.2 目的 1.3 数据说明 2. 相关概念 2.1 事件 2.2 生存时间 2.3 删失 2.4 生存概率 2.5 中位生存时间 2.6 风险概率 3. 数据处理 ...

  8. SaaS企业如何降低客户流失率?

    续费率是SaaS企业的生命线,如何把续费率/续约率持续提升是所有SaaS公司一同面临的挑战,随着疫情的到来,企业支出缩紧,裁员等现象对本就不理想的续费指标更是雪上加霜,以客户为中心几乎是所有人挂在嘴边 ...

  9. 逻辑回归模型——股票客户流失预警模型

    1.读取数据 import pandas as pd df = pd.read_excel('股票客户流失.xlsx') df.head() 2.划分特征变量和目标变量 X = df.drop(['是 ...

最新文章

  1. 平衡二叉排序树的创建和实现调整过程
  2. Android Message解析
  3. LSTM神经网络Demystifying LSTM neural networks
  4. vue-cli 基本原理
  5. sql2000执行sql2005导出的数据脚本时出现“提示含有超过64K限度的行”(转)
  6. 【正在完善】高级CSS特效解析其示范案例
  7. Android上SQLite的性能优化问题
  8. 数据库技术与应用习题2
  9. 超链接 qq群一键添加
  10. ALS算法介绍(协同过滤算法介绍)
  11. 可汗学院公开课——统计学学习:12-34
  12. bugku-writeup-MISC-宽带信息泄露
  13. 硬盘柱面损坏怎么办_电脑硬盘坏了怎么办?不花一分钱就能成功修复!
  14. gromacs 中关于二级结构分析
  15. mysql和oceanbase区别,OceanBase基本概念
  16. crc16 c语言 非查表,CRC16CCITT(1021)的16字表长查表程序
  17. 抖音服务器维护中发不了视频,抖音视频发不出去怎么回事
  18. java for 代表什么意思_java中的for是什么意思
  19. 公共厕所女性如厕难,厕所革命刻不容缓
  20. 有什么软件可以免费压缩图片?

热门文章

  1. next_permutation函数与perv_permutation函数
  2. Server SQL 2008 练习
  3. Windows程序crash该怎么分析?
  4. Excel函数--SUM计算累计销量
  5. springcloud项目读取不到application.properties
  6. [重点] 字典处理(工具)
  7. C和C++实战教程专栏完整目录
  8. 印尼央行批准外国游客使用移动支付
  9. WebRTC之P2P
  10. linux系统查看usb转串口驱动,Linux usb转串口驱动