10_行销（Marketing）里客户流失

行销（Marketing）里客户流失

Load the packages
Load the data
Data Analysis & Preparation
Train & Test Sets

客户流失是指客户决定停止使用公司的服务，内容或产品。当我们讨论客户分析时，保留现有客户的成本要比获取新客户便宜得多，而且回头客的收入通常要比新客户高。在竞争激烈的行业中，企业面对许多竞争对手，因此获得新客户的成本甚至更高，因此保留现有客户对于此类企业而言变得越来越重要。客户离开公司有很多原因。客户流失的一些常见原因是不良的客户服务，无法在产品或服务中找到足够的价值，缺乏沟通和缺乏客户忠诚度。保留这些客户的第一步是监视一段时间内的客户流失率。如果客户流失率通常很高或随着时间的流逝而增加，那么最好花一些资源来改善客户保留率。

为了提高客户保留率，当务之急是更好地了解客户。我们可以调查已经流失的客户，以了解他们为什么离开。我们还可以调查现有客户，以了解他们的需求和痛苦点。例如，我们可以查看客户的网络活动数据，并了解他们在哪里花费最多的时间，他们正在查看的页面上是否有错误，或者他们的搜索结果是否未返回良好的内容。我们还可以查看客户服务呼叫日志，以了解他们的等待时间长短，他们的投诉是什么以及如何处理他们的问题。对这些数据点进行深入分析可以揭示企业在保留现有客户方面面临的问题。

在本文中，我们来建立一个机器学习模型，该模型可以预测哪些客户可能流失，并锁定并留住这些较高流失风险的特定客户。我们会使用神经网络模型。人工神经网络（ANN）模型是一种机器学习模型，受人脑功能的启发。 ANN模型最近在图像识别，语音识别和机器人技术方面的成功应用证明了其在各种行业中的预测能力和实用性。您可能已经听说过“深度学习”一词。这是一种ANN模型，其中输入和输出层之间的层数很大。

此图显示了具有一个隐藏层的ANN模型的简单情况。此图中的圆圈表示人工神经元或节点，它们模拟人脑中的这些神经元。箭头表示信号如何从一个神经元传输到另一个神经元。如该图所示，ANN模型通过查找从每个输入神经元到下一层神经元的信号的模式或权重进行学习，从而最好地预测了输出。

下面我们还是用Kaggle数据集 WA_Fn-UseC_-Telco-Customer-Churn.csv 。然后我们用keras来构建一个神经网络。

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to loadimport numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directoryimport os
for dirname, _, filenames in os.walk('/kaggle/input'):for filename in filenames:print(os.path.join(dirname, filename))# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv

Load the packages

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import roc_curve, auc
%matplotlib inline

Load the data

df = pd.read_csv('../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv')

df.head(3)

	customerID	gender	Partner	Dependents	tenure	PhoneService	MultipleLines	InternetService	OnlineSecurity	...	DeviceProtection	TechSupport	StreamingTV	StreamingMovies	Contract	PaperlessBilling	PaymentMethod	MonthlyCharges	TotalCharges	Churn
0	7590-VHVEG	Female	Yes	No	1	No	No phone service	DSL	No	...	No	No	No	No	Month-to-month	Yes	Electronic check	29.85	29.85	No
1	5575-GNVDE	Male	No	No	34	Yes	No	DSL	Yes	...	Yes	No	No	No	One year	No	Mailed check	56.95	1889.5	No
2	3668-QPYBK	Male	No	No	2	Yes	No	DSL	Yes	...	No	No	No	No	Month-to-month	Yes	Mailed check	53.85	108.15	Yes

3 rows ?? 21 columns

df.shape

(7043, 21)

Data Analysis & Preparation

Encoding target var: Churn

df['Churn'] = df['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)
df.Churn.mean()

0.2653698707936959

Create TotalCharges

df['TotalCharges'] = df['TotalCharges'].replace(' ', np.nan).astype(float)

df = df.dropna()

Create Continuous Vars

df[['tenure', 'MonthlyCharges', 'TotalCharges']].describe()

	tenure	MonthlyCharges	TotalCharges
count	7032.000000	7032.000000	7032.000000
mean	32.421786	64.798208	2283.300441
std	24.545260	30.085974	2266.771362
min	1.000000	18.250000	18.800000
25%	9.000000	35.587500	401.450000
50%	29.000000	70.350000	1397.475000
75%	55.000000	89.862500	3794.737500
max	72.000000	118.750000	8684.800000

Normalize the variable

df['MonthlyCharges'] = np.log(df['MonthlyCharges'])
df['MonthlyCharges'] = (df['MonthlyCharges'] - df['MonthlyCharges'].mean())/df['MonthlyCharges'].std()df['TotalCharges'] = np.log(df['TotalCharges'])
df['TotalCharges'] = (df['TotalCharges'] - df['TotalCharges'].mean())/df['TotalCharges'].std()df['tenure'] = (df['tenure'] - df['tenure'].mean())/df['tenure'].std()

df[['tenure', 'MonthlyCharges', 'TotalCharges']].describe()

	tenure	MonthlyCharges	TotalCharges
count	7.032000e+03	7.032000e+03	7.032000e+03
mean	-1.028756e-16	4.688495e-14	7.150708e-15
std	1.000000e+00	1.000000e+00	1.000000e+00
min	-1.280157e+00	-1.882268e+00	-2.579056e+00
25%	-9.542285e-01	-7.583727e-01	-6.080585e-01
50%	-1.394072e-01	3.885103e-01	1.950521e-01
75%	9.198605e-01	8.004829e-01	8.382338e-01
max	1.612459e+00	1.269576e+00	1.371323e+00

continuous_vars = list(df.describe().columns)
continuous_vars

['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges', 'Churn']

One-Hot Encoding

for col in list(df.columns):print(col, df[col].nunique())

customerID 7032
gender 2
SeniorCitizen 2
Partner 2
Dependents 2
tenure 72
PhoneService 2
MultipleLines 3
InternetService 3
OnlineSecurity 3
OnlineBackup 3
DeviceProtection 3
TechSupport 3
StreamingTV 3
StreamingMovies 3
Contract 3
PaperlessBilling 2
PaymentMethod 4
MonthlyCharges 1584
TotalCharges 6530
Churn 2

df.groupby('gender').count()['customerID'].plot(kind='bar', color='skyblue', grid=True, figsize=(8,6), title='Gender'
)
plt.show()df.groupby('InternetService').count()['customerID'].plot(kind='bar', color='skyblue', grid=True, figsize=(8,6), title='Internet Service'
)
plt.show()df.groupby('PaymentMethod').count()['customerID'].plot(kind='bar', color='skyblue', grid=True, figsize=(8,6), title='Payment Method'
)
plt.show()

dummy_cols = []sample_set = df[['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn']].copy(deep=True)for col in list(df.columns):if col not in ['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn'] and df[col].nunique() < 5:dummy_vars = pd.get_dummies(df[col])dummy_vars.columns = [col+str(x) for x in dummy_vars.columns]        sample_set = pd.concat([sample_set, dummy_vars], axis=1)

sample_set.head()

	tenure	MonthlyCharges	TotalCharges	Churn	genderFemale	genderMale	SeniorCitizen0	PartnerNo	PartnerYes	...	ContractMonth-to-month	ContractOne year	PaperlessBillingNo	PaperlessBillingYes	PaymentMethodBank transfer (automatic)	PaymentMethodElectronic check	PaymentMethodMailed check
0	-1.280157	-1.054244	-2.281382	0	1	0	1	0	1	...	1	0	0	1	0	1	0
1	0.064298	0.032896	0.389269	0	0	1	1	1	0	...	0	1	1	0	0	0	1
2	-1.239416	-0.061298	-1.452520	1	0	1	1	1	0	...	1	0	0	1	0	0	1
3	0.512450	-0.467578	0.372439	0	0	1	1	1	0	...	0	1	1	0	1	0	0
4	-1.239416	0.396862	-1.234860	1	1	0	1	1	0	...	1	0	0	1	0	1	0

5 rows ?? 47 columns

list(sample_set.columns)

['tenure','MonthlyCharges','TotalCharges','Churn','genderFemale','genderMale','SeniorCitizen0','SeniorCitizen1','PartnerNo','PartnerYes','DependentsNo','DependentsYes','PhoneServiceNo','PhoneServiceYes','MultipleLinesNo','MultipleLinesNo phone service','MultipleLinesYes','InternetServiceDSL','InternetServiceFiber optic','InternetServiceNo','OnlineSecurityNo','OnlineSecurityNo internet service','OnlineSecurityYes','OnlineBackupNo','OnlineBackupNo internet service','OnlineBackupYes','DeviceProtectionNo','DeviceProtectionNo internet service','DeviceProtectionYes','TechSupportNo','TechSupportNo internet service','TechSupportYes','StreamingTVNo','StreamingTVNo internet service','StreamingTVYes','StreamingMoviesNo','StreamingMoviesNo internet service','StreamingMoviesYes','ContractMonth-to-month','ContractOne year','ContractTwo year','PaperlessBillingNo','PaperlessBillingYes','PaymentMethodBank transfer (automatic)','PaymentMethodCredit card (automatic)','PaymentMethodElectronic check','PaymentMethodMailed check']

Train & Test Sets

target_var = 'Churn'
features = [x for x in list(sample_set.columns) if x != target_var]

model = Sequential()
model.add(Dense(16, input_dim=len(features), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

???input_dim???16???relu???8???relu???Sigmoid???

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

X_train, X_test, y_train, y_test = train_test_split(sample_set[features], sample_set[target_var], test_size=0.3
)

model.fit(X_train, y_train, epochs=50, batch_size=100)

Epoch 1/50
4922/4922 [==============================] - 0s 73us/step - loss: 0.6871 - accuracy: 0.5638
Epoch 2/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.5409 - accuracy: 0.7314
Epoch 3/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.5034 - accuracy: 0.7322
Epoch 4/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4717 - accuracy: 0.7452
Epoch 5/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4404 - accuracy: 0.7926
Epoch 6/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4225 - accuracy: 0.8037
Epoch 7/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4150 - accuracy: 0.8066
Epoch 8/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4113 - accuracy: 0.8070
Epoch 9/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.4083 - accuracy: 0.8098
Epoch 10/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4063 - accuracy: 0.8090
Epoch 11/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4052 - accuracy: 0.8111
Epoch 12/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4037 - accuracy: 0.8090
Epoch 13/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4030 - accuracy: 0.8119
Epoch 14/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4021 - accuracy: 0.8127
Epoch 15/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4014 - accuracy: 0.8108
Epoch 16/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4009 - accuracy: 0.8104
Epoch 17/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4003 - accuracy: 0.8125
Epoch 18/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.4002 - accuracy: 0.8147
Epoch 19/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3987 - accuracy: 0.8133
Epoch 20/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3982 - accuracy: 0.8139
Epoch 21/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3979 - accuracy: 0.8155
Epoch 22/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3976 - accuracy: 0.8137
Epoch 23/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3974 - accuracy: 0.8139
Epoch 24/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3971 - accuracy: 0.8129
Epoch 25/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3969 - accuracy: 0.8143
Epoch 26/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3970 - accuracy: 0.8135
Epoch 27/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3963 - accuracy: 0.8123
Epoch 28/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3959 - accuracy: 0.8141
Epoch 29/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3952 - accuracy: 0.8149
Epoch 30/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3948 - accuracy: 0.8153
Epoch 31/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3954 - accuracy: 0.8153
Epoch 32/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3948 - accuracy: 0.8163
Epoch 33/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3944 - accuracy: 0.8159
Epoch 34/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3940 - accuracy: 0.8169
Epoch 35/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3941 - accuracy: 0.8178
Epoch 36/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3938 - accuracy: 0.8161
Epoch 37/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3936 - accuracy: 0.8151
Epoch 38/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3929 - accuracy: 0.8147
Epoch 39/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3927 - accuracy: 0.8169
Epoch 40/50
4922/4922 [==============================] - 0s 14us/step - loss: 0.3930 - accuracy: 0.8155
Epoch 41/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3922 - accuracy: 0.8169
Epoch 42/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3925 - accuracy: 0.8178
Epoch 43/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3921 - accuracy: 0.8155
Epoch 44/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3915 - accuracy: 0.8182
Epoch 45/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3911 - accuracy: 0.8163
Epoch 46/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3912 - accuracy: 0.8159
Epoch 47/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3909 - accuracy: 0.8178
Epoch 48/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3909 - accuracy: 0.8169
Epoch 49/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3910 - accuracy: 0.8174
Epoch 50/50
4922/4922 [==============================] - 0s 13us/step - loss: 0.3901 - accuracy: 0.8190<keras.callbacks.callbacks.History at 0x7f9437d2e990>

Accuracy, Precision, Recall

in_sample_preds = [round(x[0]) for x in model.predict(X_train)]
out_sample_preds = [round(x[0]) for x in model.predict(X_test)]

print('In-Sample Accuracy: %0.4f' % accuracy_score(y_train, in_sample_preds))
print('Out-of-Sample Accuracy: %0.4f' % accuracy_score(y_test, out_sample_preds))print('\n')print('In-Sample Precision: %0.4f' % precision_score(y_train, in_sample_preds))
print('Out-of-Sample Precision: %0.4f' % precision_score(y_test, out_sample_preds))print('\n')print('In-Sample Recall: %0.4f' % recall_score(y_train, in_sample_preds))
print('Out-of-Sample Recall: %0.4f' % recall_score(y_test, out_sample_preds))

In-Sample Accuracy: 0.8171
Out-of-Sample Accuracy: 0.7991In-Sample Precision: 0.6946
Out-of-Sample Precision: 0.6440In-Sample Recall: 0.5660
Out-of-Sample Recall: 0.5154

ROC & AUC

in_sample_preds = [x[0] for x in model.predict(X_train)]
out_sample_preds = [x[0] for x in model.predict(X_test)]
in_sample_fpr, in_sample_tpr, in_sample_thresholds = roc_curve(y_train, in_sample_preds)
out_sample_fpr, out_sample_tpr, out_sample_thresholds = roc_curve(y_test, out_sample_preds)

in_sample_roc_auc = auc(in_sample_fpr, in_sample_tpr)
out_sample_roc_auc = auc(out_sample_fpr, out_sample_tpr)print('In-Sample AUC: %0.4f' % in_sample_roc_auc)
print('Out-Sample AUC: %0.4f' % out_sample_roc_auc)

In-Sample AUC: 0.8691
Out-Sample AUC: 0.8314

plt.figure(figsize=(10,7))plt.plot(out_sample_fpr, out_sample_tpr, color='darkorange', label='Out-Sample ROC curve (area = %0.4f)' % in_sample_roc_auc
)
plt.plot(in_sample_fpr, in_sample_tpr, color='navy', label='In-Sample ROC curve (area = %0.4f)' % out_sample_roc_auc
)
plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--')
plt.grid()
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")plt.show()

EOD