成像数据更好的展示

by Jerin Paul

杰林·保罗(Jerin Paul)

如何使用数据科学更好地了解您的客户 (How to use Data Science to better understand your customers)

How much prominence do customers hold in your business layout? Well, this was a rhetorical question. We all know that the majority of businesses thrive only because of their customers. Therefore it is imperative that you understand your customers well before serving them. Knowing your customers helps you provide tailored services. This results in enhanced customer engagement and increased sales.

客户在您的业务布局中占有多大的知名度? 好吧,这是一个反问。 我们都知道,大多数企业之所以蓬勃发展,仅是因为他们的客户。 因此,在为客户提供服务之前,必须对客户有所了解。 了解您的客户可以帮助您提供量身定制的服务。 这样可以增强客户参与度并增加销售量。

Do you know your customers? Well, this question is very vague. If you are not able to answer this with certain qualitative aspects of your customers, then you need to get to work now. I am sure that all business owners have an image of their ideal customer in their minds, regardless of how obscure it may be. Often this image is fabricated from intuition. It may not be supported by any tautological evidence.

你认识你的顾客吗? 好吧,这个问题很模糊。 如果您无法从客户的某些定性方面回答这个问题,那么您需要立即开始工作。 我确信所有企业主心中都有一个理想客户的形象,无论它有多晦涩。 通常,此图像是凭直觉制造的。 它可能没有任何重言式证据的支持。

Data never lies. It is nothing more than a collection of facts and figures, and at times it can show us a mirror. This article will explain how to use the “magic” of data science to gain a coherent understanding of your customers. Precisely, we will learn how to apply a clustering algorithm to this mall customer dataset. We will then draw inferences from the output to gain a better understanding of the customers that frequent the mall. Thank you for bearing with such a lengthy prelude, and you get the project source code for your patience.

数据永不说谎。 它不过是事实和数据的集合,有时可以向我们展示一面镜子。 本文将解释如何使用数据科学的“魔术”来获得对客户的连贯理解。 准确地讲,我们将学习如何将聚类算法应用于该购物中心客户数据集 。 然后,我们将从输出中得出推论,以更好地了解逛商场的顾客。 感谢您长期的前奏,您将获得项目源代码,以耐心等待。

什么是客户分类? (What is customer bucketing?)

Customer segmentation or customer bucketing is the practice of dividing a company’s customers into groups (a.k.a. buckets) that reflect similarity among customers in each group. The goal of segmenting customers is to decide how to relate to customers in each segment in order to maximize the value of each customer to the business.

客户细分或客户分组是一种将公司客户分为几组(又称“存储桶”)的做法,这些组反映了各组客户之间的相似性。 细分客户的目标是决定如何与每个细分中的客户建立联系,以使每个客户对企业的价值最大化。

Bucketing customers enables you to cater to each customer group in a way that can maximise your sales. For marketers, segmenting your target customers allows you to shape your communications in a way that causes maximum impact.

客户群化可以使您以最大的销售额迎合每个客户群。 对于营销人员来说,细分目标客户可以使您以能够产生最大影响的方式来塑造您的传播方式。

In this project, we will use cluster analysis to segment customers into clusters based on their annual income. For this, we will use Kmeans, which is one of the finest clustering algorithms out there. K-means clustering is an unsupervised learning algorithm which finds groups in data. The number of groups is represented by the letter K.

在此项目中,我们将使用聚类分析将客户根据其年收入划分为聚类。 为此,我们将使用Kmeans,它是目前最好的聚类算法之一。 K均值聚类是一种无监督的学习算法,可在数据中查找组。 组数用字母K表示。

让我们开始吧。 (Let’s get started.)

Please feel free to follow along. The dataset can be downloaded here.

请随时跟随。 数据集可在此处下载。

偷看数据。 (Peeking at the Data.)

The mall customer dataset is a relatively small dataset as it contains only 199 rows and 5 columns. If you glance at the picture below this paragraph, then you will notice that these five columns titles are CustomerID, Genre, Age, Annual Income (k$), and Spending Score (1–100).

购物中心客户数据集是一个相对较小的数据集,因为它仅包含199行和5列。 如果您看一下本段下面的图片,则您会注意到这五个列的标题是CustomerID,流派,年龄,年收入(k $)和支出得分(1-100)。

We will get started by importing the necessary libraries.

我们将通过导入必要的库开始。

import pandas as pd
import numpy as np
from sklearn.cluster import KMeansimport matplotlib.pyplot as plt
plt.rc(“font”, size=14)

Now, we will import the dataset.

现在,我们将导入数据集。

data_path = "Mall_Customers.csv"
df = pd.read_csv(data_path)

Maybe it’s just me, but I find a few column headers unsettling for some reason. Let’s dive in and change those.

也许只是我一个人,但是由于某种原因,我发现一些列标题令人不安。 让我们潜入并更改它们。

df.rename(columns={'Genre':'Gender','Annual Income (k$)':'Annual_Income','Spending Score (1-100)':'Spending_Score'}, inplace=True)

In this project, we will be clustering the customers using their annual income and their spending score (between 1 and 100). Therefore we will only be using those two columns.

在这个项目中,我们将使用他们的年收入和他们的支出得分(在1到100之间)对客户进行聚类。 因此,我们将仅使用这两列。

X = df.iloc[:, [3, 4]].values

Now that we are all set on the data front, it is time to start with our clustering. Before we run our clustering algorithm, it is imperative to determine the number of clusters to divide our customers into. There are a few different methods of determining the ideal number of clusters for this dataset. For that, we will be using the elbow method.

现在我们都位于数据方面,是时候开始我们的集群了。 在运行聚类算法之前,必须确定将客户划分为的聚类数量。 有几种不同的方法可以确定此数据集的理想聚类数。 为此,我们将使用弯头方法。

肘法 (Elbow method)

One method to figure out the number of clusters is by using the elbow method. This method involves running the K-means clustering algorithm on the data for different values of K and calculating the Sum of Squared Errors (S.S.E.) for each value of K.

计算簇数的一种方法是使用弯头方法。 此方法涉及针对不同K值在数据上运行K-均值聚类算法,并为每个K值计算平方误差总和(SSE)。

Then, these values are plotted on a graph, and we can see that S.S.E. tends to decrease as the value of K increases. S.S.E. becomes 0 when the values of K is equal to the number of data points, because then each data point is its own cluster. Our aim is to find a point with a small value of K and that has a low S.S.E.

然后,将这些值绘制在图形上,我们可以看到SSE倾向于随着K值的增加而降低。 当K的值等于数据点的数量时,SSE变为0,因为每个数据点都是自己的簇。 我们的目标是找到K值较小且SSE较低的点

In this experiment, we will run K-means for different values of K in the range of 0 to 10 and store the S.S.E. in a list called distortions.

在本实验中,我们将对0到10范围内的K的不同值运行K均值,并将SSE存储在称为“失真”的列表中。

distortions = []
K = range(1, 10)
for k in K:kmeansModel = KMeans(n_clusters = k, init = 'k-means++',    random_state = 23)kmeansModel.fit(X)distortions.append(kmeansModel.inertia_)plt.plot(K, distortions)
plt.title("The Elbow Method")
plt.xlabel("Number of Clusters")
plt.ylabel("S.S.E.")
plt.show()

Now, let us have a look at the graph.

现在,让我们看一下图表。

In this graph, you can observe that the S.S.E. steeply drops after every iteration of K. You can also observe that after K reaches 5, it is one downhill slope from there. So, 5 seems to be an optimum value for K, and this means that we will be dividing the customers into 5 clusters.

在此图中,您可以观察到,每次K迭代后,SSE都会急剧下降。您还可以观察到,当K达到5时,它是一个从那里开始的下坡。 因此,5似乎是K的最佳值,这意味着我们将客户划分为5个集群。

Now that we have figured out the number of clusters we can go ahead and create these clusters.

现在,我们已经确定了集群的数量,可以继续创建这些集群。

kmeansModel = KMeans(n_clusters = 5, init = 'k-means++', random_state = 23)
Y = kmeansModel.fit_predict(X)

Since the dataset was small, all these processes take no time to finish. Once the clusters are created, we can plot them on a graph. Each cluster point is marked using a different sign, and the centroids of each cluster are marked using solid red dots.

由于数据集很小,因此所有这些过程都无需花费时间即可完成。 创建聚类后,我们可以将它们绘制在图形上。 每个聚类点使用不同的符号标记,每个聚类的质心使用实心红色点标记。

Just looking at the graph tells us about the five different types of customers that frequent the mall. If we were to name them, then they could be named as follows:i. Low income, High spenders (Red).ii. Low income, Low Spenders (Blue).iii. Average income, average expenditure(Orange).iv. High income, High spenders, and(Green)v. High income, Low spenders(Purple).

仅查看图表即可了解到经常逛商场的五种不同类型的客户。 如果我们要命名它们,则可以将它们命名如下:i。 低收入,高消费(红色)。ii。 低收入,低支出(蓝色)。iii。 平均收入,平均支出(橙色)。iv。 高收入,高消费者和(绿色)v。 高收入,低消费(紫色)。

Members of each of these groups would have more features common with each other, and therefore we have a homogeneous group. People of each of these clusters may have similar needs and desires. By keeping that in mind all marketing /sales activities can accommodate these needs and desires to attract more such customers. For example, a weekly discount sale that caters to the low-income group or reward points for purchases which will cater to the high spenders, turning them into regular customers. The possibilities are limitless and are only bounded by our imagination.

这些组中的每个成员将具有彼此共有的更多功能,因此我们有一个同类的组。 这些集群中的每个集群的人们可能都有相似的需求和欲望。 牢记这一点,所有的营销/销售活动都可以适应这些需求和欲望,以吸引更多的此类客户。 例如,针对低收入人群的每周折扣销售或针对高消费人群的购物奖励积分,将其转变为常规客户。 可能性是无限的,仅受我们的想象力限制。

结论 (Conclusion)

Understanding a business’s customer base is of utmost importance. One of the ways to gain deeper insight into customer behaviour is by segmenting them into different buckets based on their behaviour (income and expenditure in this experiment.) Similar people tend to behave similarly, and this is the crux of customer segmentation. Therefore, by planning all the sales and marketing activities around these buckets, it would promise a higher return on investment and enjoyable customer experience.

了解企业的​​客户基础至关重要。 深入了解客户行为的一种方法是根据客户的行为(本实验中的收入和支出)将客户划分为不同的存储桶。相似的人倾向于表现相似,这就是客户细分的症结所在。 因此,通过计划围绕这些存储桶的所有销售和营销活动,可以保证更高的投资回报率和愉快的客户体验。

翻译自: https://www.freecodecamp.org/news/using-data-science-to-better-understand-your-customers-part-1-of-2-398d11049785/

成像数据更好的展示

成像数据更好的展示_如何使用数据科学更好地了解您的客户相关推荐

  1. 成像数据更好的展示_为什么更多的数据并不总是更好

    成像数据更好的展示 Over the past few years, there has been a growing consensus that the more data one has, th ...

  2. 拼接大屏数据展示_大屏数据可视化设计注意事项

    大屏数据可视化是以大屏为主要展示载体的数据可视化设计.大屏的特点,使得在用户观感上留下独特的印象,同时,大屏所具备储存更大的信息量,对于大屏企业来说重点主要在于将信息全面的显示在屏幕上,关注于画质的清 ...

  3. 大数据实验报告总结体会_建设大数据中台架构思考与总结

    简介 本文介绍完善的大数据中台架构了解这些架构里每个部分的位置,功能和含义及背后原理及应用场景. 帮助技术与产品经理对大数据技术体系有个全面的了解. 数据中台定义:集成离线数仓与实时数仓,并以多数据源 ...

  4. 海量大数据大屏分析展示一步到位:DataWorks数据服务对接DataV最佳实践

    概述 数据服务(https://ds-cn-shanghai.data.a... 是DataWorks产品家族的一员,提供了快速将数据表生成API的能力,通过可视化的向导,一分钟"零代码&q ...

  5. 海量大数据大屏分析展示一步到位:DataWorks数据服务+MaxCompute Lightning对接DataV最佳实践...

    概述 数据服务(https://ds-cn-shanghai.data.a... 是DataWorks产品家族的一员,提供了快速将数据表生成API的能力,通过可视化的向导,一分钟"零代码&q ...

  6. 海量大数据大屏分析展示一步到位:DataWorks数据服务对接DataV最佳实践 1

    为什么80%的码农都做不了架构师?>>>    1. 概述 数据服务(https://ds-cn-shanghai.data.aliyun.com)  是DataWorks产品家族的 ...

  7. 大数据营销案例沃尔玛_实现大数据营销的方式有哪些

    精准营销简单地说就是利用现代化的信息技术手段来实现个性化营销的活动,需要建立在精准定位和分析基础之上,运营商精准营销可以从以下几个方面进行. 首先,大数据精准营销要解决的首要问题是数据整合汇聚.运营商 ...

  8. 数据带宽 总线带宽区别_如何减少数据量(和带宽)流服务的使用

    数据带宽 总线带宽区别 Streaming services are so commonplace that we often give little thought to how much data ...

  9. 大数据数据科学家常用面试题_面试有关数据科学,数据理解和准备的问答

    大数据数据科学家常用面试题 问题1:在数据科学术语中,您如何称呼所分析的数据? (Q1: In the data science terminology, how do you call the da ...

最新文章

  1. 【Netty】Netty 核心组件 ( ChannelOption | EventLoopGroup )
  2. 100M独享带宽香港服务器有多快
  3. 教大家python读取一行一行文件内容的方法
  4. Python 管道与特征联合
  5. tr69 GatewayInfo 节点添加
  6. 【渝粤教育】电大中专学前儿童科学教育 (14)作业 题库
  7. 【Sentry】为什么Cloudera要创建Hadoop安全组件Sentry?
  8. VI中的多行删除与复制
  9. 解决发http get请求的时候不成功,出现android.os.NetworkOnMainThreadException的异常
  10. 从0开始学习 GitHub 系列之「向GitHub 提交代码」
  11. 我与腾讯战斗的那些岁月
  12. 计算机在线考试摘要,基于WEB的网络在线考试系统-毕业论文中文摘要题目(可编辑).doc...
  13. Navicat连接Mysql方法教程
  14. 高精度定位赋能行业创新,Petal Maps Platform 创新地图平台能力
  15. java小易——Servlet轻量级服务
  16. 记账APP:小哈记账2——注册页面的制作
  17. Code for fun. Aimed nothing.
  18. Android实训课程之三 这次课老师先是提到了Android有四大组件:Activity、Service、Broadcast Receiver、Content Provider。然后重点提到Acti
  19. libfranka---joint_impendence_control例程分析
  20. 全球与中国无人机数据处理软件市场深度研究分析报告

热门文章

  1. 网络攻防|CVE-2021-42287、CVE-2021-42278域内提权
  2. 搜狗浏览器在高速模式下,右键点击才会出现“审查元素”
  3. 日常积累6:提取并拟合图片中的曲线
  4. python matplotlib绘制等高线、等值线图
  5. Windows 11 WHQL认证的必要性
  6. Android--智能图像识别(基于百度智能云)
  7. linux设计 实现一个模拟的文件系统,模拟Linux文件系统.doc
  8. 用AI变革新药研发,终极目标是延缓衰老,这家创业公司迎来“里程碑”
  9. HTML绘制齿轮,HTML5模拟齿轮动画代码实例
  10. 小程序如何开发商城系统