
Recency, Frequency, & Monetary (RFM) is one of the techniques that can be used for customer segmentation and is one of the conventional ways for segmentation that been used for a long time.


  1. Recency refers to when the customer did the most recent transaction using our product


  2. Frequency refers to how often customers do transactions using our product

    频率是指客户使用我们的产品进行交易的 频率

  3. Monetary Value refers to how much does a customer spend in our product


RFM method is straightforward; we only have to transform our data (usually in the shape of transactional data) into data frame consists with three variables Recent Transaction, Transaction Frequency, and Transaction Amount (Monetary Values).

RFM方法很简单; 我们只需要将我们的数据(通常以交易数据的形式)转换为包含三个变量“ 最近交易”,“交易频率”和“ 交易金额”(货币值)的数据框

Transactional data itself is the data which records or captures every transaction that been done by customers. Typically, transactional data consists of transaction time, transaction location, how much amount our customers spend, which merchant the deal took place, and every detail that can be recorded at the moment transactions were made.

交易数据本身就是记录或捕获客户完成的每笔交易的数据。 通常,交易数据包括交易时间,交易地点,我们的客户花费多少,交易发生在哪个商人以及在进行交易时可以记录的每个详细信息。

Let see our transactional dataset that later will be used as our study case. Our dataset is a 2016 credit card transactional data from every customer. Transactions dataset consist of 24 features which recorded during every transaction our customers made. Even if we have many features on our dataset; we will not use all of them and only use small numbers of features which can be transformed into Recency, Frequency, and Monetary Values instead.

让我们看看我们的交易数据集,以后将其用作我们的研究案例。 我们的数据集是来自每个客户的2016年信用卡交易数据。 交易数据集包含24个特征,这些特征记录在客户进行的每次交易中。 即使我们的数据集上有很多功能, 我们将不会全部使用它们,而只会使用少量可以转换为新近度,频率货币价值的功能

Link to dataset:

链接到数据集: https : //

Fig-1. Transactional Data Features
图。1。 交易数据功能

If we return to our description of RFM features; we only have to keep customerId, transactionDate, and transactionAmount to create Recency, Frequency, and Transaction Amount features in the new data frame that grouped by customerId features.

如果我们返回对RFM功能的描述; 我们只需保留customerId,transactionDatetransactionAmount即可在按customerId功能分组的新数据框中创建新近度,频率交易金额功能。

For the Recency feature, we can subtract the current date with the maximum value of transactionDate (latest transaction). Since our dataset only contains 2016 transactional data, we will set 1st January 2017 as our current date.

对于新近度功能,我们可以用transactionDate(最新交易)的最大值减去当前日期。 由于我们的数据集仅包含2016年交易数据,因此我们将2017年1月1日设置为当前日期。

For the Frequency feature, we count how many transactions were made for every customer using n() function in R.


for the Transaction Amount feature, we calculate the summation of transactionAmount for every customer.


Import and Transform Transactional Data to RFM Data
Fig-2. First six rows of RFM Dataset
图2。 RFM数据集的前六行

Now we have three main feature for the RFM segmentation. It is similar to any other data analytical case, the first step that we have to do is exploring our dataset, and in this case, we will check every feature distribution using histogram plot using hist() function in R.

现在,我们为RFM细分提供了三个主要功能。 类似于任何其他数据分析案例,我们要做的第一步是探索数据集,在这种情况下,我们将使用R中的hist()函数使用直方图来检查每个特征分布。

Fig-3 Histogram of RFM Features Data
图3 RFM特征数据直方图

Our RFM dataset is so right-skewed, and it will be a catastrophic problem in K-Means clustering method since this method using the distance between points as one of its calculation to determine which cluster is the points fitted the most. Log transformation can be used to handle this kind of skewed data, and since we have 0 (zero values) in the data, we will use log(n + 1) to transform our data instead of the ordinary log transformation.

我们的RFM数据集偏右,这在K-Means聚类方法中将是一个灾难性的问题,因为该方法使用点之间的距离作为其计算之一来确定哪个聚类是最适合的点。 可以使用对数转换来处理这种偏斜的数据,并且由于数据中包含0 (零值),因此我们将使用log(n + 1)来转换数据,而不是普通的对数转换。

Log-Transformation and Histogram Plot
Fig-4. Histogram of RFM Features Data — Logarithmic Scale
图4。 RFM功能数据的直方图-对数刻度

Logarithmic transformation provides better data for K-Means method to calculate and find the best cluster for our data by getting rid much of skewed data in our RFM dataset.


K均值聚类 (K-Means Clustering)

K-Means clustering method by definition is a type of unsupervised learning which been used for defining the unlabeled data into groups based on its similarity.


In R, K-Means clustering can be quickly done using kmeans() function. But, we have to find the number of clusters before creating the K-Means model. There are so many ways to find the best number of groups to assign, one of them is by using our business sense and assign the number directly, or we also can use mathematical sense to calculate the similarity between each point.

在R中,可以使用kmeans()函数快速完成K-Means聚类。 但是,在创建K-Means模型之前,我们必须先找到簇的数量。 有很多方法可以找到要分配的最佳组数,其中一种方法是使用我们的业务意识并直接分配数量,或者我们也可以使用数学方法来计算每个点之间的相似度。

On this example, we will use the within-cluster sum of squares that measures the variability of the observations within each cluster. We will iteratively calculate the within-cluster sum of squares for every cluster in range of 1 to 10 and choose the group with the lowest value and no further significant changes in value for its next cluster, or often we called it as the Elbow Method.

在此示例中,我们将使用集群内平方和来衡量每个集群内观测值的变异性。 我们将迭代计算范围在1到10之间的每个群集的群集内平方和,并选择值最低且其下一个群集的值没有进一步显着变化的组,或者我们通常将其称为Elbow方法

Elbow Method in R
Fig-5 Dataset Elbow Method Visualization (N = 4)
图5数据集肘方法可视化(N = 4)

Using the elbow method, we will assign four groups as our number of clusters. Using kmeans() function in R we only need to put cluster number in centers parameter and assign the clustering results into our dataset.

使用弯头法,我们将四个组分配为簇数。 在R中使用kmeans()函数,我们只需要将聚类数放在center参数中,并将聚类结果分配到我们的数据集中即可。

K-Means Model and Segment Summary
Fig-6. Dataset after Segment Addition
图6。 段添加后的数据集

We now have assigned every Customer ID into their groups in the segment feature. For the next step, we will check the basic RFM profile from every segment by grouping the average value of RFM features based on its segment number.

现在,我们已经在细分功能中将每个客户ID分配到了他们的组中。 下一步,我们将根据分段的编号将RFM功能的平均值分组,从而检查每个分段的基本RFM配置文件。

Fig-7. RFM Summary per Segment
图7。 每个细分的RFM摘要

So, we have four groups and let’s discuss the detail for every group:


  1. Segment-1 (Silver): Middle-class customer with second-most considerable transactions frequency and spending amount.


  2. Segment-2 (Gold): Most valuable customers who have the most significant spending amount and the one who make transactions the most


  3. Segment-3 (Bronze): Commoner customer with low transactions frequency and low spending amount. But, this segment has the largest number of the customer.

    第3部分(铜牌):交易频率低且支出金额低的普通客户。 但是,该细分市场拥有最多的客户。

  4. Segment-4 (Inactive): Inactive/less-active customers whom latest transactions had done in more than a month ago. This segment has the lowest number of customer, transaction frequency, and transaction amount.

    分类4(不活跃):不活跃/不活跃的客户,其一个多月前进行了最新交易。 该细分市场的客户数量,交易频率和交易金额最低。

Now, we have four groups of customer with detailed RFM behaviour from each group. Usually this information can be used for arrange marketing strategy that well-targeted to the customers who share similar behaviour. Recency, Frequency, and Monetary Values segmentation is simple but useful for knowing your customer better and aiming an efficient and optimum marketing strategy.

现在,我们有四个客户群,每个客户群都有详细的RFM行为。 通常,此信息可用于安排针对具有相似行为的客户的目标明确的营销策略。 新近度,频率货币价值 细分很简单,但有助于更好地了解您的客户并制定有效和最佳的营销策略。




