信用卡欺诈行为检测

It’s common to hear about fraudulent acts on e-commerce platforms. In recent years, the spotlight has been put on fraud rings/networks. They are likely to cause the strongest damage to the customers and thus the platform itself.

听到有关电子商务平台上欺诈行为的消息很常见。近年来，聚光灯环/网络已成为人们关注的焦点。它们可能会对客户造成最大的损害，从而对平台本身造成最大的损害。

Fraud investigation teams intensively rely on Supervised Machine Learning models to catch fraudulent users. Yet, one major challenge associated with these models is the lack of all likely patterns in sufficient amounts needed for training. Data on fraudulent patterns are not only scarce but also tend to continuously change, making the learning of such models difficult.

欺诈调查团队高度依赖于监督式机器学习模型来捕获欺诈用户。然而，与这些模型相关的一个主要挑战是缺乏训练所需的足够数量的所有可能模式。有关欺诈模式的数据不仅稀缺，而且还会不断变化，这使得学习此类模型非常困难。

We propose to leverage Outlier Detection models to isolate fraud rings.

我们建议利用异常值检测模型来隔离欺诈圈。

The article will cover:

本文将涉及：

Fraud Rings and why they’re dangerous欺诈戒指以及它们为何危险
Limits of Supervised Machine Learning监督机器学习的局限性
Anomaly detection applied to fraud detection异常检测应用于欺诈检测
Building and expanding on top of anomaly detection在异常检测之上构建和扩展

欺诈戒指：现代瘟疫 (Fraud Rings: a Modern Plague)

E-commerce fraud rings are organized crime groups specialized in defrauding people. Fraud rings can consist of tens, hundreds or even thousands of people. Most of the rings are devoted to specific websites. One can say they have their own preferred “playground”.

电子商务欺诈圈是专门从事欺诈活动的有组织犯罪集团。欺诈圈可能包含数十，数百甚至数千人。大多数戒指都是专门针对特定网站的。可以说他们有自己喜欢的“游乐场”。

To maximise their chances of success, fraud rings need to target as many users as possible on a given platform. They need to attack “en masse”. The formula is pretty simple. It consists of creating as many “accounts” as possible, making them look legitimate and start attacking other users. The more content they can generate, the bigger is their chances of grabbing the attention of unsuspecting users. To be efficient, they often even develop software to automate parts or all of their actions.

为了最大程度地提高其成功机会，欺诈环需要在给定平台上针对尽可能多的用户。他们需要“大规模”进攻。公式很简单。它包括创建尽可能多的“帐户”，使它们看起来合法并开始攻击其他用户。他们可以生成的内容越多，他们抓住未引起怀疑的用户的注意力的机会就越大。为了提高效率，他们甚至经常开发软件来自动化其部分或全部动作。

As an example, let’s say the fraud ring hires a pool of 3 fraudsters. These persons create a pool of 15 fake accounts on OLX. The accounts can sleep for several hours, days or weeks before they get activated. Once activated, the fake accounts can be leveraged to commit a specific set of malicious acts. The fraudsters can choose from their repertoire of techniques that they master. They can use stolen credit cards to buy items and never pay the seller. They can try to sell illicit items, such as guns or drugs. They can send spam messages to attract users on external websites. They can send phishing messages to make others reveal personal information, such as passwords, phone and credit card numbers. The pool of malicious acts is brought closer to perfection at each iteration, making them very difficult to detect.

例如，假设欺诈圈雇用了3名欺诈者。这些人在OLX上创建了一个包含15个假帐户的池。帐户在被激活之前可以Hibernate数小时，数天或数周。一旦激活，就可以利用伪造帐户实施一组特定的恶意行为。欺诈者可以从他们掌握的技术中选择。他们可以使用被盗的信用卡购买商品，而无需向卖家付款。他们可以尝试出售非法物品，例如枪支或毒品。他们可以发送垃圾邮件来吸引外部网站上的用户。他们可以发送网络钓鱼邮件，以使其他人泄露个人信息，例如密码，电话和信用卡号。恶意行为池在每次迭代时都趋于完美，从而使其很难检测。

The 4Ps of Fraud Rings — How do they operate? authors: Hagop Boghazdeklian, Yuen King Ho

Fraud rings know that websites tend to focus too often and too much on a single account at a time when building fraud measures. Hence, websites completely miss the networks of malicious accounts that may exist among their customer base. Fraudsters can then exploit this blind spot, and attack the platform at scale.

欺诈圈知道，在制定欺诈措施时，网站往往过于专注于单个帐户。因此，网站完全错过了可能存在于其客户群中的恶意帐户网络。然后，欺诈者可以利用这个盲点，并大规模攻击平台。

Fraud rings are one of the sources likely to cause the greatest harm to the business. If they succeed to defraud customers, the website has to face a pool of serious bad consequences. Angry and unsatisfied customers are likely to churn. Ultimately, this will cause monetary loss. That’s why it’s important to leverage links and connections between accounts when building ML-based fraud detection systems, to prevent the wave of attacks at scale. An extra negative side-effect is that malicious attacks may also result in data breaches requiring us to notify respective authorities.

欺诈圈是可能对企业造成最大伤害的来源之一。如果他们成功欺骗客户，该网站将面临一系列严重的严重后果。生气和不满意的顾客很可能流失。最终，这将导致金钱损失。这就是为什么在构建基于ML的欺诈检测系统时利用帐户之间的链接和连接以防止大规模攻击浪潮的重要性。另一个负面影响是，恶意攻击还可能导致数据泄露，需要我们通知相应的主管部门。

By using textual and tabular attributes, such as phone, message or email, it’s easy to connect users to each other. We can then start using Graph Analysis.

通过使用文本和表格属性，例如电话，消息或电子邮件，可以很容易地将用户彼此连接。然后，我们可以开始使用图分析。

Graph linking users into a Network, authors: Hagop Boghazdeklian, Yuen King Ho

监督机器学习的局限性 (Limits of Supervised Machine Learning)

Supervised Machine Learning models are broadly employed in detecting fraud on e-commerce platforms. They are proven to perform well for most common cases, such as Credit Card Fraud, Identity Theft or Spam. When trained with historical labelled data, in sufficient amounts, they are able to generalize well and detect learned fraud cases in newly observed data.

监督式机器学习模型广泛用于检测电子商务平台上的欺诈。事实证明，它们在大多数常见情况下(例如信用卡欺诈，身份盗用或垃圾邮件)表现良好。经过足够的历史标记数据培训后，他们能够很好地概括并发现新观察到的数据中的欺诈案例。

获取标签数据很难 (Getting labelled data is tough)

Still, gathering labelled data of fraudulent patterns is a complicated and time-consuming process. Even with the most sophisticated Fraud Investigation teams, it can take weeks to get labelled data of all ongoing fraud patterns. Simply said, labelled data is scarce.

尽管如此，收集带有欺诈性模式的标记数据仍然是一个复杂且耗时的过程。即使使用最复杂的欺诈调查团队，也可能需要花费数周的时间来获取所有正在进行的欺诈模式的标记数据。简而言之，标签数据是稀缺的。

欺诈模式一直在变化 (Fraudulent patterns change all the time)

The situation becomes worse with constantly changing behaviour. Fraudsters and particularly Fraud Networks continuously try to innovate their modus operandi to bypass the measures in place. It is not surprising to see a fraudulent pattern disappear or a new one appear from one hour to another.

随着行为的不断变化，情况变得更糟。欺诈者，尤其是欺诈网络不断尝试创新其作案手法，以绕过现有措施。从一个小时到另一个小时出现欺诈模式消失或出现新模式不足为奇。

The scarcity of labelled data and rapidly evolving behaviours make the learning process difficult and limit the effectiveness of Supervised Machine Learning models, especially to uncover new patterns. When labelled data is there it’s often too late, the damage is already done.

标记数据的稀缺性和快速发展的行为使学习过程变得困难，并限制了监督机器学习模型的有效性，尤其是发现新模式的有效性。当贴有标签的数据时，通常为时已晚，损坏已经完成。

异常检测 (Anomaly Detection)

Let’s try to flip around the problem. Instead of modelling the fraudulent behaviour, we can focus on modelling the good user’s behaviour. Let’s hypothesize that the networking behaviour of good users remains stable over time and represents the big majority of observations. Under this hypothesis, we propose to use Unsupervised Anomaly Detection models to isolate the fraudulent networks.

让我们尝试解决这个问题。代替建模欺诈行为，我们可以专注于建模良好用户的行为。让我们假设良好用户的网络行为随着时间的推移保持稳定，并且代表了大多数观察结果。在此假设下，我们建议使用无监督异常检测模型来隔离欺诈性网络。

Unsupervised Anomaly Detection, also known as Outlier Detection, consists of detecting abnormal or unusual observations. This family of models decides whether a new observation drastically deviates from the fitted norm. During training, Outlier Detection estimators try to fit the regions where the training data is the most concentrated.

无监督异常检测，也称为异常值检测，由检测异常或异常观察组成。这一系列的模型决定了新的观察是否严重偏离了拟合准则。在训练期间，离群值检测估算器尝试拟合训练数据最集中的区域。

它如何适用于欺诈圈？ (How does it apply to fraud rings?)

In our case, we are willing to learn the “normal” behaviour of goods users (inliers), based on the networking data of millions of accounts, to better isolate the “abnormal” users (outliers). Some fraudulent users are likely to be among the outliers.

在我们的例子中，我们愿意学习产品的用户( 正常值 )的基础上，数百万账户的联网数据的“正常”行为，更好地隔离的“不正常”的用户( 异常 )。一些欺诈性用户很可能在异常值中。

In theory, good users should not be part of big networks or at any network at all. Yet, it does happen that users do share links with each other over time. Because some attributes used to link them have higher collision rates than others. The possibility of collision exists. But the outlier detection algorithm can account for that. Even when certain links are found for good users, the connection between them should be low. The number of exact or similar attributes that the users in the network share should be small.

从理论上讲，好的用户不应该是大型网络的一部分，也不应该是任何网络的一部分。但是，确实发生了用户之间确实共享链接的情况。因为用于链接它们的某些属性的碰撞率高于其他属性。存在碰撞的可能性。但是异常值检测算法可以解决这个问题。即使为好用户找到某些链接，它们之间的连接也应该很低。网络中的用户共享的确切或相似属性的数量应该很小。

In the opposite case, fraudulent users will tend to share bigger amounts of similar attributes. Using Graph Theory terminology, fraud rings are likely to have the biggest number of nodes and the highest connection densities. The bigger the network size and connection density, the higher the score of being an outlier should be.

在相反的情况下，欺诈用户将倾向于共享更多的相似属性。使用图论术语，欺诈环可能具有最多的节点数和最高的连接密度。网络规模和连接密度越大，离群值应越高。

Let’s compare a normal user to a network of suspicious users. Where we have a theoretical algorithm scoring them. The normal user without any connection with others should be considered as normal, the algorithm is expected to output a low score. The users sharing phones, emails and similar messages sent should be considered as suspicious. The expectation for the model is to assign a higher score. The users/networks with high scores should then be sent to fraud investigators.

让我们将普通用户与可疑用户网络进行比较。在这里，我们有一个理论算法对它们进行评分。与其他人没有任何联系的普通用户应被视为普通用户，该算法的预期得分较低。共享电话，电子邮件和发送的类似消息的用户应视为可疑。该模型的期望是分配更高的分数。得分较高的用户/网络应随后发送给欺诈调查人员。

Comparing a normal user to suspicious network users, authors: Hagop Boghazdeklian & Yuen King Ho

异常检测模型 (Anomaly Detection Models)

There are many existing anomaly detection algorithms that can be applied to tabular data. Among the most performing and known algorithms, you can find:

现有许多可用于表格数据的异常检测算法。在性能最高且已知的算法中，您可以找到：

Isolation Forest隔离林
Variational Auto-Encoder可变自动编码器
Variational Auto-Encoding Gaussian Mixture Model变分自动编码高斯混合模型

Comparing these models requires a bit of time, it will be the subject of an article that will follow in the OLX blog.

比较这些模型需要一些时间，这将成为OLX博客中后续文章的主题。

建立在异常检测之上 (Building on top of Anomaly Detection)

Using Anomaly Detection algorithms on real-time data can help fraud investigation teams to quickly uncover new emerging fraudulent patterns. In certain cases, the precision can be high enough to go beyond moderation and establish automated decisions, such as banning or applying frictions. In OLX, when targeted on Spam detection use cases, precision exceeded 95%. Allowing to discover hundreds of malicious texting users every day.

在实时数据上使用异常检测算法可以帮助欺诈调查团队快速发现新出现的欺诈模式。在某些情况下，精度可能足够高，可以超越节制并建立自动决策，例如禁止或施加摩擦。在OLX中，当针对垃圾邮件检测用例时，精度超过95％。每天可以发现数百个恶意短信用户。

Even so, not all anomalies mean bad users:

即便如此，并非所有异常现象都意味着不良用户：

Some outliers are simply behaving in an abnormal manner. They do not have any intention of causing harm.一些异常值只是表现异常。他们无意造成伤害。
Certain behaviours among the outliers are authorised by company policies or business logics.离群值中的某些行为由公司策略或业务逻辑授权。

In a business world context, it might be necessary to complement the outlier detection models with supervised ones. The quickly uncovered patterns can be labelled by moderators in accordance with the policies, and then be used as training data for supervised learning models.

在商业环境中，可能有必要用监督模型来补充异常检测模型。主持人可以根据策略标记快速发现的模式，然后将其用作监督学习模型的训练数据。

结论 (Conclusion)

Unsupervised Machine Learning allows fraud investigation teams to uncover fraud networks. From the moment they register on the platform to when they wake-up from their incubation to attack at scale.无监督机器学习使欺诈调查团队可以发现欺诈网络。从他们在平台上注册的那一刻起，到他们从孵化中醒来大规模进攻的那一刻起。
By uncovering these networks at an early stage, it gives a significant amount of time to act upon and avoid any major harm to the customers.通过尽早发现这些网络，可以花大量时间采取行动并避免对客户造成重大伤害。
Precision can reach levels high enough (>90%) to allow automated decisions in some cases.精确度可以达到足够高的水平(> 90％)，从而在某些情况下可以进行自动决策。
It’s not surprising to see why the fraud investigation industry builds solutions around Unsupervised Machine Learning.看到欺诈调查行业为何围绕无监督机器学习构建解决方案也就不足为奇了。

[1] Novelty and Outlier Detection, https://scikit-learn.org/stable/modules/outlier_detection.html

[1]新颖性和异常值检测， https： //scikit-learn.org/stable/modules/outlier_detection.html

[2] Credit Card Fraud Detection in e-Commerce: An Outlier Detection Approach, from eBay, https://arxiv.org/abs/1811.02196

[2]电子商务中的信用卡欺诈检测：一种异常检测方法，来自eBay， https ：//arxiv.org/abs/1811.02196

The work was possible thanks to Yuen King Ho, Ara Hayrabedian, Nodari Lipartiya, Jaroslaw Szymczak and Alexey Grigorev

感谢袁敬镐，Ara Hayrabedian， Nodari Lipartiya ，Jaroslaw Szymczak和 Alexey Grigorev ，使 这项工作成为可能

翻译自: https://tech.olx.com/detecting-fraud-rings-with-unsupervised-learning-554bedf29dbf

信用卡欺诈行为检测

查看全文

http://www.taodudu.cc/news/show-6942854.html

机器学习：基于XGBoost对信用卡欺诈行为的识别
python数据分析之金融欺诈行为检测
“打拼十年，35岁一事无成”：前半生偷的懒，后半生拼命还
一个迷茫的软院学生
从一年学舞蹈想到架构师成长
如果你一生碌碌无为是因为努力不够？
一位不起眼的程序员的前半生
架构师之路（五）如何做好业务建模？
这里是碌碌无为的人
C语言与C++的区别(一)
我的未来该何去何从？双非院校计科院学嵌入式的大四小菜鸡的自我反思...
懒惰忙碌
独孤思维：自动得3w，这样赚钱颠覆你的认知
【软件测试】一线大厂的测试开发基本技能，我不再想庸庸碌碌......
4年功能庸庸碌碌，进阶自动化测试拿到了24k，测试之路不再平凡...
大学四年庸庸碌碌，我弯道超车上了软件测试
植物叶绿素测定
search（1）- elasticsearch结构概念
elastic search,又一个基于lucene的nosql好项目
【渗透测试】Lin.Security靶机渗透练习_虚拟机无法获取ip的配置方法+靶机多种提权方式练习
lucene php,用PHP调用Lucene包来实现全文检索_php
Lucene系列之全局搜索引擎入门教程
peter_lucent@hotmail.com
alcatel-lucent笔试
linux下Lucent milife isgsdk使用（1）
Alcatel-Lucent 7750 运营商认证设备在线用户数OID
lucent检索技术之创建索引：使用POI读取txt/word/excel/ppt/pdf内容
通信计算机方面面试题,通信与计算机类求职面试题--Lucent篇
MySQL数据库语言之创建表CREATE TABLE结构及用法
mysql怎么查看自己建的表_mysql怎么查看已建的表

信用卡欺诈行为检测_在无监督学习的情况下检测欺诈行为相关推荐

无监督学习中的目标检测
无监督学习是当今计算机视觉领域最困难的挑战之一.这项任务在人工智能和新兴技术中有着巨大的实用价值,因为可以用相对较低的成本收集大量未标注的视频. ------ 01 概述 ------ 今天,我们介绍 ...
空间中的语义直线检测_基于语义分割的车道线检测算法研究
龙源期刊网 http://www.qikan.com.cn 基于语义分割的车道线检测算法研究作者:张道芳张儒良来源:<科技创新与应用> 2019 年第 06 期摘 ; 要:随着半自 ...
undo表空间文件丢失恢复（4）--无备份无recover的情况下恢复
undo表空间的数据文件丢失,如果没有备份的情况下,而且丢失的undo文件可以置为offline状态后(注意是offline不是recover状态),则可以如下恢复,下边给出一个例子. undo表空间 ...
微信小程序页面搜索框查询（无后台接口情况下）
微信小程序页面搜索框查询(无后台接口情况下) 效果图: wxml <view class="container"><view class="goodsl ...
无屏幕有线情况下笔记本电脑连接树莓派
无屏幕有线情况下笔记本电脑连接树莓派获取树莓派ip 使用SSH远程登录树莓派远程VNC桌面服务访问首先我们需要将网线接入到树莓派中,如下: 获取树莓派ip 1. 假如有屏幕的话可以通过在树莓派L ...
无root权限情况下安装vim以及插件
无root权限情况下安装vim以及插件一.概述二.vim安装 2.1 ncurses下载安装 2.2下载vim并安装三. vim-plug插件管理 3.1 vim-plug插件下载 3.2 插件 ...
如何使用nexus3在无外网情况下（如云桌面）配置npm私库
Nexus3配置npm私服有三个类型: host:本地存储,提供本地私服功能,可上传 proxy:提供代理其他仓库的类型 group: 组类型,组合多个仓库为一个地址,最终开发人员使用镜像源为该gro ...
如何在无网络的情况下给Dbeaver安装数据库驱动
如何在无网络的情况下给Dbeaver安装数据库驱动如何在无网络的情况下给Dbeaver安装数据库驱动-以安装Oracle驱动为例参考:https://www.cnblogs.com/levi125 ...
垃圾邮件检测_如何在您的电子邮件中检测垃圾邮件
垃圾邮件检测 Nowadays, the SPAM coming into your mailbox is disguised forms of any type of trying to look ...

信用卡欺诈行为检测_在无监督学习的情况下检测欺诈行为