基于大数据分析技术实现对信用卡盗刷的预防

代码和数据下载: 数据及代码源码下载

摘要

近年来信用卡已成为我国社会公众使用最广泛的非现金支付工具,然而在带给人们生活便利的同时,被盗刷的现象也在增多。好在随着信息技术的不断发展,近年来金融行业的数据存储已基本实现全数字化,积累了庞大的数据量，为大数据技术的应用提供先天条件。随着大数据技术应用方面的不断探索,通过数据分析实现审计功能,“让数据说话”,极大的提高了审计的质量和效率。

本文将通过数据分析，对银行消费进行分析，判断用户的信用卡的消费是否非本人所为即是否存在盗刷行为，我们将采用二分类——逻辑回归算法对海量数据进行训练并创建相关数据模型，进行预测是否属于信用卡盗刷现象，从而预防此类现象的发生。数据集训练前我们将对数据的部分属性进行标准化处理以便减少数据差异化。

关键词：数据分析、二分类、逻辑回归算法、标准化

Based on big data analysis technology to realize the

prevention of credit card theft

Author: 一只飞翔的章鱼

Abstract

In recent years, credit card has become the most widely used non cash payment tool for the public in our country. However, while bringing convenience to people's lives, the phenomenon of stolen credit card is also increasing. Fortunately, with the continuous development of information technology, in recent years, the data storage in the financial industry has basically realized full digitalization, accumulated a huge amount of data, and provided congenital conditions for the application of big data technology. With the continuous exploration of the application of big data technology, the audit function is realized through data analysis, which greatly improves the quality and efficiency of audit.

In this paper, through data analysis, bank consumption will be analyzed to determine whether the user's credit card consumption is not what he or she does, that is, whether there is theft. We will use the two classification - logical regression algorithm to train the massive data and create the relevant data model to predict whether it belongs to the phenomenon of credit card theft, so as to prevent the occurrence of such phenomenon. Before data set training, we will standardize some attributes of data to reduce data differentiation.

符号	符号含义
data	存放creditcard.csv中的全部数据
c、y	存放class类别即是否是盗刷，1是，0否
cond_0	没有被盗刷的数据集位置
cond_1	被盗刷的数据集位置
data2	存放经过去除data里非必要属性后的数据
X	存放自变量即特征类
X_train	存放特征类的训练集数据
X_test	存放特征类的测试集数据
y_train	类别的训练集数据
y_test	类别的测试集数据

基于大数据分析技术实现对信用卡盗刷的预防

基于大数据分析技术实现对信用卡盗刷的预防相关推荐

最新文章

热门文章