@ From:数据也在这里
https://www.kaggle.com/morui1/credit-card-dataset-svm-classification/edit

@ Authors:
LE BORGNE Pierre-Alexis :https://www.kaggle.com/pierra
GUILLAUME Florian : https://www.kaggle.com/florianguillaume

写在前面

文中几个链接:
混淆矩阵
皮尔逊相关系数
热图
主成分分析(PCA)

Bgein

数据描述

creditcard.csv
行:284807
列:31
列:Time、V1-V28、Amount、Class
Class==0:non-fraud 非欺诈
Class==1:fraud 欺诈

# imprting librairies
import pandas as pd
import numpy as np# Scikit-learn library: For SVM
# Scikit学习库:用于支持向量机
from sklearn import preprocessing   # For Pretreatment  预处理
from sklearn.metrics import confusion_matrix  # 计算混淆矩阵以评估分类的准确性
from sklearn import svmimport itertools# Matplotlib library to plot the charts
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab# Library for the statistic data vizualisation
# 统计数据可视化库
import seaborn%matplotlib inline

Data Recuperation/数据读取

data = pd.read_csv('../inputs/creditcard.csv')  # Reading file
df = pd.DataFrame(data)
df.shape
(284807, 31)

Data Visulization/数据可视化

df.describe()
# Description of statistic features (Sum, Average, Variance, minimum,
#              1st quartile, 2nd quartile, 3rd Quartile and Maximum)
#统计特征描述(总和、平均值、方差、最小值、
#           第一个四分位数、第二个四分位数、第三个四分位数和最大值)
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
count 284807.000000 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 ... 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 284807.000000 284807.000000
mean 94813.859575 3.918649e-15 5.682686e-16 -8.761736e-15 2.811118e-15 -1.552103e-15 2.040130e-15 -1.698953e-15 -1.893285e-16 -3.147640e-15 ... 1.473120e-16 8.042109e-16 5.282512e-16 4.456271e-15 1.426896e-15 1.701640e-15 -3.662252e-16 -1.217809e-16 88.349619 0.001727
std 47488.145955 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00 1.380247e+00 1.332271e+00 1.237094e+00 1.194353e+00 1.098632e+00 ... 7.345240e-01 7.257016e-01 6.244603e-01 6.056471e-01 5.212781e-01 4.822270e-01 4.036325e-01 3.300833e-01 250.120109 0.041527
min 0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00 -1.137433e+02 -2.616051e+01 -4.355724e+01 -7.321672e+01 -1.343407e+01 ... -3.483038e+01 -1.093314e+01 -4.480774e+01 -2.836627e+00 -1.029540e+01 -2.604551e+00 -2.256568e+01 -1.543008e+01 0.000000 0.000000
25% 54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01 -6.915971e-01 -7.682956e-01 -5.540759e-01 -2.086297e-01 -6.430976e-01 ... -2.283949e-01 -5.423504e-01 -1.618463e-01 -3.545861e-01 -3.171451e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02 5.600000 0.000000
50% 84692.000000 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02 -5.433583e-02 -2.741871e-01 4.010308e-02 2.235804e-02 -5.142873e-02 ... -2.945017e-02 6.781943e-03 -1.119293e-02 4.097606e-02 1.659350e-02 -5.213911e-02 1.342146e-03 1.124383e-02 22.000000 0.000000
75% 139320.500000 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01 6.119264e-01 3.985649e-01 5.704361e-01 3.273459e-01 5.971390e-01 ... 1.863772e-01 5.285536e-01 1.476421e-01 4.395266e-01 3.507156e-01 2.409522e-01 9.104512e-02 7.827995e-02 77.165000 0.000000
max 172792.000000 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01 3.480167e+01 7.330163e+01 1.205895e+02 2.000721e+01 1.559499e+01 ... 2.720284e+01 1.050309e+01 2.252841e+01 4.584549e+00 7.519589e+00 3.517346e+00 3.161220e+01 3.384781e+01 25691.160000 1.000000

8 rows × 31 columns

df_fraud = df[df['Class'] == 1 ] # Recovery of fraud data - 欺诈数据的恢复
plt.figure(figsize=(15,10))
# Display fraud amounts according to their time
# 根据时间显示欺诈金额
plt.scatter(df_fraud['Time'],df_fraud["Amount"])
plt.title('Scratter plot amount fraud',fontsize=25)
plt.xlabel('Time',fontsize=20)
plt.ylabel('Amount',fontsize=20)
plt.xlim([0,175000])
plt.ylim([0,2500])
plt.show()


We notive, first of all, the time doesn’t impact the frequency of frauds. Moreover, the majority of frauds are small amounts.
首先,我们认为时间不会影响欺诈的频率。此外,大多数欺诈都是小额的。

# Recovery of frauds over 1000
nb_big_fraud = df_fraud[df_fraud['Amount']>1000].shape[0]
print('There are only ' + str(nb_big_fraud) + ' frauds where the amount was bigger than 1000 over ' + str(df_fraud.shape[0]) + ' frauds.')print('在 ' + str(nb_big_fraud) + ' 次欺诈中,只有 ' + str(df_fraud.shape[0]) + ' 次欺诈的金额大于 1000。')
There are only 9 frauds where the amount was bigger than 1000 over 492 frauds.
在 9 次欺诈中,只有 492 次欺诈的金额大于 1000。

Unbalanced data

number_fraud = len(data[data.Class == 1])
number_no_fraud = len(data[data.Class == 0])
print('There are only ' + str(number_fraud) + ' frauds in the original dataset, even though there are ' +str(number_no_fraud) + ' not frauds in the dataset.')
print('原始数据集中只有 ' + str(number_fraud) +' 起欺诈,尽管数据集中有 ' + str(number_no_fraud) + ' 起非欺诈。')
There are only 492 frauds in the original dataset, even though there are 284315 not frauds in the dataset.
原始数据集中只有 492 起欺诈,尽管数据集中有 284315 起非欺诈。

This dataset is unbalanced which means using the data as it is might result in unwanted behaviour from a supervised classifier. To make it easy to understand if a classifier were to train with this data set trying to achieve the best accuracy possible it would most likely label every transaction as a non-fraud.
此数据集是不平衡的,这意味着按原样使用数据可能会导致受监督分类器出现不必要的行为。为了便于理解分类器是否要使用此数据集进行训练,以尽可能达到最佳精度,它很可能将每个交易标记为非欺诈.

print("The accuracy of the classifier then would be: " + str((number_no_fraud-number_fraud)/number_no_fraud) + " which is the number of good classification over the number of tuple to cassify.")print('分类器的准确度将是:'+ str((number_no_fraud-number_fraud)/number_no_fraud) +' ,这是良好分类数与要分类的元组数之比。')
The accuracy of the classifier then would be: 0.998269524998681 which is the number of good classification over the number of tuple to cassify.
分类器的准确度将是:0.998269524998681 ,这是良好分类数与要分类的元组数之比。

To answer this problem we could use the oversampling principle or the undersampling principle The undersampling principle should be used only if we can be sure that the selected few tuples (in this case non-fraud) are representative of the whole non-fraud transactions of the dataset.
为了回答这个问题,我们可以使用过采样原则或欠采样原则。只有当我们能够确保选定的几个元组(在本例中为非欺诈元组)代表数据集的整个非欺诈事务时,才应使用欠采样原则。

Correlation of features/特征相关性

# Calculation of the correlation coefficients in pairs,
# with the default method: Pearson, Standard Correlation Coefficient
# 使用默认方法成对计算相关系数:
# 皮尔逊标准相关系数
df_corr = df.corr()
df_corr.head()
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
Time 1.000000 1.173963e-01 -1.059333e-02 -4.196182e-01 -1.052602e-01 1.730721e-01 -6.301647e-02 8.471437e-02 -3.694943e-02 -8.660434e-03 ... 4.473573e-02 1.440591e-01 5.114236e-02 -1.618187e-02 -2.330828e-01 -4.140710e-02 -5.134591e-03 -9.412688e-03 -0.010596 -0.012323
V1 0.117396 1.000000e+00 4.135835e-16 -1.227819e-15 -9.215150e-16 1.812612e-17 -6.506567e-16 -1.005191e-15 -2.433822e-16 -1.513678e-16 ... -2.457409e-16 -4.290944e-16 6.168652e-16 -4.425156e-17 -9.605737e-16 -1.581290e-17 1.198124e-16 2.083082e-15 -0.227709 -0.101347
V2 -0.010593 4.135835e-16 1.000000e+00 3.243764e-16 -1.121065e-15 5.157519e-16 2.787346e-16 2.055934e-16 -5.377041e-17 1.978488e-17 ... -8.480447e-17 1.526333e-16 1.634231e-16 1.247925e-17 -4.478846e-16 2.057310e-16 -4.966953e-16 -5.093836e-16 -0.531409 0.091289
V3 -0.419618 -1.227819e-15 3.243764e-16 1.000000e+00 4.711293e-16 -6.539009e-17 1.627627e-15 4.895305e-16 -1.268779e-15 5.568367e-16 ... 5.706192e-17 -1.133902e-15 -4.983035e-16 2.686834e-19 -1.104734e-15 -1.238062e-16 1.045747e-15 9.775546e-16 -0.210880 -0.192961
V4 -0.105260 -9.215150e-16 -1.121065e-15 4.711293e-16 1.000000e+00 -1.719944e-15 -7.491959e-16 -4.104503e-16 5.697192e-16 6.923247e-16 ... -1.949553e-16 -6.276051e-17 9.164206e-17 1.584638e-16 6.070716e-16 -4.247268e-16 3.977061e-17 -2.761403e-18 0.098732 0.133447

5 rows × 31 columns

# 绘制热图
plt.figure(figsize=(15,10))
seaborn.heatmap(df_corr,cmap='YlGnBu') # Display the Heatmap
seaborn.set(font_scale=2,style='white')plt.title('Hearmap correlation',fontsize=25)
plt.show()

As we can notice, most of the features are not correlated with each other. This corroborates the fact that a PCA was previously performed on the data.
正如我们所注意到的,大多数特征彼此不相关。这证实了之前对数据进行过PCA的事实。

What can generally be done on a massive dataset is a dimension reduction. By picking th emost important dimensions, there is a possiblity of explaining most of the problem, thus gaining a considerable amount of time while preventing the accuracy to drop too much.
在海量数据集上通常可以做的是降维。通过选择最重要的维度,有可能解释大部分问题,从而获得相当多的时间,同时防止精度下降太多。

However in this case given the fact that a PCA was previously performed, if the dimension reduction is effective then the PCA wasn’t computed in the most effective way. Another way to put it is that no dimension reduction should be computed on a dataset on which a PCA was computed correctly.
然而,在这种情况下,鉴于之前执行过PCA,如果降维有效,则PCA不会以最有效的方式计算。另一种说法是,不应在正确计算PCA的数据集上计算降维。

# Retrieving the correlation coefficients per feature in relation
# to the feature class
# 检索每个特征相对于特征类的相关系数
rank = df_corr['Class']
df_rank = pd.DataFrame(rank)
# print(len(df_rank))
# Ranking the absolute values of the coefficients in descending order
# 按降序排列系数的绝对值
df_rank = np.abs(df_rank).sort_values(by='Class',ascending=False)# Removing Missing Data (not a number)
# 删除丢失的数据(不是数字)
df_rank.dropna(inplace=True)
# print(len(df_rank))

Data Selection/数据选择

One way to do oversampling is to replicate the under-represented class tuples until we attain a correct proportion between the class.
进行过采样的一种方法是复制表示不足的类元组,直到类之间达到正确的比例

However as we haven’t infinite time nor the patience, we are going to run the classifier with the undersampled training data (for those using the undersampling principle if results are really bad just rerun the training dataset definition).
然而,由于我们没有无限的时间和耐心,我们将使用欠采样的训练数据运行分类器(对于那些使用欠采样原则的人,如果结果非常糟糕,只需重新运行训练数据集定义)

UNDERSAMPLING
欠采样

# We seperate ours in two groups: a train dataset and a dataset# First we build our train dataset
# We cut in two the original dataset
df_train_all = df[0:150000]
# We seperate the data which are the frauds and the no frauds
df_train_1 = df_train_all[df_train_all['Class'] == 1]
df_train_0 = df_train_all[df_train_all['Class'] == 0]
print('In this dataset, we have ' + str(len(df_train_1)) + ' frauds so we need to take a similar number of non-fraud')df_sample = df_train_0.sample(300)
df_train = df_train_1.append(df_sample) # We gather the frauds with the no frauds
df_train = df_train.sample(frac = 1) # Then we mix our dataset
In this dataset, we have 293 frauds so we need to take a similar number of non-fraud
# We drop the features Time(useless), and the Class(label)
X_train = df_train.drop(['Time','Class'],axis=1)
# We create our label
y_train = df_train['Class']X_train = np.asarray(X_train)
y_train = np.asarray(y_train)
# with all the test dataset to see if the model learn correctly
df_test_all = df[150000:]X_test_all = df_test_all.drop(['Time','Class'],axis=1)
y_test_all = df_test_all['Class']
X_test_all = np.asarray(X_test_all)
y_test_all = np.asarray(y_test_all)

Then we define training and testing set after applying a dimension reduction to illustrate the fact that nothing will be gained because a PCA was previously computed.
然后,在应用降维后,我们定义了训练集和测试集,以说明由于之前计算了PCA,所以不会获得任何结果.

# We take the first ten ranked features
print(df_rank.index[1:11])
X_train_rank = df_train[df_rank.index[1:11]]
X_train_rank = np.array(X_train_rank)
Index(['V17', 'V14', 'V12', 'V10', 'V16', 'V3', 'V7', 'V11', 'V4', 'V18'], dtype='object')
# with all th test dataset to see if the model learn correctly
X_test_all_rank = df_test_all[df_rank.index[1:11]]
X_test_all_rank = np.asarray(X_test_all_rank)
y_test_all = np.asarray(y_test_all)

Confusion Matrix/混淆矩阵

# Binary label, Class = 1(fraud) and Class = 0(no fraud)
class_names = np.array(['0','1'])
# Function to plot the confusion Matrix
def plot_confusion_matrix(cm, classes, title='Confusion Matrix',cmap=plt.cm.Blues):plt.figure(figsize=(8,8))plt.imshow(cm,interpolation='nearest',cmap=cmap)plt.title(title)plt.colorbar()tick_marks = np.arange(len(classes))plt.xticks(tick_marks,classes,rotation=45)plt.yticks(tick_marks,classes)fmt = 'd'thresh = cm.max()/2for i,j in itertools.product(range(cm.shape[0]),range(cm.shape[1])):plt.text(j,i,format(cm[i,j],fmt),horizontalalignment="center",color="white" if cm[i, j] > thresh else "black")plt.tight_layout()plt.ylabel('True label')plt.xlabel('Predicted label')

Model Selection/原数据的模型

So now, we’ll use SVM model classifier, with the scikit-learn library

# We set a SVM classifier, the default SVM Classifier(Kernel = Raadial Basis Fuction)
# 我们设置了一个SVM分类器,默认的SVM分类器(内核=径向基函数)
classifier = svm.SVC(kernel='linear')
# Then we train our mdel, with our balanced data train
classifier.fit(X_train,y_train)
SVC(kernel='linear')
# And finally, we predict our data test.
prediction_SVM_all = classifier.predict(X_test_all)
cm = confusion_matrix(y_test_all,prediction_SVM_all)
plot_confusion_matrix(cm,class_names)

accuracy = (cm[0][0] + cm[1][1])/(sum(cm[0]) + sum(cm[1]))
print_B = 4 * cm[1][1]/(cm[1][0] + cm[1][1])print('Our criterion give a result of ' + str((accuracy+print_B)/5))
print('我们的标准给出的结果为 ' + str((accuracy+print_B)/5))
Our criterion give a result of 0.9184042490971553
我们的标准给出的结果为 0.9184042490971553
print('We have detected ' + str(cm[1][1]) + ' fraude / '+ str(cm[1][1]+cm[1][0]) + ' total frauds.')
print('So, the probability to detect a fraud is ' + str(cm[1][1]/(cm[1][1]+cm[1][0])))
print("the accuracy is : " + str(accuracy))print("\n")
print('我们共发现了 '+ str(cm[1][1]) +' 起欺诈案/ '+ str(cm[1][1]+cm[1][0]) +' 起欺诈案。')
print('因此,发现欺诈的概率为 '+ str(cm[1][1]/(cm[1][1]+cm[1][0])))
print('准确度为: '+str(accuracy))
We have detected 181 fraude / 199 total frauds.
So, the probability to detect a fraud is 0.9095477386934674
the accuracy is : 0.9538302907119066我们共发现了 181 起欺诈案/ 199 起欺诈案。
因此,发现欺诈的概率为 0.9095477386934674
准确度为: 0.9538302907119066

Models Rank/降维后数据的模型

There is a need to compute the fit method again, as the dimension of the tuples to predict went from 29 to 10 because of the dimension reduction
需要再次计算拟合方法,因为要预测的元组的维数因维数减少而从29变为10

# Then we train our model, with our balanced data train.
classifier.fit(X_train_rank, y_train)
#And finally, we predict our data test.
prediction_SVM = classifier.predict(X_test_all_rank)
cm = confusion_matrix(y_test_all, prediction_SVM)
plot_confusion_matrix(cm,class_names)

accuracy = (cm[0][0] + cm[1][1])/(sum(cm[0]) + sum(cm[1]))
print_B = 4 * cm[1][1]/(cm[1][0] + cm[1][1])print('Our criterion give a result of ' + str((accuracy+print_B)/5))
print('我们的标准给出的结果为 ' + str((accuracy+print_B)/5))
Our criterion give a result of 0.894366168674494
我们的标准给出的结果为 0.894366168674494
print('We have detected ' + str(cm[1][1]) + ' fraude / '+ str(cm[1][1]+cm[1][0]) + ' total frauds.')
print('So, the probability to detect a fraud is ' + str(cm[1][1]/(cm[1][1]+cm[1][0])))
print("the accuracy is : " + str(accuracy))print("\n")
print('我们共发现了 '+ str(cm[1][1]) +' 起欺诈案/ '+ str(cm[1][1]+cm[1][0]) +' 起欺诈案。')
print('因此,发现欺诈的概率为 '+ str(cm[1][1]/(cm[1][1]+cm[1][0])))
print('准确度为: '+str(accuracy))
We have detected 178 fraude / 199 total frauds.
So, the probability to detect a fraud is 0.8944723618090452
the accuracy is : 0.970172172068216我们共发现了 178 起欺诈案/ 199 起欺诈案。
因此,发现欺诈的概率为 0.8944723618090452
准确度为: 0.970172172068216

We can see that the study using the reduced data is far from unrelevant, which means that the last step of the previously computed PCA could have been done in a more efficient way. Indeed one of the main question we have with the PCA once we calculated the principals components direction, is how many of this component are we gonna keep. This means that some of the 30 dimensions are do not discriminate classes that much.
我们可以发现,缩减后的数据并非是不相关的,这意味着先前计算的PCA的最后一步可以以更有效的方式完成。事实上,PCA的一个主要问题是,一旦我们计算出主成分的方向,我们将保留多少这个成分。这意味着,30个维度中的一些维度没有那么多区分类别。

Re-balanced class weight/重新调整分类权重

In this previously used SVM model, the weight of each class was the same, which means that missing a fraud is as bad as misjudging a non-fraud. The objective, for a bank, is to maximiz the numbre of detected frauds! Even if it means considering more non-fraud tuple as fraudulent operation. So, we need to minimmize the False positive : the number of no detected frauds.
在先前使用的SVM模型中,每种分类的权值是相同的,这意味着遗漏欺诈与误判非欺诈一样糟糕。对于银行来说,目标是最大限度的增加检测到的欺诈数量!即使这意味将更多的非欺诈视为欺诈操作。因此,我们需要最小化未检测到欺诈的数量。

Indeed, by modifying the class_weights parameter, we can chose which class to give more importance during the training phase. In this case, the class_1 which describes the fraudulent operations will be considered more important than the class_0(non-fraud operation). However, in this case we will give more importance to the class_0 due to the large number of misclassed non-fraud operation. Of course the goal is to lose as little effective fraud as possible in the process.
事实上,通过修改class_weights的参数,我们可以选择在训练阶段给予哪个类更重要的地位。在这种情况下,描述欺诈操作的class_0将被视为class_0(非欺诈操作)更重要。然而,在这种情况下,由于大量错误分类的非欺诈操作,我们更加重视clsss_0。当然,我们的目标是在这个过程中损失尽可能的少的有效欺诈。

classifier_b = svm.SVC(kernel='linear',class_weight={0:0.60,1:0.40})
classifier_b.fit(X_train,y_train)
SVC(class_weight={0: 0.6, 1: 0.4}, kernel='linear')

Testing the model

prediction_SVM_b_all = classifier_b.predict(X_test_all)
cm = confusion_matrix(y_test_all,prediction_SVM_b_all)
plot_confusion_matrix(cm,class_names)

accuracy = (cm[0][0] + cm[1][1])/(sum(cm[0]) + sum(cm[1]))
print_B = 4 * cm[1][1]/(cm[1][0] + cm[1][1])print('Our criterion give a result of ' + str((accuracy+print_B)/5))
print('我们的标准给出的结果为 ' + str((accuracy+print_B)/5))
Our criterion give a result of 0.919764712574571
我们的标准给出的结果为 0.919764712574571
print('We have detected ' + str(cm[1][1]) + ' fraude / '+ str(cm[1][1]+cm[1][0]) + ' total frauds.')
print('So, the probability to detect a fraud is ' + str(cm[1][1]/(cm[1][1]+cm[1][0])))
print("the accuracy is : " + str(accuracy))print("\n")
print('我们共发现了 '+ str(cm[1][1]) +' 起欺诈案/ '+ str(cm[1][1]+cm[1][0]) +' 起欺诈案。')
print('因此,发现欺诈的概率为 '+ str(cm[1][1]/(cm[1][1]+cm[1][0])))
print('准确度为: '+str(accuracy))
We have detected 181 fraude / 199 total frauds.
So, the probability to detect a fraud is 0.9095477386934674
the accuracy is : 0.960632608098986我们共发现了 181 起欺诈案/ 199 起欺诈案。
因此,发现欺诈的概率为 0.9095477386934674
准确度为: 0.960632608098986

Models Rank/模型排名

classifier_b.fit(X_train_rank, y_train) # Then we train our model, with our balanced data train.
prediction_SVM = classifier_b.predict(X_test_all_rank) #And finally, we predict our data test.
cm = confusion_matrix(y_test_all, prediction_SVM)
plot_confusion_matrix(cm,class_names)


accuracy = (cm[0][0] + cm[1][1])/(sum(cm[0]) + sum(cm[1]))
print_B = 4 * cm[1][1]/(cm[1][0] + cm[1][1])print('Our criterion give a result of ' + str((accuracy+print_B)/5))
print('我们的标准给出的结果为 ' + str((accuracy+print_B)/5))
Our criterion give a result of 0.894366168674494
我们的标准给出的结果为 0.894366168674494
print('We have detected ' + str(cm[1][1]) + ' fraude / '+ str(cm[1][1]+cm[1][0]) + ' total frauds.')
print('So, the probability to detect a fraud is ' + str(cm[1][1]/(cm[1][1]+cm[1][0])))
print("the accuracy is : " + str(accuracy))print("\n")
print('我们共发现了 '+ str(cm[1][1]) +' 起欺诈案/ '+ str(cm[1][1]+cm[1][0]) +' 起欺诈案。')
print('因此,发现欺诈的概率为 '+ str(cm[1][1]/(cm[1][1]+cm[1][0])))
print('准确度为: '+str(accuracy))
We have detected 173 fraude / 199 total frauds.
So, the probability to detect a fraud is 0.8693467336683417
the accuracy is : 0.9944439086991032我们共发现了 173 起欺诈案/ 199 起欺诈案。
因此,发现欺诈的概率为 0.8693467336683417
准确度为: 0.9944439086991032

Credit card dataset: SVM Classification --- PCA效果/不平衡数据相关推荐

  1. 论文 | Credit Card Fraud Detection Using Convolutional Neural Networks

    本篇博客继续为大家介绍一篇论文,也是关于用卷积神经网络 CNN 来进行信用卡欺诈检测的. 论文信息 论文题目:Credit card fraud detection using convolution ...

  2. Credit Card Fraud Detection(信用卡欺诈检测相关数据集)

    原文: Credit Card Fraud Detection Anonymized credit card transactions labeled as fraudulent or genuine ...

  3. TensorFlow for Hackers (Part VII) - Credit Card Fraud Detection using Autoencoders in Keras

    It's Sunday morning, it's quiet and you wake up with a big smile on your face. Today is going to be ...

  4. 【人脸识别】基于 Gabor+SVM和PCA+SVM实现人脸识别matlab源码含 GUI

    1 简介 随着经济的快速发展,互联网的普及,信息安全逐渐被人们所重视.人脸识别技术作为保护信息安全的重要手段之一,也逐渐被研究学者所重视.人脸识别作为计算机视觉技术和生物特征识别技术的一个重要分支,模 ...

  5. 卫星图像中的车辆分析--A Large Contextual Dataset for Classification, Detection and Counting of Cars

    A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning ECC ...

  6. cf#401(Div. 2)B. Game of Credit Card(田忌赛马类贪心)

    题干: After the fourth season Sherlock and Moriary have realized the whole foolishness of the battle b ...

  7. Python:实现测试信用卡号码有效性credit card validator的算法(附完整源码)

    Python:实现测试信用卡号码有效性credit card validator的算法 def validate_initial_digits(credit_card_number: str) -&g ...

  8. (原创)北美信用卡(Credit Card)个人使用心得与总结(个人理财版) [精华]

    http://forum.chasedream.com/thread-766972-1-1.html 本人2010年 8月F1 二度来美,现在credit score 在724-728之间浮动,最高的 ...

  9. codewars4 Credit Card Mask

    Instructions Usually when you buy something, you're asked whether your credit card number, phone num ...

  10. Federated Meta-Learning for Fraudulent Credit Card Detection

    Federated Meta-Learning for Fraudulent Credit Card Detection Introduction Life-long

最新文章

  1. matlab内存管理
  2. 正确debug的TensorFlow的姿势
  3. HTML5有哪些新特性、移除了哪些元素?
  4. AB1601的OTA区224K存储空间的使用注意事项
  5. 图像处理之添加图像水印
  6. Ubuntu Server安全Webserver搭建流程
  7. 【渝粤题库】陕西师范大学292391 金融机构管理 作业(专升本)
  8. linux下的redis配置;
  9. linux进程池 自动增长,linux下C 线程池的原理讲解和代码实现(能自行伸缩扩展线程数)...
  10. 为什么人们常说“十个创业九个死”?
  11. Java三种嵌入jsp的方法
  12. [转载] Python数学实验与建模 课后习题第1章解析
  13. Hadoop常见端口总结
  14. linux应用编程--思维导图
  15. steam加速_追梦加速器:Steam一周销量前十榜单,你的游戏排第几?
  16. Window下Redis的安装和部署详细图文教程(Redis的安装和可视化工具的使用)
  17. Linux Sentaurus-Silvaco虚拟机使用方法
  18. html天气js,H5 实现天气效果(心知天气插件)
  19. 索爱无线小蜜蜂扩音器:小巧便携、超大功率,讲课用它事半功倍
  20. Error:maven-resources-test:java.lang.OutOfMemoryError: Java heap space

热门文章

  1. [羊城杯 2020]A Piece Of Java
  2. python按钮点击事件wx_wx.python事件的绑定
  3. 谷歌邮箱无法登录问题
  4. PCB 一分钟科普之你真的懂多层板吗?
  5. 阿里网盘阿里云盘----手机端PC端
  6. 计算机组成原理运算器设计实验之8位可控加减法电路设计
  7. 高德地图JSAPI 2.0使用Java代码代替Nginx进行反向代理
  8. 如何报考华为网络工程师?
  9. Mstar数据集的获取和使用
  10. 关于手机天气应用中的城市搜索的联想查找方式优化