35. 贷款违约预测
一、项目介绍
背景
以金融风控中的个人信贷为背景,根据贷款申请人的数据信息预测其是否有违约的可能,以此判断是否通过此项贷款,这是一个典型的分类问题。
具体的列名含义
id 为贷款清单分配的唯一信用证标识loanAmnt 贷款金额term 贷款期限(year)interestRate 贷款利率installment 分期付款金额grade 贷款等级subGrade 贷款等级之子级employmentTitle 就业职称employmentLength 就业年限(年)homeOwnership 借款人在登记时提供的房屋所有权状况annualIncome 年收入verificationStatus 验证状态issueDate 贷款发放的月份purpose 借款人在贷款申请时的贷款用途类别postCode 借款人在贷款申请中提供的邮政编码的前3位数字regionCode 地区编码dti 债务收入比delinquency_2years 借款人过去2年信用档案中逾期30天以上的违约事件数ficoRangeLow 借款人在贷款发放时的fico所属的下限范围ficoRangeHigh 借款人在贷款发放时的fico所属的上限范围openAcc 借款人信用档案中未结信用额度的数量pubRec 贬损公共记录的数量pubRecBankruptcies 公开记录清除的数量revolBal 信贷周转余额合计revolUtil 循环额度利用率,或借款人使用的相对于所有可用循环信贷的信贷金额totalAcc 借款人信用档案中当前的信用额度总数initialListStatus 贷款的初始列表状态applicationType 表明贷款是个人申请还是与两个共同借款人的联合申请earliesCreditLine 借款人最早报告的信用额度开立的月份title 借款人提供的贷款名称policyCode 公开可用的策略代码=1新产品不公开可用的策略代码=2n系列匿名特征 匿名特征n0-n14,为一些贷款人行为计数特征的处理
二、数据准备
导入相关库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
from sklearn.model_selection import cross_val_score,train_test_split,GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import warnings
warnings.filterwarnings('ignore')##### 取消pandas最大列显示限制
pd.options.display.max_columns = None
获取数据
train = pd.read_csv('../data/贷款违约预测/train.csv')
三、数据分析
3.1 总体了解数据
train.shape
(800000, 47)
train.columns
Index(['id', 'loanAmnt', 'term', 'interestRate', 'installment', 'grade','subGrade', 'employmentTitle', 'employmentLength', 'homeOwnership','annualIncome', 'verificationStatus', 'issueDate', 'isDefault','purpose', 'postCode', 'regionCode', 'dti', 'delinquency_2years','ficoRangeLow', 'ficoRangeHigh', 'openAcc', 'pubRec','pubRecBankruptcies', 'revolBal', 'revolUtil', 'totalAcc','initialListStatus', 'applicationType', 'earliesCreditLine', 'title','policyCode', 'n0', 'n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7', 'n8','n9', 'n10', 'n11', 'n12', 'n13', 'n14'],dtype='object')
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800000 entries, 0 to 799999
Data columns (total 47 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 id 800000 non-null int64 1 loanAmnt 800000 non-null float642 term 800000 non-null int64 3 interestRate 800000 non-null float644 installment 800000 non-null float645 grade 800000 non-null object 6 subGrade 800000 non-null object 7 employmentTitle 799999 non-null float648 employmentLength 753201 non-null object 9 homeOwnership 800000 non-null int64 10 annualIncome 800000 non-null float6411 verificationStatus 800000 non-null int64 12 issueDate 800000 non-null object 13 isDefault 800000 non-null int64 14 purpose 800000 non-null int64 15 postCode 799999 non-null float6416 regionCode 800000 non-null int64 17 dti 799761 non-null float6418 delinquency_2years 800000 non-null float6419 ficoRangeLow 800000 non-null float6420 ficoRangeHigh 800000 non-null float6421 openAcc 800000 non-null float6422 pubRec 800000 non-null float6423 pubRecBankruptcies 799595 non-null float6424 revolBal 800000 non-null float6425 revolUtil 799469 non-null float6426 totalAcc 800000 non-null float6427 initialListStatus 800000 non-null int64 28 applicationType 800000 non-null int64 29 earliesCreditLine 800000 non-null object 30 title 799999 non-null float6431 policyCode 800000 non-null float6432 n0 759730 non-null float6433 n1 759730 non-null float6434 n2 759730 non-null float6435 n3 759730 non-null float6436 n4 766761 non-null float6437 n5 759730 non-null float6438 n6 759730 non-null float6439 n7 759730 non-null float6440 n8 759729 non-null float6441 n9 759730 non-null float6442 n10 766761 non-null float6443 n11 730248 non-null float6444 n12 759730 non-null float6445 n13 759730 non-null float6446 n14 759730 non-null float64
dtypes: float64(33), int64(9), object(5)
memory usage: 286.9+ MB
train.describe()
id | loanAmnt | term | interestRate | installment | employmentTitle | homeOwnership | annualIncome | verificationStatus | isDefault | purpose | postCode | regionCode | dti | delinquency_2years | ficoRangeLow | ficoRangeHigh | openAcc | pubRec | pubRecBankruptcies | revolBal | revolUtil | totalAcc | initialListStatus | applicationType | title | policyCode | n0 | n1 | n2 | n3 | n4 | n5 | n6 | n7 | n8 | n9 | n10 | n11 | n12 | n13 | n14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 800000.000000 | 800000.000000 | 800000.000000 | 800000.000000 | 800000.000000 | 799999.000000 | 800000.000000 | 8.000000e+05 | 800000.000000 | 800000.000000 | 800000.000000 | 799999.000000 | 800000.000000 | 799761.000000 | 800000.000000 | 800000.000000 | 800000.000000 | 800000.000000 | 800000.000000 | 799595.000000 | 8.000000e+05 | 799469.000000 | 800000.000000 | 800000.000000 | 800000.000000 | 799999.000000 | 800000.0 | 759730.000000 | 759730.000000 | 759730.000000 | 759730.000000 | 766761.000000 | 759730.000000 | 759730.000000 | 759730.000000 | 759729.000000 | 759730.000000 | 766761.000000 | 730248.000000 | 759730.000000 | 759730.000000 | 759730.000000 |
mean | 399999.500000 | 14416.818875 | 3.482745 | 13.238391 | 437.947723 | 72005.351714 | 0.614213 | 7.613391e+04 | 1.009683 | 0.199513 | 1.745982 | 258.535648 | 16.385758 | 18.284557 | 0.318239 | 696.204081 | 700.204226 | 11.598020 | 0.214915 | 0.134163 | 1.622871e+04 | 51.790734 | 24.998861 | 0.416953 | 0.019267 | 1754.113589 | 1.0 | 0.511932 | 3.642330 | 5.642648 | 5.642648 | 4.735641 | 8.107937 | 8.575994 | 8.282953 | 14.622488 | 5.592345 | 11.643896 | 0.000815 | 0.003384 | 0.089366 | 2.178606 |
std | 230940.252015 | 8716.086178 | 0.855832 | 4.765757 | 261.460393 | 106585.640204 | 0.675749 | 6.894751e+04 | 0.782716 | 0.399634 | 2.367453 | 200.037446 | 11.036679 | 11.150155 | 0.880325 | 31.865995 | 31.866674 | 5.475286 | 0.606467 | 0.377471 | 2.245802e+04 | 24.516126 | 11.999201 | 0.493055 | 0.137464 | 7941.474040 | 0.0 | 1.333266 | 2.246825 | 3.302810 | 3.302810 | 2.949969 | 4.799210 | 7.400536 | 4.561689 | 8.124610 | 3.216184 | 5.484104 | 0.030075 | 0.062041 | 0.509069 | 1.844377 |
min | 0.000000 | 500.000000 | 3.000000 | 5.310000 | 15.690000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -1.000000 | 0.000000 | 630.000000 | 634.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 2.000000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 199999.750000 | 8000.000000 | 3.000000 | 9.750000 | 248.450000 | 427.000000 | 0.000000 | 4.560000e+04 | 0.000000 | 0.000000 | 0.000000 | 103.000000 | 8.000000 | 11.790000 | 0.000000 | 670.000000 | 674.000000 | 8.000000 | 0.000000 | 0.000000 | 5.944000e+03 | 33.400000 | 16.000000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 2.000000 | 3.000000 | 3.000000 | 3.000000 | 5.000000 | 4.000000 | 5.000000 | 9.000000 | 3.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
50% | 399999.500000 | 12000.000000 | 3.000000 | 12.740000 | 375.135000 | 7755.000000 | 1.000000 | 6.500000e+04 | 1.000000 | 0.000000 | 0.000000 | 203.000000 | 14.000000 | 17.610000 | 0.000000 | 690.000000 | 694.000000 | 11.000000 | 0.000000 | 0.000000 | 1.113200e+04 | 52.100000 | 23.000000 | 0.000000 | 0.000000 | 1.000000 | 1.0 | 0.000000 | 3.000000 | 5.000000 | 5.000000 | 4.000000 | 7.000000 | 7.000000 | 7.000000 | 13.000000 | 5.000000 | 11.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 |
75% | 599999.250000 | 20000.000000 | 3.000000 | 15.990000 | 580.710000 | 117663.500000 | 1.000000 | 9.000000e+04 | 2.000000 | 0.000000 | 4.000000 | 395.000000 | 22.000000 | 24.060000 | 0.000000 | 710.000000 | 714.000000 | 14.000000 | 0.000000 | 0.000000 | 1.973400e+04 | 70.700000 | 32.000000 | 1.000000 | 0.000000 | 5.000000 | 1.0 | 0.000000 | 5.000000 | 7.000000 | 7.000000 | 6.000000 | 11.000000 | 11.000000 | 10.000000 | 19.000000 | 7.000000 | 14.000000 | 0.000000 | 0.000000 | 0.000000 | 3.000000 |
max | 799999.000000 | 40000.000000 | 5.000000 | 30.990000 | 1715.420000 | 378351.000000 | 5.000000 | 1.099920e+07 | 2.000000 | 1.000000 | 13.000000 | 940.000000 | 50.000000 | 999.000000 | 39.000000 | 845.000000 | 850.000000 | 86.000000 | 86.000000 | 12.000000 | 2.904836e+06 | 892.300000 | 162.000000 | 1.000000 | 1.000000 | 61680.000000 | 1.0 | 51.000000 | 33.000000 | 63.000000 | 63.000000 | 49.000000 | 70.000000 | 132.000000 | 79.000000 | 128.000000 | 45.000000 | 82.000000 | 4.000000 | 4.000000 | 39.000000 | 30.000000 |
# 查看数据集中特征缺失值的特征数
train.isnull().any().sum()
22
# 具体的查看缺失特征数量并可视化
missing = train.isnull().sum()
missing = missing[missing > 0]
missing.sort_values(inplace = True)
missing.plot.bar();
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lUzXSsK9-1615121853776)(output_13_0.png)]
# 查看训练集测试集中特征属性只有一值的特征
fea = [col for col in train.columns if train[col].nunique() <=1]
fea
['policyCode']
# 查看特征的数值类型有哪些,对象类型有哪些
numerical_fea = list(train.select_dtypes(exclude=['object']).columns)
category_fea = list(filter(lambda x:x not in numerical_fea,list(train.columns)))
print('数值类型特征有{}个,分别为{}:'.format(len(numerical_fea),numerical_fea))
print()
print('对象类型特征有{}个,分别为{}:'.format(len(category_fea),category_fea))
数值类型特征有42个,分别为['id', 'loanAmnt', 'term', 'interestRate', 'installment', 'employmentTitle', 'homeOwnership', 'annualIncome', 'verificationStatus', 'isDefault', 'purpose', 'postCode', 'regionCode', 'dti', 'delinquency_2years', 'ficoRangeLow', 'ficoRangeHigh', 'openAcc', 'pubRec', 'pubRecBankruptcies', 'revolBal', 'revolUtil', 'totalAcc', 'initialListStatus', 'applicationType', 'title', 'policyCode', 'n0', 'n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7', 'n8', 'n9', 'n10', 'n11', 'n12', 'n13', 'n14']:对象类型特征有5个,分别为['grade', 'subGrade', 'employmentLength', 'issueDate', 'earliesCreditLine']:
# 划分数值型变量中的连续变量和离散型变量
numerical_noserial_fea = []
numerical_serial_fea = []for fea in numerical_fea:temp = train[fea].nunique()if temp <= 10:numerical_noserial_fea.append(fea)continuenumerical_serial_fea.append(fea)print('数值连续型变量特征有:',numerical_serial_fea)
print()
print('数值离散型变量特征有:',numerical_noserial_fea)
数值连续型变量特征有: ['id', 'loanAmnt', 'interestRate', 'installment', 'employmentTitle', 'annualIncome', 'purpose', 'postCode', 'regionCode', 'dti', 'delinquency_2years', 'ficoRangeLow', 'ficoRangeHigh', 'openAcc', 'pubRec', 'pubRecBankruptcies', 'revolBal', 'revolUtil', 'totalAcc', 'title', 'n0', 'n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7', 'n8', 'n9', 'n10', 'n13', 'n14']数值离散型变量特征有: ['term', 'homeOwnership', 'verificationStatus', 'isDefault', 'initialListStatus', 'applicationType', 'policyCode', 'n11', 'n12']
3.2 数值离散型变量分析
for fea in numerical_noserial_fea:print('离散型变量:',fea)print(train[fea].value_counts())print()print()
离散型变量: term
3 606902
5 193098
Name: term, dtype: int64离散型变量: homeOwnership
0 395732
1 317660
2 86309
3 185
5 81
4 33
Name: homeOwnership, dtype: int64离散型变量: verificationStatus
1 309810
2 248968
0 241222
Name: verificationStatus, dtype: int64离散型变量: isDefault
0 640390
1 159610
Name: isDefault, dtype: int64离散型变量: initialListStatus
0 466438
1 333562
Name: initialListStatus, dtype: int64离散型变量: applicationType
0 784586
1 15414
Name: applicationType, dtype: int64离散型变量: policyCode
1.0 800000
Name: policyCode, dtype: int64离散型变量: n11
0.0 729682
1.0 540
2.0 24
4.0 1
3.0 1
Name: n11, dtype: int64离散型变量: n12
0.0 757315
1.0 2281
2.0 115
3.0 16
4.0 3
Name: n12, dtype: int64
3.3 数值连续型变量分析
f = pd.melt(train, value_vars=numerical_serial_fea)
g = sns.FacetGrid(f, col="variable", col_wrap=4, sharex=False, sharey=False)
g = g.map(sns.distplot, "value")
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GxlERoUd-1615121853777)(output_20_0.png)]
3.4 非数值类别型变量分析
for fea in category_fea:print('非数值类别型变量:',fea)print(train[fea].value_counts())print()
非数值类别型变量: grade
B 233690
C 227118
A 139661
D 119453
E 55661
F 19053
G 5364
Name: grade, dtype: int64非数值类别型变量: subGrade
C1 50763
B4 49516
B5 48965
B3 48600
C2 47068
C3 44751
C4 44272
B2 44227
B1 42382
C5 40264
A5 38045
A4 30928
D1 30538
D2 26528
A1 25909
D3 23410
A3 22655
A2 22124
D4 21139
D5 17838
E1 14064
E2 12746
E3 10925
E4 9273
E5 8653
F1 5925
F2 4340
F3 3577
F4 2859
F5 2352
G1 1759
G2 1231
G3 978
G4 751
G5 645
Name: subGrade, dtype: int64非数值类别型变量: employmentLength
10+ years 262753
2 years 72358
< 1 year 64237
3 years 64152
1 year 52489
5 years 50102
4 years 47985
6 years 37254
8 years 36192
7 years 35407
9 years 30272
Name: employmentLength, dtype: int64非数值类别型变量: issueDate
2016-03-01 29066
2015-10-01 25525
2015-07-01 24496
2015-12-01 23245
2014-10-01 21461...
2007-08-01 23
2007-07-01 21
2008-09-01 19
2007-09-01 7
2007-06-01 1
Name: issueDate, Length: 139, dtype: int64非数值类别型变量: earliesCreditLine
Aug-2001 5567
Sep-2003 5403
Aug-2002 5403
Oct-2001 5258
Aug-2000 5246...
Oct-1954 1
Jan-1944 1
May-1957 1
Nov-1954 1
Nov-1953 1
Name: earliesCreditLine, Length: 720, dtype: int64
三、特征工程
3.1 特征预处理
3.1.1缺失值填充
# 按照平均数填充连续型数值型特征
train[numerical_fea] = train[numerical_fea].fillna(train[numerical_fea].median())
# 按照众数填充类别型特征
train[category_fea].fillna(train[category_fea].mode(),inplace=True)
train['employmentLength'].fillna('10+ years',inplace=True)
train.isnull().any().sum()
0
3.1.2 对象型类别特征进行预处理
# 时间格式处理
train['issueDate'] = pd.to_datetime(train['issueDate'],format='%Y-%m-%d')
startdate = datetime.datetime.strptime('2007-06-01','%Y-%m-%d')
train['issueDate'] = train['issueDate'].apply(lambda x: x-startdate).dt.days
train['issueDate'].value_counts()
3196 29066
3044 25525
2952 24496
3105 23245
2679 21461...
61 23
30 21
458 19
92 7
0 1
Name: issueDate, Length: 139, dtype: int64
# employmentLength预处理
def employmentLength_to_int(s):if pd.isnull(s):return selse:return np.int8(s.split()[0])train['employmentLength'].replace(to_replace='10+ years', value='10 years', inplace=True)
train['employmentLength'].replace('< 1 year', '0 years', inplace=True)
train['employmentLength'] = train['employmentLength'].apply(employmentLength_to_int)
train['employmentLength'].value_counts()
10 309552
2 72358
0 64237
3 64152
1 52489
5 50102
4 47985
6 37254
8 36192
7 35407
9 30272
Name: employmentLength, dtype: int64
# 对earliesCreditLine进行预处理
train['earliesCreditLine'] = train['earliesCreditLine'].apply(lambda x:int(x[-4:]))
train['earliesCreditLine'].value_counts()
2001 53194
2002 51060
2003 50649
2000 50624
2004 49280...
1954 5
1953 5
1950 5
1946 2
1944 1
Name: earliesCreditLine, Length: 68, dtype: int64
# grade预处理
train['grade'] = train['grade'].map({'A':1,'B':2,'C':3,'D':4,'E':5,'F':6,'G':7})
train['grade'].value_counts()
2 233690
3 227118
1 139661
4 119453
5 55661
6 19053
7 5364
Name: grade, dtype: int64
# subGrade预处理
train['subGrade'] = train['subGrade'].map({'A1':1,'A2':2,'A3':3,'A4':4,'A5':5,'B1':6,'B2':7,'B3':8,'B4':9,'B5':10,'C1':11,'C2':12,'C3':13,'C4':14,'C5':15
,'D1':16,'D2':17,'D3':18,'D4':19,'D5':20,'E1':21,'E2':22,'E3':23,'E4':24,'E5':25,'F1':26,'F2':27,'F3':28,'F4':29,'F5':30
,'G1':31,'G2':32,'G3':33,'G4':34,'G5':35})
train['subGrade'].value_counts()
11 50763
9 49516
10 48965
8 48600
12 47068
13 44751
14 44272
7 44227
6 42382
15 40264
5 38045
4 30928
16 30538
17 26528
1 25909
18 23410
3 22655
2 22124
19 21139
20 17838
21 14064
22 12746
23 10925
24 9273
25 8653
26 5925
27 4340
28 3577
29 2859
30 2352
31 1759
32 1231
33 978
34 751
35 645
Name: subGrade, dtype: int64
3.1.3 数值离散特征处理
# 独热编码
temp = ['subGrade','homeOwnership','verificationStatus','purpose','regionCode']
data = pd.get_dummies(train,columns=temp,drop_first=True)
data.head()
id | loanAmnt | term | interestRate | installment | grade | employmentTitle | employmentLength | annualIncome | issueDate | isDefault | postCode | dti | delinquency_2years | ficoRangeLow | ficoRangeHigh | openAcc | pubRec | pubRecBankruptcies | revolBal | revolUtil | totalAcc | initialListStatus | applicationType | earliesCreditLine | title | policyCode | n0 | n1 | n2 | n3 | n4 | n5 | n6 | n7 | n8 | n9 | n10 | n11 | n12 | n13 | n14 | subGrade_2 | subGrade_3 | subGrade_4 | subGrade_5 | subGrade_6 | subGrade_7 | subGrade_8 | subGrade_9 | subGrade_10 | subGrade_11 | subGrade_12 | subGrade_13 | subGrade_14 | subGrade_15 | subGrade_16 | subGrade_17 | subGrade_18 | subGrade_19 | subGrade_20 | subGrade_21 | subGrade_22 | subGrade_23 | subGrade_24 | subGrade_25 | subGrade_26 | subGrade_27 | subGrade_28 | subGrade_29 | subGrade_30 | subGrade_31 | subGrade_32 | subGrade_33 | subGrade_34 | subGrade_35 | homeOwnership_1 | homeOwnership_2 | homeOwnership_3 | homeOwnership_4 | homeOwnership_5 | verificationStatus_1 | verificationStatus_2 | purpose_1 | purpose_2 | purpose_3 | purpose_4 | purpose_5 | purpose_6 | purpose_7 | purpose_8 | purpose_9 | purpose_10 | purpose_11 | purpose_12 | purpose_13 | regionCode_1 | regionCode_2 | regionCode_3 | regionCode_4 | regionCode_5 | regionCode_6 | regionCode_7 | regionCode_8 | regionCode_9 | regionCode_10 | regionCode_11 | regionCode_12 | regionCode_13 | regionCode_14 | regionCode_15 | regionCode_16 | regionCode_17 | regionCode_18 | regionCode_19 | regionCode_20 | regionCode_21 | regionCode_22 | regionCode_23 | regionCode_24 | regionCode_25 | regionCode_26 | regionCode_27 | regionCode_28 | regionCode_29 | regionCode_30 | regionCode_31 | regionCode_32 | regionCode_33 | regionCode_34 | regionCode_35 | regionCode_36 | regionCode_37 | regionCode_38 | regionCode_39 | regionCode_40 | regionCode_41 | regionCode_42 | regionCode_43 | regionCode_44 | regionCode_45 | regionCode_46 | regionCode_47 | regionCode_48 | regionCode_49 | regionCode_50 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 35000.0 | 5 | 19.52 | 917.97 | 5 | 320.0 | 2 | 110000.0 | 2587 | 1 | 137.0 | 17.05 | 0.0 | 730.0 | 734.0 | 7.0 | 0.0 | 0.0 | 24178.0 | 48.9 | 27.0 | 0 | 0 | 2001 | 1.0 | 1.0 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 | 9.0 | 8.0 | 4.0 | 12.0 | 2.0 | 7.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 18000.0 | 5 | 18.49 | 461.90 | 4 | 219843.0 | 5 | 46000.0 | 1888 | 0 | 156.0 | 27.83 | 0.0 | 700.0 | 704.0 | 13.0 | 0.0 | 0.0 | 15096.0 | 38.9 | 18.0 | 1 | 0 | 2002 | 1723.0 | 1.0 | 0.0 | 3.0 | 5.0 | 5.0 | 10.0 | 7.0 | 7.0 | 7.0 | 13.0 | 5.0 | 13.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 2 | 12000.0 | 5 | 16.99 | 298.17 | 4 | 31698.0 | 8 | 74000.0 | 3044 | 0 | 337.0 | 22.77 | 0.0 | 675.0 | 679.0 | 11.0 | 0.0 | 0.0 | 4606.0 | 51.8 | 27.0 | 0 | 0 | 2006 | 0.0 | 1.0 | 0.0 | 0.0 | 3.0 | 3.0 | 0.0 | 0.0 | 21.0 | 4.0 | 5.0 | 3.0 | 11.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 3 | 11000.0 | 3 | 7.26 | 340.96 | 1 | 46854.0 | 10 | 118000.0 | 2983 | 0 | 148.0 | 17.21 | 0.0 | 685.0 | 689.0 | 9.0 | 0.0 | 0.0 | 9948.0 | 52.6 | 28.0 | 1 | 0 | 1999 | 4.0 | 1.0 | 6.0 | 4.0 | 6.0 | 6.0 | 4.0 | 16.0 | 4.0 | 7.0 | 21.0 | 6.0 | 9.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 4 | 3000.0 | 3 | 12.99 | 101.07 | 3 | 54.0 | 10 | 29000.0 | 3196 | 0 | 301.0 | 32.16 | 0.0 | 690.0 | 694.0 | 12.0 | 0.0 | 0.0 | 2942.0 | 32.0 | 27.0 | 0 | 0 | 1977 | 11.0 | 1.0 | 1.0 | 2.0 | 7.0 | 7.0 | 2.0 | 4.0 | 9.0 | 10.0 | 15.0 | 7.0 | 12.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3.1.4 连续型特征处理
# 异常值处理
def find_outliers_by_3segama(fea):data_std = np.std(data[fea])data_mean = np.mean(data[fea])outliers_cut_off = data_std * 3lower_rule = data_mean - outliers_cut_offupper_rule = data_mean + outliers_cut_offdata[fea] = data[fea].apply(lambda x:np.nan if x > upper_rule or x < lower_rule else x)return datafor fea in data.columns:if data[fea].nunique() > 10:find_outliers_by_3segama(fea)continuedata.dropna(axis=0,how='any',inplace=True)
data.shape
(626125, 146)
# 数据分桶
for fea in data.columns:if data[fea].nunique() > 10:if data[fea].max()-data[fea].min() >100000:data[fea] = np.floor_divide(data[fea], 10000)elif data[fea].max()-data[fea].min() >10000:data[fea] = np.floor_divide(data[fea], 1000)elif data[fea].max()-data[fea].min() >1000:data[fea] = np.floor_divide(data[fea], 100)elif data[fea].max()-train[fea].min() >100:data[fea] = np.floor_divide(data[fea], 10)
data.head()
id | loanAmnt | term | interestRate | installment | grade | employmentTitle | employmentLength | annualIncome | issueDate | isDefault | postCode | dti | delinquency_2years | ficoRangeLow | ficoRangeHigh | openAcc | pubRec | pubRecBankruptcies | revolBal | revolUtil | totalAcc | initialListStatus | applicationType | earliesCreditLine | title | policyCode | n0 | n1 | n2 | n3 | n4 | n5 | n6 | n7 | n8 | n9 | n10 | n11 | n12 | n13 | n14 | subGrade_2 | subGrade_3 | subGrade_4 | subGrade_5 | subGrade_6 | subGrade_7 | subGrade_8 | subGrade_9 | subGrade_10 | subGrade_11 | subGrade_12 | subGrade_13 | subGrade_14 | subGrade_15 | subGrade_16 | subGrade_17 | subGrade_18 | subGrade_19 | subGrade_20 | subGrade_21 | subGrade_22 | subGrade_23 | subGrade_24 | subGrade_25 | subGrade_26 | subGrade_27 | subGrade_28 | subGrade_29 | subGrade_30 | subGrade_31 | subGrade_32 | subGrade_33 | subGrade_34 | subGrade_35 | homeOwnership_1 | homeOwnership_2 | homeOwnership_3 | homeOwnership_4 | homeOwnership_5 | verificationStatus_1 | verificationStatus_2 | purpose_1 | purpose_2 | purpose_3 | purpose_4 | purpose_5 | purpose_6 | purpose_7 | purpose_8 | purpose_9 | purpose_10 | purpose_11 | purpose_12 | purpose_13 | regionCode_1 | regionCode_2 | regionCode_3 | regionCode_4 | regionCode_5 | regionCode_6 | regionCode_7 | regionCode_8 | regionCode_9 | regionCode_10 | regionCode_11 | regionCode_12 | regionCode_13 | regionCode_14 | regionCode_15 | regionCode_16 | regionCode_17 | regionCode_18 | regionCode_19 | regionCode_20 | regionCode_21 | regionCode_22 | regionCode_23 | regionCode_24 | regionCode_25 | regionCode_26 | regionCode_27 | regionCode_28 | regionCode_29 | regionCode_30 | regionCode_31 | regionCode_32 | regionCode_33 | regionCode_34 | regionCode_35 | regionCode_36 | regionCode_37 | regionCode_38 | regionCode_39 | regionCode_40 | regionCode_41 | regionCode_42 | regionCode_43 | regionCode_44 | regionCode_45 | regionCode_46 | regionCode_47 | regionCode_48 | regionCode_49 | regionCode_50 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 35.0 | 5 | 19.52 | 9.0 | 5 | 0.0 | 2 | 11.0 | 25.0 | 1 | 13.0 | 17.05 | 0.0 | 73.0 | 73.0 | 7.0 | 0.0 | 0.0 | 24.0 | 4.0 | 27.0 | 0 | 0 | 2001.0 | 0.0 | 1.0 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 | 9.0 | 8.0 | 4.0 | 12.0 | 2.0 | 7.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 18.0 | 5 | 18.49 | 4.0 | 4 | 21.0 | 5 | 4.0 | 18.0 | 0 | 15.0 | 27.83 | 0.0 | 70.0 | 70.0 | 13.0 | 0.0 | 0.0 | 15.0 | 3.0 | 18.0 | 1 | 0 | 2002.0 | 1.0 | 1.0 | 0.0 | 3.0 | 5.0 | 5.0 | 10.0 | 7.0 | 7.0 | 7.0 | 13.0 | 5.0 | 13.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 12.0 | 5 | 16.99 | 2.0 | 4 | 3.0 | 8 | 7.0 | 30.0 | 0 | 33.0 | 22.77 | 0.0 | 67.0 | 67.0 | 11.0 | 0.0 | 0.0 | 4.0 | 5.0 | 27.0 | 0 | 0 | 2006.0 | 0.0 | 1.0 | 0.0 | 0.0 | 3.0 | 3.0 | 0.0 | 0.0 | 21.0 | 4.0 | 5.0 | 3.0 | 11.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 3.0 | 3 | 12.99 | 1.0 | 3 | 0.0 | 10 | 2.0 | 31.0 | 0 | 30.0 | 32.16 | 0.0 | 69.0 | 69.0 | 12.0 | 0.0 | 0.0 | 2.0 | 3.0 | 27.0 | 0 | 0 | 1977.0 | 0.0 | 1.0 | 1.0 | 2.0 | 7.0 | 7.0 | 2.0 | 4.0 | 9.0 | 10.0 | 15.0 | 7.0 | 12.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 | 0 | 2.0 | 3 | 7.69 | 0.0 | 1 | 18.0 | 9 | 3.0 | 26.0 | 0 | 51.0 | 17.49 | 0.0 | 75.0 | 75.0 | 12.0 | 0.0 | 0.0 | 3.0 | 0.0 | 23.0 | 0 | 0 | 2006.0 | 0.0 | 1.0 | 0.0 | 1.0 | 3.0 | 3.0 | 7.0 | 11.0 | 3.0 | 10.0 | 18.0 | 3.0 | 12.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3.2 特征交互
3.3 特征编码
3.4 特征选择
# 相关性绝对值小于0.003的特征删除
data.corr()
id | loanAmnt | term | interestRate | installment | grade | employmentTitle | employmentLength | annualIncome | issueDate | isDefault | postCode | dti | delinquency_2years | ficoRangeLow | ficoRangeHigh | openAcc | pubRec | pubRecBankruptcies | revolBal | revolUtil | totalAcc | initialListStatus | applicationType | earliesCreditLine | title | policyCode | n0 | n1 | n2 | n3 | n4 | n5 | n6 | n7 | n8 | n9 | n10 | n11 | n12 | n13 | n14 | subGrade_2 | subGrade_3 | subGrade_4 | subGrade_5 | subGrade_6 | subGrade_7 | subGrade_8 | subGrade_9 | subGrade_10 | subGrade_11 | subGrade_12 | subGrade_13 | subGrade_14 | subGrade_15 | subGrade_16 | subGrade_17 | subGrade_18 | subGrade_19 | subGrade_20 | subGrade_21 | subGrade_22 | subGrade_23 | subGrade_24 | subGrade_25 | subGrade_26 | subGrade_27 | subGrade_28 | subGrade_29 | subGrade_30 | subGrade_31 | subGrade_32 | subGrade_33 | subGrade_34 | subGrade_35 | homeOwnership_1 | homeOwnership_2 | homeOwnership_3 | homeOwnership_4 | homeOwnership_5 | verificationStatus_1 | verificationStatus_2 | purpose_1 | purpose_2 | purpose_3 | purpose_4 | purpose_5 | purpose_6 | purpose_7 | purpose_8 | purpose_9 | purpose_10 | purpose_11 | purpose_12 | purpose_13 | regionCode_1 | regionCode_2 | regionCode_3 | regionCode_4 | regionCode_5 | regionCode_6 | regionCode_7 | regionCode_8 | regionCode_9 | regionCode_10 | regionCode_11 | regionCode_12 | regionCode_13 | regionCode_14 | regionCode_15 | regionCode_16 | regionCode_17 | regionCode_18 | regionCode_19 | regionCode_20 | regionCode_21 | regionCode_22 | regionCode_23 | regionCode_24 | regionCode_25 | regionCode_26 | regionCode_27 | regionCode_28 | regionCode_29 | regionCode_30 | regionCode_31 | regionCode_32 | regionCode_33 | regionCode_34 | regionCode_35 | regionCode_36 | regionCode_37 | regionCode_38 | regionCode_39 | regionCode_40 | regionCode_41 | regionCode_42 | regionCode_43 | regionCode_44 | regionCode_45 | regionCode_46 | regionCode_47 | regionCode_48 | regionCode_49 | regionCode_50 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | 1.000000 | 0.000423 | -0.000832 | 0.001380 | 0.000935 | 0.001117 | -0.000515 | 0.000522 | 0.001601 | 0.000813 | 0.000054 | 0.002119 | -0.001419 | 0.000611 | -0.000891 | -0.000891 | -0.002579 | -0.000380 | -0.001113 | -0.002203 | 0.001751 | -0.001191 | 0.001625 | 0.001088 | -0.000648 | -0.000831 | NaN | 0.001187 | -0.001507 | -0.001596 | -0.001596 | -0.001167 | -0.000726 | -0.001190 | -0.002063 | -0.001261 | -0.001383 | -0.002682 | -0.000979 | 0.000919 | -0.000399 | 0.000466 | -0.003861 | 0.002312 | -0.000586 | 0.001854 | 0.001601 | -0.000564 | -0.001624 | -0.001781 | -0.000133 | -0.002010 | 0.000298 | -0.000805 | 0.000330 | 0.000902 | 0.000199 | 0.001544 | 0.000008 | 0.001353 | 0.002652 | -0.000004 | 0.001617 | 0.000757 | -0.001725 | -0.001176 | 0.000167 | 0.000584 | -0.000396 | 0.001567 | -0.001031 | -0.001643 | -0.000925 | 0.000106 | 0.000632 | 0.001376 | 0.001050 | -0.000191 | -0.000324 | -0.001007 | -0.000076 | -0.001066 | 0.002051 | 0.001086 | -0.000949 | 0.000830 | -0.001223 | 0.001617 | 0.001010 | 0.000869 | 0.000494 | -0.000550 | -0.001030 | -0.000046 | 0.000530 | -0.001829 | -0.000512 | 0.000132 | -0.002777 | 0.000837 | 0.000781 | 0.001360 | 0.000291 | 0.003432 | 0.000811 | 0.000275 | -0.000130 | -0.002148 | -0.000562 | -0.001097 | 0.000369 | -0.000868 | 0.000086 | 0.000242 | -0.001352 | 0.000190 | 0.002099 | -0.001709 | 0.000090 | -0.002066 | -0.001333 | 0.001156 | 0.000646 | -0.001189 | 0.001397 | 0.000154 | -0.000731 | -0.003658 | -0.002803 | -0.000918 | 0.000107 | 0.000119 | 0.001542 | -0.000442 | -0.001099 | -0.000408 | 0.000445 | 0.000052 | 0.000535 | -0.002393 | 0.001781 | 0.000439 | 0.001373 | -0.000621 | -0.001178 | NaN |
loanAmnt | 0.000423 | 1.000000 | 0.409910 | 0.118841 | 0.944344 | 0.120627 | -0.009845 | 0.065608 | 0.467275 | -0.003040 | 0.059117 | -0.021573 | 0.027657 | 0.011335 | 0.112145 | 0.112145 | 0.183657 | -0.085582 | -0.095282 | 0.441242 | 0.115935 | 0.224888 | -0.064040 | 0.063317 | -0.163749 | -0.019045 | NaN | -0.039067 | 0.185727 | 0.139590 | 0.139590 | 0.205452 | 0.199770 | 0.096398 | 0.158380 | 0.173734 | 0.140495 | 0.179692 | -0.000524 | 0.001787 | -0.009348 | -0.039259 | -0.015960 | -0.013134 | -0.007710 | -0.006264 | -0.029218 | -0.026768 | -0.023572 | -0.028199 | -0.041442 | -0.023762 | -0.015886 | 0.001133 | 0.016117 | 0.011130 | 0.001577 | -0.000291 | 0.009578 | 0.026116 | 0.028586 | 0.033834 | 0.038488 | 0.041068 | 0.045713 | 0.048670 | 0.036593 | 0.039976 | 0.030056 | 0.029536 | 0.028785 | 0.025081 | 0.018754 | 0.020721 | 0.016866 | 0.016305 | -0.164712 | -0.026631 | -0.002439 | 0.001072 | -0.000577 | 0.030942 | 0.155035 | 0.008906 | -0.018050 | -0.044906 | 0.029625 | -0.137541 | 0.002530 | -0.077588 | -0.066592 | -0.067713 | -0.065535 | -0.014204 | -0.015790 | -0.003060 | -0.003828 | 0.005890 | 0.006220 | -0.003724 | -0.003024 | -0.005583 | -0.004681 | 0.006280 | 0.017927 | -0.010561 | -0.005282 | 0.012463 | -0.006248 | 0.027367 | -0.002174 | -0.000754 | -0.004629 | 0.006900 | -0.014687 | -0.000519 | -0.024489 | 0.003046 | -0.012151 | -0.007140 | 0.004722 | 0.016062 | -0.005721 | 0.006328 | -0.004399 | 0.017766 | -0.000734 | -0.001726 | 0.000768 | -0.005510 | -0.011785 | 0.001404 | -0.007696 | -0.007866 | 0.000001 | -0.004725 | 0.005218 | -0.000192 | -0.010283 | -0.003038 | 0.002874 | 0.000126 | -0.001679 | 0.013122 | -0.001992 | NaN |
term | -0.000832 | 0.409910 | 1.000000 | 0.415437 | 0.162911 | 0.423747 | 0.013595 | 0.043317 | 0.106556 | -0.038790 | 0.172367 | 0.011870 | 0.064007 | -0.003635 | 0.010680 | 0.010680 | 0.077523 | -0.019022 | -0.011780 | 0.133292 | 0.066051 | 0.117111 | -0.093668 | 0.039031 | -0.055153 | 0.006475 | NaN | -0.015322 | 0.042665 | 0.050506 | 0.050506 | 0.047499 | 0.058447 | 0.084320 | 0.051615 | 0.064934 | 0.050742 | 0.079613 | 0.000024 | -0.000450 | -0.005690 | 0.021253 | -0.092775 | -0.093894 | -0.102663 | -0.095506 | -0.079971 | -0.077644 | -0.065284 | -0.055868 | -0.074103 | -0.026473 | -0.005394 | 0.033247 | 0.062081 | 0.057930 | 0.043461 | 0.043473 | 0.055638 | 0.083301 | 0.090898 | 0.096988 | 0.103288 | 0.109382 | 0.116059 | 0.117178 | 0.093415 | 0.089779 | 0.071815 | 0.071525 | 0.060260 | 0.053811 | 0.044648 | 0.035705 | 0.028910 | 0.027364 | -0.099142 | -0.020997 | -0.001623 | -0.001340 | -0.000943 | 0.033459 | 0.088802 | 0.002053 | 0.004970 | -0.013546 | -0.038622 | -0.050243 | 0.005194 | -0.034611 | -0.018643 | -0.023816 | -0.027348 | -0.006035 | -0.003893 | 0.002572 | 0.000277 | -0.007993 | 0.003824 | 0.003904 | 0.004104 | 0.001358 | 0.011006 | -0.021286 | 0.012225 | -0.001926 | 0.008378 | 0.009153 | -0.010520 | -0.005189 | 0.001348 | -0.000885 | 0.008042 | -0.000223 | 0.007868 | 0.003341 | -0.012818 | 0.005266 | 0.000807 | 0.002217 | -0.003946 | 0.003221 | 0.002844 | 0.001443 | -0.001787 | 0.001891 | 0.003290 | 0.005010 | -0.000667 | -0.001398 | -0.003144 | -0.001115 | 0.002452 | -0.007542 | 0.009668 | 0.000364 | 0.004289 | 0.003000 | 0.006001 | -0.000352 | 0.008120 | -0.002490 | -0.001081 | 0.000969 | 0.000010 | NaN |
interestRate | 0.001380 | 0.118841 | 0.415437 | 1.000000 | 0.120640 | 0.951939 | 0.072586 | -0.000166 | -0.128969 | -0.038144 | 0.253111 | 0.008645 | 0.178430 | 0.041478 | -0.397102 | -0.397102 | -0.026849 | 0.052064 | 0.048077 | -0.029575 | 0.240812 | -0.071632 | 0.129388 | 0.029456 | 0.107659 | 0.016223 | NaN | 0.042391 | 0.013137 | 0.082619 | 0.082619 | -0.066004 | -0.093405 | -0.015900 | -0.020434 | -0.069891 | 0.081010 | -0.030676 | 0.003671 | 0.012847 | 0.026751 | 0.190942 | -0.243300 | -0.225212 | -0.249667 | -0.245046 | -0.221699 | -0.173199 | -0.134540 | -0.093975 | -0.063343 | -0.020932 | 0.018093 | 0.049204 | 0.086881 | 0.125377 | 0.144427 | 0.169726 | 0.184474 | 0.198299 | 0.208694 | 0.197218 | 0.206423 | 0.209187 | 0.214386 | 0.233260 | 0.170868 | 0.158553 | 0.152411 | 0.141718 | 0.124198 | 0.107149 | 0.085428 | 0.060953 | 0.047413 | 0.044686 | 0.063943 | 0.007200 | 0.000583 | 0.002810 | 0.002654 | 0.014635 | 0.208091 | 0.064548 | -0.019210 | -0.012516 | -0.168315 | 0.074356 | 0.036263 | 0.009930 | -0.022643 | 0.020322 | 0.036275 | 0.012358 | 0.010037 | 0.000691 | 0.001799 | -0.006851 | 0.000489 | -0.004114 | 0.002838 | 0.000529 | 0.005557 | -0.007345 | 0.001095 | -0.005518 | 0.001007 | 0.002319 | 0.011774 | -0.007952 | -0.002154 | -0.001417 | 0.011875 | 0.002459 | 0.002773 | 0.000049 | 0.005241 | -0.005651 | 0.004637 | -0.001894 | -0.006674 | -0.015917 | -0.004497 | 0.009403 | -0.003376 | -0.003325 | 0.000832 | 0.006044 | -0.005375 | 0.001907 | -0.003610 | -0.003600 | 0.007047 | 0.003819 | 0.000419 | -0.001031 | 0.000135 | 0.003551 | 0.001716 | 0.002648 | -0.000331 | 0.001871 | -0.002838 | 0.002601 | 0.002269 | NaN |
installment | 0.000935 | 0.944344 | 0.162911 | 0.120640 | 1.000000 | 0.115138 | -0.005217 | 0.055356 | 0.440590 | -0.000578 | 0.041954 | -0.024978 | 0.034484 | 0.018761 | 0.063176 | 0.063176 | 0.169524 | -0.076187 | -0.089488 | 0.418453 | 0.133743 | 0.193653 | -0.019355 | 0.055460 | -0.142758 | -0.019282 | NaN | -0.031052 | 0.188863 | 0.146766 | 0.146766 | 0.197947 | 0.184984 | 0.073793 | 0.153171 | 0.158454 | 0.147354 | 0.164535 | -0.000376 | 0.003462 | -0.004688 | -0.024037 | -0.012639 | -0.008244 | -0.003798 | -0.003254 | -0.031604 | -0.027494 | -0.027593 | -0.030215 | -0.035670 | -0.018203 | -0.011481 | -0.004921 | 0.006126 | 0.004602 | 0.002066 | 0.005157 | 0.013816 | 0.025210 | 0.027891 | 0.030986 | 0.034989 | 0.037623 | 0.042111 | 0.048661 | 0.032765 | 0.038853 | 0.032514 | 0.031743 | 0.032160 | 0.027272 | 0.020309 | 0.021406 | 0.016787 | 0.016617 | -0.134317 | -0.020577 | -0.002310 | 0.001835 | -0.000081 | 0.023925 | 0.159702 | 0.017272 | -0.024932 | -0.046447 | 0.024458 | -0.129208 | 0.004799 | -0.075295 | -0.066935 | -0.065726 | -0.061881 | -0.012486 | -0.014627 | -0.003316 | -0.003710 | 0.008499 | 0.006028 | -0.005617 | -0.003991 | -0.006158 | -0.008413 | 0.012794 | 0.014852 | -0.011654 | -0.008282 | 0.009407 | -0.001891 | 0.029017 | -0.002955 | -0.001005 | -0.006236 | 0.007959 | -0.018127 | -0.001813 | -0.021365 | 0.000745 | -0.012224 | -0.008418 | 0.005480 | 0.016269 | -0.006885 | 0.007615 | -0.004297 | 0.017403 | -0.001918 | -0.002410 | 0.000553 | -0.005730 | -0.011770 | 0.001309 | -0.008045 | -0.005362 | -0.003144 | -0.004970 | 0.004123 | -0.000890 | -0.012445 | -0.002893 | 0.000689 | 0.001379 | -0.002107 | 0.014514 | -0.002434 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
regionCode_46 | 0.000439 | 0.000126 | -0.002490 | 0.001871 | 0.001379 | 0.002765 | -0.008369 | -0.011931 | -0.004625 | 0.024831 | -0.000336 | 0.058462 | 0.009459 | 0.003612 | -0.002016 | -0.002016 | -0.003194 | 0.001476 | -0.002488 | -0.004314 | 0.002978 | -0.001576 | -0.013439 | 0.009129 | 0.011684 | -0.005360 | NaN | -0.000041 | -0.002700 | -0.004665 | -0.004665 | -0.004031 | -0.008001 | 0.009072 | -0.007103 | -0.008401 | -0.005226 | -0.003726 | 0.001148 | 0.001783 | 0.000147 | 0.002509 | -0.002335 | -0.002496 | -0.001349 | -0.003406 | 0.001946 | -0.000224 | -0.001426 | -0.000537 | 0.002037 | -0.000117 | -0.000140 | 0.000594 | 0.000096 | 0.005490 | 0.001900 | 0.000297 | 0.002810 | -0.000436 | -0.001953 | 0.000466 | -0.000527 | 0.001300 | -0.002110 | -0.000180 | 0.000703 | 0.000282 | -0.000673 | -0.000245 | -0.000626 | -0.001348 | -0.001072 | -0.000790 | -0.000618 | -0.000568 | -0.001450 | 0.002148 | -0.000553 | -0.000206 | -0.000190 | 0.001963 | 0.001363 | -0.000022 | -0.002358 | -0.001070 | 0.002238 | 0.000232 | -0.001839 | 0.000393 | 0.000282 | 0.001306 | 0.000258 | 0.002608 | -0.001346 | -0.000119 | -0.001614 | -0.007011 | -0.006412 | -0.004771 | -0.003901 | -0.002357 | -0.006112 | -0.014704 | -0.006070 | -0.005696 | -0.004578 | -0.005392 | -0.010477 | -0.010645 | -0.003315 | -0.002605 | -0.003985 | -0.005350 | -0.006564 | -0.003865 | -0.009814 | -0.005398 | -0.005811 | -0.004546 | -0.001762 | -0.005437 | -0.004141 | -0.002559 | -0.001590 | -0.006658 | -0.001902 | -0.004436 | -0.002476 | -0.001846 | -0.003979 | -0.004241 | -0.003088 | -0.004423 | -0.002117 | -0.001948 | -0.001703 | -0.003432 | -0.003531 | -0.002525 | -0.003082 | 1.000000 | -0.001388 | -0.001764 | -0.000718 | NaN |
regionCode_47 | 0.001373 | -0.001679 | -0.001081 | -0.002838 | -0.002107 | -0.002504 | -0.007356 | 0.001941 | -0.007198 | 0.030269 | -0.005557 | 0.060215 | 0.007797 | 0.000471 | 0.001091 | 0.001091 | -0.003704 | 0.003830 | 0.002131 | 0.000469 | 0.000210 | 0.000105 | -0.013378 | 0.009576 | 0.002036 | -0.005905 | NaN | -0.000110 | -0.000971 | -0.004866 | -0.004866 | -0.000450 | -0.001806 | 0.004435 | -0.006316 | -0.004431 | -0.005377 | -0.004324 | -0.000898 | -0.000308 | -0.000536 | -0.003055 | -0.000755 | -0.000961 | -0.001134 | 0.001827 | 0.001583 | 0.000080 | -0.001527 | 0.002260 | 0.002075 | -0.001430 | -0.000158 | 0.001319 | 0.001773 | -0.001070 | 0.001429 | -0.000536 | 0.001861 | 0.001303 | -0.001691 | -0.001744 | -0.001350 | -0.001749 | -0.000725 | -0.001673 | -0.000952 | -0.000817 | -0.001704 | -0.000594 | -0.000881 | -0.001484 | -0.001180 | -0.000870 | -0.000681 | -0.000625 | -0.009648 | 0.005573 | -0.000609 | -0.000226 | -0.000210 | 0.001542 | -0.001215 | -0.001125 | -0.001645 | -0.000254 | 0.003326 | -0.001654 | -0.001564 | 0.001295 | 0.001141 | -0.001868 | -0.002305 | 0.000595 | -0.001483 | -0.000131 | -0.001778 | -0.007724 | -0.007063 | -0.005255 | -0.004297 | -0.002596 | -0.006733 | -0.016198 | -0.006686 | -0.006275 | -0.005044 | -0.005940 | -0.011541 | -0.011727 | -0.003651 | -0.002869 | -0.004390 | -0.005894 | -0.007231 | -0.004257 | -0.010811 | -0.005946 | -0.006402 | -0.005008 | -0.001941 | -0.005990 | -0.004562 | -0.002819 | -0.001752 | -0.007335 | -0.002096 | -0.004887 | -0.002727 | -0.002033 | -0.004384 | -0.004672 | -0.003402 | -0.004872 | -0.002332 | -0.002145 | -0.001876 | -0.003781 | -0.003890 | -0.002782 | -0.003395 | -0.001388 | 1.000000 | -0.001943 | -0.000791 | NaN |
regionCode_48 | -0.000621 | 0.013122 | 0.000969 | 0.002601 | 0.014514 | 0.002697 | 0.005839 | 0.001035 | 0.005677 | -0.006643 | 0.000386 | 0.074851 | 0.003538 | -0.000505 | -0.000066 | -0.000066 | -0.008804 | -0.010687 | -0.009029 | 0.012497 | 0.016935 | -0.002152 | 0.003535 | -0.000401 | -0.000109 | 0.002563 | NaN | -0.003413 | -0.004071 | -0.006308 | -0.006308 | -0.006013 | -0.003305 | 0.003450 | -0.011382 | -0.006447 | -0.005964 | -0.008729 | -0.001141 | -0.001788 | -0.000788 | -0.007267 | -0.002167 | 0.000319 | 0.001117 | 0.000323 | 0.000613 | -0.001174 | -0.002295 | -0.001939 | 0.001481 | -0.000007 | -0.001138 | 0.000008 | -0.001269 | -0.000826 | 0.001418 | 0.001531 | -0.000812 | 0.002184 | -0.000215 | 0.001059 | 0.000417 | 0.000593 | 0.001657 | -0.000144 | -0.000153 | 0.003352 | -0.000897 | -0.000291 | -0.000136 | 0.000664 | 0.002774 | 0.001791 | 0.000984 | 0.001222 | -0.002532 | 0.001040 | -0.000774 | -0.000288 | 0.005743 | 0.000934 | 0.001735 | 0.001361 | 0.000426 | -0.000243 | -0.001842 | -0.003017 | 0.002071 | 0.003414 | -0.001371 | 0.000768 | 0.001233 | -0.000015 | 0.000669 | -0.000166 | -0.002259 | -0.009814 | -0.008975 | -0.006678 | -0.005460 | -0.003299 | -0.008555 | -0.020582 | -0.008496 | -0.007973 | -0.006408 | -0.007547 | -0.014664 | -0.014901 | -0.004640 | -0.003646 | -0.005578 | -0.007488 | -0.009188 | -0.005409 | -0.013736 | -0.007556 | -0.008134 | -0.006364 | -0.002466 | -0.007610 | -0.005797 | -0.003582 | -0.002226 | -0.009320 | -0.002663 | -0.006209 | -0.003465 | -0.002583 | -0.005570 | -0.005937 | -0.004323 | -0.006191 | -0.002963 | -0.002726 | -0.002384 | -0.004804 | -0.004943 | -0.003534 | -0.004314 | -0.001764 | -0.001943 | 1.000000 | -0.001005 | NaN |
regionCode_49 | -0.001178 | -0.001992 | 0.000010 | 0.002269 | -0.002434 | 0.000968 | -0.003484 | 0.000365 | -0.006709 | 0.020769 | -0.002056 | 0.059028 | 0.004764 | -0.000256 | -0.001166 | -0.001166 | -0.001209 | 0.011928 | 0.004670 | -0.003665 | -0.000948 | -0.001023 | -0.006241 | 0.015515 | 0.005007 | -0.003054 | NaN | 0.003467 | -0.004131 | -0.001632 | -0.001632 | -0.004665 | -0.007655 | 0.002124 | -0.001115 | -0.003804 | -0.001842 | -0.001661 | -0.000464 | 0.000555 | 0.001177 | 0.001127 | 0.000018 | -0.000573 | -0.003264 | -0.003465 | -0.000607 | 0.000914 | -0.000261 | 0.001553 | 0.002841 | -0.000248 | 0.000493 | 0.002180 | 0.000185 | -0.000020 | 0.000113 | -0.001948 | 0.001733 | 0.000162 | 0.000193 | -0.000263 | 0.001869 | -0.000318 | 0.000810 | -0.001332 | -0.001581 | -0.000191 | -0.001230 | -0.001093 | 0.000817 | -0.000768 | -0.000610 | -0.000450 | -0.000352 | -0.000323 | -0.007148 | -0.001560 | -0.000315 | -0.000117 | -0.000108 | -0.000953 | 0.002061 | -0.001269 | -0.000138 | -0.001287 | -0.002003 | 0.001516 | -0.001416 | -0.000695 | -0.000552 | 0.001566 | 0.000195 | -0.000518 | -0.000767 | -0.000068 | -0.000920 | -0.003995 | -0.003653 | -0.002718 | -0.002222 | -0.001343 | -0.003482 | -0.008377 | -0.003458 | -0.003245 | -0.002608 | -0.003072 | -0.005969 | -0.006065 | -0.001888 | -0.001484 | -0.002271 | -0.003048 | -0.003740 | -0.002202 | -0.005591 | -0.003075 | -0.003311 | -0.002590 | -0.001004 | -0.003098 | -0.002359 | -0.001458 | -0.000906 | -0.003794 | -0.001084 | -0.002527 | -0.001411 | -0.001052 | -0.002267 | -0.002416 | -0.001760 | -0.002520 | -0.001206 | -0.001110 | -0.000970 | -0.001955 | -0.002012 | -0.001439 | -0.001756 | -0.000718 | -0.000791 | -0.001005 | 1.000000 | NaN |
regionCode_50 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
146 rows × 146 columns
corr = data.corr().unstack()['isDefault'].sort_values(ascending=False)
drop_fea = corr[abs(corr)<0.003].index
data.drop(drop_fea,axis=1,inplace=True)
data.head()
loanAmnt | term | interestRate | installment | grade | employmentTitle | annualIncome | issueDate | isDefault | postCode | dti | delinquency_2years | ficoRangeLow | ficoRangeHigh | openAcc | pubRec | pubRecBankruptcies | revolBal | revolUtil | totalAcc | initialListStatus | applicationType | earliesCreditLine | title | policyCode | n0 | n1 | n2 | n3 | n4 | n5 | n6 | n7 | n8 | n9 | n10 | n13 | n14 | subGrade_2 | subGrade_3 | subGrade_4 | subGrade_5 | subGrade_6 | subGrade_7 | subGrade_8 | subGrade_9 | subGrade_10 | subGrade_11 | subGrade_12 | subGrade_13 | subGrade_14 | subGrade_15 | subGrade_16 | subGrade_17 | subGrade_18 | subGrade_19 | subGrade_20 | subGrade_21 | subGrade_22 | subGrade_23 | subGrade_24 | subGrade_25 | subGrade_26 | subGrade_27 | subGrade_28 | subGrade_29 | subGrade_30 | subGrade_31 | subGrade_32 | subGrade_33 | subGrade_34 | subGrade_35 | homeOwnership_1 | homeOwnership_2 | verificationStatus_1 | verificationStatus_2 | purpose_1 | purpose_2 | purpose_4 | purpose_5 | purpose_6 | purpose_8 | purpose_9 | purpose_10 | purpose_12 | regionCode_2 | regionCode_3 | regionCode_5 | regionCode_6 | regionCode_7 | regionCode_11 | regionCode_12 | regionCode_13 | regionCode_14 | regionCode_15 | regionCode_17 | regionCode_18 | regionCode_19 | regionCode_20 | regionCode_21 | regionCode_22 | regionCode_24 | regionCode_25 | regionCode_27 | regionCode_29 | regionCode_30 | regionCode_32 | regionCode_33 | regionCode_34 | regionCode_35 | regionCode_36 | regionCode_37 | regionCode_38 | regionCode_39 | regionCode_40 | regionCode_42 | regionCode_43 | regionCode_44 | regionCode_45 | regionCode_47 | regionCode_50 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 35.0 | 5 | 19.52 | 9.0 | 5 | 0.0 | 11.0 | 25.0 | 1 | 13.0 | 17.05 | 0.0 | 73.0 | 73.0 | 7.0 | 0.0 | 0.0 | 24.0 | 4.0 | 27.0 | 0 | 0 | 2001.0 | 0.0 | 1.0 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 | 9.0 | 8.0 | 4.0 | 12.0 | 2.0 | 7.0 | 0.0 | 2.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 18.0 | 5 | 18.49 | 4.0 | 4 | 21.0 | 4.0 | 18.0 | 0 | 15.0 | 27.83 | 0.0 | 70.0 | 70.0 | 13.0 | 0.0 | 0.0 | 15.0 | 3.0 | 18.0 | 1 | 0 | 2002.0 | 1.0 | 1.0 | 0.0 | 3.0 | 5.0 | 5.0 | 10.0 | 7.0 | 7.0 | 7.0 | 13.0 | 5.0 | 13.0 | 0.0 | 2.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 12.0 | 5 | 16.99 | 2.0 | 4 | 3.0 | 7.0 | 30.0 | 0 | 33.0 | 22.77 | 0.0 | 67.0 | 67.0 | 11.0 | 0.0 | 0.0 | 4.0 | 5.0 | 27.0 | 0 | 0 | 2006.0 | 0.0 | 1.0 | 0.0 | 0.0 | 3.0 | 3.0 | 0.0 | 0.0 | 21.0 | 4.0 | 5.0 | 3.0 | 11.0 | 0.0 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 3.0 | 3 | 12.99 | 1.0 | 3 | 0.0 | 2.0 | 31.0 | 0 | 30.0 | 32.16 | 0.0 | 69.0 | 69.0 | 12.0 | 0.0 | 0.0 | 2.0 | 3.0 | 27.0 | 0 | 0 | 1977.0 | 0.0 | 1.0 | 1.0 | 2.0 | 7.0 | 7.0 | 2.0 | 4.0 | 9.0 | 10.0 | 15.0 | 7.0 | 12.0 | 0.0 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 | 2.0 | 3 | 7.69 | 0.0 | 1 | 18.0 | 3.0 | 26.0 | 0 | 51.0 | 17.49 | 0.0 | 75.0 | 75.0 | 12.0 | 0.0 | 0.0 | 3.0 | 0.0 | 23.0 | 0 | 0 | 2006.0 | 0.0 | 1.0 | 0.0 | 1.0 | 3.0 | 3.0 | 7.0 | 11.0 | 3.0 | 10.0 | 18.0 | 3.0 | 12.0 | 0.0 | 3.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
四.建型与调参
4.1 划分训练集/测试集/验证集
x_train = data.drop(['isDefault'],axis=1)
y_train = data['isDefault']x, val_x, y, val_y = train_test_split(x_train,y_train,test_size=0.25,random_state=1)
4.2 选择模型
4.2.1 逻辑回归模型
lr = LogisticRegression()
scores = cross_val_score(lr,x_train,y_train,cv=5,scoring='accuracy')
print('逻辑回归5折交叉训练准确率为:',np.mean(scores))
逻辑回归5折交叉训练准确率为: 0.803408265122779
4.2.2 随机森林模型
rfc = RandomForestClassifier(n_estimators=250,max_depth=10)
rfc.fit(x,y)
print('随机森林准确率为:',rfc.score(val_x,val_y))
随机森林准确率为: 0.8049472312370634
4.3 网格搜索进行超参数调优
param_grid = {'n_estimators':[100,150,200,300],'max_depth':[5,10,15]}
r = RandomForestClassifier()
emstimator = GridSearchCV(r,param_grid=param_grid,cv=5)
emstimator.fit(x_train,y_train)
emstimator.best_score_
五、总结
金融风控的实际项目多涉及到信用评分,因此需要模型特征具有较好的可解释性,所以目前在实际项目中多还是以逻辑回归作为基础模型。
如果想获得更好的结果,可以使用集成算法进行建模,但解释性不强。
35. 贷款违约预测相关推荐
- 【算法竞赛学习】金融风控之贷款违约预测-建模与调参
Task4 建模与调参 此部分为零基础入门金融风控的 Task4 建模调参部分,带你来了解各种模型以及模型的评价和调参策略,欢迎大家后续多多交流. 赛题:零基础入门数据挖掘 - 零基础入门金融风控之贷 ...
- 数据竞赛入门-金融风控(贷款违约预测)四、建模与调参
前言 本次活动为datawhale与天池联合举办,为金融风控之贷款违约预测挑战赛(入门) 比赛地址:https://tianchi.aliyun.com/competition/entrance/53 ...
- 数据竞赛入门-金融风控(贷款违约预测)三、特征工程
前言 本次活动为datawhale与天池联合举办,为金融风控之贷款违约预测挑战赛(入门) 比赛地址:https://tianchi.aliyun.com/competition/entrance/53 ...
- 基于机器学习与深度学习的金融风控贷款违约预测
基于机器学习与深度学习的金融风控贷款违约预测 目录 一.赛题分析 1. 任务分析 2. 数据属性 3. 评价指标 4. 问题归类 5. 整体思路 二.数据可视化分析 1. 总体数据分析 2. 数值型数 ...
- 「机器学习」天池比赛:金融风控贷款违约预测
一.前言 1.1 赛题背景 赛题以金融风控中的个人信贷为背景,要求选手根据贷款申请人的数据信息预测其是否有违约的可能,以此判断是否通过此项贷款,这是一个典型的分类问题. 任务:预测用户贷款是否违约 比 ...
- 数据挖掘机器学习[六]---项目实战金融风控之贷款违约预测
相关文章: 特征工程详解及实战项目[参考] 数据挖掘---汽车车交易价格预测[一](测评指标:EDA) 数据挖掘机器学习---汽车交易价格预测详细版本[二]{EDA-数据探索性分析} 数据挖掘机器学习 ...
- 1.天池金融风控-贷款违约预测新人赛之预备知识
比赛链接:金融风控-贷款违约预测 因为这是一个金融风控专题的数据挖掘实战,在开始之前先引入一些预备知识. 1.预备知识 1.1预测指标 本次竞赛用AUC作为评价指标,AUC为ROC曲线下与坐标轴围成的 ...
- 入门金融风控【贷款违约预测】
入门金融风控[贷款违约预测] 赛题以金融风控中的个人信贷为背景,要求选手根据贷款申请人的数据信息预测其是否有违约的可能,以此判断是否通过此项贷款,这是一个典型的分类问题.通过这道赛题来引导大家了解金融 ...
- 金融风控-贷款违约预测学习笔记(Part3:特征工程)
金融风控-贷款违约预测学习笔记(Part3:特征工程) 1.特征预处理 1.1 处理类别型特征和数值型特征 1.2 缺失值填充 1.3 时间格式处理 1.4 将对象类型特征转换到数值 1.5 类别特征 ...
最新文章
- ubuntu 14.04安装postgresql最新版本
- 计算机书籍-Go语言入门经典SAMS Teach Yourself
- 顶刊发文奖励100万!不唯论文后,这所中科院研究院的激励机制引发争议
- 手把手带你玩转 AWS Lambda
- 使用root用户安装Hybris遇到的错误
- 51NOD 1125(交换机器最小代价) (贪心) 思想 !思想!
- linux多线程学习(四)——互斥锁线程控制
- 天花板级软测项目拆分详解,年后涨薪面试,稳了...
- Win8:Setting
- Halcon和Opencv区别
- 谈谈项目成本管理遇到的难题及解决措施
- 【某deed和某app面试】
- [Java][详解]使用jintellitype实现键盘全局监听
- 编写一个Python程序,计算任意圆锥体的体积和表面积。
- 紫光信息港 软件测试,紫光展锐 信息化软件工程师面经
- 国产可替代电机芯片AT8236驱动控制
- 福利:工作经常用到的Mac软件整理(全)
- 7个银行的软件测试项目实战,别再说简历项目不知道怎么写了
- MySQL新增数据,存在就更新,不存在就添加(Mybatis)
- 【PDF报表】Jasperreports+jaspersoft studio快速入门