示例代码

我们使用一个分箱的库：scorecardbundle来做分箱
scorecardbundle github主页：https://github.com/Lantianzz/Scorecard-Bundle
scorecardbundle 文档：https://scorecard-bundle.bubu.blue/English/1.intro.html

import pandas as pddef get_dataset():from sklearn.datasets import make_classificationdata_x, data_y = make_classification(n_samples=1000, n_classes=2, n_features=6, n_informative=4, random_state=0)  # 2个特征data_df = pd.DataFrame(data_x).merge(pd.Series(data_y, name="y_label"), left_index=True, right_index=True)data_df.columns = ['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'y_label']return data_df.drop(["y_label"], axis=1), data_df['y_label']if __name__ == '__main__':x_value, y_value = get_dataset()# 分箱from scorecardbundle.feature_discretization import ChiMerge as cmtrans_cm = cm.ChiMerge(max_intervals=10, min_intervals=2, decimal=3, output_dataframe=True)result_cm = trans_cm.fit_transform(x_value, y_value)trans_cm.boundaries_  # 查看分箱阈值# woe转换from scorecardbundle.feature_encoding import WOE as woetrans_woe = woe.WOE_Encoder(output_dataframe=True)result_woe = trans_woe.fit_transform(result_cm, y_value)print(trans_woe.iv_)  # iv值print(trans_woe.result_dict_)  # woe dictionary and iv value for each feature

得到结果：

{'x1': 1.6346317045813659, 'x2': 0.8150596934872387, 'x3': 2.1554523388874394, 'x4': 0.09694014765234728, 'x5': 4.0535459433655925, 'x6': 0.7650559687769136}
{'x1': ({'-0.089~0.574': -0.14034198335669795, '-0.49~-0.089': -1.2982402309053493, '-1.819~-0.49': -0.6471710273616801, '-1.856~-1.819': 1.406295027699045, '-3.086~-1.856': -0.2676814057179092, '-inf~-3.086': 1.1351422572511074, '0.574~0.627': -2.177223909860503, '0.627~1.396': 0.02000066670100254, '1.396~3.756': 1.13436131227068, '3.756~inf': 26.041583870099096}, 1.6346317045813659), 'x2': ({'-1.761~0.542': -0.15463536598804814, '-2.661~-1.761': -0.7879220283290467, '-inf~-2.661': -3.643560975704789, '0.542~0.943': 0.6390398750588908, '0.943~1.866': 1.6056279303592254, '1.866~2.056': 0.24314421797227492, '2.056~inf': 2.483853907178159}, 0.8150596934872387), 'x3': ({'-0.832~0.868': 0.046387421872776745, '-1.606~-0.832': -0.8107678916070944, '-1.628~-1.606': 0.8672985270005281, '-1.794~-1.628': -0.8272971935574127, '-1.823~-1.794': 0.8672985270005281, '-2.661~-1.823': -0.827297193555032, '-2.737~-2.661': 1.406295027699045, '-inf~-2.737': -0.3854644413598186, '0.868~2.802': 1.4918172011816269, '2.802~inf': 26.447048978207263}, 2.1554523388874394), 'x4': ({'-0.706~0.394': -0.2729864579462302, '-0.776~-0.706': 1.406295027699045, '-1.904~-0.776': -0.2754635461586352, '-2.181~-1.904': 0.7131478472036047, '-2.424~-2.181': -0.47247581833830427, '-inf~-2.424': 0.27886230059432987, '0.394~0.619': 0.274892916306179, '0.619~0.752': -0.4367577357396309, '0.752~inf': 0.1614123207389391}, 0.09694014765234728), 'x5': ({'-0.81~0.146': -1.4518158675271366, '-1.099~-0.81': -0.5060924291285228, '-1.391~-1.099': 0.38396604387173733, '-1.44~-1.391': -1.0786116217760677, '-2.204~-1.44': 0.32228153854905195, '-2.852~-2.204': -0.935510778169474, '-inf~-2.852': -23.025850929940457, '0.146~0.881': -0.008986870170557375, '0.881~2.227': 1.5973502645022293, '2.227~inf': 3.8041903004751383}, 4.0535459433655925), 'x6': ({'-0.827~0.314': -0.11353072590854493, '-1.026~-0.827': -1.366293694126287, '-1.374~-1.026': -0.15184959020824298, '-1.47~-1.374': -2.1200654960643353, '-1.962~-1.47': -0.5525185259954828, '-4.027~-1.962': 0.6390398750607955, '-inf~-4.027': 25.348436689539152, '0.314~1.844': 0.39212935451196373, '1.844~inf': -0.3664162463898471}, 0.7650559687769136)}

其中trans_woe就是预期的结果

数据分箱6——分箱结果进行WOE转化相关推荐

等宽分箱_数据分析师-数据挖掘如何分箱以及对箱子中的数据进行平滑处理
题干: 假定用于分析的数据包含属性age.数据元组中age的值如下(按递增序):13,15,16,16,19,20,20,21,22,22, 25,25,25,30,33,33,35,35,36,40 ...
R语言数据预处理——离散化（分箱）
R语言数据预处理--离散化(分箱) 一.项目环境开发工具:RStudio R:3.5.2 相关包:infotheo,discretization,smbinning,dplyr,sqldf 二.导入 ...
数据分箱——KS分箱/卡放分箱
目录 1.前言 2.定义 3.分箱的用处 4.分箱方法 4.1 KS分箱变量的KS值 Best-KS分箱 4.2卡方分箱 1.前言评分卡建模在金融行业应用得比较广泛,比如对客户的信贷诚信度进行评分 ...
特征工程之特征分箱（决策树分箱、卡方分箱、bestks以及评价标准WOE和IV)
特征工程之特征分箱:决策树分箱.卡方分箱.bestks以及评价标准 1.WOE和IV 2.无监督分箱 2.1等频分箱 2.2等距分箱 3.有监督分箱 3.1决策树分箱 3.2best-ks分箱 3.3 ...
python 等深分箱等宽分箱结合二分箱的数据分析
python 等深分箱等宽分箱结合二分箱的数据分析等深分箱等宽分箱概述 Python里可以通过pcut(等深分箱即每箱的样本量基本一致)和cut(等宽分箱即样本量之间有相同的宽度)对样本进行分箱. ...
python最优分箱计算iv值_GitHub - zhaoxingfeng/WOE: Weight of Evidence，基于iv值最大思想求最优分箱...
WOE WOE Transformation常用于信用风险评分卡(Credit Risk Scorecard)模型中,采用分箱的方式对原始特征进行非线性映射.常见的分箱方法有等宽分箱.等频分箱.最优分 ...
Python数据可视化：如何创建箱线图
一图胜千言,使用Python的matplotlib库,可以快速创建高质量的图形. 我们团队推出一个新的系列教程:Python数据可视化,针对初级和中级用户,将理论和示例代码相结合,使用matplotl ...
R语言ggplot2可视化分组变量下的数据分布（线条、色彩配置）、WVPlots包的ShadowHist函数比较分组下的数据直方图、ggplot2分面图facet_wrap可视化分组变量下的数据分布
R语言ggplot2可视化分组变量下的数据分布(线条.色彩配置).WVPlots包的ShadowHist函数比较分组下的数据直方图.ggplot2分面图facet_wrap可视化分组变量下的数据分布 ...
Day 4 - PB级规模数据的Elasticsearch分库分表实践
Day 4 - PB级规模数据的Elasticsearch分库分表实践从2018年7月在开始在某阿里云数据中心部署Elasticsearch软件,到2018年12月共创建了15个集群,服务于客户的文 ...

数据分箱6——分箱结果进行WOE转化

示例代码

数据分箱6——分箱结果进行WOE转化相关推荐

最新文章

热门文章