Xgboost中特征重要性计算方法详解

1.plot_importance方法

xgboost中的plot_importance方法内置了几种计算重要性的方式。

def plot_importance(booster, ax=None, height=0.2,xlim=None, ylim=None, title='Feature importance',xlabel='F score', ylabel='Features',importance_type='weight', max_num_features=None,grid=True, show_values=True, **kwargs):"""Plot importance based on fitted trees.Parameters----------booster : Booster, XGBModel or dictBooster or XGBModel instance, or dict taken by Booster.get_fscore()ax : matplotlib Axes, default NoneTarget axes instance. If None, new figure and axes will be created.grid : bool, Turn the axes grids on or off.  Default is True (On).importance_type : str, default "weight"How the importance is calculated: either "weight", "gain", or "cover"* "weight" is the number of times a feature appears in a tree* "gain" is the average gain of splits which use the feature* "cover" is the average coverage of splits which use the featurewhere coverage is defined as the number of samples affected by the splitmax_num_features : int, default NoneMaximum number of top features displayed on plot. If None, all features will be displayed.height : float, default 0.2Bar height, passed to ax.barh()xlim : tuple, default NoneTuple passed to axes.xlim()ylim : tuple, default NoneTuple passed to axes.ylim()title : str, default "Feature importance"Axes title. To disable, pass None.xlabel : str, default "F score"X axis title label. To disable, pass None.ylabel : str, default "Features"Y axis title label. To disable, pass None.show_values : bool, default TrueShow values on plot. To disable, pass False.kwargs :Other keywords passed to ax.barh()Returns-------ax : matplotlib Axes"""

plot_importance的方法签名如上所示。

从上面的方法签名可以看出
1.如果没有指定坐标轴名称，默认的x轴名称为"F score"，y轴名称为"Features"。
2.重要性计算类型有三种，分别为weight, gain, cover，下面我们针对这三种计算类型进行总结。

2.weight

* "weight" is the number of times a feature appears in a tree

从上面的解释不难看出，weight方法衡量特征重要性的计算方式，是在子树进行分裂的时候，用到的特征次数，而且这里指的是所有的树。

一般来说，weight会给数值特征更高的值。因为连续值的变化越多，树分裂时候可以切割的空间就越大，那被用到的次数也就越多。所以对于weight指标，比较容易掩盖重要的枚举类特征。

3.gain

* "gain" is the average gain of splits which use the feature

gain采用的计算熵的方式。如果按某个特征进行分裂，熵的增量比较大，那么该特征的重要性就越强。
与特征选择里面采用计算信息增益的方式是一样的。

4.cover

* "cover" is the average coverage of splits which use the featurewhere coverage is defined as the number of samples affected by the split

cover的计算方法是，树在进行分列时，特征下面的叶子结点涵盖的样本数除以特征用来分裂的次数。当分裂越靠近树的根部时，cover的值会越大。

cover 对于枚举特征会更合适。同时，它也没有过度拟合目标函数，不会受目标函数的量纲影响。

5.permutation_importance方法

除此以外，还有permutation_importance方法也可以做衡量特征重要性的工作。sklearn官方文档针对该方法的说明如下

Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular.
This is especially useful for non-linear or opaque estimators.
The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled.
This procedure breaks the relationship between the feature and the target,
thus the drop in the model score is indicative of how much the model depends on the feature.
This technique benefits from being model agnostic and can be calculated many times with different permutations of the feature.

其原理大致如下：
1.首先根据训练集训练一个模型。
2.在测试集上测试该模型，得到模型相关的指标，比如回归问题为MSE，分类问题为logloss或者auc之类的指标。
3.在测试集上将某一个特征进行randomly shuffle(随机替换该特征值)，在使用模型进行预测，得到新的模型指标。与第2步得到的指标进行比较，如果相差越多，说明特征的重要性越大。