python重要性_基于Python的随机森林特征重要性图

我正在使用python中的RandomForestRegressor，我想创建一个图表来说明特性重要性的排名。这是我使用的代码：from sklearn.ensemble import RandomForestRegressor

MT= pd.read_csv("MT_reduced.csv")

df = MT.reset_index(drop = False)

columns2 = df.columns.tolist()

# Filter the columns to remove ones we don't want.

columns2 = [c for c in columns2 if c not in["Violent_crime_rate","Change_Property_crime_rate","State","Year"]]

# Store the variable we'll be predicting on.

target = "Property_crime_rate"

# Let’s randomly split our data with 80% as the train set and 20% as the test set:

# Generate the training set. Set random_state to be able to replicate results.

train2 = df.sample(frac=0.8, random_state=1)

#exclude all obs with matching index

test2 = df.loc[~df.index.isin(train2.index)]

print(train2.shape) #need to have same number of features only difference should be obs

print(test2.shape)

# Initialize the model with some parameters.

model = RandomForestRegressor(n_estimators=100, min_samples_leaf=8, random_state=1)

#n_estimators= number of trees in forrest

#min_samples_leaf= min number of samples at each leaf

# Fit the model to the data.

model.fit(train2[columns2], train2[target])

# Make predictions.

predictions_rf = model.predict(test2[columns2])

# Compute the error.

mean_squared_error(predictions_rf, test2[target])#650.4928

特征重要性features=df.columns[[3,4,6,8,9,10]]

importances = model.feature_importances_

indices = np.argsort(importances)

plt.figure(1)

plt.title('Feature Importances')

plt.barh(range(len(indices)), importances[indices], color='b', align='center')

plt.yticks(range(len(indices)), features[indices])

plt.xlabel('Relative Importance')

尝试用数据复制代码时，收到以下错误：IndexError: index 6 is out of bounds for axis 1 with size 6

另外，在没有标签的情况下，只有一个功能显示在我的图表上，其重要性为100%。

任何帮助解决这个问题，所以我可以创建这个图表将非常感谢。

python重要性_基于Python的随机森林特征重要性图相关推荐

GEE随记（二）：随机森林特征重要性的排序
原因在Google Earth Engine(GEE)利用随机森林模型进行地物监督分类时,我们往往也需要输出特征重要性来判别哪一种特征对分类最有效,关于随机森林的地物分类可以参考相关博主的文章:GE ...
随机森林特征重要性（Variable importance）评估方法
Random Forest Variable importance 算法介绍实现算法流程分类回归实验实验1:waveform数据集(分类) 实验2:superconductivity数据集 ...
基于java的随机森林算法_基于Spark实现随机森林代码
本文实例为大家分享了基于Spark实现随机森林的具体代码,供大家参考,具体内容如下 public class RandomForestClassficationTest extends TestCas ...
xgboost和随机森林特征重要性计算方法
随机森林中特征重要性和xgboost不同: 随机森林中的特征重要性主要是基于不纯度(也可以叫做Gini importance): 计算某一个节点不纯度为其中,ωk\omega_kωk,ωleft\ ...
python随机森林特征重要性_基于随机森林识别特征重要性(翻译)
博主Slav Ivanov 的文章<Identifying churn drivers with Random Forests >部分内容翻译.博主有一款自己的产品RetainKit,用A ...
python随机森林特征重要性原理_随机森林进行特征重要性度量的详细说明
特征选择方法中,有一种方法是利用随机森林,进行特征的重要性度量,选择重要性较高的特征.下面对如何计算重要性进行说明. 1 特征重要性度量计算某个特征X的重要性时,具体步骤如下: 1)对每一颗决策树 ...
python随机森林特征重要性_Python中随机森林回归的特征重要性
当涉及到决策树时,特征重要性不是一个黑匣子.来自DecisionTreeRegressor的文档:The importance of a feature is computed as the (nor ...
用python做炒股软件-同花顺有python接口_基于python的炒股软件
股票详细数据怎么获得股市数据针对股票等金融数据的获取,python提供了一个非常实用的模块-tushare,自动完成了数据从采集.清洗到存储的全过程,可以极大减轻金融分析人员的工作量,下面我简单介绍 ...
python产品缺陷_基于python从redmine-api中获取项目缺陷数据（1）
1.引言本文主要内容是将如何利用 Python 对 Redmine缺陷进行缺陷数据获取操作.目前统计缺陷数据时基本是根据项目手动去redmine获取缺陷数据,至少要花费一个工作日去完成,目前的目标是 ...

python重要性_基于Python的随机森林特征重要性图

python重要性_基于Python的随机森林特征重要性图相关推荐

最新文章

热门文章