sklearn随机森林模型：ValueError: Unknown label type: 'unknown'

问题：

解决：

完整错误：

问题：

分类的标签数据被客户错误地标注为了浮点型；

让学生去做出现了问题，有问题的程序如下；

def random_forest_selection(df):clf = RandomForestClassifier(n_estimators=100,\random_state=42,\class_weight = 'balanced',\min_samples_leaf = 4,\min_samples_split = 4,\max_depth = 5,n_jobs = -1,)clf_X = df.drop(['label'], axis=1)clf_y = df.labelmodel = clf.fit(clf_X,clf_y)#feat_importances = pd.DataFrame(model.feature_importances_, index=clf_X.columns, columns=["importance"])feature_importances = pd.DataFrame({'feature': clf_X.columns, 'importance': model.feature_importances_})feature_importances.sort_values(by='importance', ascending=False, inplace=True)#feat_importances[:10].plot(figsize(12,5),kind='bar',)plot_feature_importances(feature_importances, n = 10, threshold = 0.95)most = min(50,df.shape[1]//5)return feature_importances.feature[0:100].valuesmy_important = random_forest_selection(df_in)
my_important

解决：

初始数据类型为float64

转换时，正确的书写格式为Int64

这样转换可以转换过去，但是在进入模型的时候会发生问题，

所以，终极的正确的的处理方式是：

缺失值填充后进行数据格式转换；且使用小写int64

df_in = df_in.fillna(0)
df_in['label'] = df_in['label'].astype('int64')

df_in = df_origin
df_in['label'] = df_in['label'].astype("int")
#'Int64'
#df_in['label'] = df_in['label'].astype("Int64")df_in['label'].value_counts()
# df_in['label'].describe()

完整错误：

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-511d3bf43def> in <module>
----> 1 my_important = random_forest_selection(df_in)
2 my_important

<ipython-input-40-4490edd7b355> in random_forest_selection(df)
12 clf_y = df.label
13
---> 14 model = clf.fit(clf_X,clf_y)
15 #feat_importances = pd.DataFrame(model.feature_importances_, index=clf_X.columns, columns=["importance"])
16 feature_importances = pd.DataFrame({'feature': clf_X.columns, 'importance': model.feature_importances_})

D:\anaconda\lib\site-packages\sklearn\ensemble\_forest.py in fit(self, X, y, sample_weight)
329 self.n_outputs_ = y.shape[1]
330
--> 331 y, expanded_class_weight = self._validate_y_class_weight(y)
332
333 if getattr(y, "dtype", None) != DOUBLE or not y.flags.contiguous:

D:\anaconda\lib\site-packages\sklearn\ensemble\_forest.py in _validate_y_class_weight(self, y)
557
558 def _validate_y_class_weight(self, y):
--> 559 check_classification_targets(y)
560
561 y = np.copy(y)

D:\anaconda\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
181 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
182 'multilabel-indicator', 'multilabel-sequences']:
--> 183 raise ValueError("Unknown label type: %r" % y_type)
184
185

ValueError: Unknown label type: 'unknown'

sklearn随机森林模型：ValueError: Unknown label type: ‘unknown‘相关推荐

sklearn随机森林模型参数解释
n_estimators 随机森林决策树的数目,n_estimators越大越好,但占用的内存与训练和预测的时间也会相应增长,且边际效益是递减的,所以要在可承受的内存/时间内选取尽可能大的n_es ...
ValueError Unknown label type unknown
应该是标签数据集的类型出错了,那么使用dtype打印一下数据类型看看(不要使用type函数,这样会发现都是int类型的),发现这时候的标签数据类型是object,这样sklearn是无法识别的,所以使 ...
sklearn.svm.SVC中raise ValueError(“Unknown label type: %r” % y_type)ValueError: Unknown label type:处理
关于sklearn.svm.SVC中raise ValueError("Unknown label type: %r" % y_type)ValueError: Unknown l ...
sklearn中ValueError: Unknown label type: ‘continuous‘错误解决
ValueError: Unknown label type: 'continuous'错误解决今天在做决策树鸢尾花分类时出现了一个错误: 解决方法是:train_y后加上astype('int') ...
成功解决raise ValueError(“Unknown label type: %s“ % repr(ys))ValueError: Unknown label type: (array([24
成功解决raise ValueError("Unknown label type: %s" % repr(ys))ValueError: Unknown label type: ( ...
kaggle项目：基于随机森林模型的心脏病患者预测分类！
公众号:尤而小屋作者:Peter 编辑:Peter 大家好,我是Peter~ 新年的第一个项目实践~给大家分享一个新的kaggle案例:基于随机森林模型(RandomForest)的心脏病人预测分类 ...
机器学习笔记十九：由浅入深的随机森林模型之分类
随机森林学习内容 1. 集成学习 2.sklearn中的集成算法 2.1 sklearn中的集成算法模块ensemble 2.2 RandomForestClassifier 2.2.1 参数 2.2 ...
随机森林模型sklearn_Sklearn_随机森林
一.集成算法概述集成学习(ensemble learning)是时下非常流行的机器学习算法,它本身不是一个单独的机器学习算法,而是通过在数据上构建多个模型,集成所有模型的建模结果.基本上所有的机器 ...
大数据分析案例-基于随机森林模型对北京房价进行预测
目录 1.项目背景 2.项目简介 2.1数据说明 2.2技术工具 3.算法原理 4.项目实施步骤 4.1理解数据 4.2数据预处理 4.2.1缺失值处理 4.2.2重复值处理 4.2.3异常值处理 4 ...

sklearn随机森林模型：ValueError: Unknown label type: ‘unknown‘

sklearn随机森林模型：ValueError: Unknown label type: 'unknown'

问题：

解决：

完整错误：

sklearn随机森林模型：ValueError: Unknown label type: ‘unknown‘相关推荐

最新文章

热门文章