UserWarning: Label not :NUMBER: is present in all training examples
UserWarning: Label not :NUMBER: is present in all training examples
目录
UserWarning: Label not :NUMBER: is present in all training examples
问题剖析:
完整错误:
问题剖析:
#问题的核心就在于,某些标签在测试集中或者验证集中存在而在训练集中不存在,才会出现这个问题。
#问题可能是一些标记只出现在几个文档中(查看本文了解详细信息)。当您将数据集拆分为train和test以验证模型时,可能会出现训练数据中缺少某些标记的情况。设train_indexes是一个数组,其中包含训练样本的索引。如果训练样本中没有出现(索引k的)特定标记,则指示矩阵y[train_indexes]第k列中的所有元素为零。
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_val_predictQ = {'What does the "yield" keyword do in Python?': ['python'],'What is a metaclass in Python?': ['oop'],'How do I check whether a file exists using Python?': ['python'],'How to make a chain of function decorators?': ['python', 'decorator'],'Using i and j as variables in Matlab': ['matlab', 'naming-conventions'],'MATLAB: get variable type': ['matlab'],'Why is MATLAB so fast in matrix multiplication?': ['performance'],'Is MATLAB OOP slow or am I doing something wrong?': ['matlab-oop'],}
dataframe = pd.DataFrame({'body': Q.keys(), 'tag': Q.values()}) mlb = MultiLabelBinarizer()
X = dataframe['body'].values
y = mlb.fit_transform(dataframe['tag'].values)classifier = Pipeline([('vectorizer', CountVectorizer(lowercase=True, stop_words='english', max_df=0.8, min_df=1)),('tfidf', TfidfTransformer()),('clf', OneVsRestClassifier(LinearSVC()))])predicted = cross_val_predict(classifier, X, y)
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 4 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 0 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 1 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 3 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 5 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 2 is present in all training examples.str(classes[c]))
#预测输出
import numpy as np
np.set_printoptions(precision=2, threshold=1000)
predicted
array([[0, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 1, 0, 0, 0, 0, 0],[0, 1, 0, 0, 0, 0, 0]])
#手动交叉验证并抑制错误信息的输出,来查看哪些标签不存在于训练集中。
import warnings
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import ShuffleSplitrs = ShuffleSplit(n_splits=1, test_size=.5, random_state=0)
# train_indices, test_indices = rs.split(X)
# train_indices, test_indices = rs.split(X)
for train_index, test_index in rs.split(X):train_indices, test_indices = train_index, test_indexprint("TRAIN:", train_index, "TEST:", test_index)with warnings.catch_warnings(record=True) as received_warnings:warnings.simplefilter("always")X_train, y_train = X[train_indices], y[train_indices]X_test, y_test = X[test_indices], y[test_indices]classifier.fit(X_train, y_train)predicted_test = classifier.predict(X_test)for w in received_warnings:print (w.message)
TRAIN: [3 0 5 4] TEST: [6 2 1 7] Label not 2 is present in all training examples. Label not 4 is present in all training examples. Label not 5 is present in all training examples.
#也可以从实际的训练数据中得到验证;
#同理,某些预测输出的也全是0;
y_train[:4]
array([[1, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 1],[0, 1, 0, 0, 0, 0, 0],[0, 1, 0, 1, 0, 0, 0]])
#编写与问题相关的自定义的函数;
#为了克服这个问题,您可以实现自己的预测函数
def get_best_tags(clf, X, lb, n_tags=3):decfun = clf.decision_function(X)best_tags = np.argsort(decfun)[:, :-(n_tags+1): -1]return lb.classes_[best_tags]
#通过这样做,每个文档总是被分配置信度得分最高的n_tag标记:
mlb.inverse_transform(predicted_test)
get_best_tags(classifier, X_test, mlb)
array([['matlab', 'performance', 'oop'],['python', 'performance', 'oop'],['python', 'performance', 'oop'],['matlab', 'performance', 'oop']], dtype=object)
完整错误:
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 4 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 0 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 1 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 3 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 5 is present in all training examples.str(classes[c])) D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 2 is present in all training examples.str(classes[c]))
参考:sklearn
参考:UserWarning: Label not :NUMBER: is present in all training examples
UserWarning: Label not :NUMBER: is present in all training examples相关推荐
- 【文献阅读】小目标检测综述:挑战,技术和数据集(M. MUZAMMUL等人,ACM,2021)
一.文章概况 文章题目:<A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges, Techniq ...
- Coursera | Andrew Ng (01-week-2-2.4)—梯度下降法
该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等.如有错误,还请批评指教.在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字.因本人一直在学习英语,所以 ...
- Java Label
在Java中"{"和"}"组成一个代码块(code block),如我们最常用到的static代码块,而每个代码块都可以用一个Label,Label不是Java ...
- Java Label的使用
在Java中"{"和"}"组成一个代码块(code block),如我们最常用到的static代码块,而每个代码块都可以用一个Label,Label不是Java ...
- 吴恩达深度学习5.3练习_Sequence Models_Trigger word detection
转载自吴恩达老师深度学习课程作业notebook Trigger Word Detection Welcome to the final programming assignment of this ...
- YOLOv4:目标检测(windows和Linux下Darknet 版本)实施
YOLOv4:目标检测(windows和Linux下Darknet 版本)实施 YOLOv4 - Neural Networks for Object Detection (Windows and L ...
- Emojify - v2 吴恩达老师深度学习第五课第二周编程作业2
吴恩达老师深度学习第五课第二周编程作业2,包含答案! Emojify! Welcome to the second assignment of Week 2. You are going to use ...
- Andrew Ng 深度学习课后测试记录-01-week2-答案
代码标注及运行.调试结果 tips:深度学习中的很多错误软件来自矩阵/向量的维度不匹配,要注意检查 1.准备工作 import numpy as np '''python用于科学计算的基础包'''im ...
- Gradient Boosting, Decision Trees and XGBoost with CUDA ——GPU加速5-6倍
xgboost的可以参考:https://xgboost.readthedocs.io/en/latest/gpu/index.html 整体看加速5-6倍的样子. Gradient Boosting ...
最新文章
- 对标Mobileye!百度Apollo公布L4级自动驾驶纯视觉解决方案Apollo Lite
- Nanopore测序
- matlab金属槽有限差分法程序,有限差分法MATLAB程序
- 无忧开通了博客园博客主页
- 巧用Newtonsoft.Json处理重复请求/并发请求?
- 一晚浓浓的程序员鸡汤,先干为敬
- 把zabbix图形整合至运维平台
- c++循环读取多行文本文件
- [Python] 字典 get(key, default=None):获取字典中相应键的对应值
- Windows11动态磁贴替代软件大盘点 _φ(❐_❐✧ windows教程
- 云原生技术开放日PPT大放送 | 五星级的云原生开发体验
- 继电保护原理3-输电线纵差
- 威纶触摸屏如何组态设置离散量报警及报警历史记录?
- 2021年中国气体分离设备行业运营情况分析:气体分离及液化设备产量14.85万台[图]
- 计算机的音量打不开,电脑没声音。音量控制也打不开怎么处理?
- css 背景图片虚化磨砂效果
- Flask中自定义红图拆分视图函数的方法以及——为什么蓝图不适合用于拆分试图函数
- 四火的唠叨51CTO访谈--有关面试
- 外文网站对CMap用法的精辟解释
- OPengl实现小球围绕大球旋转的效果
热门文章
- linux shell 读取文件 笔记
- python3并发编程基础
- c linux new使内存耗尽_C/C++的内存泄漏检测工具Valgrind memcheck的使用经历
- 最新!2022中国大学排名发布
- 基于视觉惯性里程计的无监督深度补全方法
- 使用 OpenMVG+PMVS实现视觉三维重建
- SpringBoot mybatis Interceptor分页实现
- RDKit | 基于主成分分析可视化(DrugBank)类药性的化学空间
- DrugVQA | 用视觉问答技术预测药物蛋白质相互作用
- R语言-包的安装、载入及使用方法