UserWarning: Label not :NUMBER: is present in all training examples

目录

UserWarning: Label not :NUMBER: is present in all training examples

问题剖析:

完整错误:


问题剖析:

#问题的核心就在于,某些标签在测试集中或者验证集中存在而在训练集中不存在,才会出现这个问题。

#问题可能是一些标记只出现在几个文档中(查看本文了解详细信息)。当您将数据集拆分为train和test以验证模型时,可能会出现训练数据中缺少某些标记的情况。设train_indexes是一个数组,其中包含训练样本的索引。如果训练样本中没有出现(索引k的)特定标记,则指示矩阵y[train_indexes]第k列中的所有元素为零。

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_val_predictQ = {'What does the "yield" keyword do in Python?': ['python'],'What is a metaclass in Python?': ['oop'],'How do I check whether a file exists using Python?': ['python'],'How to make a chain of function decorators?': ['python', 'decorator'],'Using i and j as variables in Matlab': ['matlab', 'naming-conventions'],'MATLAB: get variable type': ['matlab'],'Why is MATLAB so fast in matrix multiplication?': ['performance'],'Is MATLAB OOP slow or am I doing something wrong?': ['matlab-oop'],}
dataframe = pd.DataFrame({'body': Q.keys(), 'tag': Q.values()})    mlb = MultiLabelBinarizer()
X = dataframe['body'].values
y = mlb.fit_transform(dataframe['tag'].values)classifier = Pipeline([('vectorizer', CountVectorizer(lowercase=True, stop_words='english', max_df=0.8, min_df=1)),('tfidf', TfidfTransformer()),('clf', OneVsRestClassifier(LinearSVC()))])predicted = cross_val_predict(classifier, X, y)
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 4 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 0 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 1 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 3 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 5 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 2 is present in all training examples.str(classes[c]))

#预测输出

import numpy as np
np.set_printoptions(precision=2, threshold=1000)
predicted
array([[0, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 1, 0, 0, 0, 0, 0],[0, 1, 0, 0, 0, 0, 0]])

#手动交叉验证并抑制错误信息的输出,来查看哪些标签不存在于训练集中。

import warnings
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import ShuffleSplitrs = ShuffleSplit(n_splits=1, test_size=.5, random_state=0)
# train_indices, test_indices = rs.split(X)
# train_indices, test_indices = rs.split(X)
for train_index, test_index in rs.split(X):train_indices, test_indices = train_index, test_indexprint("TRAIN:", train_index, "TEST:", test_index)with warnings.catch_warnings(record=True) as received_warnings:warnings.simplefilter("always")X_train, y_train = X[train_indices], y[train_indices]X_test, y_test = X[test_indices], y[test_indices]classifier.fit(X_train, y_train)predicted_test = classifier.predict(X_test)for w in received_warnings:print (w.message)
TRAIN: [3 0 5 4] TEST: [6 2 1 7]
Label not 2 is present in all training examples.
Label not 4 is present in all training examples.
Label not 5 is present in all training examples.

#也可以从实际的训练数据中得到验证;

#同理,某些预测输出的也全是0;

y_train[:4]
array([[1, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 1],[0, 1, 0, 0, 0, 0, 0],[0, 1, 0, 1, 0, 0, 0]])

#编写与问题相关的自定义的函数;

#为了克服这个问题,您可以实现自己的预测函数

def get_best_tags(clf, X, lb, n_tags=3):decfun = clf.decision_function(X)best_tags = np.argsort(decfun)[:, :-(n_tags+1): -1]return lb.classes_[best_tags]

#通过这样做,每个文档总是被分配置信度得分最高的n_tag标记:

mlb.inverse_transform(predicted_test)
get_best_tags(classifier, X_test, mlb)
array([['matlab', 'performance', 'oop'],['python', 'performance', 'oop'],['python', 'performance', 'oop'],['matlab', 'performance', 'oop']], dtype=object)

完整错误:

D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 4 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 0 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 1 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 3 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 5 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 2 is present in all training examples.str(classes[c]))

参考:sklearn

参考:UserWarning: Label not :NUMBER: is present in all training examples

UserWarning: Label not :NUMBER: is present in all training examples相关推荐

  1. 【文献阅读】小目标检测综述:挑战,技术和数据集(M. MUZAMMUL等人,ACM,2021)

    一.文章概况     文章题目:<A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges, Techniq ...

  2. Coursera | Andrew Ng (01-week-2-2.4)—梯度下降法

    该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等.如有错误,还请批评指教.在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字.因本人一直在学习英语,所以 ...

  3. Java Label

    在Java中"{"和"}"组成一个代码块(code block),如我们最常用到的static代码块,而每个代码块都可以用一个Label,Label不是Java ...

  4. Java Label的使用

    在Java中"{"和"}"组成一个代码块(code block),如我们最常用到的static代码块,而每个代码块都可以用一个Label,Label不是Java ...

  5. 吴恩达深度学习5.3练习_Sequence Models_Trigger word detection

    转载自吴恩达老师深度学习课程作业notebook Trigger Word Detection Welcome to the final programming assignment of this ...

  6. YOLOv4:目标检测(windows和Linux下Darknet 版本)实施

    YOLOv4:目标检测(windows和Linux下Darknet 版本)实施 YOLOv4 - Neural Networks for Object Detection (Windows and L ...

  7. Emojify - v2 吴恩达老师深度学习第五课第二周编程作业2

    吴恩达老师深度学习第五课第二周编程作业2,包含答案! Emojify! Welcome to the second assignment of Week 2. You are going to use ...

  8. Andrew Ng 深度学习课后测试记录-01-week2-答案

    代码标注及运行.调试结果 tips:深度学习中的很多错误软件来自矩阵/向量的维度不匹配,要注意检查 1.准备工作 import numpy as np '''python用于科学计算的基础包'''im ...

  9. Gradient Boosting, Decision Trees and XGBoost with CUDA ——GPU加速5-6倍

    xgboost的可以参考:https://xgboost.readthedocs.io/en/latest/gpu/index.html 整体看加速5-6倍的样子. Gradient Boosting ...

最新文章

  1. 对标Mobileye!百度Apollo公布L4级自动驾驶纯视觉解决方案Apollo Lite
  2. Nanopore测序
  3. matlab金属槽有限差分法程序,有限差分法MATLAB程序
  4. 无忧开通了博客园博客主页
  5. 巧用Newtonsoft.Json处理重复请求/并发请求?
  6. 一晚浓浓的程序员鸡汤,先干为敬
  7. 把zabbix图形整合至运维平台
  8. c++循环读取多行文本文件
  9. [Python] 字典 get(key, default=None):获取字典中相应键的对应值
  10. Windows11动态磁贴替代软件大盘点 _φ(❐_❐✧ windows教程
  11. 云原生技术开放日PPT大放送 | 五星级的云原生开发体验
  12. 继电保护原理3-输电线纵差
  13. 威纶触摸屏如何组态设置离散量报警及报警历史记录?
  14. 2021年中国气体分离设备行业运营情况分析:气体分离及液化设备产量14.85万台[图]
  15. 计算机的音量打不开,电脑没声音。音量控制也打不开怎么处理?
  16. css 背景图片虚化磨砂效果
  17. Flask中自定义红图拆分视图函数的方法以及——为什么蓝图不适合用于拆分试图函数
  18. 四火的唠叨51CTO访谈--有关面试
  19. 外文网站对CMap用法的精辟解释
  20. OPengl实现小球围绕大球旋转的效果

热门文章

  1. linux shell 读取文件 笔记
  2. python3并发编程基础
  3. c linux new使内存耗尽_C/C++的内存泄漏检测工具Valgrind memcheck的使用经历
  4. 最新!2022中国大学排名发布
  5. 基于视觉惯性里程计的无监督深度补全方法
  6. 使用 OpenMVG+PMVS实现视觉三维重建
  7. SpringBoot mybatis Interceptor分页实现
  8. RDKit | 基于主成分分析可视化(DrugBank)类药性的化学空间
  9. DrugVQA | 用视觉问答技术预测药物蛋白质相互作用
  10. R语言-包的安装、载入及使用方法