UserWarning: Label not :NUMBER: is present in all training examples

问题剖析：

完整错误：

问题剖析：

#问题的核心就在于，某些标签在测试集中或者验证集中存在而在训练集中不存在，才会出现这个问题。

#问题可能是一些标记只出现在几个文档中（查看本文了解详细信息）。当您将数据集拆分为train和test以验证模型时，可能会出现训练数据中缺少某些标记的情况。设train_indexes是一个数组，其中包含训练样本的索引。如果训练样本中没有出现（索引k的）特定标记，则指示矩阵y[train_indexes]第k列中的所有元素为零。

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_val_predictQ = {'What does the "yield" keyword do in Python?': ['python'],'What is a metaclass in Python?': ['oop'],'How do I check whether a file exists using Python?': ['python'],'How to make a chain of function decorators?': ['python', 'decorator'],'Using i and j as variables in Matlab': ['matlab', 'naming-conventions'],'MATLAB: get variable type': ['matlab'],'Why is MATLAB so fast in matrix multiplication?': ['performance'],'Is MATLAB OOP slow or am I doing something wrong?': ['matlab-oop'],}
dataframe = pd.DataFrame({'body': Q.keys(), 'tag': Q.values()})    mlb = MultiLabelBinarizer()
X = dataframe['body'].values
y = mlb.fit_transform(dataframe['tag'].values)classifier = Pipeline([('vectorizer', CountVectorizer(lowercase=True, stop_words='english', max_df=0.8, min_df=1)),('tfidf', TfidfTransformer()),('clf', OneVsRestClassifier(LinearSVC()))])predicted = cross_val_predict(classifier, X, y)

D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 4 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 0 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 1 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 3 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 5 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 2 is present in all training examples.str(classes[c]))

#预测输出

import numpy as np
np.set_printoptions(precision=2, threshold=1000)
predicted

array([[0, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0],[0, 1, 0, 0, 0, 0, 0],[0, 1, 0, 0, 0, 0, 0]])

#手动交叉验证并抑制错误信息的输出，来查看哪些标签不存在于训练集中。

import warnings
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import ShuffleSplitrs = ShuffleSplit(n_splits=1, test_size=.5, random_state=0)
# train_indices, test_indices = rs.split(X)
# train_indices, test_indices = rs.split(X)
for train_index, test_index in rs.split(X):train_indices, test_indices = train_index, test_indexprint("TRAIN:", train_index, "TEST:", test_index)with warnings.catch_warnings(record=True) as received_warnings:warnings.simplefilter("always")X_train, y_train = X[train_indices], y[train_indices]X_test, y_test = X[test_indices], y[test_indices]classifier.fit(X_train, y_train)predicted_test = classifier.predict(X_test)for w in received_warnings:print (w.message)

TRAIN: [3 0 5 4] TEST: [6 2 1 7]
Label not 2 is present in all training examples.
Label not 4 is present in all training examples.
Label not 5 is present in all training examples.

#也可以从实际的训练数据中得到验证；

#同理，某些预测输出的也全是0；

y_train[:4]

array([[1, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 1],[0, 1, 0, 0, 0, 0, 0],[0, 1, 0, 1, 0, 0, 0]])

#编写与问题相关的自定义的函数；

#为了克服这个问题，您可以实现自己的预测函数

def get_best_tags(clf, X, lb, n_tags=3):decfun = clf.decision_function(X)best_tags = np.argsort(decfun)[:, :-(n_tags+1): -1]return lb.classes_[best_tags]

#通过这样做，每个文档总是被分配置信度得分最高的n_tag标记：

mlb.inverse_transform(predicted_test)
get_best_tags(classifier, X_test, mlb)

array([['matlab', 'performance', 'oop'],['python', 'performance', 'oop'],['python', 'performance', 'oop'],['matlab', 'performance', 'oop']], dtype=object)

完整错误：

D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 4 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 0 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 1 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 3 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 5 is present in all training examples.str(classes[c]))
D:\anaconda\lib\site-packages\sklearn\multiclass.py:81: UserWarning: Label not 2 is present in all training examples.str(classes[c]))

参考：sklearn

参考：UserWarning: Label not :NUMBER: is present in all training examples

UserWarning: Label not :NUMBER: is present in all training examples相关推荐

【文献阅读】小目标检测综述：挑战，技术和数据集（M. MUZAMMUL等人，ACM，2021）
一.文章概况文章题目:<A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges, Techniq ...
Coursera | Andrew Ng (01-week-2-2.4)—梯度下降法
该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等.如有错误,还请批评指教.在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字.因本人一直在学习英语,所以 ...
Java Label
在Java中"{"和"}"组成一个代码块(code block),如我们最常用到的static代码块,而每个代码块都可以用一个Label,Label不是Java ...
Java Label的使用
在Java中"{"和"}"组成一个代码块(code block),如我们最常用到的static代码块,而每个代码块都可以用一个Label,Label不是Java ...
吴恩达深度学习5.3练习_Sequence Models_Trigger word detection
转载自吴恩达老师深度学习课程作业notebook Trigger Word Detection Welcome to the final programming assignment of this ...
YOLOv4：目标检测（windows和Linux下Darknet 版本）实施
YOLOv4:目标检测(windows和Linux下Darknet 版本)实施 YOLOv4 - Neural Networks for Object Detection (Windows and L ...
Emojify - v2 吴恩达老师深度学习第五课第二周编程作业2
吴恩达老师深度学习第五课第二周编程作业2,包含答案! Emojify! Welcome to the second assignment of Week 2. You are going to use ...
Andrew Ng 深度学习课后测试记录-01-week2-答案
代码标注及运行.调试结果 tips:深度学习中的很多错误软件来自矩阵/向量的维度不匹配,要注意检查 1.准备工作 import numpy as np '''python用于科学计算的基础包'''im ...
Gradient Boosting, Decision Trees and XGBoost with CUDA ——GPU加速5-6倍
xgboost的可以参考:https://xgboost.readthedocs.io/en/latest/gpu/index.html 整体看加速5-6倍的样子. Gradient Boosting ...

UserWarning: Label not :NUMBER: is present in all training examples

UserWarning: Label not :NUMBER: is present in all training examples

问题剖析：

完整错误：

UserWarning: Label not :NUMBER: is present in all training examples相关推荐

最新文章

热门文章