在 Hub 中,您可以找到 AI 社区共享的 27,000 多个模型,这些模型在情感分析、对象检测、文本生成、语音识别等任务上具有最先进的性能。

from transformers import pipeline
#sentiment_pipeline = pipeline("sentiment-analysis")
data = ["This is wonderful and easy to put together.  My cats love it.",
"This cat tree is almost perfect. I wanted a tall tree, and this one delivers. It reaches almost to the top of my 8\' ceiling",
"The super large box had disintegrated by the time it arrived to my doorstep &amp; large portions were missing from a 89” solid wood cat tree. I took detailed pictures of the box before &amp; after unpacking &amp;  laying out all contents. Several pieces were badly damaged &amp; 3 crucial pieces were missing.<br/>A 45 minute phone call with Amazon resulted in Amazon requesting missing parts from Armarkat who never responded despite my repeated attempts to follow-through. Amazon offered for me to purchase another box, pack it &amp; haul the box (weighs more than I weigh) to a place to be picked up. There’s no opportunity to do that where I live.<br/><br/>It’s a very expensive loss"]
sentiment_pipeline = pipeline("sentiment-analysis")print(sentiment_pipeline(data))

在自己的亚马逊数据集上训练

from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
import os.path as osp
import os
import numpy as np
from datasets import load_metric
### pretrained model :distilbert-base-uncased
### bert-base-uncased
### gpt2
### distilgpt2def get_list(path,file_list):end_list = []for sample in file_list:sample_path = osp.join(path,sample)end_list.append(sample_path)return end_listdef get_dataset(dataset_path):test_path = osp.join(dataset_path,'test/')train_path = osp.join(dataset_path,'train/')val_path = osp.join(dataset_path,'val/')test_file_list = os.listdir(test_path)train_file_list = os.listdir(train_path)val_file_list = os.listdir(val_path)test_list = get_list(test_path,test_file_list)train_list = get_list(train_path,train_file_list)val_list = get_list(val_path,val_file_list)return test_list,train_list,val_listdef check_the_wrong_sample(labels,predictions):val_folder = '/cloud/cloud_disk/users/huh/dataset/nlp_dataset/question_dataset/process_data/cattree_product_quality/val'end_folder = '/cloud/cloud_disk/users/huh/dataset/nlp_dataset/question_dataset/process_data/cattree_product_quality/wrong_sample'sample_list = os.listdir(val_folder)index = 0for samle in labels:if samle != predictions[index]:print(index)print(sample_list[index])wrong_sample_path = osp.join(val_folder,sample_list[index])end_sample_path = osp.join(end_folder,sample_list[index])os.system("cp {} {}".format(wrong_sample_path,end_sample_path))index +=1def compute_metric(eval_pred):metric = load_metric("accuracy")logits,labels = eval_predprint(logits,labels)print(len(logits),len(labels))predictions = np.argmax(logits,axis=-1)print(len(predictions))print('predictions')print(predictions)check_the_wrong_sample(labels,predictions)return metric.compute(predictions = predictions,references = labels)def train(dataset_path):test_list,train_list,val_list = get_dataset(dataset_path)question_dataset = load_dataset('json', data_files={'train':train_list,'test':test_list,'val':val_list})tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")def preprocess_function(examples):return tokenizer(examples["text"], truncation=True)tokenized_imdb = question_dataset.map(preprocess_function, batched=True)data_collator = DataCollatorWithPadding(tokenizer=tokenizer)model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)training_args = TrainingArguments(output_dir="./results",learning_rate=2e-5,per_device_train_batch_size=16,per_device_eval_batch_size=16,num_train_epochs=5,weight_decay=0.01,logging_steps = 50,run_name = "catree",save_strategy='no')trainer = Trainer(model=model,args=training_args,train_dataset=tokenized_imdb["train"],eval_dataset=tokenized_imdb["val"],tokenizer=tokenizer,data_collator=data_collator,compute_metrics=compute_metric)trainer.train()trainer.evaluate()if __name__ == '__main__':#dataset_path = '/cloud/cloud_disk/users/huh/dataset/nlp_dataset/question_dataset/process_data/catree_personality_2.0'dataset_path = '/cloud/cloud_disk/users/huh/dataset/nlp_dataset/question_dataset/process_data/cattree_product_quality'train(dataset_path)

利用huggingface进行文本分类相关推荐

  1. 利用SVM 实现文本分类的实例

    原文来自:http://blog.csdn.net/zhzhl202/article/details/8197109 之前做过一些文本挖掘的项目,比如网页分类.微博情感分析.用户评论挖掘,也曾经将li ...

  2. 文本分类 决策树 python_NLTK学习笔记(六):利用机器学习进行文本分类

    关于分类文本,有三个问题 怎么识别出文本中用于明显分类的特征 怎么构建自动分类文本的模型 相关的语言知识 按照这个思路,博主进行了艰苦学习(手动捂脸..) 一.监督式分类:建立在训练语料基础上的分类 ...

  3. 利用LSTM 做文本分类

    """ RNN模型 下面我们尝试把模型换成一个recurrent neural network (RNN).RNN经常会被用来encode一个sequence ℎ

  4. 用深度学习(CNN RNN Attention)解决大规模文本分类问题 - 综述和实践

    https://zhuanlan.zhihu.com/p/25928551 近来在同时做一个应用深度学习解决淘宝商品的类目预测问题的项目,恰好硕士毕业时论文题目便是文本分类问题,趁此机会总结下文本分类 ...

  5. 用深度学习解决大规模文本分类问题

     用深度学习解决大规模文本分类问题 人工智能头条 2017-03-27 22:14:22 淘宝 阅读(228) 评论(0) 声明:本文由入驻搜狐公众平台的作者撰写,除搜狐官方账号外,观点仅代表作者 ...

  6. pyhanlp 文本分类与情感分析

    这一次我们需要利用HanLP进行文本分类与情感分析.同时这也是pyhanlp用户指南的倒数第二篇关于接口和Python实现的文章了,再之后就是导论,使用技巧汇总和几个实例落.真是可喜可贺啊. 文本分类 ...

  7. [026]文本分类之SVM

    1 基础知识 1. 1 样本整理 文本分类属于有监督的学习,所以需要整理样本.根据业务需求,确定样本标签与数目,其中样本标签多为整数.在svm中其中如果为二分类,样本标签一般会设定为-1和1,而在朴素 ...

  8. 传统文本分类和基于深度学习文本分类

    用深度学习(CNN RNN Attention)解决大规模文本分类问题 - 综述和实践 近来在同时做一个应用深度学习解决淘宝商品的类目预测问题的项目,恰好硕士毕业时论文题目便是文本分类问题,趁此机会总 ...

  9. [NLP] 文本分类之TextCNN模型原理和实现(超详细)

    1. 模型原理 1.1论文 Yoon Kim在论文(2014 EMNLP) Convolutional Neural Networks for Sentence Classification提出Tex ...

最新文章

  1. ICCV2021 Oral SimROD:简单高效的数据增强!华为提出了一种简单的鲁棒目标检测自适应方法...
  2. 前端面试题学习和总结
  3. sourcesafe管理phpproj文件的补充说明(downmoon)
  4. openGL光照要点总结
  5. 为什么Servlet程序的init(ServletConfig config)中需要调用父类的init方法
  6. Windows7下安装LabelImg标注工具
  7. php笔记之表单验证
  8. C++实现tree234树(附完整源码)
  9. Docker 容器从入门到Devops实践
  10. glassfish_具有GlassFish和一致性的高性能JPA –第2部分
  11. 梯度算法的matlab程序,基于MATLAB的梯度法源代码
  12. C++学习笔记系列四
  13. ios category 笔记整理(一)
  14. property java用法_Java Properties getProperty(key)用法及代码示例
  15. 【计算机技术】我用两句话在电脑课上解除了老师的控制
  16. 智鹰科技——无人机线路巡检系统商业计划书
  17. newifi3刷机 php,新路由3(newifi d2)刷老毛子固件教程-路由器交流
  18. php 判断百度蜘蛛抓取,判断百度蜘蛛偷偷进行转移权重301,给新站提权
  19. 微信群抽奖,有什么好用的抽奖小程序?
  20. 小米手机刷机失败之小米La¥%¥Ji

热门文章

  1. 四轴飞行器F450+Futaba 14SG+好盈电调油门行程校准
  2. 校园网b站播放器报错解决
  3. 马来西亚外劳市场现状
  4. 渗透测试-Kali入侵Win7主机
  5. 计算机应用技术专业英文,计算机应用技术类专业英文简历模板
  6. 教你用TensorFlow搭建AlexNet
  7. OpenCV系列之轮廓属性 | 二十三
  8. python参数化建模加工图_proe参数化建模教程(最新)
  9. 2019/9/05 软件项目管理作业(用例图)
  10. 新手坐高铁怎么找车厢_第一次坐高铁怎么找座位啊