数据集如下：
为数据起个名字：bert_example.csv

"a stirring , funny and finally transporting re imagining of beauty and the beast and 1930s horror films",1
apparently reassembled from the cutting room floor of any given daytime soap,0
"they presume their audience wo n't sit still for a sociology lesson , however entertainingly presented , so they trot out the conventional science fiction elements of bug eyed monsters and futuristic women in skimpy clothes",0
"this is a visually stunning rumination on love , memory , history and the war between art and commerce",1
jonathan parker 's bartleby should have been the be all end all of the modern office anomie films,1
campanella gets the tone just right funny in the middle of sad in the middle of hopeful,1
a fan film that for the uninitiated plays better on video with the sound turned down,0
"b art and berling are both superb , while huppert is magnificent",1
"a little less extreme than in the past , with longer exposition sequences between them , and with fewer gags to break the tedium",0
the film is strictly routine,0
a lyrical metaphor for cultural and personal self discovery and a picaresque view of a little remembered world,1
the most repugnant adaptation of a classic text since roland joff and demi moore 's the scarlet letter,0
"for something as splendid looking as this particular film , the viewer expects something special but instead gets lrb sci fi rrb rehash",0
"this is a stunning film , a one of a kind tour de force",1
"may be more genial than ingenious , but it gets the job done",1
"there is a freedom to watching stunts that are this crude , this fast paced and this insane",1
"if the tuxedo actually were a suit , it would fit chan like a 99 bargain basement special",0
"as quiet , patient and tenacious as mr lopez himself , who approaches his difficult , endless work with remarkable serenity and discipline",1
final verdict you 've seen it all before,0
"blue crush follows the formula , but throws in too many conflicts to keep the story compelling",0

使用Bert预训练模型

示例代码

import torch
from torch import nn
from torch import optim
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import BertModel, BertTokenizer
import torch.utils.data as Data
import numpy as np
from loguru import loggerdef load_data():"""用来生成训练、测试数据"""train_df = pd.read_csv("bert_example.csv", header=None)sentences = train_df[0].valuestargets = train_df[1].valuestrain_inputs, test_inputs, train_targets, test_targets = train_test_split(sentences, targets)return train_inputs, test_inputs, train_targets, test_targetsclass BertClassificationModel(nn.Module):def __init__(self):super(BertClassificationModel, self).__init__()MODEL_PATH = 'bert_model'  # 装着上面3个文件的文件夹位置self.tokenizer = BertTokenizer.from_pretrained(pretrained_model_name_or_path=MODEL_PATH)self.bert = BertModel.from_pretrained(MODEL_PATH)  # 读取预训练模型self.use_bert_classify = nn.Linear(768, 2)  # bert预训练模型输出位768维，这里根据自己的分类任务可知为二分类，最后输出2个维度self.sig_mod = nn.Sigmoid()def forward(self, batch_sentences):sentence_tokenized = self.tokenizer(batch_sentences,truncation=True,  # 超过最大长度截断padding=True,  # 设置长度不足就补齐max_length=30,  # 最大长度add_special_tokens=True)  # 添加默认的tokeninput_ids = torch.tensor(sentence_tokenized['input_ids'])  # 转为tokenattention_mask = torch.tensor(sentence_tokenized['attention_mask'])  # attention maskbert_output = self.bert(input_ids, attention_mask=attention_mask)# hidden_state = bert_output[0].view(64, -1)  # 还有一种方法：把隐层特征拉长bert_cls_hidden_state = bert_output[0][:, 0, :]  # 提取[CLS]对应的隐藏状态，这里等同于取每个序列的第一个位置输出# 由于输入的[CLS]每个句子都一样，但是embedding后[CLS]就不一样了，因此我们认为[CLS]这个维度包含了句子的全部信息，即句向量linear_output = self.use_bert_classify(bert_cls_hidden_state)return self.sig_mod(linear_output)def main():train_inputs, test_inputs, train_targets, test_targets = load_data()# ============== 参数 ================epochs = 10batch_size = 5# ============== 参数 ================train_sentence_loader = Data.DataLoader(dataset=train_inputs,batch_size=batch_size,  # 每块的大小)train_label_loader = Data.DataLoader(dataset=train_targets,batch_size=batch_size,)bert_classifier_model = BertClassificationModel()optimizer = optim.SGD(bert_classifier_model.parameters(), lr=0.01)criterion = nn.CrossEntropyLoss()for epoch in range(epochs): # 开始训练loss_list = []for sentences, labels in zip(train_sentence_loader, train_label_loader):optimizer.zero_grad()outputs = bert_classifier_model(sentences)loss = criterion(outputs, labels)loss.backward()optimizer.step()loss_list.append(loss.detach().numpy())logger.info("epoch:{},loss:{}".format(epoch, np.mean(loss_list)))if __name__ == '__main__':main()

得到效果：

2022-03-28 15:53:21.356 | INFO     | __main__:main:73 - epoch:0,loss:0.6939845681190491
2022-03-28 15:53:24.096 | INFO     | __main__:main:73 - epoch:1,loss:0.6804901957511902
2022-03-28 15:53:26.815 | INFO     | __main__:main:73 - epoch:2,loss:0.6670143604278564
2022-03-28 15:53:29.475 | INFO     | __main__:main:73 - epoch:3,loss:0.6514456868171692
2022-03-28 15:53:32.160 | INFO     | __main__:main:73 - epoch:4,loss:0.6312667727470398
2022-03-28 15:53:34.832 | INFO     | __main__:main:73 - epoch:5,loss:0.604450523853302
.......

HuggingFace学习2：使用Bert模型训练文本分类任务相关推荐

bert模型可以做文本主题识别吗_GitHub - jkszw2014/TextClassify_with_BERT: 使用BERT模型做文本分类；面向工业用途...
TextClassify_with_BERT 使用BERT模型做文本分类:面向工业用途自己研究了当前开源的使用BERT做文本分类的许多存储库,各有各的缺点.通病就是面向学术,不考虑实际应用. 使用t ...
猿创征文丨深度学习基于双向LSTM模型完成文本分类任务
大家好,我是猿童学,本期猿创征文的第三期,也是最后一期,给大家带来神经网络中的循环神经网络案例,基于双向LSTM模型完成文本分类任务,数据集来自kaggle,对电影评论进行文本分类. 电影评论可以蕴含 ...
深度学习基于双向 LSTM 模型完成文本分类任务
大家好,本期给大家带来神经网络中的循环神经网络案例,基于双向LSTM模型完成文本分类任务,数据集来自kaggle,对电影评论进行文本分类. 电影评论可以蕴含丰富的情感:比如喜欢.讨厌.等等．情感分析( ...
基于bert模型的文本分类研究：“Predict the Happiness”挑战
1. 前言在2018年10月,Google发布了新的语言表示模型BERT-"Bidirectional Encoder Representations from Transformers& ...
[Python人工智能] 三十三.Bert模型 (2)keras-bert库构建Bert模型实现文本分类
从本专栏开始,作者正式研究Python深度学习.神经网络及人工智能相关知识.前一篇文章开启了新的内容--Bert,首先介绍Keras-bert库安装及基础用法,这将为后续文本分类.命名实体识别提供帮助 ...
基于BERT模型的文本分类研究 TensorFlow2实现（内附源码）【自然语言处理NLP-100例】
Bert模型进行文本分类
非常详细:https://www.cnblogs.com/jiangxinyang/p/10241243.html
【NLP】使用Transformer模型进行文本分类
作者 | Eric Fillion 编译 | VK 来源 | Towards Data Science 文本分类是NLP最常见的应用.与大多数NLP应用一样,Transformer模型近年来在该领域占 ...
广告行业中那些趣事系列2：BERT实战NLP文本分类任务(附github源码)
微信公众号:数据拾光者.愿结交更多的小伙伴,一同走人生路. 摘要:上一篇广告中那些趣事系列1:广告统一兴趣建模流程,我们了解了如何为广告主圈人群以及如何刻画用户的兴趣度.要想给用户打标签,我们需要构建 ...

HuggingFace学习2：使用Bert模型训练文本分类任务

使用Bert预训练模型

示例代码

HuggingFace学习2：使用Bert模型训练文本分类任务相关推荐

最新文章

热门文章