Bert使用参考：
https://blog.csdn.net/c991262331/article/details/89381972
https://zhuanlan.zhihu.com/p/50773178
https://www.jiqizhixin.com/articles/2018-11-01-9

一 Demo

注意：
BertTokenizer.from_pretrained中参数要指定到vocab的文件路径，而BertModel.from_pretrained中参数要指定到pytorch_model.bin和config.json所在的目录路径，不是要指定到pytorch_model.bin的文件路径

from pytorch_pretrained_bert import BertTokenizer,BertModel
import torch# bert_tokenizer = BertTokenizer.from_pretrained("bert-base-chinese")
bert_tokenizer = BertTokenizer.from_pretrained(r'C:\XXX\bert-base-chinese-vocab.txt')
a = "张三和李四都住在村头"
a_token = bert_tokenizer.tokenize(a)
print(a_token)
a_seq_ids = bert_tokenizer.convert_tokens_to_ids(a_token)
print(a_seq_ids)# bert_model = BertModel.from_pretrained("bert-base-chinese")
bert_model = BertModel.from_pretrained(r'C:\XXX\bert-base-chinese')
batch_data = torch.Tensor(a_seq_ids).long().view((1,-1))
out,_ = bert_model(batch_data)
print(out[0].shape)['张', '三', '和', '李', '四', '都', '住', '在', '村', '头']
[2476, 676, 1469, 3330, 1724, 6963, 857, 1762, 3333, 1928]
torch.Size([1, 10, 768])

二源码

2.1 模型和词汇表加载

Bert模型需要加载的预训练资源有两种：词汇表（用于分词）、模型（参数）

2.1.1 加载预训练词汇表：

pytorch_pretrained_bert/file_utils.py用来加载预训练的词汇表：可以指定预训练模型的名称，由程序会自动下载对应的词汇表；也可以手动下载预训练词汇表，指定本地路径。也即，pretrained_model_name_or_path参数的值既可以是模型名称，也可以是本地路径。

2.1.2 加载预训练模型：

pytorch_pretrained_bert/modeling.py用来加载预训练的模型：可以指定预训练模型的名称，由程序会自动下载对应的模型；也可以手动下载预训练模型后，指定本地路径。也即，pretrained_model_name_or_path参数的值既可以是模型名称，也可以是本地路径。

Params:
pretrained_model_name_or_path: either:- a str with the name of a pre-trained model to load selected in the list of:. `bert-base-uncased`. `bert-large-uncased`. `bert-base-cased`. `bert-large-cased`. `bert-base-multilingual-uncased`. `bert-base-multilingual-cased`. `bert-base-chinese`- a path or url to a pretrained model archive containing:. `bert_config.json` a configuration file for the model. `pytorch_model.bin` a PyTorch dump of a BertForPreTraining instance- a path or url to a pretrained model archive containing:. `bert_config.json` a configuration file for the model. `model.chkpt` a TensorFlow checkpoint
from_tf: should we load the weights from a locally saved TensorFlow checkpoint
cache_dir: an optional path to a folder in which the pre-trained models will be cached.
state_dict: an optional state dictionnary (collections.OrderedDict object) to use instead of Google pre-trained models
*inputs, **kwargs: additional input for the specific Bert class(ex: num_labels for BertForSequenceClassification)

2.2 前向传播

Bert前向传播参数有三个，input_ids、token_type_ids、attention_mask，后两个默认值为None。
input_ids：值为词在词典中的索引；
token_type_ids：值为0或1，表示词所属的句子，Bert中可输入两个句子，0表示属于第一个句子，1表示属于第二个句子；【如果词为padding也用0表示】
attention_mask：值为0或1，表示词是否为padding，Batch中句子长度不同，短句子需要padding，0表示词为padding。

Inputs:
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts`extract_features.py`, `run_classifier.py` and `run_squad.py`)
`token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the tokentypes indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds toa `sentence B` token (see BERT paper for more details).
`attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indicesselected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the maxinput sequence length in the current batch. It's the mask that we typically use for attention whena batch has varying length sentences.
`output_all_encoded_layers`: boolean which controls the content of the `encoded_layers` output as described below. Default: `True`.

Outputs: Tuple of (encoded_layers, pooled_output)
`encoded_layers`: controled by `output_all_encoded_layers` argument:- `output_all_encoded_layers=True`: outputs a list of the full sequences of encoded-hidden-states at the endof each attention block (i.e. 12 full sequences for BERT-base, 24 for BERT-large), eachencoded-hidden-state is a torch.FloatTensor of size [batch_size, sequence_length, hidden_size],- `output_all_encoded_layers=False`: outputs only the full sequence of hidden-states correspondingto the last attention block of shape [batch_size, sequence_length, hidden_size],
`pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of aclassifier pretrained on top of the hidden state associated to the first character of theinput (`CLS`) to train on the Next-Sentence task (see BERT's paper).

Example usage:
# Already been converted into WordPiece token ids
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])

2.3 框架初始化

pytorch_pretrained_bert/__init__.py该文件为入口，整合了多个模型：Bert、GPT、GPT2、Transformer其中PYTORCH_PRETRAINED_BERT_CACHE 中设置了Bert模型下载的缓存目录

pytorch_pretrained_bert/file_utils.py对Bert模型的下载进行了设置，自动下载模型的保存路径，以及配置文件、权重文件名称

2.4 预训练资源下载：

pytorch_pretrained_bert/modeling.py预训练模型的URL

PRETRAINED_MODEL_ARCHIVE_MAP = {'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz",'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz",'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz",'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz",'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz",'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased.tar.gz",'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz",
}

pytorch_pretrained_bert/tokenization.py预训练词汇表的URL

PRETRAINED_VOCAB_ARCHIVE_MAP = {'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt",'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt",'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt",
}
PRETRAINED_VOCAB_POSITIONAL_EMBEDDINGS_SIZE_MAP = {'bert-base-uncased': 512,'bert-large-uncased': 512,'bert-base-cased': 512,'bert-large-cased': 512,'bert-base-multilingual-uncased': 512,'bert-base-multilingual-cased': 512,'bert-base-chinese': 512,
}

Bert使用之一_基本使用相关推荐

pytorch bert文本分类_一起读Bert文本分类代码 (pytorch篇四）
Bert是去年google发布的新模型,打破了11项纪录,关于模型基础部分就不在这篇文章里多说了.这次想和大家一起读的是huggingface的pytorch-pretrained-BERT代码exa ...
12层的bert参数量_只需一个损失函数、一个超参数即可压缩BERT，MSRA提出模型压缩新方法...
来自武汉大学.北京航空航天大学和微软亚洲研究院的这项研究为模型压缩提供了新方向. 机器之心报道,参与:魔王. 论文链接:https://arxiv.org/pdf/2002.02925.pdf 这篇论 ...
使用pytorch获取bert词向量将字符转换成词向量
转载保存: 使用pytorch获取bert词向量_海蓝时见鲸_的博客-CSDN博客_获取bert词向量 pytorch-pretrained-bert简单使用_风吹草地现牛羊的马的博客-CSDN博客_ ...
NLP-预训练模型-2019：ALBert【轻Bert；使用 “输入层向量矩阵分解”、“跨层参数共享” 减少参数量；使用SOP代替NSP】【较Bert而言缩短训练及推理时间】
预训练模型(Pretrained model):一般情况下预训练模型都是大型模型,具备复杂的网络结构,众多的参数量,以及在足够大的数据集下进行训练而产生的模型. 在NLP领域,预训练模型往往是语言模型 ...
BERT and beyond
BERT 背景前言注意力机制-transformer https://mp.weixin.qq.com/s?__biz=MzIwMTc4ODE0Mw==&mid=2247486960& ...
知识图谱顶会论文(ACL-2022) ACL-SimKGC：基于PLM的简单对比KGC
ACL-SimKGC:基于PLM的简单对比KGC 论文标题: SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-traine ...
关于胶囊网络（Capsule Net）的个人理解
原文链接最近在跟踪keras的contri版的更新时,发现了冒出了一个Capsule层.于是我百度+谷歌一顿操作猛如虎,才发现在很早之前,胶囊网络的概念就提出了.但是限于胶囊网络的performan ...
bert 无标记文本调优_使用BERT准确标记主观问答内容
bert 无标记文本调优介绍 (Introduction) Kaggle released Q&A understanding competition at the beginning o ...
bert使用做文本分类_使用BERT进行深度学习的多类文本分类
bert使用做文本分类 Most of the researchers submit their research papers to academic conference because its ...

Bert使用之一_基本使用

一 Demo

二源码

2.1 模型和词汇表加载

2.1.1 加载预训练词汇表：

2.1.2 加载预训练模型：

2.2 前向传播

2.3 框架初始化

2.4 预训练资源下载：

Bert使用之一_基本使用相关推荐

最新文章

热门文章

Bert使用之一_基本使用

一 Demo

二 源码

2.1 模型和词汇表加载

2.1.1 加载预训练词汇表：

2.1.2 加载预训练模型：

2.2 前向传播

2.3 框架初始化

2.4 预训练资源下载：

Bert使用之一_基本使用相关推荐

最新文章

热门文章

二源码