bert简介_BERT简介

bert简介

BERT, Bi-directional Encoder Representation from Transformer, is a state of the art language model by Google which can be used for cutting-edge natural language processing (NLP) tasks.

BERT是Transformer的双向编码器表示形式，是Google先进的语言模型，可用于尖端的自然语言处理(NLP)任务。

After reading this article, you will have a basic understanding of BERT and will be able to utilize it for your own business applications. It would be helpful if you are familiar with Python and have a general idea of machine learning.

阅读本文之后，您将对BERT有基本的了解，并将能够将其用于您自己的业务应用程序。如果您熟悉Python并且对机器学习有所了解，这将很有帮助。

The BERT models I will cover in this article are:

我将在本文中介绍的BERT模型是：

Binary or multi-class classification二进制或多类分类
Regression model回归模型
Question-answering applications问答应用

Introduction to BERT

BERT简介

BERT is trained on the entirety of Wikipedia (~2.5 billion words), along with a book corpus (~800 million words). In order to utilize BERT, you won’t have to repeat this compute-intensive process.

BERT接受了整个Wikipedia(约25亿个单词)以及书籍语料库(约8亿个单词)的培训。为了利用BERT，您不必重复此计算密集型过程。

BERT brings the transfer learning approach into the natural language processing area in a way that no language model has done before.

BERT以前所未有的语言模型将迁移学习方法引入自然语言处理领域。

Transfer Learning

转移学习

Transfer learning is a process where a machine learning model developed for a general task can be reused as a starting point for a specific business problem.

转移学习是一个过程，在该过程中，可以将为一般任务开发的机器学习模型重新用作特定业务问题的起点。

Imagine you want to teach someone named Amanda, who doesn’t speak English, how to take the SAT. The first step would be to teach Amanda the English language as thoroughly as possible. Then, you can teach her more specifically for the SAT.

想象一下，您想教一个不会说英语的名叫Amanda的人参加SAT考试。第一步是尽可能全面地教阿曼达英语。然后，您可以针对SAT更具体地教她。

In the context of a machine learning model, this idea is known as transfer learning. The first part of transfer learning is pre-training (similar to teaching Amanda English for the first time). After the pre-training is complete you can focus on a specific task (like teaching Amanda how to take the SAT). This is a process known as fine-tuning — changing the model so it can fit your specific business problem.

在机器学习模型的上下文中，这个想法称为转移学习。转移学习的第一部分是预培训(类似于第一次教阿曼达英语)。预培训完成后，您可以专注于特定任务(例如教阿曼达(Amanda)如何参加SAT)。这是一个称为微调的过程-更改模型以使其适合您的特定业务问题。

BERT Pre-training

BERT预训练

This is a quick introduction about the BERT pre-training process. For practical purposes, you can use a pre-trained BERT model and do not need to perform this step.

这是有关BERT预训练过程的快速介绍。出于实际目的，您可以使用预训练的BERT模型，而无需执行此步骤。

BERT takes two chunks of text as input. In the simplified example above, I referred to these two inputs as Sentence 1 and Sentence 2. In the pre-training for BERT, Sentence 2 intentionally does not follow Sentence 1 in about half of the training examples.

BERT将两个文本块作为输入。在上面的简化示例中，我将这两个输入称为句子1和句子2。在BERT的预训练中，在大约一半的训练示例中，句子2故意不遵循句子1。

Sentence 1 starts with a special token [CLS] and both sentences end with another special token [SEP]. There will be a single token for each word that is in the BERT vocabulary. If a word is not in the vocabulary, BERT will split that word into multiple tokens. Before feeding sentences to BERT, 15% of the tokens are masked.

句子1以特殊标记[CLS]开头，两个句子都以另一个特殊标记[SEP]结尾。 BERT词汇表中的每个单词都有一个令牌。如果单词不在词汇表中，则BERT会将单词拆分为多个标记。在将句子提供给BERT之前，将屏蔽15％的令牌。

The pre-training process, the first step of transfer learning, is like teaching English to the BERT model so that it can be used for various tasks which require English knowledge. This is accomplished by the two practice tasks given to BERT:

预培训过程是迁移学习的第一步，就像在BERT模型上教英语一样，它可以用于需要英语知识的各种任务。这是通过给BERT的两个练习任务完成的：

Predict masked (hidden) tokens. To illustrate, the words “favorite” and “to” are masked in the diagram above. BERT will try to predict these masked tokens as part of the pre-training. This is similar to a “fill in the blanks” task we may give to a student who is learning English. While trying to fill in the missing words, the student will learn the language. This is referred to as the Masked Language Model (MLM).预测屏蔽(隐藏)令牌。为了说明起见，在上图中屏蔽了单词“收藏夹”和“收件人”。 BERT将在预训练中尝试预测这些被屏蔽的令牌。这类似于我们可能给予正在学习英语的学生的“填补空白”任务。在尝试填写缺失的单词时，学生将学习该语言。这被称为屏蔽语言模型(MLM)。
BERT also tries to predict if Sentence 2 logically follows Sentence 1 or not in order to provide a deeper understanding about sentence dependencies. In the example above, Sentence 2 is in logical continuation of Sentence 1, so the prediction will be True. The special token [CLS] on the output side is used for this task.BERT还尝试预测句子2在逻辑上是否跟随句子1，以提供对句子依存关系的更深入理解。在上面的示例中，句子2是句子1的逻辑延续，因此预测将为True。输出端的特殊令牌[CLS]用于此任务。

The BERT pre-trained model comes in many variants. The most common ones are BERT Large and BERT Base:

BERT预训练模型有许多变体。最常见的是BERT Large和BERT Base：

BERT Fine-Tuning

BERT微调

Fine-tuning is the next part of transfer learning. For specific tasks, such as text classification or question-answering, you would perform incremental training on a much smaller dataset. This adjusts the parameters of the pre-trained model.

微调是迁移学习的下一部分。对于特定任务，例如文本分类或问题解答，您将在较小的数据集上进行增量训练。这将调整预训练模型的参数。

用例 (Use Cases)

To demonstrate practical uses of BERT, I am providing two examples below. The code and documentation are provided in both GitHub and Google Colab. You can use either of the options to follow along and try it out for yourself!

为了演示BERT的实际用法，我在下面提供两个示例。 GitHub和Google Colab中都提供了代码和文档。您可以使用以下任何一种方法来自己尝试一下！

Text Classification or Regression

文字分类或回归

This is sample code for the binary classification of tweets. Here we have two types of tweets, disaster-related tweets (target = 1) and normal tweets (target = 0). We fine-tune the BERT Base model to classify tweets into these two groups.

这是推文的二进制分类的示例代码。在这里，我们有两种类型的推文，与灾难有关的推文(目标= 1)和普通推文(目标= 0)。我们对BERT Base模型进行微调，以将推文分为这两类。

GitHub: https://github.com/sanigam/BERT_Medium

GitHub： https : //github.com/sanigam/BERT_Medium

Google Colab: https://colab.research.google.com/drive/1ARH9dnugVuKjRTNorKIVrgRKitjg051c?usp=sharing

Google Colab： https ：//colab.research.google.com/drive/1ARH9dnugVuKjRTNorKIVrgRKitjg051c ？ usp = sharing

This code can be used for multi-class classification or regression by using appropriate values of parameters in the function bert_model_creation(). The code provides details on parameter values. If you want, you can add additional dense layers into this function.

通过在函数bert_model_creation()中使用适当的参数值，此代码可用于多类分类或回归。该代码提供了有关参数值的详细信息。如果需要，可以在此功能中添加其他密集层。

2. BERT for Question-Answering

2. BERT进行问题解答

This is another interesting use case for BERT, where you input a passage and a question into the BERT model. It can find the answer to the question based on information given in the passage. In this code, I am using the BERT Large model, which is already fine-tuned on the Stanford Question Answer Dataset (SQuAD). You will see how to use this fine-tuned model to get answers from a given passage.

这个是BERT的另一个有趣用例，您在BERT模型中输入了段落和问题。它可以根据段落中给出的信息找到问题的答案。在此代码中，我使用的是BERT Large模型，该模型已经在Stanford问题答案数据集(SQuAD)上进行了微调。您将看到如何使用此微调的模型从给定的段落中获得答案。

GitHub: https://github.com/sanigam/BERT_QA_Medium

GitHub： https : //github.com/sanigam/BERT_QA_Medium

Google Colab: https://colab.research.google.com/drive/1ZpeVygQJW3O2Olg1kZuLnybxZMV1GpKK?usp=sharing

Google Colab： https ：//colab.research.google.com/drive/1ZpeVygQJW3O2Olg1kZuLnybxZMV1GpKK ？ usp = sharing

Example with this use case:

此用例示例：

Passage — “John is a 10 year old boy. He is the son of Robert Smith. Elizabeth Davis is Robert’s wife. She teaches at UC Berkeley. Sophia Smith is Elizabeth’s daughter. She studies at UC Davis”

段落— “约翰是个10岁的男孩。 他是罗伯特·史密斯(Robert Smith)的儿子。 伊丽莎白·戴维斯(Elizabeth Davis)是罗伯特(Robert)的妻子。 她在加州大学伯克利分校任教。 索菲亚·史密斯(Sophia Smith)是伊丽莎白的女儿。 她在加州大学戴维斯分校学习”

Question — “Which college does John’s sister attend?”

问题— “约翰的姐姐上哪一所大学？”

When these two inputs are passed in, the model returns the correct answer, “uc davis”

传入这两个输入后，模型将返回正确的答案“ uc davis”

This example proves that BERT can understand language structure and handle dependencies across sentences. It can apply simple logic to answer the question (e.g. to find out who John’s sister is). Please note that you can have a passage that is much longer than the example shown above, but the total length of the question and passage cannot exceed 512 tokens. If your passage is longer than that, the code will automatically truncate the extra part.

该示例证明BERT可以理解语言结构并处理句子之间的依存关系。它可以应用简单的逻辑来回答问题(例如，找出约翰的姐姐是谁)。请注意，您可以通过的段落比上面显示的示例长得多，但是问题和段落的总长度不能超过512个记号。如果您的通过时间超过该时间，则代码将自动截断多余的部分。

The code provides examples in addition to the one shown above— a total of 3 passages and 22 questions. One of these passages is a version of my BERT article. You will see that BERT QA is able to answer any question where it can get answer from the passage. You can customize the code for your own question-answering applications.

除了上面显示的示例外，该代码还提供了示例-共有3个段落和22个问题。这些文章之一是我的BERT文章的一个版本。您将看到BERT QA能够回答任何可以从文章中获得答案的问题。您可以为自己的问答应用程序定制代码。

Hopefully this provides you with a good jump start to use BERT for your own practical applications. If you have any questions or feedback, feel free to let me know!

希望这可以为您在自己的实际应用中使用BERT提供一个良好的开始。如果您有任何疑问或反馈，请随时告诉我！

翻译自: https://medium.com/analytics-vidhya/introduction-to-bert-f9aa4075cf4f

bert简介

查看全文

http://www.taodudu.cc/news/show-863680.html

卷积神经网络结构_卷积神经网络
html两个框架同时_两个框架的故事
深度学习中交叉熵_深度计算机视觉，用于检测高熵合金中的钽和铌碎片
梯度提升树python_梯度增强树回归— Spark和Python
5行代码可实现5倍Scikit-Learn参数调整的更快速度
tensorflow 多人_使用TensorFlow2.x进行实时多人2D姿势估计
keras构建卷积神经网络_在Keras中构建，加载和保存卷积神经网络
深度学习背后的数学_深度学习背后的简单数学
深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第1部分
单光子探测技术应用_我如何最终在光学/光子学应用程序中使用机器学习作为博士学位
基于深度学习的病理_组织病理学的深度学习（第二部分）
ai无法启动产品_启动AI启动的三个关键教训
达尔文进化奖_使用Kydavra GeneticAlgorithmSelector将达尔文进化应用于特征选择
变异函数 python_使用Python进行变异测试
信号处理深度学习机器学习_机器学习与信号处理
PinnerSage模型
零信任模型_关于信任模型
乐器演奏_深度强化学习代理演奏的蛇
深度学习模型建立过程_所有深度学习都是统计模型的建立
使用TensorFlow进行鬼写
使用OpenCV和Python从图像中提取形状
NLP的特征工程
无监督学习 k-means_无监督学习-第1部分
keras时间序列数据预测_使用Keras的时间序列数据中的异常检测
端口停止使用_我停止使用
opencv 分割边界_电影观众：场景边界分割
监督学习无监督学习_无监督学习简介
kusto使用_Python查找具有数据重复问题的Kusto表
使用GridSearchCV和RandomizedSearchCV进行超参数调整
rust面向对象_面向初学者的Rust操作员综合教程

bert简介_BERT简介相关推荐

BERT和ViT简介
文章目录 BERT和ViT简介 BERT ViT BERT和ViT简介 BERT(Bidirectional Encoder Representations from Transformers)是一个 ...
DriverManager 驱动管理器类简介 JDBC简介（三）
驱动程序管理器是负责管理驱动程序的,驱动注册以后,会保存在DriverManager中的已注册列表中后续的处理就可以对这个列表进行操作简言之,驱动管理器,就是字面含义,主要负责就是管理驱动概述 ...
【Android 高性能音频】Oboe 函数库简介 ( Oboe 简介 | Oboe 特点 | Oboe 编译工具 | Oboe 相关文档 | Oboe 测试工具 )
文章目录一.Oboe 简介二.Oboe 特点三.Oboe 编译工具四.Oboe 文档五.Oboe 测试一.Oboe 简介 Oboe 简介 : ① 函数库 : Oboe 是 C++ 函数库 ...
【Android 应用开发】Paint 图形组合 Xfermod 简介 ( 图形组合集合描述 | Xfermod 简介 | PorterDuff 简介 )
文章目录图形组合集合描述 Xfermod 简介 PorterDuff 由来 Xfermod 合成模式分类图形组合集合描述图形组合集合描述 : 1.下面我们先描述两个集合 : ① 集合 A ...
Data Source与数据库连接池简介 JDBC简介（八）
DataSource是作为DriverManager的替代品而推出的,DataSource 对象是获取连接的首选方法. 起源为何放弃DriverManager DriverManager负责管理驱动 ...
Swagger 学习笔记 | Swagger 简介 | Springfox 简介 | Springfox 2.9.2 常用注解 | Spring Boot 整合 Swagger2 案例
文章目录一.Swagger 简介二.Springfox 简介三.Springfox2.9.2 常用注解四.SpringBoot 整合 Swagger2 4.1 引入Maven依赖 4.2 项目 ...
Transformer、Bert、GPT简介
Transformer 首先看一下trasformer结构简单回顾一下,encoder将token编码处理,得到embedding.然后送入decoder.decoder的input是前一个时间点产 ...
【AOP 面向切面编程】AOP 简介 ( AspectJ 简介 | AspectJ 下载 )
文章目录一.AOP 简介二.AspectJ 简介三.AspectJ 下载一.AOP 简介 AOP 是 Aspect Oriented Programming 的缩写 , 面向切面编程 ; 利用 ...
【Flutter】Flutter Gallery 官方示例简介 ( 项目简介 | 工程构建 )
文章目录一.Flutter Gallery 简介二.Flutter Gallery 项目构建发现一个很强的 Flutter 开源项目 , 由 Flutter 官方提供的 Flutter Gall ...

bert简介_BERT简介

用例 (Use Cases)

相关文章：

bert简介_BERT简介相关推荐

最新文章

热门文章