ansys电力变压器模型

Transformer models have revolutionised the field of Natural Language Processing but, how did it all start? To understand current state-of-the-art architectures and genuinely appreciate why these models became a breakthrough in the field, we must go even further in time, where NLP as we know it started: when we first introduced Neural Networks in NLP.

变压器模型彻底改变了自然语言处理领域，但是，这一切是如何开始的呢？为了了解当前的最新体系结构并真正理解为什么这些模型成为该领域的突破，我们必须走得更远，我们所知道的NLP始于此：当我们在NLP中首次引入神经网络时。

The introduction of Neural models to NLP found ways to overcome challenges that traditional methods couldn’t solve. One of the most remarkable advances were Sequence-to-Sequence models: Such models generate an output sequence by predicting one word at a time. Sequence-to-Sequence models encode the source text to reduce ambiguity and achieve context-awareness.

将神经模型引入NLP发现了克服传统方法无法解决的挑战的方法。最引人注目的进步之一是序列到序列模型：此类模型通过一次预测一个单词来生成输出序列。序列到序列模型对源文本进行编码，以减少歧义并实现上下文感知。

In any language task, context plays an essential role. To understand what words mean, we have to know something about the situation where they are used. Seq2Seq models achieve context by looking at a token level: previous word/sentences to generate the next words/sentences. The introduction to this representation of context embedded in space had multiple advantages such as avoiding data sparsity due to similar context data being mapped close to each other and providing a way to generate synthetic data.

在任何语言任务中，上下文都起着至关重要的作用。要理解单词的含义，我们必须了解使用它们的情况。 Seq2Seq模型通过查看令牌级别来实现上下文：前一个单词/句子生成下一个单词/句子。嵌入到空间中的上下文表示的介绍具有多个优点，例如避免了由于相似的上下文数据彼此靠近映射而导致的数据稀疏性，并提供了一种生成合成数据的方法。

However, context in language is very sophisticated. Most times, you can’t find context by only focusing on the previous sentence. There is a need for long range dependencies to achieve context awareness. Seq2Seq models work with Recurrent Neural Networks: LSTM or GRU. These networks have memory mechanisms to regulate the flow of information when processing sequences to achieve a “long term memory.” Despite this, if a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones.

但是，语言中的上下文非常复杂。多数情况下，仅关注上一个句子就无法找到上下文。为了实现上下文感知，需要长期依赖 。 Seq2Seq模型可用于递归神经网络：LSTM或GRU。这些网络具有存储机制，可在处理序列以实现“长期存储”时调节信息流。尽管如此，如果一个序列足够长，他们将很难将信息从较早的时间步长传送到较晚的时间步长。

RNNs will fall short when trying to process entire paragraphs of text. They suffer from the vanishing gradient problem. Gradients are values used to update the weights of a neural network and thus, learn. The vanishing gradient problem occurs when the gradient shrinks as it backpropagates through time. If a gradient value becomes extremely small, it doesn’t contribute much to learning. Moreover, RNNs’ topology is very time-consuming because, for every backpropagation step, the network needs to see the entire sequence of words.

尝试处理文本的整个段落时，RNN将不足。他们遭受梯度消失的困扰。梯度是用于更新神经网络的权重并因此进行学习的值。当梯度随着时间向后传播而缩小时，就会出现消失的梯度问题。如果梯度值变得非常小，则对学习没有太大贡献。此外，RNN的拓扑非常耗时，因为对于每个反向传播步骤，网络都需要查看整个单词序列。

As a way to try to fix these problems, the use of Convolutional Neural Networks was introduced in NLP. Using convolutions to create a logarithmic path. The network can “observe” the entire sequence in a log number of convolutional layers. However, this raised a new challenge: positional bias. How do we make sure that the positions we are observing in the text are the ones that give more insights? Why focus on position X of the sequence and not X-1?

为了解决这些问题，NLP中引入了卷积神经网络。使用卷积创建对数路径。网络可以对数卷积层“观察”整个序列。但是，这带来了新的挑战：位置偏差。我们如何确保在文本中观察到的位置能够提供更多的见解？为什么要关注序列的X而不是X-1？

Besides, the challenge it's not only to find a way of encoding a large amount of text sequences but also to be able to determine which parts of that text are essential to gain context-awareness. Not all the text is equally important to gain understanding. To address this, the attention mechanism was introduced in Seq2Seq models.

此外，挑战在于，不仅要找到一种编码大量文本序列的方法，而且还要能够确定该文本的哪些部分对于获得上下文感知至关重要。并非所有文本对理解都同样重要。为了解决这个问题，在Seq2Seq模型中引入了注意力机制。

Attention mechanism is inspired in the visual attention animals have were they focus on specific parts of their visual inputs to compute adequate responses. Attention used in Seq2Seq architectures seeks to give more contextual information to the decoder. At every decoding step, the decoder is informed how much “attention” it should give to each input word.

注意机制的灵感来自动物的视觉注意，因为它们专注于视觉输入的特定部分以计算适当的响应。 Seq2Seq体系结构中使用的注意力旨在为解码器提供更多上下文信息。在每个解码步骤中，都会告知解码器应给予每个输入字多少“注意力”。

Attention in Seq2Seq models (image by author)

Despite the improvements in context awareness, there was still a substantial opportunity to improve. The most significant drawback of these methods is the complexity of these architectures.

尽管上下文意识有所改善，但仍有很大的机会可以改进。这些方法的最大缺点是这些体系结构的复杂性。

This is where the transformer model came into the picture. The transformer model introduces the idea of instead of adding another complex mechanism (attention) to an already complex Seq2Seq model; we can simplify the solution by forgetting about everything else and just focusing on attention.

这就是变压器模型出现的地方。转换器模型引入了一种想法，而不是在已经很复杂的Seq2Seq模型中添加另一种复杂的机制(注意)；我们可以通过忽略其他所有内容而只关注注意力来简化解决方案。

This model removes recurrence, it only uses matrices multiplications. It processes all the inputs at once without having to process it in sequential manner. To avoid losing order, it uses positional embeddings that provide information about the position in the sequence of each element. And despite removing recurrence it still provides an encoder-decoder architecture such as the one seen in Seq2Seq models

该模型消除了重复发生，仅使用矩阵乘法。它可以一次处理所有输入，而不必依次处理。为了避免丢失顺序，它使用位置嵌入来提供有关每个元素序列中位置的信息。尽管消除了重复现象，它仍然提供了一种编码器-解码器架构，例如Seq2Seq模型中所见的架构

So, after seeing all the challenges we face with previous models, let’s dive deep into what the transformer model solves in comparison to Seq2Seq models.

因此，在了解了先前模型所面临的所有挑战之后，让我们深入了解与Seq2Seq模型相比，变压器模型所要解决的问题。

变压器技术深入探讨 (Transformer technical deep dive)

Transformer architecture (image by author)

While RNN fell short when we needed to process entire paragraphs to gain context, transformers are able to identify long-range dependencies achieving context-awareness. We also saw that RNN by themselves struggle in determining which parts of the text give more information, to do so they needed to add an extra layer, a bidirectional RNN to implement the attention mechanism. On the contrary, the transformer works only with attention so that it can determine the essential parts of context at different levels

当我们需要处理整个段落以获取上下文时，RNN不够用，而转换器可以识别实现上下文感知的远程依赖项。我们还看到，RNN自己在努力确定文本的哪些部分提供更多信息，为此，他们需要添加一个额外的层，即双向RNN以实现注意力机制。相反，转换器仅注意工作，以便可以确定不同级别上下文的基本部分

Another critical difference is that the transformer model removes recurrence. By eliminating the recurrence, the number of sequential operations is reduced, and the computational complexity is decreased. In RNNs for every backpropagation step, the network needs to see the entire sequence of words. In the transformer, all the input is processed at once decreasing computational complexity. This also brings a new advantage, we can now parallelise training. Being able to split training examples into several tasks to process independently boosts the training efficiency.

另一个重要的区别是，变压器模型消除了重复现象。通过消除重复，减少了顺序操作的次数，并降低了计算复杂度。在RNN的每个反向传播步骤中，网络都需要查看整个单词序列。在变压器中，所有输入都会被立即处理，从而降低了计算复杂度。这也带来了新的优势，我们现在可以并行化培训。能够将训练示例分解为多个任务以独立处理，从而提高了训练效率。

So how does the model keep the sequence order without using recurrence?

那么模型如何在不使用递归的情况下保持序列顺序呢？

Using Positional embeddings. The model takes a sequence of n word embeddings. To model position information, a positional embedding is added to each word embedding.

使用位置嵌入。该模型采用n个单词嵌入的序列。为了建模位置信息，将位置嵌入添加到每个单词嵌入。

Positional embeddings are created using sine and cosine functions with different dimensions. Words are encoded with the pattern created by the combination of these functions; this results in a continuous binary encoding of positions in a sequence.

使用不同尺寸的正弦和余弦函数创建位置嵌入。单词通过这些功能的组合创建的模式进行编码。这导致序列中位置的连续二进制编码。

The transformer model uses multihead attention to encode the input embeddings, when doing so, it attends inputs in a forward and backward matter so the order in the sequence is lost. Because of this, it relies on positional embeddings that we just explained.

变形器模型使用多头注意力对输入嵌入进行编码，这样做时，它会以向前和向后的方式参与输入，因此顺序中的顺序会丢失。因此，它依赖于我们刚刚解释过的位置嵌入。

The transformer has three different attention mechanisms: the encoder attention, the encoder-decoder attention and the decoder attention. So how does the attention mechanism works? It is basically a vector multiplication, where depending on the angle of the vector one can determine the importance of each value. If the angles of the vectors are close to 90 degrees, then the dot product will be close to zero, but if the vectors point to the same direction, the dot product will return a greater value.

变压器具有三种不同的注意机制：编码器注意，编码器-解码器注意和解码器注意。那么注意力机制是如何工作的呢？从本质上讲，它是一个向量乘法，根据向量的角度，可以确定每个值的重要性。如果向量的角度接近90度，则点积将接近零，但是如果向量指向相同的方向，则点积将返回更大的值。

Each key has a value associated, and for every new input vector, we can determine how much does this vector relates to the value vectors, and select the closest term using a softmax function.

每个键都有一个关联的值，对于每个新的输入向量，我们可以确定该向量与值向量有多少关系，并使用softmax函数选择最接近的项。

Transformers have a multihead attention; we can think of it as filters in CNN’s, each one learns to pay attention to a specific group of words. One can learn to identify short-range dependencies while others learn to identify long-range dependencies.This improves the context-awareness, we can understand what terms refer to when it’s not clear; for example, with words such as pronouns.

变压器需要多头关注；我们可以将其视为CNN中的过滤器，每个人都学会注意特定的一组单词。一个人可以学会识别短程依赖关系，而其他人则可以学会识别长程依赖关系。这提高了上下文意识，我们可以理解不清楚时指的是什么术语。例如，带有代词之类的单词。

The transformer architecture facilitates the creation of powerful models trained on massive datasets. Even though it is not feasible for everyone to train these models. We can now leverage of transfer learning to use these pre-trained language modes and fine-tune them for our specific tasks.

转换器架构有助于在大量数据集上训练强大的模型。即使每个人训练这些模型都不可行。现在，我们可以利用迁移学习来使用这些预先训练的语言模式，并针对我们的特定任务对其进行微调。

Transformers models have revolutionised the field. They have excelled RNN-based architectures in a wide range of tasks, and they will continue to create tremendous impact in the area of NLP.

变压器模型彻底改变了这一领域。他们在很多任务上都超越了基于RNN的体系结构，并且将继续在NLP领域产生巨大影响。

翻译自: https://towardsdatascience.com/transformer-models-how-did-it-all-start-2e5b385ddd93

ansys电力变压器模型

http://www.taodudu.cc/news/show-863840.html

浓缩摘要_浓缩咖啡的收益递减
机器学习中的无监督学习_无监督机器学习中聚类背后的直觉
python初学者编程指南_动态编程初学者指南
raspberry pi_在Raspberry Pi上使用TensorFlow进行对象检测
我如何在20小时内为AWS ML专业课程做好准备并进行破解
使用composer_在Google Cloud Composer（Airflow）上使用Selenium搜寻网页
nlp自然语言处理_自然语言处理（NLP）：不要重新发明轮子
机器学习导论�_机器学习导论
直线回归数据离群值_处理离群值：OLS与稳健回归
Python中机器学习的特征选择技术
聚类树状图_聚集聚类和树状图-解释
机器学习与分布式机器学习_我将如何再次开始学习机器学习（3年以上）
机器学习算法机器人足球_购买足球队：一种机器学习方法
机器学习与不确定性_机器学习求职中的不确定性
pandas数据处理代码_使用Pandas方法链接提高代码可读性
opencv 检测几何图形_使用OpenCV + ConvNets检测几何形状
立即学习AI：03-使用卷积神经网络进行马铃薯分类
netflix 开源_Netflix的Polynote是一个新的开源框架，可用来构建更好的数据科学笔记本
电场大学_人工电场优化算法
主题建模lda_使用LDA的Google Play商店应用评论的主题建模
胶囊路由_评论：胶囊之间的动态路由
交叉验证python_交叉验证
open ai gpt_您实际上想尝试的GPT-3 AI发明鸡尾酒
python 线性回归_Python中的简化线性回归
机器学习模型的性能指标
利用云功能和API监视Google表格中的Cloud Dataprep作业状态
谷歌联合学习的论文_Google的未来联合学习
使用cnn预测房价_使用CNN的人和马预测
利用colab保存模型_在Google Colab上训练您的机器学习模型中的“后门”
java 回归遍历_回归基础：代码遍历

ansys电力变压器模型_变压器模型……一切是如何开始的？相关推荐

生成模型和判别模型_生成模型和判别模型简介
生成模型和判别模型 Intro 介绍 Recently I gave a presentation at work, where I explained how I solved some probl ...
线程监视器模型_为什么模型验证如此重要，它与模型监视有何不同
线程监视器模型建模基础 (MODELING FUNDAMENTALS) Once the model development steps are complete, model validation ...
动机模型_一个模型教你如何激发学习动机
作为青少年生涯导师,经常被问到通过生涯规划能够提分吗?怎么去通过生涯规划达成提升成绩,我通常比较关注的是激发学生的学习动机,如何唤醒学习动机和激发学习动机是我经常探索思考的课题. 什么是学习动机? 说 ...
python做什么模型_主题模型初学者指南[Python]
引言近年来涌现出越来越多的非结构化数据,我们很难直接利用传统的分析方法从这些数据中获得信息.但是新技术的出现使得我们可以从这些轻易地解析非结构化数据,并提取出重要信息. 主题模型是处理非结构化数据的 ...
python结构方程模型_结构方程模型：方法与应用_结构方程模型公式
讲解清晰,适合MPLus学习者!注意这是王济川的书! 第一章绪论( Introduction) 11模型表述( Model formulation 1.11测量模型( Measurernent mo ...
Java多线程_1_Java内存模型_内存模型的3大特性
Java内存模型: 内存分布情况及其关系: 主内存:Java内存模型规定所有的变量都保存在主内存中工作内存:每个线程都有自己的工作内存,保存了该线程使用到的变量的主内存副本拷贝主内存与工作内存的关 ...
开源三层模型_开源模型将如何超越其他模型
开源三层模型定义项目不只是讨论可交付成果的结果. 对于项目经理,此定义是关于学习如何平衡一系列相互关联的元素. 在创建过程中,项目经理必须管理依赖关系和项目的关键链. 项目经理还必须与各种利益相关 ...
java lda主题模型_主题模型（一）：LDA 基本原理
一. 数学基础 *** 二项分布*** 二项分布为N重伯努利分布,则事件成功k次的概率可表示为: *** 多项分布 *** 多项分布是二项分布在高维度上的推广: *** Beta分布 *** 其中, ...
python绘制太阳系模型_太阳系模型Python列表操作困难
这是我在这里的第一篇文章. 所以我尝试用可视化的python制作一个太阳系模型.我将每个行星定义为一个球体,有半径.与太阳的距离.质量和动量变量.然后将每个行星(或天体)放入一个列表结构中.正如你现在 ...

ansys电力变压器模型_变压器模型……一切是如何开始的？

变压器技术深入探讨 (Transformer technical deep dive)

相关文章：

ansys电力变压器模型_变压器模型……一切是如何开始的？相关推荐

最新文章

热门文章