http 响应消息解码

介绍 (Introduction)

Deep learning has been deployed in many tasks in NLP, such as translation, image captioning, and dialogue systems. In machine translation, it is used to read source language (input) and generate the desired language (output). Similarly in a dialogue system, it is used to generate a response given a context. This is also known as Natural Language Generation (NLG).

深度学习已部署在NLP的许多任务中，例如翻译，图像字幕和对话系统。在机器翻译中，它用于读取源语言(输入)并生成所需的语言(输出)。类似地，在对话系统中，它用于在给定上下文的情况下生成响应。这也称为自然语言生成(NLG)。

The model splits into 2 parts: encoder and decoder. Encoder reads the input text and returns a vector representing that input. Then, the decoder takes that vector and generates a corresponding text.

该模型分为两部分：编码器和解码器。编码器读取输入文本，并返回代表该输入的向量。然后，解码器获取该向量并生成相应的文本。

To generate a text, commonly it is done by generating one token at a time. Without proper techniques, the generated response may be very generic and boring. In this article, we will explore the following strategies:

要生成文本，通常是一次生成一个令牌来完成。没有适当的技术，生成的响应可能会非常通用且令人厌烦。在本文中，我们将探讨以下策略：

Greedy贪婪
Beam Search光束搜索
Random Sampling随机抽样
Temperature温度
Top-K SamplingTop-K采样
Nucleus Sampling核采样

解码策略 (Decoding Strategies)

At each timestep during decoding, we take the vector (that holds the information from one step to another) and apply it with softmax function to convert it into an array of probability for each word.

在解码期间的每个时间步上，我们取向量(将信息从一个步骤转移到另一个步骤)，并将其与softmax函数一起应用以将其转换为每个单词的概率数组。

Equation 1: Softmax Function. x is a token at timestep i. u is the vector that contains the value of every token in the vocabulary.

贪婪的方法 (Greedy Approach)

This approach is the simplest. At each time-step, it just chooses whichever token that is the most probable.

这种方法是最简单的。在每个时间步长，它只选择最可能的令牌。

Context:            Try this cake. I baked it myself.Optimal Response  : This cake tastes great.Generated Response: This is okay.

However, this approach may generate a suboptimal response, as shown in the example above. The generated response may not be the best possible response. This is due to the training data that commonly have examples like “That is […]”. Therefore, if we generate the most probable token at a time, it might output “is” instead of “cake”.

但是，如上例所示，此方法可能会产生次优响应。生成的响应可能不是最佳的响应。这是由于训练数据通常带有“ That is […]”之类的示例。因此，如果我们一次生成最可能的令牌，则它可能会输出“ is”而不是“ cake”。

光束搜索 (Beam Search)

Exhaustive search can solve the previous problem since it will search for the whole space. However, it would be computationally expensive. Suppose there are 10,000 vocabularies, to generate a sentence with the length of 10 tokens, it would be (10,000)¹⁰.

穷举搜索可以解决先前的问题，因为它将搜索整个空间。然而，这将在计算上昂贵。假设有10,000个词汇表，要生成一个长度为10个令牌的句子，则为(10,000)¹⁰。

Beam search can cope with this problem. At each timestep, it generates all possible tokens in the vocabulary list; then, it will choose top B candidates that have the most probability. Those B candidates will move to the next time step, and the process repeats. In the end, there will only be B candidates. The search space is only (10,000)*B.

光束搜索可以解决这个问题。在每个时间步，它都会在词汇表中生成所有可能的标记。然后，它将选择可能性最大的前B名候选人。那些B候选者将移至下一个步骤，然后重复该过程。最后，只有B个候选人。搜索空间仅为(10,000)* B。

Context:    Try this cake. I baked it myself.Response A: That cake tastes great.Response B: Thank you.

But sometimes, it chooses an even more optimal (Response B). In this case, it makes perfect sense. But imagine that the model likes to play safe and keeps on generating “I don’t know” or “Thank you” to most of the context, that is a pretty bad bot.

但有时，它会选择一个更好的选择(响应B)。在这种情况下，这很有意义。但是，请想象该模型喜欢安全运行，并且在大多数情况下都会不断生成“我不知道”或“谢谢”，这是一个非常糟糕的机器人。

随机抽样 (Random Sampling)

Alternatively, we can look into stochastic approaches to avoid the response being generic. We can utilize the probability of each token from the softmax function to generate the next token.

另外，我们可以研究随机方法来避免响应是通用的。我们可以利用softmax函数中每个令牌的概率来生成下一个令牌。

Suppose we are generating the first token of a context “I love watching movies”, Figure below shows the probability of what the first word should be.

假设我们正在生成上下文“我喜欢看电影”的第一个标记，下图显示了第一个单词应该是什么的概率。

Figure 3: Probability of each word. X-axis is the token index. i.e, index 37 corresponds to the word “yeah”

If we use a greedy approach, a token “i” will be chosen. With random sampling, however, token i has a probability of around 0.2 to occur. At the same time, any token that has a probability of 0.0001 can also occur. It’s just very unlikely.

如果我们使用贪婪方法，则将选择标记“ i”。但是，通过随机采样，令牌i 发生的可能性约为0.2。同时，也可能出现任何概率为0.0001的令牌。这是非常不可能的。

温度随机采样 (Random Sampling with Temperature)

Random sampling, by itself, could potentially generate a very random word by chance. Temperature is used to increase the probability of probable tokens while reducing the one that is not. Usually, the range is 0 < temp ≤ 1. Note that when temp=1, there is no effect.

随机抽样本身可能会偶然产生一个非常随机的词。温度用于增加可能的令牌的概率，同时减少不存在的令牌的概率。通常，范围是0 <temp≤1。请注意，当temp = 1时，没有任何作用。

Equation 2: Random sampling with temperature. Temperature t is used to scale the value of each token before going into a softmax function

Figure 4: Random sampling vs. random sampling with temperature

In Figure 4, with temp=0.5, the most probable words like i, yeah, me, have more chance of being generated. At the same time, this also lowers the probability of the less probable ones, although this does not stop them from occurring.

在图4中，当temp = 0.5时，最可能出现的单词(如i ， yeah ， me )更有可能被生成。同时，这也降低了可能性较小的可能性，尽管这并不能阻止它们的发生。

Top-K采样 (Top-K Sampling)

Top-K sampling is used to ensure that the less probable words should not have any chance at all. Only top K probable tokens should be considered for a generation.

Top-K采样用于确保不太可能出现的单词完全没有任何机会。世代仅应考虑前K个可能的令牌。

Figure 5: Distribution of the 3 random sampling, random with temp, and top-k

The token index between 50 to 80 has some small probabilities if we use random sampling with temperature=0.5 or 1.0. With top-k sampling (K=10), those tokens have no chance of being generated. Note that we can also combine Top-K sampling with temperature, but you kinda get the idea already, so we choose not to discuss it here.

如果我们使用温度= 0.5或1.0的随机采样，则介于50到80之间的令牌索引的概率很小。使用top-k采样(K = 10)时，这些标记就没有机会被生成。请注意，我们还可以将Top-K采样与温度结合起来，但是您已经知道了这一点，因此我们选择不在此处讨论。

This sampling technique has been adopted in many recent generation tasks. Its performance is quite good. One limitation with this approach is the number of top K words need to be defined in the beginning. Suppose we choose K=300; however, at a decoding timestep, the model is sure that there should be 10 highly probable words. If we use Top-K, that means we will also consider the other 290 less probable words.

这种采样技术已在许多新一代任务中采用。它的表现还不错。这种方法的局限性在于，一开始需要定义前K个字的数量。假设我们选择K = 300；但是，在解码时，模型可以确定应该有10个高度可能的单词。如果我们使用Top-K，这意味着我们还将考虑其他290个不太可能的单词。

核采样 (Nucleus Sampling)

Nucleus sampling is similar to Top-K sampling. Instead of focusing on Top-K words, nucleus sampling focuses on the smallest possible sets of Top-V words such that the sum of their probability is ≥ p. Then, the tokens that are not in V^(p) are set to 0; the rest are re-scaled to ensure that they sum to 1.

核采样类似于Top-K采样。核采样不是关注Top-K单词，而是关注Top-V单词的最小可能集合，以使它们的概率之和≥p。然后，将不在V ^(p)中的标记设置为0；否则将标记设置为0。其余的将重新缩放以确保它们的总和为1。

Equation 3: Nucleus sampling. V^(p) is the smallest possible of tokens. P(x|…) is the probability of generating token x given the previous generated tokens x from 1 to i-1

The intuition is that when the model is very certain on some tokens, the set of potential candidate tokens is small otherwise, there will be more potential candidate tokens.

直觉是，当模型在某些标记上非常确定时，潜在候选标记的集合很小，否则，将有更多潜在候选标记。

Certain → those few tokens have high probability = sum of few tokens is enough to exceed p.Uncertain → Many tokens have small probability = sum of many tokens is needed to exceed p.

某些→少数令牌具有高概率=少数令牌之和足以超过p。不确定→许多令牌具有小概率=需要多个令牌之和超过p。

Figure 6: Distribution of Top-K and Nucleus Sampling

Comparing nucleus sampling (p=0.5) with top-K sampling (K = 10), we can see the nucleus does not consider token “you” to be a candidate. This shows that it can adapt to different cases and select different numbers of tokens, unlike Top-K sampling.

将核采样(p = 0.5)与top-K采样(K = 10)进行比较，我们可以看到核不认为标记“ you”是候选。这表明，与Top-K采样不同，它可以适应不同的情况并选择不同数量的令牌。

摘要 (Summary)

Greedy: Select the best probable token at a time贪婪：一次选择最佳的可能令牌
Beam Search: Select the best probable response光束搜索：选择最佳的可能响应
Random Sampling: Random based on probability随机抽样：基于概率的随机抽样
Temperature: Shrink or enlarge probabilities温度：缩小或放大概率
Top-K Sampling: Select top probable K tokensTop-K采样：选择可能的前K个令牌
Nucleus Sampling: Dynamically choose the number of K (sort of)核采样：动态选择K数(一种)

Commonly, top choices by researchers are beam search, top-K sampling (with temperature), and nucleus sampling.

通常，研究人员的首选是光束搜索，top-K采样(带温度)和原子核采样。

结论 (Conclusion)

We have gone through a list of different ways to decode a response. These techniques can be applied to different generation tasks, i.e., image captioning, translation, and story generation. Using a good model with bad decoding strategies or a bad model with good decoding strategies is not enough. A good balance between the two can make the generation a lot more interesting.

我们已经通过一系列不同的方式来解码响应。这些技术可以应用于不同的生成任务，即图像字幕，翻译和故事生成。仅使用具有不良解码策略的良好模型或具有良好解码策略的不良模型是不够的。两者之间的良好平衡可以使这一代人变得更加有趣。

翻译自: https://towardsdatascience.com/decoding-strategies-that-you-need-to-know-for-response-generation-ba95ee0faadc

http 响应消息解码

查看全文

http://www.taodudu.cc/news/show-1874142.html

永久删除谷歌浏览器缩略图_“暮光之城”如何永久破坏了Google图片搜索
从头实现linux操作系统_从头开始实现您的第一个人工神经元
语音通话视频通话前端_无需互联网即可进行数十亿视频通话
优先体验重播matlab_如何为深度Q网络实施优先体验重播
人工智能ai以算法为基础_为公司采用人工智能做准备
ieee浮点数与常规浮点数_浮点数如何工作
模型压缩_模型压缩：
pytorch ocr_使用PyTorch解决CAPTCHA（不使用OCR）
pd4ml_您应该在本周（7月4日）阅读有趣的AI / ML文章
aws搭建深度学习gpu_选择合适的GPU进行AWS深度学习
证明神经网络的通用逼近定理_在您理解通用逼近定理之前，您不会理解神经网络。...
ai智能时代教育内容的改变_人工智能正在改变我们的评论方式
通用大数据架构-_通用做法-第4部分
香草 jboss 工具_使用Tensorflow创建香草神经网络
机器学习深度学习 ai_人工智能，机器学习和深度学习。真正的区别是什么？...
锁公平非公平_推荐引擎也需要公平！
创建dqn的深度神经网络_深度Q网络（DQN）-II
kafka topic:1_Topic️主题建模：超越令牌输出
dask 于数据分析_利用Dask ML框架进行欺诈检测-端到端数据分析
x射线计算机断层成像_医疗保健中的深度学习-X射线成像（第4部分-类不平衡问题）...
r-cnn 行人检测_了解用于对象检测的快速R-CNN和快速R-CNN。
语义分割空间上下文关系_多尺度空间注意的语义分割
自我监督学习和无监督学习_弱和自我监督的学习-第2部分
深度之眼 alexnet_AlexNet带给了深度学习的世界
ai生成图片是什么技术_什么是生成型AI？
ai人工智能可以干什么_我们可以使人工智能更具道德性吗？
pong_计算机视觉与终极Pong AI
linkedin爬虫_这些框架帮助LinkedIn大规模构建了机器学习
词嵌入生成词向量_使用词嵌入创建诗生成器
端到端车道线检测_如何使用Yolov5创建端到端对象检测器？

http 响应消息解码_响应生成所需的解码策略相关推荐

flex如何做响应式设计_响应式设计-您做错了！
flex如何做响应式设计 Responsive design is not just about the web that automatically adjusts to different scr ...
响应式扩展_响应式和无限扩展的JS动画
响应式扩展 Back in late 2012 it was not easy to find open source projects using requestAnimationFrame() - ...
vueweb端响应式布局_响应式网站和PC+手机端网站有什么区别？
企业建站普遍会遇到这样一个选择难题:网站选择什么类型? 常见的类型分为3种:PC端,PC+手机端,响应式.PC端大家都知道,主要的选择难点是后两种有什区别.今天小编就帮大家来分析一下响应式网站和PC+ ...
前端设计响应式设计_响应设计简介
前端设计响应式设计 "Responsive Design" as a buzzword has reached peak popularity: we now have book ...
响应式电子邮件_响应式HTML电子邮件模板
响应式电子邮件 HTML Email templates and Email alerts are one of the integral parts of almost any site. Thes ...
ajax——请求消息（request）和响应消息（response）
请求消息(request) 客户端发送给服务器端的HTTP请求消息由请求行(requestline).请求头部(header).空行和请求数据四个部分组成,如下图所示. GET /index.html ...
java day39【HTTP协议：响应消息、Response对象、ServletContext对象】
第一章 HTTP协议:响应消息 1. 请求消息:客户端发送给服务器端的数据 * 数据格式: 1. 请求行 2. 请求头 3. 请求空行 4. 请求体 2. 响应消息:服务器端发送给客户端的数据 * ...
HTTP协议简介_请求消息/请求数据包/请求报文_响应消息/响应数据包/响应报文
文章目录 HTTP 介绍请求数据包/请求消息/请求报文请求数据包解析响应数据包/响应消息/响应报文 HTTP 介绍概念:Hyper Text Transfer Protocol 超文本传输协议 ...
HTTP_响应消息_响应行_状态码
2. 响应消息:服务器端发送给客户端的数据 * 数据格式: 1. 响应行 1. 组成:协议/版本响应状态码状态码描述 2. ...
1.1 Windows 程序运行原理（消息循环和响应）
************************************************* * 本文由小鸟飞飞整理发表 <samboy@sohu.com> * * 首发网站 ...

http 响应消息解码_响应生成所需的解码策略