正确理解TensorFlow中的logits

【问题】I was going through the tensorflow API docs here. In the tensorflow documentation, they used a keyword called logits. What is it? In a lot of methods in the API docs it is written like
我正想通过tensorflow API文档在这里。在tensorflow文档中，他们使用了一个叫做关键字logits。它是什么？API文档中的很多方法都是这样写的

tf.nn.softmax(logits, name=None)

If what is written is those logits are only Tensors, why keeping a different name like logits?

Another thing is that there are two methods I could not differentiate. They were
如果写的是logits只有这些Tensors，为什么要保留一个不同的名字logits？

另一件事是有两种方法我不能区分。他们是

tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

What are the differences between them? The docs are not clear to me. I know what tf.nn.softmaxdoes. But not the other. An example will be really helpful.
他们之间有什么不同？文档对我不明确。我知道是什么tf.nn.softmax。但不是其他。一个例子会非常有用。
Short version:

Suppose you have two tensors, where y_hat contains computed scores for each class (for example, from y = W*x +b) and y_true contains one-hot encoded true labels.
假设您有两个张量，其中y_hat包含每个类的计算得分（例如，从y = W * x + b），并y_true包含一个热点编码的真实标签。

y_hat  = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded

If you interpret the scores in y_hat as unnormalized log probabilities, then they are logits.

Additionally, the total cross-entropy loss computed in this manner:
如果您将分数解释为y_hat非标准化的日志概率，那么它们就是logits。

另外，以这种方式计算的总交叉熵损失：

y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))

本质上等价于用函数计算的总交叉熵损失softmax_cross_entropy_with_logits()：
is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits():

total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

Long version:

In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b. To serve as an example, below I've created a y_hat as a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. So here there are 2 training instances and 3 classes.
在神经网络的输出层中，您可能会计算一个数组，其中包含每个训练实例的类分数，例如来自计算y_hat = W*x + b。作为一个例子，下面我创建了y_hat一个2×3数组，其中行对应于训练实例，列对应于类。所以这里有2个训练实例和3个类别。

import tensorflow as tf
import numpy as npsess = tf.Session()# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5,  1.5,  0.1],
#        [ 2.2,  1.3,  1.7]])

Note that the values are not normalized (i.e. the rows don't add up to 1). In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits) and outputs normalized linear probabilities.
请注意，这些值没有标准化（即每一行的和不等于1）。为了对它们进行归一化，我们可以应用softmax函数，它将输入解释为非归一化对数概率（又名logits）并输出归一化的线性概率。

y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863  ,  0.61939586,  0.15274114],
#        [ 0.49674623,  0.20196195,  0.30129182]])

It's important to fully understand what the softmax output is saying. Below I've shown a table that more clearly represents the output above. It can be seen that, for example, the probability of training instance 1 being "Class 2" is 0.619. The class probabilities for each training instance are normalized, so the sum of each row is 1.0.
充分理解softmax输出的含义非常重要。下面我列出了一张更清楚地表示上面输出的表格。可以看出，例如，训练实例1为“2类”的概率为0.619。每个训练实例的类概率被归一化，所以每行的总和为1.0。

                      Pr(Class 1)  Pr(Class 2)  Pr(Class 3),--------------------------------------
Training instance 1 | 0.227863   | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182

So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. From above, we may generate that training instance 1 belongs to "Class 2" and training instance 2 belongs to "Class 1".

Are these classifications correct? We need to measure against the true labels from the training set. You will need a one-hot encoded y_true array, where again the rows are training instances and columns are classes. Below I've created an example y_true one-hot array where the true label for training instance 1 is "Class 2" and the true label for training instance 2 is "Class 3".
所以现在我们有每个训练实例的类概率，我们可以在每个行的argmax（）中生成最终的分类。从上面，我们可以生成训练实例1属于“2类”，训练实例2属于“1类”。

这些分类是否正确？我们需要根据训练集中的真实标签进行测量。您将需要一个热点编码y_true数组，其中行又是训练实例，列是类。下面我创建了一个示例y_trueone-hot数组，其中训练实例1的真实标签为“Class 2”，训练实例2的真实标签为“Class 3”。

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0.,  1.,  0.],
#        [ 0.,  0.,  1.]])

Is the probability distribution in y_hat_softmax close to the probability distribution in y_true? We can use cross-entropy loss to measure the error.
概率分布是否y_hat_softmax接近概率分布y_true？我们可以使用交叉熵损失来衡量错误。

We can compute the cross-entropy loss on a row-wise basis and see the results. Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. This result makes sense because in our example above, y_hat_softmax showed that training instance 1's highest probability was for "Class 2", which matches training instance 1 in y_true; however, the prediction for training instance 2 showed a highest probability for "Class 1", which does not match the true class "Class 3".
我们可以逐行计算交叉熵损失并查看结果。下面我们可以看到，训练实例1损失了0.479，而训练实例2损失了1.200。这个结果是有道理的，因为在我们上面的例子中y_hat_softmax，训练实例1的最高概率是“类2”，它与训练实例1匹配y_true; 然而，训练实例2的预测显示“1类”的最高概率，其与真实类“3类”不匹配。

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 ,  1.19967598])

What we really want is the total loss over all the training instances. So we can compute:
我们真正想要的是所有培训实例的全部损失。所以我们可以计算：

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944

Using softmax_cross_entropy_with_logits()

We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits() function, as shown below.
使用softmax_cross_entropy_with_logits（）

我们可以用tf.nn.softmax_cross_entropy_with_logits()函数来计算总的交叉熵损失，如下所示。

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 ,  1.19967598])total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922

Note that total_loss_1 and total_loss_2 produce essentially equivalent results with some small differences in the very final digits. However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits().
请注意，total_loss_1并total_loss_2产生基本相同的结果，在最后一位数字中有一些小的差异。但是，你可以使用第二种方法：它只需要少一行代码，并累积更少的数字错误，因为softmax是在你内部完成的softmax_cross_entropy_with_logits()。

form Stack Overflow[https://stackoverflow.com/questions/34240703/what-is-logits-softmax-and-softmax-cross-entropy-with-logits?noredirect=1&lq=1]

正确理解TensorFlow中的logits相关推荐

正确理解WPF中的TemplatedParent
原文:正确理解WPF中的TemplatedParent http://www.cnblogs.com/mgen/archive/2011/08/31/2160581.html (注:Logical T ...
python的上下文管理用哪个关键字_正确理解python中的关键字“with”与上下文管理器...
正确理解python中的关键字"with"与上下文管理器来源:中文源码网浏览: 次日期:2018年9月2日 [下载文档: 正确理解python中的关键字&quo ...
matlab计算正负零序分量,5分钟教你正确理解电力系统中的正序负序零序.doc
5分钟教你正确理解电力系统中的正序负序零序电力三相不平衡作图法对称分量法 1:三相不平衡的的电压(或电流),可以分解为平衡的正序.负序和零序 2:零序为3相电压向量相加,除以3 3:正序将BC ...
正确理解scipy中的coo_matrix函数
正确理解scipy中的coo_matrix函数 1. 构造一个空矩阵 2. 使用ijv(triplet)格式构造一个矩阵 3. 用重复的索引构造矩阵 1. 构造一个空矩阵这种用法比较简单,直接生成一 ...
tensorflow 里metrics_深入理解TensorFlow中的tf.metrics算子
[IT168 技术]01 概述本文将深入介绍Tensorflow内置的评估指标算子,以避免出现令人头疼的问题. tf.metrics.accuracy() tf.metrics.precision( ...
2.如何正确理解古典概率中的条件概率《zobol的考研概率论教程》
写本文主要是帮助粉丝理解考研中的古典概率-条件概率的具体定义. "B事件发生的条件下,A事件发生的概率"? "在A集合内有多少B的样本点"? "在B约 ...
通过图+代码来理解tensorflow中反卷积
反卷积这个东西老是容易忘,而且很多文章理论讲的很详细,但反卷积实际怎么操作的却没有概念,因此想以自己喜欢的方式(直接上图和代码)写一篇,以便随时翻阅. 卷积 tf中的padding方式有两种,SAME ...
mysql having in_正确理解MySQL中的where和having的区别
以前在学校里学习过SQLserver数据库,发现学习的都是皮毛,今天以正确的姿态谈一下MySQL中where和having的区别. 误区:不要错误的认为having和group by 必须配合使用. ...
在MySQL中 NULL的含义是_null有哪些常见的意思?（如何正确理解 SQL 中的 NULL）
SELECT * FROM SOME_TABLEWHERE SOME_COLUMN IS NULL 或者这样写: 复制代码WHERE SOME_COLUMN = 1 正确的写法应该是第二种(WHER ...
mysql where 与having_正确理解MySQL中的where和having的区别
以前在学校里学习过SQLserver数据库,发现学习的都是皮毛,今天以正确的姿态谈一下MySQL中where和having的区别. 误区:不要错误的认为having和group by 必须配合使用. ...

正确理解TensorFlow中的logits

正确理解TensorFlow中的logits相关推荐

最新文章

热门文章