by Wolfgang Beyer

沃尔夫冈·拜尔(Wolfgang Beyer)

如何使用TensorFlow构建简单的图像识别系统(第2部分) (How to Build a Simple Image Recognition System with TensorFlow (Part 2))

This is the second part of my introduction to building an image recognition system with TensorFlow. In the first part we built a softmax classifier to label images from the CIFAR-10 dataset. We achieved an accuracy of around 25–30%. Since there are 10 different and equally likely categories, labeling the images randomly we’d expect an accuracy of 10%. So we’re already a lot better than random, but there’s still plenty of room for improvement.

这是我使用TensorFlow构建图像识别系统的第二部分。 在第一部分中,我们构建了softmax分类器来标记CIFAR-10数据集中的图像。 我们达到了约25–30%的精度。 由于存在10个不同且可能性均等的类别,因此随机标记图像,我们希望其准确性为10%。 因此,我们已经比随机的要好很多,但是仍有很大的改进空间。

In this post, I’ll describe how to build a neural network that performs the same task. Let’s see by how much we can increase our prediction accuracy!

在本文中,我将描述如何构建执行相同任务的神经网络。 让我们看看我们可以提高多少预测精度!

神经网络 (Neural Networks)

Neural networks are very loosely based on how biological brains work. They consist of a number of artificial neurons which each process multiple incoming signals and return a single output signal. The output signal can then be used as an input signal for other neurons.

神经网络非常松散地基于生物大脑的工作方式。 它们由许多人工神经元组成,每个神经元处理多个传入信号并返回单个输出信号。 输出信号然后可以用作其他神经元的输入信号。

Let’s take a look at an individual neuron:

让我们看一下单个神经元:

What happens in a single neuron is very similar to what happens in the the softmax classifier. Again we have a vector of input values and a vector of weights. The weights are the neuron’s internal parameters. Both input vector and weights vector contain the same number of values, so we can use them to calculate a weighted sum.

单个神经元中发生的事情与softmax分类器中发生的事情非常相似。 同样,我们有一个输入值向量和一个权重向量。 权重是神经元的内部参数。 输入向量和权重向量都包含相同数量的值,因此我们可以使用它们来计算加权和。

So far, we’re doing exactly the same calculation as in the softmax classifier, but now comes a little twist: as long as the result of the weighted sum is a positive value, the neuron’s output is this value. But if the weighted sum is a negative value, we ignore that negative value and the neuron generates an output of 0 instead. This operation is called a Rectified Linear Unit (ReLU).

到目前为止,我们所做的计算与softmax分类器中的计算完全相同,但是现在有了一些变化:只要加权总和的结果为正值,神经元的输出就是该值。 但是,如果加权和为负值,我们将忽略该负值,而神经元将生成输出0。 此操作称为整流线性单位(ReLU)。

The reason for using a ReLU is that this creates a nonlinearity. The neuron’s output is now not strictly a linear combination (= weighted sum) of its inputs anymore. We’ll see why this is useful when we stop looking at individual neurons and instead look at the whole network.

使用ReLU的原因是这会产生非线性。 现在,神经元的输出不再严格是其输入的线性组合(=加权和)。 我们将看到为什么当我们停止查看单个神经元,而是查看整个网络时,为什么这样做有用。

The neurons in artificial neural networks are usually not connected randomly to each other. Most of the time they are arranged in layers:

人工神经网络中的神经元通常不会彼此随机连接。 大多数情况下,它们是分层排列的:

The input image’s pixel values are the inputs for the network’s first layer of neurons. The output of the neurons in layer 1 is the input for neurons of layer 2 and so forth. This is the reason why having a nonlinearity is so important. Without the ReLU at each layer, we would only have a sequence of weighted sums. And stacked weighted sums can be merged into a single weighted sum, so the multiple layers would give us no improvement over a single layer network. Introducing the ReLU nonlinearity solves this problem as each additional layer really adds something to the network.

输入图像的像素值是网络第一层神经元的输入。 第1层中神经元的输出是第2层中神经元的输入,依此类推。 这就是为什么具有非线性如此重要的原因。 没有每一层的ReLU,我们将只有一系列加权和。 而且堆叠的加权总和可以合并为单个加权总和,因此多层结构不会对单层网络带来任何改善。 引入ReLU非线性解决了这个问题,因为每个附加层确实为网络增加了一些东西。

The network’s final layer’s output are the values we are interested in, the scores for the image categories. In this network architecture each neuron is connected to all neurons of the previous layer, therefore this kind of network is called a fully connected network. As we shall see in Part 3 of this Tutorial, that is not necessarily always the case.

网络最后一层的输出是我们感兴趣的值,即图像类别的分数。 在这种网络体系结构中,每个神经元都连接到上一层的所有神经元,因此,这种网络称为完全连接网络。 正如我们将在本教程的第3部分中看到的那样,情况不一定总是如此。

And that’s already the end of my very brief part on the theory of neural networks. Let’s get started building one!

这已经是我关于神经网络理论的简短部分的结尾。 让我们开始构建一个!

代码 (The Code)

The full code for this example is available on Github. It requires TensorFlow and the CIFAR-10 dataset (see Part 1) on how to install the prerequisites).

Github上提供了此示例的完整代码。 它需要TensorFlow和CIFAR-10数据集(有关如何安装必备组件的信息,请参阅第1部分 )。

If you’ve made your way through my previous blog post, you’ll see that the code for the neural network classifier is pretty similar to the code for the softmax classifier. But in addition to switching out the part of the code that defines the model, I’ve added a couple of small features to show some of the things TensorFlow can do:

如果您已经看过我以前的博客文章,那么您会发现神经网络分类器的代码与softmax分类器的代码非常相似。 但是除了切换定义模型的代码部分之外,我还添加了一些小功能来展示TensorFlow可以做的一些事情:

  • Regularization: this is a very common technique to prevent overfitting of a model. It works by applying a counter-force during the optimization process which aims to keep the model simple.正则化:这是防止模型过度拟合的非常常用的技术。 它通过在优化过程中施加反作用来工作,该作用旨在保持模型简单。
  • Visualization of the model with TensorBoard: TensorBoard is included with TensorFlow and allows you to generate charts and graphs from your models and from data generated by your models. This helps with analyzing your models and is especially useful for debugging.使用TensorBoard可视化模型:TensorFlow随附TensorBoard,使您可以从模型以及模型生成的数据生成图表。 这有助于分析模型,对于调试尤其有用。
  • Checkpoints: this feature allows you to save the current state of your model for later use. Training a model can take quite a while, so it’s essential to not have to start from scratch each time you want to use it.检查点:此功能使您可以保存模型的当前状态以供以后使用。 训练模型可能需要花费相当长的时间,因此至关重要的是,不必在每次使用模型时都从头开始。

The code is split into two files this time: there’s two_layer_fc.py, which defines the model, and run_fc_model.py, which runs the model (in case you’re wondering: ‘fc’ stands for fully connected).

这次将代码分为两个文件:定义模型的two_layer_fc.py和运行模型的run_fc_model.py (以防您想知道:'fc'代表完全连接)。

两层全连接神经网络 (2-Layer Fully Connected Neural Network)

Let’s look at the model itself first and deal with running and training it later. two_layer_fc.py contains the following functions:

让我们先看一下模型本身,然后再处理并训练它。 two_layer_fc.py包含以下功能:

  • inference() gets us from input data to class scores.

    inference()让我们从输入数据到课程成绩。

  • loss() calculates the loss value from class scores.

    loss()根据班级成绩计算损失值。

  • training() performs a single training step.

    training()执行单个训练步骤。

  • evaluation() calculates the accuracy of the network.

    evaluation()计算网络的准确性。

生成班级成绩: inference() (Generating Class Scores: inference())

inference() describes the forward pass through the network. How are the class scores calculated, starting from input images?

inference()描述通过网络的正向传递。 从输入图像开始,如何计算班级成绩?

The images parameter is the TensorFlow placeholder containing the actual image data. The next three parameters describe the shape/size of the network. image_pixels is the number of pixels per input image, classes is the number of different output labels and hidden_units is the number of neurons in the first/hidden layer of our network.

images参数是包含实际图像数据的TensorFlow占位符。 接下来的三个参数描述了网络的形状/大小。 image_pixels是每个输入图像的像素数, classes是不同输出标签的数量, hidden_units是我们网络的第一/隐藏层中的神经元的数量。

Each neuron takes all values from the previous layer as input and generates a single output value. Each neuron in the hidden layer therefore has image_pixels inputs and the layer as a whole generates hidden_units outputs. These are then fed into the classes neurons of the output layer which generate classes output values, one score per class.

每个神经元都将前一层的所有值作为输入,并生成单个输出值。 因此,隐藏层中的每个神经元都具有image_pixels输入,并且该层作为一个整体生成hidden_units输出。 然后将它们输入到输出层的classes神经元中,这些神经元生成classes输出值,每个类一个分数。

reg_constant is the regularization constant. TensorFlow allows us to add regularization to our network very easily by handling most of the calculations automatically. I’ll go into a bit more detail when we get to the loss function.

reg_constant是正则化常数。 TensorFlow允许我们通过自动处理大多数计算来非常轻松地将正则化添加到我们的网络中。 当我们讨论损失函数时,我将进一步详细介绍。

Since our neural network has 2 similar layers, we’ll define a separate scope for each. This allows us to reuse variable names in each scope. The biases variable is defined in the way we already know, by using tf.Variable().

由于我们的神经网络具有2个相似的层,因此我们将为每个层定义一个单独的范围。 这使我们可以在每个作用域中重用变量名。 该biases变量是在我们已经知道的方式定义,通过使用tf.Variable()

The definition of the weights variable is a bit more involved. We use tf.get_variable(), which allows us to add regularization. weights is a matrix with dimensions of image_pixels by hidden_units (input vector size x output vector size). The initializer parameter describes the weight variable’s initial values.

weights变量的定义要复杂得多。 我们使用tf.get_variable() ,它允许我们添加正则化。 weights是具有的尺寸的矩阵image_pixels通过hidden_units (输入矢量大小x输出向量的大小)。 initializer参数描述了weight变量的初始值。

Up to now, we’ve initialized our variables to 0, but this wouldn’t work here. Think about the neurons in a single layer. They all receive exactly the same input values. If they all had the same internal parameters as well, they would all make the same calculation and all output the same value. To avoid this, we need to randomize their initial weights. We use an initialization scheme which usually works well, the weights are initialized to normally distributed values. We drop values which are more than 2 standard deviations from the mean, and the standard deviation is set to the inverse of the square root of the number of input pixels. Luckily TensorFlow handles all these details for us, we just need to specify that we want to use a truncated_normal_initializer which does exactly what we want.

到目前为止,我们已经将变量初始化为0,但这在这里不起作用。 考虑单层中的神经元。 它们都接收完全相同的输入值。 如果它们都具有相同的内部参数,则它们都将进行相同的计算,并且都将输出相同的值。 为了避免这种情况,我们需要将它们的初始权重随机化。 我们使用通常工作良好的初始化方案,将权重初始化为正态分布的值。 我们丢弃与平均值相比大于2个标准偏差的值,并且将标准偏差设置为输入像素数的平方根的倒数。 幸运的是TensorFlow为我们处理了所有这些细节,我们只需要指定我们想要使用一个truncated_normal_initializer完成我们想要的工作。

The final parameter for the weights variable is the regularizer. All we have to do at this point is to tell TensorFlow we want to use L2-regularization for the weights variable. I’ll cover regularization here.

weights变量的最后一个参数是regularizer 。 这时我们要做的就是告诉TensorFlow我们想对weights变量使用L2正则化。 我将在这里讨论正则化。

To create the first layer’s output we multiply the images matrix and the weights matrix witch each other and add the bias variable. This is exactly the same as in the softmax classifier from the previous blog post. Then we apply tf.nn.relu(), the ReLU function to arrive at the hidden layer’s output.

为了创建第一层的输出,我们将images矩阵和weights矩阵彼此相乘,然后加上bias变量。 这与先前博客文章中的softmax分类器完全相同。 然后我们使用ReLU函数tf.nn.relu()到达隐藏层的输出。

Layer 2 is very similar to layer 1. The number of inputs is hidden_units, the number of outputs is classes. Therefore the dimensions of the weights matrix are [hidden_units, classes]. Since this is the final layer of our network, there’s no need for a ReLU anymore. We arrive at the class scores (logits) by multiplying input (hidden) and weights with each other and adding bias.

第2层与第1层非常相似。输入的数量是hidden_units ,输出的数量是classes 。 因此, weights矩阵的维数为[hidden_units, classes] 。 由于这是我们网络的最后一层,因此不再需要ReLU。 我们通过将输入( hidden )和weights彼此相乘并加上bias来得出班级成绩(对logits )。

The summary operation tf.histogram_summary() allows us to record the value of the logits variable for later analysis with TensorBoard. I’ll cover this later.

摘要操作tf.histogram_summary()允许我们记录logits变量的值,以便以后使用TensorBoard进行分析。 我将这个以后 。

To sum it up, the inference() function as whole takes in input images and returns class scores. That’s all a trained classifier needs to do, but in order to arrive at a trained classifier, we first need to measure how good those class scores are. That’s the job of the loss function.

概括起来,整个inference()函数接收输入图像并返回类分数。 这是训练有素的分类器所需要做的所有工作,但是为了得出训练有素的分类器,我们首先需要衡量这些类分数的好坏。 这就是损失函数的工作。

计算损失: loss() (Calculating the Loss: loss())

First we calculate the cross-entropy between logits(the model’s output) and labels(the correct labels from the training dataset). That has been our whole loss function for the softmax classifier, but this time we want to use regularization, so we have to add another term to our loss.

首先,我们计算logits (模型的输出)和labels (训练数据集中的正确标签)之间的交叉熵。 这就是softmax分类器的全部损失函数,但是这次我们要使用正则化,因此我们必须在损失中添加另一个术语。

Let’s take a step back first and look at what we want to achieve by using regularization.

让我们先退后一步,看看我们希望通过使用正则化来实现什么。

过度拟合和正则化 (Overfitting and Regularization)

When a statistical model captures the random noise in the data it was trained on instead of the true underlying relationship, this is called overfitting.

当统计模型捕获其训练所依据的数据中的随机噪声而不是真正的基础关系时,这称为过拟合。

In the above image there are two different classes, represented by the blue and red circles. The green line is an overfitted classifier. It follows the training data perfectly, but it is also heavily dependent on it and is likely to handle unseen data worse than the black line, which represents a regularized model.

在上图中,有两个不同的类,分别由蓝色和红色圆圈表示。 绿线是过度拟合的分类器。 它完美地遵循了训练数据,但也严重依赖于训练数据,并且可能处理比黑线更糟的看不见的数据,黑线表示正则化模型。

So our goal for regularization is to arrive at a simple model without any unnecessary complications. There are different ways to achieve this, and the option we are choosing is called L2-regularization. L2-regularization adds the sum of the squares of all the weights in the network to the loss function. This corresponds to a heavy penalty if the model is using big weights and a small penalty if the model is using small weights.

因此,我们进行正则化的目标是获得一个没有任何不必要复杂性的简单模型。 有多种方法可以实现此目的,我们选择的选项称为L2正则化。 L2正则化将网络中所有权重的平方和与损失函数相加。 如果模型使用的是较大的权重,则对应沉重的惩罚;如果模型使用的是较小的权重,则对应较小的惩罚。

That’s why we used the regularizer parameter when defining the weights and assigned a l2_regularizer to it. This tells TensorFlow to keep track of the L2-regularization terms (and weigh them by the parameter reg_constant) for this variable. All regularization terms are added to a collection called tf.GraphKeys.REGULARIZATION_LOSSES, which the loss function accesses. We then add the sum of all regularization losses to the previously calculated cross-entropy to arrive at the total loss of our model.

这就是为什么我们在定义权重时使用了regularizer参数并l2_regularizer分配了l2_regularizer的原因。 这告诉TensorFlow跟踪该变量的L2正则化项(并通过参数reg_constant )。 所有正则化术语都添加到名为tf.GraphKeys.REGULARIZATION_LOSSES的集合中,损失函数可以访问该集合。 然后,我们将所有正则化损失的总和加到先前计算的交叉熵中,得出模型的总损失。

优化变量: training() (Optimizing the Variables: training())

global_step is a scalar variable which keeps track of how many training iterations have already been performed. When repeatedly running the model in our training loop, we already know this value. It’s the iteration variable of the loop. The reason we’re adding this value directly to the TensorFlow graph is that we want to be able to take snapshots of the model. And these snapshots should include information about how many training steps have already been performed.

global_step是一个标量变量,用于跟踪已经执行了多少次训练迭代。 当在训练循环中重复运行模型时,我们已经知道该值。 这是循环的迭代变量。 我们将此值直接添加到TensorFlow图的原因是我们希望能够拍摄模型的快照。 并且这些快照应包括有关已经执行了多少培训步骤的信息。

The definition of the gradient descent optimizer is simple. We provide the learning rate and tell the optimizer which variable it is supposed to minimize. In addition, the optimizer automatically increments the global_step parameter with every iteration.

梯度下降优化器的定义很简单。 我们提供学习率,并告诉优化器应该最小化哪个变量。 此外,优化器会在每次迭代时自动增加global_step参数。

绩效evaluation()evaluation() (Measuring Performance: evaluation())

The calculation of the model’s accuracy is the same as in the softmax case: we compare the model’s predictions with true labels and calculate the frequency of how often the prediction is correct. We’re also interested in how the accuracy evolves over time, so we’re adding a summary operation which keeps track of the value of accuracy. We’ll cover this in the section about TensorBoard.

模型准确性的计算与softmax情况相同:我们将模型的预测与真实标签进行比较,并计算预测正确的频率。 我们也正在随着时间的推移感兴趣的是如何准确的发展,所以我们增加其跟踪的值的汇总操作accuracy 。 我们将在有关TensorBoard的部分中对此进行介绍 。

To summarize what we have done so far, we have defined the behavior of a 2-layer artificial neural network using 4 functions: inference() constitutes the forward pass through the network and returns class scores. loss() compares predicted and true class scores and generates a loss value. training() performs a training step and optimizes the model’s internal parameters and evaluation() measures the performance of our model.

总结到目前为止我们所做的事情,我们使用4个函数定义了一个2层人工神经网络的行为: inference()构成通过网络的正向传递并返回类分数。 loss()比较预测的和真实的班级成绩,并生成损失值。 training()执行训练步骤并优化模型的内部参数, evaluation()模型的性能。

运行神经网络 (Running the Neural Network)

Now that the neural network is defined, let’s look at how run_fc_model.py runs, trains and evaluates the model.

现在已经定义了神经网络,让我们看看run_fc_model.py如何运行,训练和评估模型。

After the obligatory imports we’re defining the model parameters as external flags. TensorFlow has its own module for command line parameters, which is a thin wrapper around Python’s argparse. We’re using it here for convenience, but you can just as well use argparse directly instead.

强制导入后,我们将模型参数定义为外部标志。 TensorFlow拥有自己的命令行参数模块,该模块是Python的argparse的瘦包装。 为了方便起见,我们在这里使用它,但是您也可以直接使用argparse

In the first couple of lines, the various command line parameters are being defined. The parameters for each flag are the flag’s name, its default value and a short description. Executing the file with the -h flag displays these descriptions.

在前几行中,定义了各种命令行参数。 每个标志的参数是标志的名称,其默认值和简短描述。 使用-h标志执行文件将显示这些描述。

The second block of lines calls the function which actually parses the command line parameters. Then the values of all parameters are printed to the screen.

第二行代码调用实际解析命令行参数的函数。 然后将所有参数的值打印到屏幕上。

Here we define constants for the number of pixels per image (32 x 32 x 3) and the number of different image categories. Then we start measuring the runtime by creating a timer.

在这里,我们为每个图像的像素数(32 x 32 x 3)和不同图像类别的数量定义常数。 然后,我们通过创建计时器开始测量运行时间。

We want to log some info about the training process and use TensorBoard to display that info. TensorBoard requires the logs for each run to be in a separate directory, so we’re adding date and time info to the name of the log directory.

我们想记录一些有关训练过程的信息,并使用TensorBoard显示该信息。 TensorBoard要求每次运行的日志都位于单独的目录中,因此我们将日期和时间信息添加到日志目录的名称中。

load_data() loads the CIFAR-10 data and returns a dictionary containing separate training and test datasets.

load_data()加载CIFAR-10数据并返回包含单独的训练和测试数据集的字典。

生成TensorFlow图 (Generate the TensorFlow Graph)

We’re defining TensorFlow placeholders. When performing the actual calculations, these will be filled with training/testing data.

我们正在定义TensorFlow占位符。 在执行实际计算时,这些数据将填充训练/测试数据。

The images_placeholder has dimensions of batch size x pixels per image. A batch size of ‘None’ allows us to run the graph with different batch sizes (the batch size for training the net can be set via a command line parameter, but for testing we’re passing the whole test set as a single batch).

images_placeholder具有批量大小x每个图像像素的尺寸。 批处理大小为“无”可让我们以不同的批处理大小运行图形(可通过命令行参数设置用于训练网络的批处理大小,但对于测试,我们将整个测试集作为单个批处理传递) 。

The labels_placeholder is a vector of integer values containing the correct class label, one per image in the batch.

labels_placeholder是包含正确的类标签的整数值的向量,批处理中的每个图像一个。

Here we’re referencing the functions we covered earlier in two_layer_fc.py.

在这里,我们引用我们在two_layer_fc.py介绍的函数。

  • inference() gets us from input data to class scores.

    inference()让我们从输入数据到课程成绩。

  • loss() calculates a loss value from class scores.

    loss()根据课程分数计算损失值。

  • training() performs a single training step.

    training()执行单个训练步骤。

  • evaluation() calculates the accuracy of the network.

    evaluation()计算网络的准确性。

Defines a summary operation for TensorBoard (covered here).

为TensorBoard定义一个摘要操作(在此处找到 )。

Generates a saver object to save the model’s state at checkpoints (covered here).

生成一个saver对象,以将模型的状态保存在检查点(在此处找到 )。

We start the TensorFlow session and immediately initialize all variables. Then we create a summary writer which we will use to periodically save log information to disk.

我们开始TensorFlow会话并立即初始化所有变量。 然后,我们创建摘要编写器,将其用于定期将日志信息保存到磁盘。

These lines are responsible for generating batches of input data. Let’s pretend we have 100 training images and a batch size of 10. In the softmax example we just picked 10 random images for each iteration. This means that after 10 iterations each image will have been picked once on average(!). But in fact some images will have been picked multiple times while some images haven’t been part of any batch so far. As long as you repeat this often enough, it’s not that terrible that randomness causes some images to be part of the training batches somewhat more often than others.

这些行负责生成一批输入数据。 假设我们有100张训练图像,批处理大小为10张。在softmax示例中,我们为每次迭代选择了10张随机图像。 这意味着经过10次迭代后,每个图像平均会被选择一次(!)。 但是实际上,有些图像会被多次拾取,而到目前为止,有些图像还没有被纳入任何批次。 只要您重复的次数足够多,随机性就不会比某些图像更频繁地使某些图像成为训练批次的一部分。

But this time we want to improve the sampling process. What we do is we first shuffle the 100 images of the training dataset. The first 10 images of the shuffled data are our first batch, the next 10 images are our second batch and so forth. After 10 batches we’re at the end of our dataset and the process starts again. We shuffle the data another time and run through it from front to back. This guarantees that no image is being picked more often than any other while still ensuring that the order in which the images are returned is random.

但是这次我们要改善采样过程。 我们要做的是首先对训练数据集的100张图像进行混洗。 随机数据的前10张图像是我们的第一批,接下来的10张图像是我们的第二批,依此类推。 10个批次后,我们位于数据集的末尾,过程再次开始。 我们再次对数据进行混洗,然后从头到尾遍历数据。 这保证了没有图像比其他任何图像被更频繁地拾取,同时仍然确保了返回图像的顺序是随机的。

In order to achieve this, the gen_batch() function in data_helpers() returns a Python generator, which returns the next batch each time it is evaluated. The details of how generators work are beyond the scope of this post (a good explanation can be found here). We’re using the Python’s built-in zip() function to generate a list of tuples of the from [(image1, label1), (image2, label2), ...], which is then passed to our generator function.

为了实现这一点, gen_batch()函数data_helpers()返回一个Python generator ,其中每个被评价时间返回下一个批次。 生成器如何工作的详细信息超出了本文的范围(可以在此处找到很好的解释)。 我们正在使用Python的内置zip()函数来生成from [(image1, label1), (image2, label2), ...]的元组列表,然后将其传递给我们的生成器函数。

next(batches) returns the next batch of data. Since it’s still in the form of [(imageA, labelA), (imageB, labelB), ...], we need to unzip it first to separate images from labels, before filling feed_dict, the dictionary containing the TensorFlow placeholders, with a single batch of training data.

next(batches)返回下一批数据。 由于它仍然是[(imageA, labelA), (imageB, labelB), ...] ,因此我们需要先将其解压缩以将图像与标签分开,然后在feed_dict (包含TensorFlow占位符的字典)中填充一个单批训练数据。

Every 100 iterations the model’s current accuracy is evaluated and printed to the screen. In addition, the summary operation is being run and its results are added to the summary_writer which is responsible for writing the summaries to disk. From there they can be read and displayed by TensorBoard (see this section).

每进行100次迭代,就会评估模型的当前精度并将其打印到屏幕上。 此外,正在运行summary操作,并且其结果已添加到summary_writer ,后者负责将摘要写入磁盘。 TensorBoard可以从那里读取和显示它们(请参阅本节 )。

This line runs the train_step operation (defined previously to call two_layer_fc.training(), which contains the actual instructions for the optimization of the variables).

该行运行train_step操作(先前定义为调用two_layer_fc.training() ,其中包含用于优化变量的实际指令)。

When training a model takes a longer period of time, there is an easy way to save a snapshot of your progress. This allows you to come back later and restore the model in exactly the same state. All you need to do is to create a tf.train.Saver object (we did that earlier) and then call its save() method every time you want to take a snapshot.

训练模型需要较长时间时,有一种简单的方法可以保存进度快照。 这样一来,您稍后即可返回并以完全相同的状态还原模型。 您需要做的就是创建一个tf.train.Saver对象(我们之前做过),然后每次想要拍摄快照时都调用其save()方法。

Restoring a model is just as easy, just call the saver’s restore() method. There is a working code example showing how to do this in the file restore_model.pyin the github repository.

还原模型同样简单,只需调用保护程序的restore()方法即可。 github存储库中的restore_model.py文件中有一个工作代码示例,展示了如何执行此操作。

After the training is finished, the final model is evaluated on the test set (remember, the test set contains data that the model has not seen so far, allowing us to judge how well the model is able to generalize to new data).

训练完成后,将在测试集上评估最终模型(请记住,测试集包含该模型到目前为止尚未看到的数据,这使我们能够判断该模型能够很好地推广到新数据)。

结果 (Results)

Let’s run the model with the default parameters via “python run_fc_model.py”. My output looks like this:

让我们通过“ python run_fc_model.py ”使用默认参数运行模型。 我的输出如下所示:

Parameters: batch_size = 400 hidden1 = 120 learning_rate = 0.001 max_steps = 2000 reg_constant = 0.1 train_dir = tf_logs
Step 0, training accuracy 0.09 Step 100, training accuracy 0.2675 Step 200, training accuracy 0.3925 Step 300, training accuracy 0.41 Step 400, training accuracy 0.4075 Step 500, training accuracy 0.44 Step 600, training accuracy 0.455 Step 700, training accuracy 0.44 Step 800, training accuracy 0.48 Step 900, training accuracy 0.51 Saved checkpoint Step 1000, training accuracy 0.4425 Step 1100, training accuracy 0.5075 Step 1200, training accuracy 0.4925 Step 1300, training accuracy 0.5025 Step 1400, training accuracy 0.5775 Step 1500, training accuracy 0.515 Step 1600, training accuracy 0.4925 Step 1700, training accuracy 0.56 Step 1800, training accuracy 0.5375 Step 1900, training accuracy 0.51 Saved checkpoint Test accuracy 0.4633 Total time: 97.54s

We can see that the training accuracy starts at a level we would expect from guessing randomly (10 classes -> 10% chance of picking the correct one). Over the first about 1000 iterations the accuracy increases to around 50% and fluctuates around that value for the next 1000 iterations. The test accuracy of 46% is not much lower than the training accuracy. This indicates that our model is not significantly overfitted. The performance of the softmax classifier was around 30%, so 46% is an improvement of about 50%. Not bad!

我们可以看到,训练的准确性始于我们随机猜测所期望的水平(10个类-> 10%的机会选择正确的一个)。 在最初的约1000次迭代中,精度增加到50%左右,并在接下来的1000次迭代中围绕该值波动。 46%的测试准确度并不比训练准确度低很多。 这表明我们的模型没有明显过拟合。 softmax分类器的性能约为30%,因此46%的性能约为50%。 不错!

使用TensorBoard进行可视化 (Visualization with TensorBoard)

TensorBoard allows you to visualize different aspects of your TensorFlow graphs and is very useful for debugging and improving your networks. Let’s look at the TensorBoard-related lines of code spread throughout the codebase.

TensorBoard允许您可视化TensorFlow图的不同方面,对于调试和改进网络非常有用。 让我们看看遍及整个代码库的与TensorBoard相关的代码行。

In two_layer_fc.py we find the following:

two_layer_fc.py我们找到以下内容:

Each of these three lines creates a summary operation. By defining a summary operation you tell TensorFlow that you are interested in collecting summary information from certain tensors (logits, loss and accuracy in our case). The other parameter for the summary operation is just a label you want to attach to the summary.

这三行中的每行都创建一个摘要操作。 通过定义摘要操作,您可以告诉TensorFlow您有兴趣从某些张量(本例中为logitslossaccuracy )收集摘要信息。 摘要操作的另一个参数只是您要附加到摘要的标签。

There are different kinds of summary operations. We’re using scalar_summary to record information about scalar (non-vector) values and histogram_summary to collect info about a distribution of multiple values (more info about the various summary operations can be found in the TensorFlow docs).

有不同种类的汇总操作。 我们正在使用scalar_summary记录有关标量(非矢量)值的信息,并使用histogram_summary收集有关多个值分布的信息(有关各种汇总操作的更多信息,请参见TensorFlow文档 )。

In run_fc_model.py the following lines are relevant for the TensorBoard visualization:

run_fc_model.py ,以下几行与TensorBoard可视化相关:

An operation in TensorFlow doesn’t run by itself, you need to either call it directly or call another operation which depends on it. Since we don’t want to call each summary operation individually each time we want to collect summary information, we’re using tf.merge_all_summaries to create a single operation which runs all our summaries.

TensorFlow中的一个操作不是自己运行的,您需要直接调用它或调用另一个依赖于它的操作。 由于我们不想每次想要收集摘要信息时都单独调用每个摘要操作,因此我们使用tf.merge_all_summaries创建一个运行所有摘要的单个操作。

During the initialization of the TensorFlow session we’re creating a summary writer. The summary writer is responsible for actually writing summary data to disk. In its constructor we supply logdir, the directory where we want the logs to be written. The optional graph argument tells TensorBoard to render a display of the whole TensorFlow graph.

在TensorFlow会话初始化期间,我们正在创建摘要编写器。 摘要编写器负责将摘要数据实际写入磁盘。 在其构造函数中,我们提供logdir ,即我们希望将日志写入的目录。 可选的graph参数告诉TensorBoard渲染整个TensorFlow图的显示。

Every 100 iterations we execute the merged summary operation and feed the results to the summary writer which writes them to disk.

每执行100次迭代,我们就会执行合并的摘要操作,并将结果提供给摘要编写器,然后将其写入磁盘。

To view the results we run TensorBoard via “tensorboard --logdir=tf_logs” and open localhost:6006 in a web browser. In the “Events”-tab we can see how the network’s loss decreases and how its accuracy increases over time.

要查看结果,我们通过“ tensorboard --logdir=tf_logs ”运行TensorBoard并在Web浏览器中打开localhost:6006 。 在“事件”选项卡中,我们可以看到网络的损耗如何减少以及其准确性如何随时间增加。

The “Graphs”-tab shows a visualization of the TensorFlow graph we have defined. You can interactively rearrange it until you’re satisfied with how it looks. I think the following image shows the structure of our network pretty well.

“图形”选项卡显示了我们定义的TensorFlow图的可视化。 您可以交互式地重新排列它,直到对它的外观满意为止。 我认为下图很好地显示了我们的网络结构。

In the “Distribution”- and “Histograms”-tabs you can explore the results of the tf.histogram_summary operation we attached to logits, but I won’t go into further details here. More info can be found in the relevant section of the offical TensorFlow documentation.

在“分布”和“直方图”选项卡中,您可以浏览我们附加到logitstf.histogram_summary操作的结果,但在此不再赘述。 可以在官方TensorFlow文档的相关部分中找到更多信息。

进一步改进 (Further Improvements)

Maybe you’re thinking that training the softmax classifier took a lot less computation time than training the neural network. While that’s true, even if we kept training the softmax classifier as long as it took the neural network to train, it wouldn’t reach the same performance. The longer you train a model, the smaller the additional gains get and after a certain point the performance improvement is miniscule. We’ve reached this point with the neural network too. Additional training time would not improve the accuracy significantly anymore. There’s something else we could do though:

也许您认为与训练神经网络相比,训练softmax分类器花费的计算时间少得多。 的确如此,即使只要我们继续训练softmax分类器,只要它需要神经网络来训练,它就不会达到相同的性能。 训练模型的时间越长,获得的额外收益越小,并且在特定点之后,性能提升微乎其微。 我们也已经通过神经网络达到了这一点。 额外的培训时间将不再显着提高准确性。 我们还有其他可以做的事情:

The default parameter values are chosen to be pretty ok, but there is some room for improvement left. By varying parameters such as the number of neurons in the hidden layer or the learning rate, we should be able to improve the model’s accuracy some more. A testing accuracy greater than 50% should definitely be possible with this model with some further optimization. Although I would be very surprised if this model could be tuned to reach 65% or more. But there’s another type of network architecture for which such an accuracy is easily doable: convolutional neural networks. These are a class of neural networks which are not fully connected. Instead they try to make sense of local features in their input, which is very useful for analyzing images. It intuitively makes a lot of sense to take spatial information into account when looking at images. In part 3 of this series we will see the principles of how convolutional neural networks work and build one ourselves.

选择默认参数值还可以,但是还有一些改进的余地。 通过改变诸如隐藏层中的神经元数量或学习率之类的参数,我们应该能够进一步提高模型的准确性。 经过进一步优化,使用此模型绝对可以实现大于50%的测试精度。 尽管如果将该模型调整到65%或更高的水平,我会感到非常惊讶。 但是还有另一种类型的网络体系结构可以很容易地实现这种准确性:卷积神经网络。 这些是一类没有完全连接的神经网络。 相反,他们尝试在输入中理解局部特征,这对于分析图像非常有用。 直观地考虑图像时考虑空间信息非常有意义。 在本系列的第3部分中,我们将了解卷积神经网络如何工作并自行构建的原理。

Stay tuned for part 3 on convolutional neural networks and thanks a lot for reading! I’m happy about any feedback you might have!

敬请关注卷积神经网络的第3部分,非常感谢您的阅读! 我很高兴收到您的任何反馈!

aYou can also check out other articles I’ve written on my blog.

a您还可以查看我在博客上写的其他文章。

翻译自: https://www.freecodecamp.org/news/how-to-build-a-simple-image-recognition-system-with-tensorflow-part-2-c83348b33bce/

如何使用TensorFlow构建简单的图像识别系统(第2部分)相关推荐

  1. php webmail,构建简单的Webmail系统

    构建简单的Webmail系统 更新时间:2006年10月09日 00:00:00   作者: 这是一段Web Mail的示范代码,功能不是很强,但是结构还比较完整,主要的功能如查看文件夹,查看信件,回 ...

  2. 使用tensorflow构建简单卷积神经网络

    一 概要 CIFAR-10分类问题是机器学习领域的一个通用基准,其问题是将32X32像素的RGB图像分类成10种类别:飞机,手机,鸟,猫,鹿,狗,青蛙,马,船和卡车.  更多信息请移步CIFAR-10 ...

  3. 机器学习零基础?手把手教你用TensorFlow搭建图像识别系统

    [转] http://www.leiphone.com/news/201701/Y4uyEktkkwb5YhJM.html http://www.leiphone.com/news/201701/2t ...

  4. Docker使用Dockerfile构建简单镜像

    Docker使用Dockerfile构建简单镜像 首先确保系统已经安装docker 构建镜像 安装基础镜像 sudo docker pull ubuntu 查看镜像是否已经拉取成功 REPOSITOR ...

  5. 使用Tensorflow构建和训练自己的CNN来做简单的验证码识别

    Tensorflow是目前最流行的深度学习框架,我们可以用它来搭建自己的卷积神经网络并训练自己的分类器,本文介绍怎样使用Tensorflow构建自己的CNN,怎样训练用于简单的验证码识别的分类器.本文 ...

  6. 玩转直播:如何从 0 到 1 构建简单直播系统

    作者:vivo 互联网服务器团队-Li Guolin 一.前言 随着5G时代的到来,音视频行业也可能迎来一个行业的春天,直播则是新视频行业一直以来的一个重要的产品形态,从最初的秀场直播,游戏直播,到今 ...

  7. 玩转直播系列之从 0 到 1 构建简单直播系统(1)

    一.前言 随着5G时代的到来,音视频行业也可能迎来一个行业的春天,直播则是新视频行业一直以来的一个重要的产品形态,从最初的秀场直播,游戏直播,到今年由于疫情,目前比较火的在线教育直播,带货直播等,各类 ...

  8. 玩转直播系列之从 0 到 1 构建简单直播系统

    一.前言 随着5G时代的到来,音视频行业也可能迎来一个行业的春天,直播则是新视频行业一直以来的一个重要的产品形态,从最初的秀场直播,游戏直播,到今年由于疫情,目前比较火的在线教育直播,带货直播等,各类 ...

  9. 使用 TensorFlow 构建机器学习项目:6~10

    原文:Building Machine Learning Projects with TensorFlow 协议:CC BY-NC-SA 4.0 译者:飞龙 本文来自[ApacheCN 深度学习 译文 ...

最新文章

  1. 通过yiic来创建yii应用
  2. 拉开你和同龄人差距的,不是基因,不是努力,而是……
  3. Android国际化(多语言)实现,支持8.0
  4. 深入理解分布式技术 - 负载均衡策略
  5. 软件测试2019:第五次作业
  6. 多线程python实现方式_python多线程的两种实现方式(代码教程)
  7. 2021爱分析・中国采购数字化趋势报告
  8. 小牛各个版本的限速破解方式-适用N1/M1/N1s----附加转向灯提示音修改
  9. 串口收数数码管显示(串口带协议带校验)
  10. builder设计模式,写和很好
  11. Reliance Jio 4G网速最快 超过沃达丰及Idea
  12. [效率] HHKB键盘 + Autohotkey 配置秘籍
  13. java的动物打一生肖,吉祥的动物是什么生肖 指哪个生肖 打一生肖
  14. android studio记账,Android Studio——记账本以及图表可视化实现
  15. 绿色石化高质量发展 茂名天源石化碳三碳四资源利用项目开工
  16. PPT卡死了?只需要这几个小技巧,瞬间帮你提速!
  17. 背景动态线条js特效html5代码
  18. 开发过程---统一过程
  19. 计算机管理无法启用,无法打开计算机管理怎么办 打开计算机管理的方法
  20. arduino驱动MG996舵机+stm32f103驱动舵机

热门文章

  1. SQL求一个表中非重复数据及其出现的次数
  2. 测开2 - Python(文件操作)
  3. Nginx:Nginx limit_req limit_conn限速
  4. array_combine()
  5. Java Timestamp Memo
  6. 工作记录四-etcd与flanneld
  7. 联系表单 1_copy
  8. 腾讯手游如何提早揭露游戏外挂风险?
  9. 15行Python代码,帮你理解令牌桶算法
  10. c# .Net 缓存 使用System.Runtime.Caching 做缓存 平滑过期,绝对过期