mnist手写体识别中用到的TensorFlow API总结

声明：本文通过CNN实现mnist例子总结了TensorFlow 1.12的相关API。代码来源于《Learning TensorFlow》这本书，API查阅了TensorFlow官网API

作者： SixAbs

摘要

本文通过对经典深度学习的入门示例“mnist手写体数字识别”API进行总结，其目的是使自己在初学时候熟悉TensorFlow相关API，同时熟悉TensorFlow的基本使用。为了直奔主题，本文忽略了对CNN相关知识的详述，首先直接给出了用TensorFlow实现mnist手写体数字识别的代码，然后直接依次罗列其中的API并给出解释和加以拓展。

关键词：TensorFlow；API r1.12；mnist；

文章目录

1 TensorFlow实现mnist的代码
2 API总结
- 2.1 截断正太分布
- 2.2 Variable
- 2.3 constant常量
- 2.4 tf.placeholder() 占位符
- 2.5 tf.reshape()
- 2.6 tf.conv2d() 二维卷积操作
- 2.7 tf.nn.max_pool() 最大池化
- 2.8 tf.nn.relu() 修正线性单元
- 2.9 tf.nn.dropout()
- 其他以后再做总结

1 TensorFlow实现mnist的代码

"""
CNN手写体数字识别
"""
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'# 先定义好需要用到的函数
def weight_variable(shape):initial = tf.truncated_normal(shape, stddev=0.1)  return tf.Variable(initial)  def bias_variable(shape):initial = tf.constant(0.1, shape=shape)return tf.Variable(initial)def conv2d(x, W):return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')def max_pool_2x2(x):return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')def conv_layer(input, shape):W = weight_variable(shape)b = bias_variable([shape[3]])return tf.nn.relu(conv2d(input, W) + b)def full_layer(input, size):in_size = int(input.get_shape()[1])W = weight_variable([in_size, size])b = bias_variable([size])return tf.matmul(input, W) + bx = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])x_image = tf.reshape(x, [-1, 28, 28, 1])
conv1 = conv_layer(x_image, shape=[5, 5, 1, 32])
conv1_pool = max_pool_2x2(conv1)conv2 = conv_layer(conv1_pool, shape=[5, 5, 32, 64])
conv2_pool = max_pool_2x2(conv2)conv2_flat = tf.reshape(conv2_pool, [-1, 7*7*64])
full_1 = tf.nn.relu(full_layer(conv2_flat, 1024))keep_prob = tf.placeholder(tf.float32)
full1_drop = tf.nn.dropout(full_1, keep_prob=keep_prob)y_conv = full_layer(full1_drop, 10)sess = tf.Session()
tf.summary.FileWriter('log/', sess.graph)mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_conv, labels=y_))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))STEPS = 1001
with tf.Session() as sess:sess.run(tf.global_variables_initializer())for i in range(STEPS):batch = mnist.train.next_batch(50)if i % 100 == 0:train_accuracy = sess.run(accuracy, feed_dict={x: batch[0],y_: batch[1],keep_prob: 1.0})print("step {}, training accuracy {}".format(i, train_accuracy))sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})X = mnist.test.images.reshape(10, 1000, 784)Y = mnist.test.labels.reshape(10, 1000, 10)test_accuracy = np.mean([sess.run(accuracy, feed_dict={x: X[i], y_: Y[i], keep_prob: 1.0})for i in range(10)])print("test accuracy: {}".format(test_accuracy))

2 API总结

2.1 截断正太分布

tf.random.truncated_normal(shape,  # A 1-D integer Tensor or Python array. The shape of the output tensor.mean=0.0,  # A 0-D Tensor or Python value of type dtype. The mean of the truncated normal distribution.stddev=1.0,  # A 0-D Tensor or Python value of type dtype. The standard deviation of the normal distribution, before truncation.dtype=tf.float32,  # The type of the output.seed=None,  # A Python integer. Used to create a random seed for the distribution. See tf.set_random_seed for behavior.name=None  # A name for the operation (optional).
)

截断正太分布常用于给训练参数（如：权值，偏置）产生一些初值。

2.2 Variable

变量(Variable)是特殊的张量(Tensor)，它的值可以是一个任何类型和形状的张量。与其他张量不同，变量存在于单个 session.run 调用的上下文之外，也就是说，变量存储的是持久张量，当训练模型时，用变量来存储和更新参数。除此之外，在调用operator之前，所有变量都应被显式地初始化过。Variable其实是一个python类，该类的构造函数其实包含了很多参数：

__init__(initial_value=None,trainable=True,collections=None,validate_shape=True,caching_device=None,name=None,variable_def=None,dtype=None,expected_shape=None,import_scope=None,constraint=None,use_resource=None,synchronization=tf.VariableSynchronization.AUTO,aggregation=tf.VariableAggregation.NONE
)

其中initial_value是传入的初始化值，官网是这样描述的：

initial_value: A Tensor, or Python object convertible to a Tensor, which is the initial value for the Variable. The initial value must have a shape specified unless validate_shape is set to False. Can also be a callable with no argument that returns the initial value when called. In that case, dtype must be specified. (Note that initializer functions from init_ops.py must first be bound to a shape before being used here.)
变量的初始值可以是一个张量，或者是可转换为张量的Python对象。初始值必须具有指定的形状，除非validate_shape参数设置为False。也可以是一个无参数调用，调用时返回初始值。在这种情况下，dtype必须指定。（请注意，init_ops.py中的初始化函数在使用之前必须先绑定到一个形状。

变量创建时使用tf.Variable(), 在使用前需要为初始化数据分类内存，这时需要给sess.run()传入一个tf.global_variables_initializer()。

init = tf.global_variables_initializer()
sess.run(init)

和其他张量对象一样，Variables只有在运行模型时才会计算。同时重用同一个variable为了提高效率，我们可以调用tf.get_variables()。
其他更详细的信息直接参看官方api。

2.3 constant常量

tf.constant(value,  # A constant value (or list) of output type dtype.dtype=None,  # The type of the elements of the resulting tensor.shape=None,  # Optional dimensions of resulting tensor.name='Const',  # Optional name for the tensor.verify_shape=False  #  Boolean that enables verification of a shape of values.
)

根据函数参数信息，可以发现value是必须传的参数。
与tf.fill()比较：

tf.constant differs from tf.fill in a few ways:

tf.constant supports arbitrary constants, not just uniform scalar Tensors like tf.fill.

tf.constant creates a Const node in the computation graph with the exact value at graph construction time. On the other hand, tf.fill creates an Op in the graph that is expanded at runtime.

Because tf.constant only embeds constant values in the graph, it does not support dynamic shapes based on other runtime Tensors, whereas tf.fill does.

2.4 tf.placeholder() 占位符

TensorFlow已经为我们指定内置结构用于供给输入值，这些结构称为占位符。占位符可以被认为是空变量，并将在随后填充数据。我们首先使用它们来构建我们的图形，并且只有在执行它时才使用输入数据。

tf.placeholder(dtype,  # 指定占位符数据类型shape=None,  # shape指定输入的shape,当某一维指定位None,表示这一维可以是任意值。如x = tf.placeholder(tf.float32, shape=[None, 784])，这里None表示这个维度可以是任意大小，通常用于表示样本数量。name=None  # 名称
)

2.5 tf.reshape()

tf.reshape(  # 将给定的tensor的形状转换为指定的shapetensor,shape,name=None
)

注意shape参数可以有一个-1，表示缺省值（自适应），就是先根据其他维度调整，到时tensor总维度乘积除以其他几个维度乘积，就是缺省的维度大小。如：

a = tf.placeholder(tf.float32, shape=[1, 24])
print(a.get_shape())
b = tf.reshape(a, [-1, 3, 4])
print(b.get_shape())
# out =
# (1, 24)
# (2, 3, 4)

b的第一维设为 $- 1$ ，通过自适应（如果不能整除将会报错），第一维reshape后为 $(1\times 24)/(3\times4) = 2$ 。特殊的 shape=[-1]表示将tensor展成一维。

2.6 tf.conv2d() 二维卷积操作

tf.nn.conv2d(input,   # 输入张量data_format默认为 [batch, in_height, in_width, in_channels], 数值类型必须是half, bfloat16, float32, float64filter,  # 滤波器（核）[filter_height, filter_width, in_channels, out_channels]strides,  # 移动步长：[batch, height, weight, channel]分别表示对应的移动步长padding,  # 边缘补0，设置为‘SAME’添加后产生的特征图和输入维度一样大use_cudnn_on_gpu=True,  # bool类型，是否使用cudnn加速，默认为truedata_format='NHWC',  # 输入输出数据格式：默认为 [batch, height, width, channels]dilations=[1, 1, 1, 1],  # 每一维与data_format对应。如果设置为k> 1，则该维度上的每个滤镜元素之间将有k-1个跳过的单元格。可以做出中空滤波器的效果，用相同数量的参数获得更大的感受野name=None   # 名字
)

2.7 tf.nn.max_pool() 最大池化

tf.nn.max_pool(value,  # data_format格式的4维张量，一般是卷积后的feture mapksize,  # 池化窗口大小，取一个四维向量，一般是[1, height, width, 1]，因为我们不想在batch和channels上做池化strides,  # 和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]padding,  # 和卷积类似，可以取'VALID' 或者'SAME'data_format='NHWC', name=None
)

2.8 tf.nn.relu() 修正线性单元

tf.nn.relu(features,  # 张量name=None  # 名称
)

2.9 tf.nn.dropout()

tf.nn.dropout(x,  # float类型的tensorkeep_prob,  # float类型，每个元素被保留下来的概率，设置神经元被选中的概率,在初始化时keep_prob是一个占位符, keep_prob = tf.placeholder(tf.float32) 。tensorflow在run时设置keep_prob具体的值，例如keep_prob: 0.5noise_shape=None, # 一个1维的int32张量，代表了随机产生“保留/丢弃”标志的shapeseed=None,  # 整形变量，随机数种子name=None
)

其他以后再做总结

tf.matmul(a,b)  # 矩阵相乘a*b
tf.Session()  # 在会话中启动图
tf.summary.FileWriter()  # 将摘要协议缓冲区写入事件文件
tf.reduce_mean() # 默认reduce_mean(x)对所有元素求均值，指定减小的维度用axis属性
tf.equal() # 判断张量相等
tf.argmax() # 返回张量中的最大值
tf.cast() # 将tensor转型为新的类型