有关CIFAR-10数据集

（1）CIFAR-10数据集由10个类的60000个32x32彩色图像组成，每个类有6000个图像。有50000个训练图像和10000个测试图像。

（2）数据集分为五个训练批次和一个测试批次，每个批次有10000个图像。测试批次包含来自每个类别的恰好1000个随机选择的图像。

（3）第一部分是特征部分，使用一个[10000,3072的uint8的矩阵进行存储，每一行向量都是3X3大小的 3通道图片，构成的格式类似于[3,3,3]；第二部分为标签部分，使用一个10000数据的list进行存储，每个list对应的是0-9中的一个数字，对应于物品分类。另外对于python的数据集，还有一个标签为“label_names”，例如label_names[0] == “airplane”等。

CIFAR-10数据集下载

官网：http://www.cs.toronto.edu/~kriz/cifar.html

CIFAR-10数据集训练

最终可以运行的代码和一些解释：

1. cifar10_input.py

"""Routine for decoding the CIFAR-10 binary file format."""from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport osfrom six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf# Process images of this size. Note that this differs from the original CIFAR
# image size of 32 x 32. If one alters this number, then the entire model
# architecture will change and any model would need to be retrained.
# 原图像的尺度为32*32,但根据常识，信息部分通常位于图像的中央，
# 这里定义了以中心裁剪后图像的尺寸
IMAGE_SIZE = 24# Global constants describing the CIFAR-10 data set.
NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000def read_cifar10(filename_queue):"""Reads and parses examples from CIFAR10 data files.Recommendation: if you want N-way read parallelism, call this functionN times.  This will give you N independent Readers reading differentfiles & positions within those files, which will give better mixing ofexamples.Args:filename_queue: A queue of strings with the filenames to read from.Returns:An object representing a single example, with the following fields:height: number of rows in the result (32)width: number of columns in the result (32)depth: number of color channels in the result (3)key: a scalar string Tensor describing the filename & record numberfor this example.label: an int32 Tensor with the label in the range 0..9.uint8image: a [height, width, depth] uint8 Tensor with the image data"""# 定义一个空的类对象，类似于c语言里面的结构体定义class CIFAR10Record(object):passresult = CIFAR10Record()# Dimensions of the images in the CIFAR-10 dataset.# See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the# input format.label_bytes = 1  # 2 for CIFAR-100result.height = 32result.width = 32result.depth = 3# 一张图像占用空间image_bytes = result.height * result.width * result.depth# Every record consists of a label followed by the image, with a# fixed number of bytes for each.# 数据集中一条记录的组成record_bytes = label_bytes + image_bytes# Read a record, getting filenames from the filename_queue.  No# header or footer in the CIFAR-10 format, so we leave header_bytes# and footer_bytes at their default of 0.# 定义一个Reader，它每次能从文件中读取固定字节数reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)# 返回从filename_queue中读取的(key, value)对，key和value都是字符串类型的tensor，并且当队列中的某一个文件读完成时，该文件名会dequeueresult.key, value = reader.read(filename_queue)# Convert from a string to a vector of uint8 that is record_bytes long.# 解码操作可以看作读二进制文件，把字符串中的字节转换为数值向量,每一个数值占用一个字节,在[0, 255]区间内，因此out_type要取uint8类型record_bytes = tf.decode_raw(value, tf.uint8)  # 将字符串Tensor转化成uint8类型# The first bytes represent the label, which we convert from uint8->int32.# 从一维tensor对象中截取一个slice,类似于从一维向量中筛选子向量，因为record_bytes中包含了label和feature，故要对向量类型tensor进行'parse'操作result.label = tf.cast(tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32)  # 分别表示待截取片段的起点和长度，并且把标签由之前的uint8转变成int32数据类型# The remaining bytes after the label represent the image, which we reshape.# from [depth * height * width] to [depth, height, width].# 提取每条记录中的图像数据为result.depth, result.height, result.widthdepth_major = tf.reshape(tf.strided_slice(record_bytes, [label_bytes],[label_bytes + image_bytes]),[result.depth, result.height, result.width])# Convert from [depth, height, width] to [height, width, depth].# 改变为height, width, depthresult.uint8image = tf.transpose(depth_major, [1, 2, 0])return result# 构建一个排列后的一组图片和分类
def _generate_image_and_label_batch(image, label, min_queue_examples,batch_size, shuffle):"""Construct a queued batch of images and labels.Args:image: 3-D Tensor of [height, width, 3] of type.float32.label: 1-D Tensor of type.int32min_queue_examples: int32, minimum number of samples to retainin the queue that provides of batches of examples.batch_size: Number of images per batch.shuffle: boolean indicating whether to use a shuffling queue.Returns:images: Images. 4D tensor of [batch_size, height, width, 3] size.labels: Labels. 1D tensor of [batch_size] size."""# Create a queue that shuffles the examples, and then# read 'batch_size' images + labels from the example queue.# 线程数num_preprocess_threads = 16# 布尔指示是否使用一个shuffling队列if shuffle:images, label_batch = tf.train.shuffle_batch([image, label],batch_size=batch_size,num_threads=num_preprocess_threads,capacity=min_queue_examples + 3 * batch_size,min_after_dequeue=min_queue_examples)else:# tf.train.batch(tensors, batch_size, num_threads=1, capacity=32,# enqueue_many=False, shapes=None, dynamic_pad=False,# allow_smaller_final_batch=False, shared_name=None, name=None)# 这里是用队列实现，已经默认使用enqueue_runner将enqueue_runner加入到Graph'senqueue_runner集合中# 其默认enqueue_many=False时，输入的tensor为一个样本【x,y,z】,输出为Tensor的一批样本# capacity：队列中允许最大元素个数images, label_batch = tf.train.batch([image, label],batch_size=batch_size,num_threads=num_preprocess_threads,capacity=min_queue_examples + 3 * batch_size)# Display the training images in the visualizer.# 将训练图片可视化，可拱直接检查图片正误tf.summary.image('images', images)return images, tf.reshape(label_batch, [batch_size])# 为CIFAR评价构建输入
# data_dir路径
# batch_size一个组的大小
def distorted_inputs(data_dir, batch_size):"""Construct distorted input for CIFAR training using the Reader ops.Args:data_dir: Path to the CIFAR-10 data directory.batch_size: Number of images per batch.Returns:images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.labels: Labels. 1D tensor of [batch_size] size."""filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)for i in xrange(1, 6)]for f in filenames:if not tf.gfile.Exists(f):raise ValueError('Failed to find file: ' + f)# Create a queue that produces the filenames to read.filename_queue = tf.train.string_input_producer(filenames)# Read examples from files in the filename queue.read_input = read_cifar10(filename_queue)reshaped_image = tf.cast(read_input.uint8image, tf.float32)height = IMAGE_SIZEwidth = IMAGE_SIZE# Image processing for training the network. Note the many random# distortions applied to the image.# Randomly crop a [height, width] section of the image.distorted_image = tf.random_crop(reshaped_image, [height, width, 3])# Randomly flip the image horizontally.distorted_image = tf.image.random_flip_left_right(distorted_image)# Because these operations are not commutative, consider randomizing# the order their operation.# NOTE: since per_image_standardization zeros the mean and makes# the stddev unit, this likely has no effect see tensorflow#1458.distorted_image = tf.image.random_brightness(distorted_image,max_delta=63)distorted_image = tf.image.random_contrast(distorted_image,lower=0.2, upper=1.8)# Subtract off the mean and divide by the variance of the pixels.float_image = tf.image.per_image_standardization(distorted_image)# Set the shapes of tensors.# 设置张量的型float_image.set_shape([height, width, 3])read_input.label.set_shape([1])# Ensure that the random shuffling has good mixing properties.# 确保洗牌的随机性min_fraction_of_examples_in_queue = 0.4min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *min_fraction_of_examples_in_queue)print('Filling queue with %d CIFAR images before starting to train. ''This will take a few minutes.' % min_queue_examples)# Generate a batch of images and labels by building up a queue of examples.return _generate_image_and_label_batch(float_image, read_input.label,min_queue_examples, batch_size,shuffle=True)# 为CIFAR评价构建输入
# eval_data使用训练还是评价数据集
# data_dir路径
# batch_size一个组的大小
def inputs(eval_data, data_dir, batch_size):"""Construct input for CIFAR evaluation using the Reader ops.Args:eval_data: bool, indicating if one should use the train or eval data set.data_dir: Path to the CIFAR-10 data directory.batch_size: Number of images per batch.Returns:images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.labels: Labels. 1D tensor of [batch_size] size."""if not eval_data:filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)for i in xrange(1, 6)]num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAINelse:filenames = [os.path.join(data_dir, 'test_batch.bin')]num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVALfor f in filenames:if not tf.gfile.Exists(f):raise ValueError('Failed to find file: ' + f)# Create a queue that produces the filenames to read.# 文件名队列# def string_input_producer(string_tensor,# num_epochs=None,# shuffle=True,# seed=None,# capacity=32,# shared_name=None,# name=None,# cancel_op=None):# 根据上面的函数可以看出下面的这个默认对输入队列进行shuffle，string_input_producer返回的是字符串队列，# 使用enqueue_runner将enqueue_runner加入到Graph'senqueue_runner集合中filename_queue = tf.train.string_input_producer(filenames)# Read examples from files in the filename queue.# 从文件队列中读取解析出的图片队列# read_cifar10从输入文件名队列中读取一条图像记录read_input = read_cifar10(filename_queue)# 将记录中的图像记录转换为float32reshaped_image = tf.cast(read_input.uint8image, tf.float32)height = IMAGE_SIZEwidth = IMAGE_SIZE# Image processing for evaluation.# Crop the central [height, width] of the image.# 将图像裁剪成24*24resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,height, width)# Subtract off the mean and divide by the variance of the pixels.# 对图像数据进行归一化float_image = tf.image.per_image_standardization(resized_image)# Set the shapes of tensors.float_image.set_shape([height, width, 3])read_input.label.set_shape([1])# Ensure that the random shuffling has good mixing properties.min_fraction_of_examples_in_queue = 0.4min_queue_examples = int(num_examples_per_epoch *min_fraction_of_examples_in_queue)# Generate a batch of images and labels by building up a queue of examples.# 根据当前记录中第一条记录的值，采用多线程的方法，批量读取一个batch中的数据return _generate_image_and_label_batch(float_image, read_input.label,min_queue_examples, batch_size,shuffle=False)

2. cifar10.py

"""Builds the CIFAR-10 network.
Summary of available functions:# Compute input images and labels for training. If you would like to run# evaluations, use inputs() instead.inputs, labels = distorted_inputs()# Compute inference on the model inputs to make a prediction.predictions = inference(inputs)# Compute the total loss of the prediction with respect to the labels.loss = loss(predictions, labels)# Create a graph to run one step of training with respect to the loss.train_op = train(loss, global_step)
"""
# pylint: disable=missing-docstring
from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport argparse
import os
import re
import sys
import tarfile
import argparsefrom six.moves import urllib
import tensorflow as tfimport cifar10_inputparser = argparse.ArgumentParser()# Basic model parameters.
parser.add_argument('--batch_size', type=int, default=128,help='Number of images to process in a batch.')parser.add_argument('--data_dir', type=str, default='D:/QQ文件/cifar10-test1/cifar10_data/',help='Path to the CIFAR-10 data directory.')parser.add_argument('--use_fp16', type=bool, default=False,help='Train the model using fp16.')FLAGS = parser.parse_args()# Global constants describing the CIFAR-10 data set.
IMAGE_SIZE = cifar10_input.IMAGE_SIZE
NUM_CLASSES = cifar10_input.NUM_CLASSES
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL# Constants describing the training process.
MOVING_AVERAGE_DECAY = 0.9999  # The decay to use for the moving average.
NUM_EPOCHS_PER_DECAY = 350.0  # Epochs after which learning rate decays.# 衰减呈阶梯函数，控制衰减周期（阶梯宽度）
LEARNING_RATE_DECAY_FACTOR = 0.1  # Learning rate decay factor.# 学习率衰减因子
INITIAL_LEARNING_RATE = 0.1  # Initial learning rate.# 初始学习率# If a model is trained with multiple GPUs, prefix all Op names with tower_name
# to differentiate the operations. Note that this prefix is removed from the
# names of the summaries when visualizing a model.
TOWER_NAME = 'tower'DATA_URL = 'https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'def _activation_summary(x):"""Helper to create summaries for activations.Creates a summary that provides a histogram of activations.Creates a summary that measures the sparsity of activations.Args:x: TensorReturns:nothing"""# Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training# session. This helps the clarity of presentation on tensorboard.tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name)tf.summary.histogram(tensor_name + '/activations', x)tf.summary.scalar(tensor_name + '/sparsity',tf.nn.zero_fraction(x))def _variable_on_cpu(name, shape, initializer):"""Helper to create a Variable stored on CPU memory.Args:name: name of the variableshape: list of intsinitializer: initializer for VariableReturns:Variable Tensor"""with tf.device('/cpu:0'):  # 一个 context manager,用于为新的op指定要使用的硬件dtype = tf.float16 if FLAGS.use_fp16 else tf.float32var = tf.get_variable(name, shape, initializer=initializer, dtype=dtype)return vardef _variable_with_weight_decay(name, shape, stddev, wd):"""Helper to create an initialized Variable with weight decay.Note that the Variable is initialized with a truncated normal distribution.A weight decay is added only if one is specified.Args:name: name of the variableshape: list of intsstddev: standard deviation of a truncated Gaussianwd: add L2Loss weight decay multiplied by this float. If None, weightdecay is not added for this Variable.Returns:Variable Tensor"""dtype = tf.float16 if FLAGS.use_fp16 else tf.float32var = _variable_on_cpu(name,shape,tf.truncated_normal_initializer(stddev=stddev, dtype=dtype))if wd is not None:weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss')tf.add_to_collection('losses', weight_decay)return vardef distorted_inputs():"""Construct distorted input for CIFAR training using the Reader ops.Returns:images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.labels: Labels. 1D tensor of [batch_size] size.Raises:ValueError: If no data_dir"""if not FLAGS.data_dir:raise ValueError('Please supply a data_dir')data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')images, labels = cifar10_input.distorted_inputs(data_dir=data_dir,batch_size=FLAGS.batch_size)if FLAGS.use_fp16:images = tf.cast(images, tf.float16)labels = tf.cast(labels, tf.float16)return images, labelsdef inputs(eval_data):"""Construct input for CIFAR evaluation using the Reader ops.Args:eval_data: bool, indicating if one should use the train or eval data set.Returns:images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.labels: Labels. 1D tensor of [batch_size] size.Raises:ValueError: If no data_dir"""if not FLAGS.data_dir:raise ValueError('Please supply a data_dir')data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')images, labels = cifar10_input.inputs(eval_data=eval_data,data_dir=data_dir,batch_size=FLAGS.batch_size)if FLAGS.use_fp16:images = tf.cast(images, tf.float16)labels = tf.cast(labels, tf.float16)return images, labels# 开始建立网络，第一层卷积层的 weight 不进行 L2正则，因此 kernel(wd) 这一项设为0，建立值为0的 biases，
# conv1的结果由 ReLu 激活，由 _activation_summary() 进行汇总；然后建立第一层池化层，
# 最大池化尺寸和步长不一致可以增加数据的丰富性；最后建立 LRN 层
def inference(images):"""Build the CIFAR-10 model.Args:images: Images returned from distorted_inputs() or inputs().Returns:Logits."""# We instantiate all variables using tf.get_variable() instead of# tf.Variable() in order to share variables across multiple GPU training runs.# If we only ran this model on a single GPU, we could simplify this function# by replacing all instances of tf.get_variable() with tf.Variable().## conv1with tf.variable_scope('conv1') as scope:kernel = _variable_with_weight_decay('weights',shape=[5, 5, 3, 64],stddev=5e-2,wd=0.0)conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))pre_activation = tf.nn.bias_add(conv, biases)conv1 = tf.nn.relu(pre_activation, name=scope.name)_activation_summary(conv1)# pool1pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],padding='SAME', name='pool1')# norm1 局部相响应归一化# LRN层模仿了生物神经系统的# "侧抑制"# 机制，对局部神经元的活动创建竞争环境，使得其中响应比较大的值变得相对更大，并抑制其他反馈较小的神经元，增强了模型的泛化能力，LRN# 对Relu 这种没有上限边界的激活函数会比较有用，因为它会从附近的多个卷积核的响应中挑选比较大的反馈，但不适合# sigmoid这种有固定边界并且能抑制过大的激活函数。norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,name='norm1')# conv2# 第二层卷积层与第一层，除了输入参数的改变之外，将 biases 值全部初始化为0.1，# 调换最大池化和 LRN 层的顺序，先进行LRN，再使用最大池化层。with tf.variable_scope('conv2') as scope:kernel = _variable_with_weight_decay('weights',shape=[5, 5, 64, 64],stddev=5e-2,wd=0.0)conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))pre_activation = tf.nn.bias_add(conv, biases)conv2 = tf.nn.relu(pre_activation, name=scope.name)_activation_summary(conv2)# norm2norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,name='norm2')# pool2pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],strides=[1, 2, 2, 1], padding='SAME', name='pool2')# local3# 第三层全连接层 ，需要先把前面的卷积层的输出结果全部 flatten，# 使用 tf.reshape 函数将每个样本都变为一维向量，使用 get_shape 函数获取数据扁平化之后的长度；# 然后对全连接层的 weights 和 biases 进行初始化，为了防止全连接层过拟合，设置一个非零的 wd 值0.004，# 让这一层的所有参数都被 L2正则所约束，最后依然使用 Relu 激活函数进行非线性化。# 同理，可以建立第四层全连接层。with tf.variable_scope('local3') as scope:# Move everything into depth so we can perform a single matrix multiply.reshape = tf.reshape(pool2, [FLAGS.batch_size, -1])dim = reshape.get_shape()[1].valueweights = _variable_with_weight_decay('weights', shape=[dim, 384],stddev=0.04, wd=0.004)biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)_activation_summary(local3)# local4with tf.variable_scope('local4') as scope:weights = _variable_with_weight_decay('weights', shape=[384, 192],stddev=0.04, wd=0.004)biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)_activation_summary(local4)# linear layer(WX + b),# We don't apply softmax here because# tf.nn.sparse_softmax_cross_entropy_with_logits accepts the unscaled logits# and performs the softmax internally for efficiency.# 最后的 softmax_linear 层，先创建这一层的 weights 和 biases，不添加L2正则化。# 在这个模型中，不像之前的例子使用 sotfmax 输出最后的结果，因为将 softmax 的操作放在来计算 loss 的部分，# 将 softmax_linear 的线性返回值 logits 与 labels 计算 loss，with tf.variable_scope('softmax_linear') as scope:weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],stddev=1 / 192.0, wd=0.0)biases = _variable_on_cpu('biases', [NUM_CLASSES],tf.constant_initializer(0.0))softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)_activation_summary(softmax_linear)return softmax_linear# 损失函数
# 通过 tf.nn.softmax 后的 logits 值(属于每个类别的概率值)
def loss(logits, labels):"""Add L2Loss to all the trainable variables.Add summary for "Loss" and "Loss/avg".Args:logits: Logits from inference().labels: Labels from distorted_inputs or inputs(). 1-D tensorof shape [batch_size]Returns:Loss tensor of type float."""# Calculate the average cross entropy loss across the batch.labels = tf.cast(labels, tf.int64)# 在 CIFAR-10 中，labels 的 shape 为 [batch_size]，每个样本的 label 为0到9的一个数，代表10个分类，# 这些类之间是相互排斥的，每个 CIFAR-10 图片只能被标记为唯一的一个标签：一张图片可能是一只狗或一辆卡车，而不能两者都是。# 因此我们需要对 label 值 onehot encoding，转化过程比较繁琐，# 新版的 TensorFlow API 支持对唯一值 labels 的 sparse_to_dense，只需要一步：cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='cross_entropy_per_example')cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')# 这里的 labels 的 shape 为 [batch_size, 1]。# 再使用 tf.add_to_collection 把 cross entropy 的 loss 添加到整体 losses 的 collection 中。#  最后，使用 tf.add_n 将整体 losses 的 collection中 的全部 loss 求和，得到最终的 loss 并返回，# 其中包含 cross entropy loss，还有后两个全连接层中的 weight 的 L2 losstf.add_to_collection('losses', cross_entropy_mean)# The total loss is defined as the cross entropy loss plus all of the weight# decay terms (L2 loss).return tf.add_n(tf.get_collection('losses'), name='total_loss')def _add_loss_summaries(total_loss):"""Add summaries for losses in CIFAR-10 model.Generates moving average for all losses and associated summaries forvisualizing the performance of the network.Args:total_loss: Total loss from loss().Returns:loss_averages_op: op for generating moving averages of losses."""# Compute the moving average of all individual losses and the total loss.# 创建一个新的指数滑动均值对象loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')# 从字典集合中返回关键字'losses'对应的所有变量，包括交叉熵损失和正则项损失losses = tf.get_collection('losses')# 创建'shadow variables'并添加维护滑动均值的操作# apply() 方法会添加 trained variables 的 shadow copies，并添加操作来维护变量的滑动均值到 shadow copies。# 滑动均值是通过指数衰减计算得到的，shadow variable 的初始化值和 trained variables 相同，# 其更新公式为 shadow_variable = decay * shadow_variable + (1 - decay) * variable。loss_averages_op = loss_averages.apply(losses + [total_loss])# Attach a scalar summary to all individual losses and the total loss; do the# same for the averaged version of the losses.for l in losses + [total_loss]:# Name each loss as '(raw)' and name the moving average version of the loss# as the original loss name.tf.summary.scalar(l.op.name + ' (raw)', l)tf.summary.scalar(l.op.name, loss_averages.average(l))return loss_averages_opdef train(total_loss, global_step):"""Train CIFAR-10 model.Create an optimizer and apply to all trainable variables. Add movingaverage for all trainable variables.Args:total_loss: Total loss from loss().global_step: Integer Variable counting the number of training stepsprocessed.Returns:train_op: op for training."""# Variables that affect learning rate.num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_sizedecay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)# Decay the learning rate exponentially based on the number of steps.# 首先定义学习率（learning rate），并设置随迭代次数衰减，并进行 summary：lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,global_step,decay_steps,LEARNING_RATE_DECAY_FACTOR,staircase=True)tf.summary.scalar('learning_rate', lr)# Generate moving averages of all losses and associated summaries.# 对 loss 生成滑动均值和汇总，通过使用指数衰减，来维护变量的滑动均值(Moving Average)。# 当训练模型时，维护训练参数的滑动均值是有好处的，在测试过程中使用滑动参数比最终训练的参数值本身，会提高模型的实际性能即准确率。loss_averages_op = _add_loss_summaries(total_loss)  # 损失变量的更新操作# Compute gradients.# 定义训练方法与目标，tf.control_dependencies 是一个 context manager，控制节点执行顺序，先执行[ ]中的操作，再执行 context 中的操作：with tf.control_dependencies([loss_averages_op]):opt = tf.train.GradientDescentOptimizer(lr)  # 优化器  随机梯度下降法grads = opt.compute_gradients(total_loss)  # 返回计算出的(gradient, variable) pairs# Apply gradients.# 返回一步梯度更新操作apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)# Add histograms for trainable variables.for var in tf.trainable_variables():tf.summary.histogram(var.op.name, var)# Add histograms for gradients.for grad, var in grads:if grad is not None:tf.summary.histogram(var.op.name + '/gradients', grad)# Track the moving averages of all trainable variables.# 最后，动态调整衰减率，返回模型参数变量的滑动更新操作即 train op：variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)variables_averages_op = variable_averages.apply(tf.trainable_variables())with tf.control_dependencies([apply_gradient_op, variables_averages_op]):train_op = tf.no_op(name='train')return train_opdef maybe_download_and_extract():"""Download and extract the tarball from Alex's website."""dest_directory = FLAGS.data_dirif not os.path.exists(dest_directory):os.makedirs(dest_directory)filename = DATA_URL.split('/')[-1]filepath = os.path.join(dest_directory, filename)print(filepath)if not os.path.exists(filepath):def _progress(count, block_size, total_size):sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,float(count * block_size) / float(total_size) * 100.0))sys.stdout.flush()filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)print()statinfo = os.stat(filepath)print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')extracted_dir_path = os.path.join(dest_directory, 'cifar-10-batches-bin')if not os.path.exists(extracted_dir_path):tarfile.open(filepath, 'r:gz').extractall(dest_directory)

3. cifar10_train.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionfrom datetime import datetime
import timeimport tensorflow as tfimport cifar10parser = cifar10.parserparser.add_argument('--train_dir', type=str, default='cifar10_train/',help='Directory where to write event logs and checkpoint.')parser.add_argument('--max_steps', type=int, default=1000000,help='Number of batches to run.')parser.add_argument('--log_device_placement', type=bool, default=False,help='Whether to log device placement.')parser.add_argument('--log_frequency', type=int, default=10,help='How often to log results to the console.')def train():"""Train CIFAR-10 for a number of steps."""# 指定当前图为默认graphwith tf.Graph().as_default():# 设置trainable=False，是因为防止训练过程中对global_step变量也进行滑动更新操作 global_step = tf.Variable(0, trainable=False)global_step = tf.train.get_or_create_global_step()# Get images and labels for CIFAR-10.# Force input pipeline to CPU:0 to avoid operations sometimes ending up on# GPU and resulting in a slow down.with tf.device('/cpu:0'):images, labels = cifar10.distorted_inputs()# Build a Graph that computes the logits predictions from the# inference model.logits = cifar10.inference(images)# Calculate loss.loss = cifar10.loss(logits, labels)# Build a Graph that trains the model with one batch of examples and# updates the model parameters.train_op = cifar10.train(loss, global_step)class _LoggerHook(tf.train.SessionRunHook):"""Logs loss and runtime."""def begin(self):self._step = -1self._start_time = time.time()def before_run(self, run_context):self._step += 1return tf.train.SessionRunArgs(loss)  # Asks for loss value.def after_run(self, run_context, run_values):if self._step % FLAGS.log_frequency == 0:current_time = time.time()duration = current_time - self._start_timeself._start_time = current_timeloss_value = run_values.resultsexamples_per_sec = FLAGS.log_frequency * FLAGS.batch_size / durationsec_per_batch = float(duration / FLAGS.log_frequency)format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f ''sec/batch)')print(format_str % (datetime.now(), self._step, loss_value,examples_per_sec, sec_per_batch))with tf.train.MonitoredTrainingSession(checkpoint_dir=FLAGS.train_dir,hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),tf.train.NanTensorHook(loss),_LoggerHook()],config=tf.ConfigProto(log_device_placement=FLAGS.log_device_placement)) as mon_sess:while not mon_sess.should_stop():mon_sess.run(train_op)def main(argv=None):  # pylint: disable=unused-argumentcifar10.maybe_download_and_extract()if tf.gfile.Exists(FLAGS.train_dir):tf.gfile.DeleteRecursively(FLAGS.train_dir)tf.gfile.MakeDirs(FLAGS.train_dir)train()if __name__ == '__main__':FLAGS = parser.parse_args()tf.app.run()# tensorboard  --logdir=D:\tmp\cifar10_train

4. cifar10_eval.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionfrom datetime import datetime
import math
import timeimport numpy as np
import tensorflow as tfimport cifar10parser = cifar10.parserparser.add_argument('--eval_dir', type=str, default='/cifar10_eval',help='Directory where to write event logs.')parser.add_argument('--eval_data', type=str, default='test',help='Either `test` or `train_eval`.')parser.add_argument('--checkpoint_dir', type=str, default='/cifar10_train',help='Directory where to read model checkpoints.')parser.add_argument('--eval_interval_secs', type=int, default=60 * 5,help='How often to run the eval.')parser.add_argument('--num_examples', type=int, default=10000,help='Number of examples to run.')parser.add_argument('--run_once', type=bool, default=False,help='Whether to run eval only once.')# cifar10_train.py 会周期性的在检查点文件中保存模型中的所有参数，但是不会对模型进行评估。
# cifar10_eval.py 会使用该检查点文件在另一部分数据集上测试预测性能。
# 利用 inference() 函数重构模型，并使用了在评估数据集所有10,000张 CIFAR-10 图片进行测试。
# 最终计算出的精度为 1 : N，N = 预测值中置信度最高的一项与图片真实 label 匹配的频次。
# 为了监控模型在训练过程中的改进情况，评估用的脚本文件会周期性的在最新的检查点文件上运行，
# 这些检查点文件是由上述的 cifar10_train.py 产生
def eval_once(saver, summary_writer, top_k_op, summary_op):"""Run Eval once.Args:saver: Saver.summary_writer: Summary writer.top_k_op: Top K op.summary_op: Summary op."""with tf.Session() as sess:ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)if ckpt and ckpt.model_checkpoint_path:# Restores from checkpointsaver.restore(sess, ckpt.model_checkpoint_path)# Assuming model_checkpoint_path looks something like:#   /my-favorite-path/cifar10_train/model.ckpt-0,# extract global_step from it.global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]else:print('No checkpoint file found')return# Start the queue runners.coord = tf.train.Coordinator()try:threads = []for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):threads.extend(qr.create_threads(sess, coord=coord, daemon=True,start=True))num_iter = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size))true_count = 0  # Counts the number of correct predictions.total_sample_count = num_iter * FLAGS.batch_sizestep = 0while step < num_iter and not coord.should_stop():predictions = sess.run([top_k_op])true_count += np.sum(predictions)step += 1# Compute precision @ 1.precision = true_count / total_sample_countprint('%s: precision @ 1 = %.3f' % (datetime.now(), precision))summary = tf.Summary()summary.ParseFromString(sess.run(summary_op))summary.value.add(tag='Precision @ 1', simple_value=precision)summary_writer.add_summary(summary, global_step)except Exception as e:  # pylint: disable=broad-exceptcoord.request_stop(e)coord.request_stop()coord.join(threads, stop_grace_period_secs=10)def evaluate():"""Eval CIFAR-10 for a number of steps."""with tf.Graph().as_default() as g:# Get images and labels for CIFAR-10.eval_data = FLAGS.eval_data == 'test'images, labels = cifar10.inputs(eval_data=eval_data)# Build a Graph that computes the logits predictions from the# inference model.logits = cifar10.inference(images)# Calculate predictions.top_k_op = tf.nn.in_top_k(logits, labels, 1)# Restore the moving average version of the learned variables for eval.variable_averages = tf.train.ExponentialMovingAverage(cifar10.MOVING_AVERAGE_DECAY)variables_to_restore = variable_averages.variables_to_restore()saver = tf.train.Saver(variables_to_restore)# Build the summary operation based on the TF collection of Summaries.summary_op = tf.summary.merge_all()summary_writer = tf.summary.FileWriter(FLAGS.eval_dir, g)while True:eval_once(saver, summary_writer, top_k_op, summary_op)if FLAGS.run_once:breaktime.sleep(FLAGS.eval_interval_secs)def main(argv=None):  # pylint: disable=unused-argumentcifar10.maybe_download_and_extract()if tf.gfile.Exists(FLAGS.eval_dir):tf.gfile.DeleteRecursively(FLAGS.eval_dir)tf.gfile.MakeDirs(FLAGS.eval_dir)evaluate()if __name__ == '__main__':FLAGS = parser.parse_args()tf.app.run()# 在训练脚本会为所有学习变量计算其滑动均值(Moving Average)，
# 评估脚本则直接将所有学习到的模型参数替换成对应的滑动均值，这一替代方式可以在评估过程中提升模型的性能。# tensorboard  --logdir=D:\tmp\cifar10_train

运行结果

cifar10_train.py的运行结果：

TensorBoard的可视化：

遇到的问题 && 解决方法

1. 问题一：pycharm找不到数据文件
错误信息： ValueError: Failed to ﬁnd ﬁle: cifar10_data/cifar-10-batches-bin\data_batc
错误原因：
看到下面博客中的解释，发现是因为预先定义了cifar-10的存储路径，默认路径是/tmp/cifar10_train，所以运行时找不到文件。

解决方法：
（1）将cifar10_train.py中第14行的默认路径改为了 ‘cifar10_train/’

（2）将cifar10.py中第36行的默认路径改成了cifar10数据存储的绝对路径，例：'D:/QQ文件/cifar10test1/cifar10_data/'
（然后就可以运行了）

2. 问题二：tensorboard的网址无法打开
在命令行输入tensorboard --logdir cifar10_train/可以查看训练进度，其中 –logdir cifar10_train/ 表示模型训练日志保存的位置：

该网址谷歌浏览器无法打开，解决方法：在原命令语句后加上" --host=127.0.0.1"，这样返回的地址也变成完整的127.0.0.1:6006，再到浏览器将地址输入，即可正常打开网页。重新在命令行输入：tensorboard --logdir cifar10_train/ --host=127.0.0.1

参考

https://blog.csdn.net/barry_j/article/details/79252438
https://www.cnblogs.com/gangzhucoll/p/12778246.html

https://blog.csdn.net/wang_kmin/article/details/81637816

https://www.cnblogs.com/YouXiangLiThon/articles/7246169.html

https://blog.csdn.net/qq_30377909/article/details/89946818

cifar10数据集训练相关推荐

基于Keras搭建cifar10数据集训练预测Pipeline
基于Keras搭建cifar10数据集训练预测Pipeline 钢笔先生关注 0.5412019.01.17 22:52:05字数 227阅读 500 Pipeline 本次训练模型的数据直接使用Ke ...
TF之CNN：基于CIFAR-10数据集训练、检测CNN(2+2)模型(TensorBoard可视化)
TF之CNN:基于CIFAR-10数据集训练.检测CNN(2+2)模型(TensorBoard可视化) 目录 1.基于CIFAR-10数据集训练CNN(2+2)模型代码 2.检测CNN(2+2)模型 ...
cifar10数据集测试有多少张图_pytorch VGG11识别cifar10数据集(训练+预测单张输入图片操作)...
首先这是VGG的结构图,VGG11则是红色框里的结构,共分五个block,如红框中的VGG11第一个block就是一个conv3-64卷积层: 一,写VGG代码时,首先定义一个 vgg_block(n ...
caffe学习（五）：cifar-10数据集训练及测试（Ubuntu）
简介网站链接:CIFAR-10 CIFAR-10数据集包括由10个类别的事物,每个事物各有6000张彩色图像,每张图片的大小是32*32. 整个数据集被分成了5个训练集和1个测试集,各有10000张 ...
深度学习 pytorch cifar10数据集训练
1.加载数据集,并对数据集进行增强,类型转换官网cifar10数据集附链接:https://www.cs.toronto.edu/~kriz/cifar.html 读取数据过程中,可以改变batc ...
【小白学习keras教程】二、基于CIFAR-10数据集训练简单的MLP分类模型
@Author:Runsen 分类任务的MLP 当目标(y)是离散的(分类的) 对于损失函数,使用交叉熵:对于评估指标,通常使用accuracy 数据集描述 CIFAR-10数据集包含10个类中的60 ...
CIFAR10数据集训练及测试
一.数据集介绍该数据集共有60000张彩色图像,这些图像是32*32,分为10个类,每类6000张图.这里面有50000张用于训练,构成了5个训练批,每一批10000张图:另外10000用于测试,单 ...
【Pytorch实战4】基于CIFAR10数据集训练一个分类器
参考资料: <深度学习之pytorch实战计算机视觉> Pytorch官方教程 Pytorch中文文档先是数据的导入与预览. import torch import torchvisio ...
【小白学习keras教程】六、基于CIFAR-10数据集训练CNN-RNN神经网络模型
@Author:Runsen 文章目录 Load Dataset 1.CNN-RNN 2.CNN-RNN-2 Load Dataset CIFAR-10 dataset import numpy as ...

cifar10数据集训练