cats_VS_dogs的分类问题

一：input.py的输入训练样本集的部分

训练文件进行操作

二：model.py模型部分，负责实现我们的神经网络

很经典的案例，现在讲讲他，如有错误，多多评论指教

他的train_set和test_set均可在这个网址下载到https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/data

不想看下面我的剖析的，也可以直接看这里的源代码https://github.com/ZZZstudent/catsVSdogs1

将对应的数据图片集修改好，可以直接运行的，当然，能自己一个个敲出来，慢慢理解就最好不过了，现在把我理解的写出来，留着自己看，也希望各位大佬多多指点

首先，我们需要对整个project有一个整体的把握；

（1）数据集[data]

（2）存放代码的文件夹[other](文件夹西面的——pycache_是自己生成的，不管，直接建立input.py等三个文件就行)

（3）这个网络还是挺复杂的，需要一个tensorboard来看看我们最后生成的是什么样子[logs]

所以，就建了一个如下图所示的项目列表

一：input.py的输入训练样本集的部分

输入训练样本集所需要的一些头文件

import tensorflow as tf
import numpy as np
import os#目录有关系

其中一个要强调的是inport os就是对目录内的文件进行操作的，因为我们要从文件夹中读取图片，所以，需要这个

现在我们可以看一下，我们的猫狗大战的图片数据集究竟长的什么样子

#%%
img_width = 208
img_height = 208

由上面的猫狗的图，不难发现，采集回来的图片大小不一，有的长了，有了高了，大小不一样，这样对后面的批batch处理就比较的麻烦，所以，我们这里就对所有图像统一成大小[208,208]，上面的代码部分就是干这个的。
那下面，就开始对待训练文件进行操作吧

训练文件进行操作

#%%
train_dir = 'D:/Anaconda3/projects/catsVSdogs/data/train/'
def get_files(file_dir):#返回存放数据的路径以及label标签'''args:file_dir: file directoryreturns:list of images and labels'''cats =  []label_cats = []dogs = []label_dogs = []for file in os.listdir(file_dir):#返回这个路径下所有文件的名字name = file.split(sep='.')if name[0]=='cat':cats.append(file_dir +file)label_cats.append(0)else:dogs.append(file_dir + file)label_dogs.append(1)print('there are %d cats\nthere are %d dogs' %(len(cats),len(dogs)))image_list = np.hstack((cats, dogs))#将猫狗图片堆积起来label_list = np.hstack((label_cats,label_dogs))#label也堆积起来temp = np.array([image_list,label_list])temp = temp.transpose()np.random.shuffle(temp)#打乱数据image_list = list(temp[:,0])label_list = list(temp[:,1])label_list = [int(i) for i in label_list]return image_list,label_list

上面这一串字的代码呢，其实就是讲train_set的训练样本传进来，然后将他们通过文件名来分成cat和dog另类，分类的标识是‘.’，并给他们大上0或者1的标签，猫的label=0，狗的label=1.将数据集打乱，生成image_list，和label_list。最后返回这两个部分

#%%
def get_batch(image,label,image_W,image_H,batch_size,capacity):'''Args:image: list type get_files函数返回的image_listlabel: list type  get_files函数返回的label_listimage_W: image width 图片大小不一，需要裁减的宽高image_H: image widthbatch_size: batch size，每一批的图片数量capacity: the maximum elements in queue队列中容纳的图片数Returns:iamge_batch:4D tensor [batch_size,width,height,3],dtype=tf.float32；#图像是rgb所以通道是3label_batch:1D tensor [batch_size],dtype=tf.int32'''global label_batch#在python的函数中和全局同名的变量，如果你有修改变量的值就会变成局部变量，#在修改之前对该变量的引用自然就会出现没定义这样的错误了，如果确定要引用全局变量，并且要对它修改，必须加上global关键字。image = tf.cast(image,tf.string) #转换数据类型label = tf.cast(label,tf.int32)#make  an input queue输入队列input_queue = tf.train.slice_input_producer([image,label])label = input_queue[1]image_contents = tf.read_file(input_queue[0])image = tf.image.decode_jpeg(image_contents,channels=3)#解码################################data argymentation should go to here可以做一些数据特征加强#################################image = tf.image.resize_image_with_crop_or_pad(image,image_W,image_H)#裁减图片的长宽image = tf.image.per_image_standardization(image)#神经网络对图片要求很高，所以标准化图片减去均值除以方差image_batch,label_batch = tf.train.batch([image,label],#listbatch_size = batch_size,num_threads = 64,capacity = capacity)#队列中最多能容纳的个数#image_batch,label_batch = tf.train.shuffle_batch([image,label],#                                                 batch_size = BATCH_SIZE,#                                                 num_threads=64,#                                                 capacity=CAPACITY,#                                                 min_after_dequeue=CAPACITY-1)  label_batch = tf.reshape(label_batch,[batch_size])return image_batch,label_batch

其实对图像训练样本进行训练的时候，并不是一个一个的输入，而是一批一批的输入，这里面就涉及到怎么组合成一个batch。

这时候该思考，为什么要把训练数据集设置成一个个batch呢？

答案是：如果损失函数是非凸的话，整个训练样本尽管算的动，可能会卡在局部最优解上；分批训练表示全样本的抽样实现，也就是相当于人为的引入了修正梯度上的采样噪声，使得“一路不同，找别路”的方法，更有可能搜索到全局最优解。

由get_batch(image,label,image_W,image_H,batch_size,capacity)函数的要输入的参数，并结合下面的注释，自己理解，下面着重说道cast(x, dtype, name=None) 这个转换数据类型的函数。例如下面代码（举个栗子）

a = tf.Variable([1,0,0,1,1])
b = tf.cast(a,dtype=tf.bool)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(b))
#[ True False False  True  True]

就是将a中的[1,0,0,1,1]的variable变量类型转换成了bool类型，即[ True False False True True]
针对这里的image = tf.cast(image,tf.string)，就是将image的输入图像，用string的形式，我理解的应该就是猫=cat的string形式，狗=dog的数据形式，应为前面get_files已经将图像集分成了cat和dog两部分；label = tf.cast(label,tf.int32)就是将标签0或1的int形式了。毕竟0和1也可以转化为bool行的true和false

下面，看一下组合成队列的一个函数吧，就能对上面的图像更改大小有更深的了解了，先看官方文档对tf.train.slice_input_producer()的注释

其中看最后一条，tesor_list中的tensor必须有相同的大小，这也就是上面图像大小一致的原因所在。最后，输出image_batch和label_batch

这样训练样本图像和标签批次都做好了，具体一批次多少，就根据后面的定义传进来的值来具体考虑了，这会在train.py中考虑到

现在有了文件样本输入，有了数据批次，这个时候就可以先不管后面还要做什么，可以先看看我们这个input的部分有没有做对，能不能将他们分批次来处理。

其实在第一步get_files时候，就可以先运行，检测下文件路径是否正确，是否能够分类好猫狗数据集，然后输出多少只猫和多少只狗。那就在下面的代码中一并看看输出的效果吧。

#%%TEST those two functionimport matplotlib.pyplot as pltBATCH_SIZE = 10
CAPACITY = 256 #队列中最多容纳图片的个数
IMG_W = 208
IMG_H = 208
train_dir = 'D:/Anaconda3/projects/catsVSdogs/data/train/'
image_list,label_list = get_files(train_dir)
image_batch,label_batch = get_batch(image_list,label_list,IMG_W,IMG_H,BATCH_SIZE,CAPACITY)with tf.Session() as sess:i = 0coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(coord=coord)try:while not coord.should_stop() and i<1:img,label = sess.run([image_batch,label_batch])#just test one batchfor j in np.arange(BATCH_SIZE):print('label: %d' %label[j])plt.imshow(img[j,:,:,:])#4D数据后面全用冒号plt.show()i+=1except tf.errors.OutOfRangeError:print('done!')finally:coord.request_stop()coord.join(threads)

看下输出，这里面batch_size=10.有点多，就截图一部分给看看，有点奇怪的是他像素有点奇怪是吧。在image = tf.image.resize_image_with_crop_or_pad(image,image_W,image_H)#裁减图片的长宽。这种方法是从图像中心向四周裁剪，如果图片超过规定尺寸，最后只会剩中间区域的一部分，可能一只狗只剩下躯干，头都不见了，用这样的图片训练结果肯定会受到影响。不防尝试下image = tf.image.resize_images(image, [image_H, image_W], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)对图像进行缩放，而不是裁剪。

缩放之后视频中还进行了per_image_standardization (标准化)步骤，但加了这步之后，得到的图片是花的，虽然各个通道单独提出来是正常的，三通道一起就不对了，删了标准化这步结果正常。

这样12500个cats,12500个dogs。

#################################

input.py分割线end

##################################

二：model.py模型部分，负责实现我们的神经网络

现在已经将训练样本输入了进来，分成了batch批次，但是并没有对原始的样本数据进行什么样子的处理和运算，原始的图像样本只是换了一种形式在存在的而已，所以，我们下面再model.py部分就是对我们的处理模型进行一个搭建，那现在开始。

传统的，需要库文件的载入

import tensorflow as tf

上面也在用到，这里就不详述了，看下面，才是model的重点所在

直接看看代码吧

#%%
def inference(images,batch_size,n_classes):'''build the modelArgs:images:image batch产生的批次,4D tensor,tf.float32,[batch_size,width,height,channels]#n_classes=2，2分类问题returns:output tensor with the computed logits,float,[batch_size,n_classes]#最后一层全连接输出'''
#conv1, shape = [kernel size,kernel size,channels,kernal numbers]with tf.variable_scope('conv1') as scope:weights = tf.get_variable('weights',shape = [3,3,3,16],dtype = tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1,dtype=tf.float32))#均值默认为0biases = tf.get_variable('biases',shape = [16],#shape要与weights里面kernal numbers保持一致dtype=tf.float32,initializer=tf.constant_initializer(0.1))#初始值0.1，很重要conv = tf.nn.conv2d(images,weights,strides=[1,1,1,1],padding='SAME')#2D卷积计算pre_activation = tf.nn.bias_add(conv,biases)conv1 = tf.nn.relu(pre_activation,name=scope.name)#激活函数#pool1 and norm1with tf.variable_scope('pooling_lrn') as scope:pool1 = tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],#最常见的ksize和stride，定义的是滑动的距离padding = 'SAME',name='pooling1')norm1 = tf.nn.lrn(pool1,depth_radius=4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1')#参照cfr10#conv2with tf.variable_scope('conv2') as scope:weights = tf.get_variable('weights',shape = [3,3,16,16],dtype = tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1,dtype=tf.float32))biases = tf.get_variable('biases',shape = [16],dtype=tf.float32,initializer=tf.constant_initializer(0.1))conv = tf.nn.conv2d(norm1,weights,strides=[1,1,1,1],padding='SAME')pre_activation = tf.nn.bias_add(conv,biases)conv2 = tf.nn.relu(pre_activation,name='conv2')#pool2 and norm2with tf.variable_scope('pooling2_lrn') as scope:norm2 = tf.nn.lrn(conv2,depth_radius=4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm2')pool2 = tf.nn.max_pool(norm2,ksize=[1,3,3,1],strides=[1,1,1,1],padding = 'SAME',name='pooling2')#local3全连接层with tf.variable_scope('local3') as scope:reshape = tf.reshape(pool2,shape=[batch_size,-1])dim = reshape.get_shape()[1].value       weights = tf.get_variable('weights',shape = [dim,128],dtype = tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.005,dtype=tf.float32))biases = tf.get_variable('biases',shape = [128],#神经元个数128dtype=tf.float32,initializer=tf.constant_initializer(0.1))local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases,name=scope.name)#矩阵乘法
#local4全连接层with tf.variable_scope('local4') as scope:weights = tf.get_variable('weights',shape = [128,128],dtype = tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.005,dtype=tf.float32))biases = tf.get_variable('biases',shape = [128],dtype=tf.float32,initializer=tf.constant_initializer(0.1))local4 = tf.nn.relu(tf.matmul(local3,weights)+biases,name='local4')
#softmaxwith tf.variable_scope('softmax_linear') as scope:weights = tf.get_variable('softmax_linear',shape = [128,n_classes],#n_classes=2dtype = tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.005,dtype=tf.float32))biases = tf.get_variable('biases',shape = [n_classes],dtype=tf.float32,initializer=tf.constant_initializer(0.1))softmax_linear = tf.add(tf.matmul(local4,weights),biases,name='softmax_linear')#不要激活函数return softmax_linear

从上面的标注，就可以看出，这是一个传统的卷积神经网络（CNN），其中有conv1--maxpool--norm1--conv2--norm2---maxpool--local3全连接层--local4全连接层--softmax分类。

下面，就揪出来部分代码，做一个参数的详述

#conv1, with tf.variable_scope('conv1') as scope:weights = tf.get_variable('weights',shape = [3,3,3,16],      #shape = [kernel size,kernel size,channels,kernal numbers]dtype = tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1,dtype=tf.float32))#均值默认为0biases = tf.get_variable('biases',shape = [16],#shape要与weights里面kernal numbers保持一致dtype=tf.float32,initializer=tf.constant_initializer(0.1))#初始值0.1，很重要conv = tf.nn.conv2d(images,weights,strides=[1,1,1,1],padding='SAME')#2D卷积计算pre_activation = tf.nn.bias_add(conv,biases)conv1 = tf.nn.relu(pre_activation,name=scope.name)#激活函数shape = [kernel size,kernel size,channels,kernal numbers]dtype = tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1,dtype=tf.float32))#均值默认为0biases = tf.get_variable('biases',shape = [16],#shape要与weights里面kernal numbers保持一致dtype=tf.float32,initializer=tf.constant_initializer(0.1))#初始值0.1，很重要conv = tf.nn.conv2d(images,weights,strides=[1,1,1,1],padding='SAME')#2D卷积计算pre_activation = tf.nn.bias_add(conv,biases)conv1 = tf.nn.relu(pre_activation,name=scope.name)#激活函数

with tf.variable_scope('conv1') as scope:这个呢，是为了生存一个结构图链表，作为名字用的，就不详述

weights里面的shape=[3,3,3,16],对应着[kernel size,kernel size,channels,kernal numbers]，即卷积核的size为3*3，channel=3，卷积核数量为16个
biases即它的偏置，为了后面的计算wx+b,所以biases的shape=[16]，初始化为0.1

在卷积conv中，conv2d里面的是步幅strides=[1,1,1,1]， # stride [1, x_movement, y_movement, 1]，padding分为valid（小）和same（补0不变）

下面就是两个运算，将卷积处理后的conv与biases相加，然后再relu激活函数

#pool1 and norm1with tf.variable_scope('pooling_lrn') as scope:pool1 = tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],#最常见的ksize和stride，定义的是滑动的距离padding = 'SAME',name='pooling1')norm1 = tf.nn.lrn(pool1,depth_radius=4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1')#参照cfr10

tf.nn.max_pool(value, ksize, strides, padding, name=None)
参数是四个，和卷积很类似：
第一个参数value：需要池化的输入，一般池化层接在卷积层后面，所以输入通常是feature map，依然是[batch, height, width, channels]这样的shape
第二个参数ksize：池化窗口的大小，取一个四维向量，一般是[1, height, width, 1]，因为我们不想在batch和channels上做池化，所以这两个维度设为了1
第三个参数strides：和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]
第四个参数padding：和卷积类似，可以取'VALID' 或者'SAME'
返回一个Tensor，类型不变，shape仍然是[batch, height, width, channels]这种形式

norm层---局部响应一体化点击打开链接具体可以点击这个看看，就放一个截图在这里了

后面就是全连接的部分，自己看看代码理解

还有定义损失函数

#%%
def losses(logits,labels):'''compute loss from logits and labelsArgs:logits:logits tensor,float,[batch_size,n_classes]labels:label tensor,tf.int32,[batch_size]returns:loss tensor of float type'''with tf.variable_scope('loss') as scope: cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits\ (logits=logits,labels=labels,name='xentropy_per_example') loss = tf.reduce_mean(cross_entropy,name='loss') tf.summary.scalar(scope.name+'/loss',loss) return loss
定义训练函数
#%%训练优化
def training(loss,learning_rate):'''Training ops, the op returned by this function is must be passed to'sess.run()' call to cause the model to train.Args:loss:loss tensor,from losses()Returns:train_op:The op for training'''with tf.name_scope('optimizer'):optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate)global_step = tf.Variable(0, name='global_step',trainable=False)train_op = optimizer.minimize(loss,global_step=global_step)return train_op
#%%
def evaluation(logits,labels):"""evaluate the quality of the logits at predicting the label.Args:logits:logits tensor, float - [batch_size,NUM_CLASSES].labels:labels tensor,int32 - [batch_size],with value in the range[0,NUM_CLASSES]returns:a scalar int32 tensor with the number of examples (out of batch_size)that were predicted correctly."""with tf.variable_scope('accuracy') as scope:correct = tf.nn.in_top_k(logits,labels,1)correct = tf.cast(correct,tf.float16)accuracy = tf.reduce_mean(correct)tf.summary.scalar(scope.name+'/accuracy',accuracy)return accuracy

好了，这两个部分结束了，下面就是正式启动训练的一部分，先看我的电脑训练到13000步的准确率截图（是应该乘100的，忘记了），已经到94%了。

最后，感谢下优酷上一个大师的讲解课，可以去优酷上直接搜索，这里就不附带连接了。
下一节将最后的一部分整理下。

想get更多有趣知识？请加微信公众号“小白CV”，谢谢

[catsVSdogs]猫狗大战代码注释讲解_1相关推荐

（信贷风控九）行为评分卡模型python实现（详细代码+注释+讲解）
(九)行为评分卡模型python实现(详细代码+注释+讲解) 浅谈行为评分卡我们知道行为评分卡只要用在信贷的贷中环节,贷中指的是贷款发放之后到期之前的时间段,其实行为评分卡和申请评分卡在实现上没有太 ...
C语言顺序表，合并并排序（代码注释讲解）
/*.已知有两个按元素值递增有序的顺序表A和B,设计一个算法将表A和表B的全部元素归并为一个按元素值非递减有序的顺序表C. 要求: 从键盘输入顺序表A和B的各元素,编程实现上述算法,输出顺序表A.顺序 ...
天津理工大学《操作系统》实验二，存储器的分配与回收算法实现，代码详解，保姆式注释讲解
天津理工大学<操作系统>实验二,存储器的分配与回收算法实现,代码详解,保姆式注释讲解实验内容 1．本实验是模拟操作系统的主存分配,运用可变分区的存储管理算法设计主存分配和回收程序,并不 ...
uboot 详细注释讲解
转自:http://home.eeworld.com.cn/my/space-uid-135723-blogid-25548.html uboot 详细注释讲解标签: uboot 注释讲解 ...
createprocess失败代码2_极客战记[森林]：边地之叉-通关代码及讲解
本栏目为极客战记关卡通关讲解栏目,将会从地牢-森林-沙漠-山峰-冰川,按用户的正常过关顺序,挑出难度较大或有教学意义的关卡进行讲解,长期更新. 森林地牢里的关卡主要是偏向锻炼用户的基本语法结构和计算 ...
C语言代码注释 - C语言零基础入门教程
目录方法一:使用// 方法二:使用/* */ 方法三:使用宏 #if #else #end 猜你喜欢零基础 C/C++ 学习路线推荐 : C/C++ 学习目录 >> C 语言基础入门 ...
批量无损删除项目中的代码注释方法
本篇文章主要讲解,通过工具实现保留路径并批量无损删除,项目中的代码注释的方法工具说明:无损清理项目代码注释,适用于python.c.java.js.html.css.php.mysql等编程语言常见 ...
39个史诗级奇葩代码注释，程序不会崩，但程序员会
导读:作为程序员,有没有让你感到既无语又崩溃的代码注释? StackOverflow 上有一个类似的问题,问大家见过哪些超秀的注释,不少程序员纷纷吐槽自己见过的那些逆天注释,我们一起来围观一下. 1. ...
粒子群算法Particle Swarm Optimization超详细解析+代码实例讲解
01 算法起源粒子群优化算法(PSO)是一种进化计算技术(evolutionary computation),1995 年由Eberhart 博士和kennedy 博士提出,源于对鸟群捕食的行为研究 ...

[catsVSdogs]猫狗大战代码注释讲解_1

cats_VS_dogs的分类问题

一：input.py的输入训练样本集的部分

训练文件进行操作

二：model.py模型部分，负责实现我们的神经网络

[catsVSdogs]猫狗大战代码注释讲解_1相关推荐

最新文章

热门文章