本次的作业很贴心,在ipython的作业中有一段教程大概告诉我们tensorflow的基本使用,还附上了一些常用API的guide链接,赞!没有科学上网也没有关系,我这里分享一个API查询神器 DevDocs,offline data对于我这种家里没有网的同志更是实用性MAX!作业过程中碰到不清楚的API随手一查即可~

作业代码地址:my_github

废话不多说,看作业

首先是作业里已经写好的simple_model函数

要知道为什么张开成一维向量后是5408,计算过程写好了注释;one_hot编码的使用

# clear old variables
tf.reset_default_graph()# setup input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)def simple_model(X,y):# define our weights (e.g. init_two_layer_convnet)# setup variables#size of conv kernel is 7*7*3, the number is 32#output_size:(32-7+1)/stride * (32-7+1)/stride * 32#if stride=2#13*13*32 = 5408Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 3, 32])bconv1 = tf.get_variable("bconv1", shape=[32])W1 = tf.get_variable("W1", shape=[5408, 10])b1 = tf.get_variable("b1", shape=[10])# define our graph (e.g. two_layer_convnet)a1 = tf.nn.conv2d(X, Wconv1, strides=[1,2,2,1], padding='VALID') + bconv1h1 = tf.nn.relu(a1)#13*13*32 -->  1*5408h1_flat = tf.reshape(h1,[-1,5408])y_out = tf.matmul(h1_flat,W1) + b1return y_outy_out = simple_model(X,y)# define our loss
"""""
tf.one_hot(indices, depth)
将输入转换为one_hot张量
label1:[0 1 2 3 4]
after one_hot:
[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 1]]
"""""
total_loss = tf.losses.hinge_loss(tf.one_hot(y,10),logits=y_out)
mean_loss = tf.reduce_mean(total_loss)# define our optimizer
optimizer = tf.train.AdamOptimizer(5e-4) # select optimizer and set learning rate
train_step = optimizer.minimize(mean_loss)

然后是run_model的解读

def run_model(session, predict, loss_val, Xd, yd,epochs=1, batch_size=64, print_every=100,training=None, plot_losses=False):# have tensorflow compute accuracy#tf.argmax返回predict每行数值最大的下标correct_prediction = tf.equal(tf.argmax(predict,1), y)#tf.cast 将输入的数据格式转换为dtypeaccuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))# shuffle indiciestrain_indicies = np.arange(Xd.shape[0])np.random.shuffle(train_indicies)training_now = training is not None# setting up variables we want to compute (and optimizing)# if we have a training function, add that to things we computevariables = [mean_loss,correct_prediction,accuracy]if training_now:variables[-1] = training# counter iter_cnt = 0for e in range(epochs):# keep track of losses and accuracycorrect = 0losses = []# make sure we iterate over the dataset once#math.ceil返回大于参量的最小整数for i in range(int(math.ceil(Xd.shape[0]/batch_size))):# generate indicies for the batchstart_idx = (i*batch_size)%Xd.shape[0]idx = train_indicies[start_idx:start_idx+batch_size]# create a feed dictionary for this batchfeed_dict = {X: Xd[idx,:],y: yd[idx],is_training: training_now }# get batch size#最后一次迭代中,实际的batch_size可能会小于设定的batch_sizeactual_batch_size = yd[idx].shape[0]# have tensorflow compute loss and correct predictions# and (if given) perform a training step#variables = [mean_loss,correct_prediction,accuracy]#mean_loss = tf.reduce_mean(total_loss)#correct_prediction = tf.equal(tf.argmax(predict,1), y)#accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))loss, corr, _ = session.run(variables,feed_dict=feed_dict)# aggregate performance statslosses.append(loss*actual_batch_size)correct += np.sum(corr)# print every now and thenif training_now and (iter_cnt % print_every) == 0:print("Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"\.format(iter_cnt,loss,np.sum(corr)/actual_batch_size))iter_cnt += 1total_correct = correct/Xd.shape[0]total_loss = np.sum(losses)/Xd.shape[0]print("Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"\.format(total_loss,total_correct,e+1))if plot_losses:plt.plot(losses)plt.grid(True)plt.title('Epoch {} Loss'.format(e+1))plt.xlabel('minibatch number')plt.ylabel('minibatch loss')plt.show()return total_loss,total_correctwith tf.Session() as sess:with tf.device("/gpu:0"): #"/cpu:0" or "/gpu:0" sess.run(tf.global_variables_initializer())print('Training')run_model(sess,y_out,mean_loss,X_train,y_train,1,64,100,train_step,True)print('Validation')run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

接下来需要自己写的部分就按照simple_model来就差不多,只是按照要求用了别的一些API

cross_entropy: 交叉熵常常与softmax一起使用,tf中直接将它俩放一起做一个API了;

RMSProp: tf.train下常见的optimizer都有实现。

mean_loss = None
optimizer = None
total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.one_hot(y,10), logits=y_out))mean_loss = tf.reduce_mean(total_loss)
optimizer = tf.train.RMSPropOptimizer(1e-3)

Batch_Normalization: 在tf中需要加入额外的依赖(不知道为啥,要我加就加呗)

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):train_step = optimizer.minimize(mean_loss)

接下来任务是要在CIFAR-10上训练一个>=70%accuracy on validation set的网络

作业给出的hint中第一条就提到了上面那个网络的缺点之一:filter size of 7*7,后面当然果断将它改为3个3*3,做到这里来的同学自然知道个中缘由,不再多赘言。第二个hint是在全连接层后再加一个SVM;还有个可以注意的是使用Global Average Pooling,使用average pooling将输入的一个feature map变为1*1,最后将它们组合起来作为一个向量输入到分类器中,主要解决的应该还是affine layer参数的问题,相较而言,global_average_pooling没有参数,计算会更快。不过想到加分类器还要进run_model改,干脆就没写了(逃)它们和强大的resnet、densenet大家有时间都可以一试。

def my_model(X,y,is_training):conv11 = tf.nn.conv2d(X, Wconv11, strides=[1,1,1,1],padding='SAME') + bconv11h11 = tf.nn.relu(conv11)conv12 = tf.nn.conv2d(h11, Wconv12, strides=[1,1,1,1],padding='SAME') + bconv12h12 = tf.nn.relu(conv12)conv13 = tf.nn.conv2d(h12, Wconv13, strides=[1,1,1,1],padding='SAME') + bconv13h_batch1 = tf.layers.batch_normalization(conv13, training=is_training)h1 = tf.nn.relu(h_batch1)h_pool1 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')print("h_pool1",h_pool1.shape)conv21 = tf.nn.conv2d(h_pool1, Wconv21, strides=[1, 1, 1, 1], padding='SAME') + bconv21h21 = tf.nn.relu(conv21)conv22 = tf.nn.conv2d(h21, Wconv22, strides=[1,1,1,1], padding='SAME') + bconv22h22 = tf.nn.relu(conv22)conv23 = tf.nn.conv2d(h22, Wconv23, strides=[1,1,1,1], padding='SAME') + bconv23h_batch23 = tf.layers.batch_normalization(conv23, training=is_training)h2 = tf.nn.relu(h_batch23)h_pool2 = tf.nn.max_pool(h2, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='VALID')print("h_pool2",h_pool2.shape)conv31 = tf.nn.conv2d(h_pool2, Wconv31, strides=[1,1,1,1],padding='SAME') + bconv31h31 = tf.nn.relu(conv31)conv32 = tf.nn.conv2d(h31, Wconv32, strides=[1,1,1,1], padding='SAME') + bconv32h32 = tf.nn.relu(conv32)conv33 = tf.nn.conv2d(h32, Wconv33, strides=[1,1,1,1], padding='SAME') + bconv33h_batch3 = tf.layers.batch_normalization(conv33, training=is_training)h3 = tf.nn.relu(h_batch3)print("h3",h3.shape)h_pool3 = tf.nn.max_pool(h3, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='VALID')print("h_pool3",h_pool3.shape)conv_flat = tf.reshape(h_pool3, [-1,80])affine = tf.matmul(conv_flat, W1) + b1h_batch4 = tf.layers.batch_normalization(affine, training=is_training)h4 = tf.nn.relu(h_batch4)y_out = tf.matmul(h4, W2) + b2return y_out
tf.reset_default_graph()X = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
global_step = tf.Variable(0)Wconv11 = tf.get_variable("Wconv11", shape=[3, 3, 3, 30])
bconv11 = tf.get_variable("bconv11", [30])
Wconv12 = tf.get_variable("Wconv12", shape=[3, 3, 30, 30])
bconv12 = tf.get_variable("bconv12", [30])
Wconv13 = tf.get_variable("Wconv13", shape=[3, 3, 30, 30])
bconv13 = tf.get_variable("bconv13", [30])
#16*16*30Wconv21 = tf.get_variable("Wconv21", shape=[3, 3, 30, 50])
bconv21 = tf.get_variable("bconv21", [50])
Wconv22 = tf.get_variable("Wconv22", shape=[3, 3, 50, 50])
bconv22 = tf.get_variable("bconv22", [50])
Wconv23 = tf.get_variable("Wconv23", shape=[3, 3, 50, 50])
bconv23 = tf.get_variable("bconv23", [50])
#4*4*50Wconv31 = tf.get_variable("Wconv31", shape=[3, 3, 50, 80])
bconv31 = tf.get_variable("bconv31", [80])
Wconv32 = tf.get_variable("Wconv32", shape=[3, 3, 80, 80])
bconv32 = tf.get_variable("bconv32", [80])
Wconv33 = tf.get_variable("Wconv33", shape=[3, 3, 80, 80])
bconv33 = tf.get_variable("bconv33", [80])
#1*1*80W1 = tf.get_variable("W1", shape=[80, 512])
b1 = tf.get_variable("b1", shape=[512])
W2 = tf.get_variable("W2", shape=[512, 10])
b2 = tf.get_variable("b2", shape=[10])
y_out = my_model(X,y,is_training)
mean_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf.one_hot(y,10),logits=y_out))
mean_loss += 0.001 * (tf.nn.l2_loss(Wconv11) + tf.nn.l2_loss(Wconv12) + tf.nn.l2_loss(Wconv13) + tf.nn.l2_loss(Wconv21) + tf.nn.l2_loss(Wconv22) + tf.nn.l2_loss(Wconv23) + tf.nn.l2_loss(Wconv31) + tf.nn.l2_loss(Wconv32) + tf.nn.l2_loss(Wconv33) + tf.nn.l2_loss(W1) + tf.nn.l2_loss(W2))
#tf.train..exponential_decay() 指数衰减学习率,每迭代100次学习率衰减为0.9倍
#decayed_learning_rate = learning_rate * decay_rate^(global_step/decay_steps)
learning_rate = tf.train.exponential_decay(1e-3, global_step, 100, 0.9, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate)# batch normalization in tensorflow requires this extra dependency
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):train_step = optimizer.minimize(mean_loss)

最后跑了10个epoch的结果,根据输出的log,进行更多的迭代次数是可以得到更高准确率的,但这没啥意义咯,试试别的architecture可能会有惊喜的结果。

最后有个问题是:卷积核是否更多一些会更好?这个我认为不一定,而且如果是增加的冗余特征的话,识别效果可能会更差。因为冗余特征没有增加数据的信息但对分类的置信度造成了影响,它产生的效果也一定程度上影响了模型,使得模型的效果变差。以logistic regression为例,极端假设所有的特征均重复,那么对数几率会变成原来的两倍,使得模型分类的置信度得到了提高。

CS231n - Assignment2 Tensorflow相关推荐

  1. CS231n+assignment2(一)

    assignment2目录 CS231n+assignment2(一) 文章目录 assignment2目录 前言 一.环境搭建 二.代码实现 Multi-Layer Fully Connected ...

  2. 全球名校课程作业分享系列(8)--斯坦福计算机视觉与深度学习CS231n之tensorflow实践

    课程作业原地址:CS231n Assignment 1 作业及整理:@邓妍蕾 && @郭承坤 && @寒小阳 时间:2018年2月. 出处:http://blog.cs ...

  3. cs231n assignment2 PyTorch

    文章目录 Barebones PyTorch Three-Layer ConvNet Training a ConvNet PyTorch Module API Module API: Train a ...

  4. 斯坦福CS231n作业代码(汉化)Assignment 2 Q4

    编写:土豆 MoreZheng SlyneD 校对:碧海听滔 Molly 总校对与审核:寒小阳 本系列由斯坦福大学CS231n课后作业提供 CS231N - Assignment2 - Q4 - Co ...

  5. BERT引发的深度学习2

    收藏于https://blog.csdn.net/qunnie_yi/article/details/80126965 详解卷积神经网络(https://www.easemob.com/news/75 ...

  6. cs231n:assignment2——Q4: ConvNet on CIFAR-10

    视频里 Andrej Karpathy上课的时候说,这次的作业meaty but educational,确实很meaty,作业一般是由.ipynb文件和.py文件组成,这次因为每个.ipynb文件涉 ...

  7. cs231n:assignment2——Q1: Fully-connected Neural Network

    视频里 Andrej Karpathy上课的时候说,这次的作业meaty but educational,确实很meaty,作业一般是由.ipynb文件和.py文件组成,这次因为每个.ipynb文件涉 ...

  8. 斯坦福cs231n课程记录——assignment2 BatchNormalization

    目录 BatchNormalization原理 BatchNormalization实现 BatchNormalization运用 Layer Normalization 参考文献 一.BatchNo ...

  9. 掌握深度学习,为什么要用PyTorch、TensorFlow框架?

    作者 | Martin Heller 译者 | 弯月 责编 | 屠敏 来源 | CSDN(ID:CSDNnews) [导读]如果你需要深度学习模型,那么 PyTorch 和 TensorFlow 都是 ...

最新文章

  1. 给女友讲讲设计模式——适配器模式(JAVA实例)5
  2. 极客新闻——12、错误和失败的区别是什么?
  3. 第一个VueJs入门页面
  4. 域名的购买,备案,解析以及绑定云服务器上的项目
  5. 前端渲染与后端渲染的区别
  6. 永恒之蓝(MS17-010)补丁KB号
  7. 练字格子纸模板pdf_这么好用的模板,我要好好保存下来!
  8. 虚拟网站禁用php,虚拟主机php程序fsockopen函数被禁用
  9. 搜狐畅游一面(c++)
  10. 单片机中断交通灯c语言程序设计,基于c语言单片机交通灯
  11. java计算机毕业设计大数据在线考试系统在线阅卷系统及大数据统计分析源码+mysql数据库+系统+lw文档+部署
  12. nginx配置https后报错nginx: [emerg] https protocol requires SSL support in XXX.conf详细解决方法
  13. cpu架构之体系架构
  14. java 水晶按钮_java渲染水晶按钮
  15. linux shell 统计词频,SHELL篇 统计词频
  16. ⑪(面试篇 3/3)、《史上最全iOS八股文面试题》2022年,金三银四我为你准备了,iOS《1000条》笔试题以及面试题(包含答案)。带面试你过关斩将,(赶紧过来背iOS八股文)
  17. docker 安装redis并配置redis.conf
  18. 2022年莆田市高新技术企业申报奖励补贴,高企认定条件以及申报材料汇总
  19. Java程序设计基础知识点
  20. 【RocketMQ】延迟消息(延迟队列)

热门文章

  1. Mysql之group by 和order by 一起用时的排序问题
  2. 数据结构——逻辑结构物理结构的区别用法
  3. 价值投资3原则:内在价值、安全边际和市场波动
  4. C++上机报告(示例)
  5. C语言学习:用C语言实现简单的计算器
  6. 安卓系统结构和安卓开发目录结构
  7. python try 嵌套_exception:如何在Python中安全地创建嵌套目录?
  8. 新编C语言程序设计pdf
  9. IM即时通讯软件开发之扫码登录功能
  10. linux运维现在怎么样,浅谈现下Linux运维人员面临的问题