1.Which of the following are true? (Check all that apply.)
A.XXX is a matrix in which each row is one training example.

B.a4[2]a^{[2]}_4a4[2]​ is the activation output by the 4th4^{th}4th neuron of the 2nd2^{nd}2nd layer

C.a[2](12)a^{[2](12)}a[2](12) denotes the activation vector of the 2nd2^{nd}2nd layer for the 12th12^{th}12th training example.

D.a[2](12)a^{[2](12)}a[2](12)denotes activation vector of the 12th12^{th}12thlayer on the 2nd2^{nd}2nd training example.

E.a[2]a^{[2]}a[2] denotes the activation vector of the 2nd2^{nd}2ndlayer.

F.XXX is a matrix in which each column is one training example.

G.a4[2]a^{[2]}_4a4[2]​ is the activation output of the 2nd2^{nd}2nd layer for the 4th4^{th}4th training example

答案: B C E F

2.The tanh activation is not always better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data, making learning complex for the next layer. True/False?

答案: False
As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

3.Which of these is a correct vectorized implementation of forward propagation for layer lll, where 1≤l≤L1 ≤ l ≤ L1≤l≤L

A.Z[l]=W[l]A[l]+b[l]Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]}Z[l]=W[l]A[l]+b[l]
A[l+1]=g[l](Z[l])A^{[l+1]} = g^{[l]}(Z^{[l]})A[l+1]=g[l](Z[l])

B.Z[l]=W[l]A[l]+b[l]Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]}Z[l]=W[l]A[l]+b[l]
A[l+1]=g[l+1](Z[l])A^{[l+1]} = g^{[l+1]}(Z^{[l]})A[l+1]=g[l+1](Z[l])

C.Z[l]=W[l]A[l−1]+b[l]Z^{[l]} = W^{[l]} A^{[l-1]}+ b^{[l]}Z[l]=W[l]A[l−1]+b[l]
A[l]=g[l](Z[l])A^{[l]} = g^{[l]}(Z^{[l]})A[l]=g[l](Z[l])

D.Z[l]=W[l−1]A[l]+b[l−1]Z^{[l]} = W^{[l-1]} A^{[l]}+ b^{[l-1]}Z[l]=W[l−1]A[l]+b[l−1]
A[l]=g[l](Z[l])A^{[l]} = g^{[l]}(Z^{[l]})A[l]=g[l](Z[l])

答案: C

4.You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?
A.sigmoid

B.ReLU

C.tanh

D.Leaky ReLU
答案: A

5.Consider the following code:
A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)
What will be B.shape? (If you’re not sure, feel free to run this in python to find out).
A.(4, 1)

B.(4, )

C.(1, 3)

D.(, 3)
答案: A
Yes, we use (keepdims = True) to make sure that A.shape is (4,1) and not (4, ). It makes our code more robust.

6.Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
A.Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.

B.Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.

C.The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.

D.Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
答案: D

7.Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?
答案: False
Yes, Logistic Regression doesn’t have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there’s no hidden layer) which is not zero. So at the second iteration, the weights values follow x’s distribution and are different from each other if x is not a constant vector.

8.You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(…,…)*1000. What will happen?
A.This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

B.This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set \alphaα to be very small to prevent divergence; this will slow down learning.

C.This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

D.It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.

答案: C
Yes. tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.

9.Consider the following 1 hidden layer neural network:

Which of the following statements are True? (Check all that apply).
A.W[1]W^{[1]}W[1] will have shape (4, 2)

B.W[2]W^{[2]}W[2] will have shape (1, 4)

C.W[2]W^{[2]}W[2] will have shape (4, 1)

D.b[2]b^{[2]}b[2] will have shape (4, 1)

E.W[1]W^{[1]}W[1] will have shape (2, 4)

F.b[1]b^{[1]}b[1] will have shape (2, 1)

G.b[1]b^{[1]}b[1] will have shape (4, 1)

H.b[2]b^{[2]}b[2] will have shape (1, 1)
答案: A B G H

10.In the same network as the previous question, what are the dimensions of Z[1]Z^{[1]}Z[1]and A[1]A^{[1]}A[1]?
A.Z[1]Z^{[1]}Z[1]and A[1]A^{[1]}A[1] are (1,4)

B.Z[1]Z^{[1]}Z[1]and A[1]A^{[1]}A[1] are (4,1)

C.Z[1]Z^{[1]}Z[1]and A[1]A^{[1]}A[1] are (4,2)

D.Z[1]Z^{[1]}Z[1] and A[1]A^{[1]}A[1]are (4,m)
答案: D

Andrew Ng Deep Learning 第三周 选择题相关推荐

  1. 【Deep Learning 二】课程一(Neural Networks and Deep Learning),第二周(Basics of Neural Network programming)答案

    课程一(Neural Networks and Deep Learning),第二周(Basics of Neural Network programming)答案 ----------------- ...

  2. Andrew Ng Machine Learning 专题【Logistic Regression amp; Regularization】

    此文是斯坦福大学,机器学习界 superstar - Andrew Ng 所开设的 Coursera 课程:Machine Learning 的课程笔记. 力求简洁,仅代表本人观点,不足之处希望大家探 ...

  3. Deep learning:三十六(关于构建深度卷积SAE网络的一点困惑)

    前言: 最近一直在思考,如果我使用SCSAE(即stacked convolution sparse autoendoer)算法来训练一个的deep model的话,其网络的第二层开始后续所有网络层的 ...

  4. 【Deep Learning 一】课程一(Neural Networks and Deep Learning),第一周(Introduction to Deep Learning)答案

    10个测验题: 1.What does the analogy "AI is the new electricity" refer to?  (B) A. Through the ...

  5. Andrew Ng -- machine learning ex2/吴恩达机器学习ex2

    这个项目包含了吴恩达机器学习ex2的python实现,主要知识点为逻辑回归.正则化,题目内容可以查看数据集中的ex2.pdf 代码来自网络(原作者黄广海的github),添加了部分对于题意的中文翻译, ...

  6. andrew ng machine learning week4 神经网络

    吴恩达机器学习全套笔记博客地址 week 1  week 2  week 3  week 4 week 5 week 6  week 7 week 8 week 9 GitHub 地址 一.模型表示 ...

  7. Lesson2 Week 1 Quiz - Practical aspects of deep learning(第一周测验 - 深度学习的实践)

  8. 【github】机器学习(Machine Learning)深度学习(Deep Learning)资料

    转自:https://github.com/ty4z2008/Qix/blob/master/dl.md# <Brief History of Machine Learning> 介绍:这 ...

  9. 机器学习(Machine Learning)深度学习(Deep Learning)资料汇总

    本文来源:https://github.com/ty4z2008/Qix/blob/master/dl.md 机器学习(Machine Learning)&深度学习(Deep Learning ...

最新文章

  1. vb 取得计算机名及目录
  2. 如何应对安全漏洞的修复
  3. Android开发学习笔记:Gallery和GridView浅析
  4. html下拉框只读,HTML元素(如select下拉框)设置为只读
  5. CVPR 2020 开源论文 | 多种可能性行人未来路径预测
  6. Java集合框架:ArrayList
  7. 使用python简单连接并操作数据库
  8. WebGL(四)—— 第一个WEBGL程序
  9. 为全局变量赋值_实例分析如何远离漫天飞舞的全局变量
  10. 设计模式C++实现(3)——建造者模式
  11. Python 的协程
  12. php 小程序页面传参,介绍小程序中传递参数的实现方法
  13. JavaScript中的逗号运算符
  14. ~~通过预处理逆元的方式求组合数
  15. Unity连接本地数据库sqlite
  16. 常平计算机培训班,常平大朗CNC编程培训速成班,一个月学会UG编程
  17. linux 修改文件可执行,linux下用chmod修改文件为可执行文件
  18. 统计分类分为描述性统计_了解描述性统计
  19. WDM驱动安装和卸载
  20. 【软考-软件设计师精华知识点笔记】第十章 网络与信息安全

热门文章

  1. CPU 使用率低 负载高的原因
  2. 什么蓝牙耳机好看?2022高颜值蓝牙耳机排行榜
  3. GitHub已标星72K阿里内部878页性能优化笔记限时免费
  4. c++中赋值运算符重载
  5. 关于mobile6.1的tmail.exe命令行参数
  6. 打卡第四天:三数之和
  7. springboot毕设项目养老平台的设计与实现u8sua(java+VUE+Mybatis+Maven+Mysql)
  8. 川大计算机考研英语要求,请问如果考研,四川大学的英语要求高么??属于哪..._考研_帮考网...
  9. Windows个人电脑的自我防护(包括nmap的扫描端口和cmd的跃点追踪)
  10. zzuli 1131