Detection Algorithms

You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What should yy be for the image below? Remember that “?” means “don’t care”, which means that the neural network loss function won’t care what the neural network gives for that component of the output. Recall y = [pc,bx,by,bh,bw,c1,c2,c3p_c, b_x, b_y, b_h, b_w, c_1, c_2, c_3pc,bx,by,bh,bw,c1,c2,c3].

y=[1,?,?,?,?,1,?,?]
y=[1,0.66,0.5,0.75,0.16,1,0,0]
y=[1,0.66,0.5,0.16,0.75,1,0,0]
y=[1,0.66,0.5,0.75,0.16,0,0,0]

You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appear the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:

To solve this task it is necessary to divide the task into two: 1. Construct a system to detect if a can is present or not. 2. Construct a system that calculates the bounding box of the can when present. Which one of the following do you agree with the most?

We can approach the task as an image classification with a localization problem.
An end-to-end solution is always superior to a two-step system.
The two-step system is always a better option compared to an end-to-end solution.
We can’t solve the task as an image classification with a localization problem since all the bounding boxes have the same dimensions.

If you build a neural network that inputs a picture of a person’s face and outputs N landmarks on the face (assume the input image always contains exactly one face), how many output units will the network have?

2N
N
N2N^2N2
3N

When training one of the object detection systems described in the lectures, you need a training set that contains many pictures of the object(s) you wish to detect. However, bounding boxes do not need to be provided in the training set, since the algorithm can learn to detect the objects by itself.

False
True

解析：you need bounding boxes in the training set. Your loss function should try to match the predictions for the bounding boxes to the true bounding boxes from the training set.

What is the IoU between the red box and the blue box in the following figure? Assume that all the squares have the same measurements.

18\frac{1}{8}81
12\frac{1}{2}21
17\frac{1}{7}71
14\frac{1}{4}41

解析：（2 * 2）/ （4 * 4 + 4 * 4 - 2 * 2）= 4 / 28 = 1 / 7

Suppose you run non-max suppression on the predicted boxes below. The parameters you use for non-max suppression are that boxes with probability q≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?

If we use anchor boxes in YOLO we no longer need the coordinates of the bounding box bx,by,bh,bwb_x, b_y, b_h, b_wbx,by,bh,bw since they are given by the cell position of the grid and the anchor box selection. True/False?

False
True

解析：We use the grid and anchor boxes to improve the capabilities of the algorithm to localize and detect objects, for example, two different objects that intersect, but we still use the bounding box coordinates.

Semantic segmentation can only be applied to classify pixels of images in a binary way as 1 or 0, according to whether they belong to a certain class or not. True/False?

False
True

解析：The same ideas used for multi-class classification can be applied to semantic segmentation.

Using the concept of Transpose Convolution, fill in the values of X, Y and Z below. (padding = 1, stride = 2)
Input: 2x2 **

Filter: 3x3

Result: 6x6

X = 3, Y = 0, Z = 4
X = 10, Y = 0, Z = 6
X = 4, Y = 3, Z = 2
X = 10, Y = 0, Z = 0

When using the U-Net architecture with an input h×w×ch\times w \times ch×w×c, where cc denotes the number of channels, the output will always have the shape h×w×ch \times w \times ch×w×c. True/False?

False
True

解析：The output of the U-Net architecture can be h×w×k where k is the number of classes. The number of channels doesn’t have to match between input and output.

深度学习Course4第三周Detection Algorithms习题整理相关推荐

吴恩达深度学习-Course4第三周作业 yolo.h5文件读取错误解决方法
这个yolo.h5文件走了不少弯路呐,不过最后终于搞好了,现在把最详细的脱坑过程记录下来,希望小伙伴们少走些弯路. 最初的代码是从下面这个大佬博主的百度网盘下载的,但是h5文件无法读取.(22条消息) ...
吴恩达深度学习(一)-第三周：Planar data classification with one hidden layer
Planar data classification with one hidden layer 本练习会建立只有一个隐藏层的神经网络,我们将看到这与逻辑回归有多大的差别. You will lear ...
吴恩达深度学习课程-第三周
1.神经网络概述和表示在下图中,上标 [ 1 ] . [ 2 ] [1].[2] [1].[2]表示当前神经网络的层数,并不是前面提到的样本个数. a [ 1 ] a^{[1]} a[1]表示第一层 ...
coursera 吴恩达 -- 第一课神经网络和深度学习：第三周课后习题 Key concepts on Deep Neural Networks Quiz, 10 questions
有两道题没有图片...难受
coursera 吴恩达 -- 第一课神经网络和深度学习：第三周课后习题 Shallow Neural Networks Quiz, 10 questions
这次的题有陷阱0.0
[caffe]深度学习之CNN检测object detection方法摘要介绍
[caffe]深度学习之CNN检测object detection方法摘要介绍 2015-08-17 17:44 3276人阅读评论(1) 收藏举报一两年cnn在检测这块的发展突飞猛进,下面详 ...
深度学习入门（三十七）计算性能——硬件（TBC）
深度学习入门(三十七)计算性能--硬件(CPU.GPU) 前言计算性能--硬件(CPU.GPU) 课件电脑提升CPU利用率① 提升CPU利用率② CPU VS GPU 提升GPU利用率 CPU/ ...
吴恩达深度学习 | (2) 神经网络与深度学习专项课程第二周学习笔记
课程视频第二周PPT汇总吴恩达深度学习专项课程共分为五个部分,本篇博客将介绍第一部分神经网络和深度学习专项的第二周课程:神经网络基础.由于逻辑回归算法可以看作是一个单神经元(单层)的网络结构,为了 ...
【深度学习】入门深度学习，看三位顶级大牛Yann LeCun、Yoshua Bengio和Geoffrey Hinton的综述
[编者按]深度学习领域的三位大牛Yann LeCun.Yoshua Bengio和Geoffrey Hinton无人不知无人不晓.此前,为纪念人工智能提出60周年,Yann LeCun.Yoshua ...
UFLDL深度学习笔记（三）无监督特征学习
UFLDL深度学习笔记 (三)无监督特征学习 1. 主题思路 "UFLDL 无监督特征学习"本节全称为自我学习与无监督特征学习,和前一节softmax回归很类似,所以本篇笔记会比较 ...

深度学习Course4第三周Detection Algorithms习题整理

Detection Algorithms

深度学习Course4第三周Detection Algorithms习题整理相关推荐

最新文章

热门文章