Step by Step

卷积

补全zero_pad

X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0) #第二个参数是对哪些维度的两边加怎么样的padding

补全conv_single_step

### START CODE HERE ### (≈ 2 lines of code)
# Element-wise product between a_slice and W. Add bias.
s = np.multiply(a_slice_prev , W)
# Sum over all entries of the volume s
Z = np.sum(s) + float(b)
### END CODE HERE ###

将卷积核与要卷积的部分，对应位置相乘之后求和，再加上一个bias

补全conv_forward

按卷积核提取出要卷积的数据
卷积并赋值

 ### START CODE HERE #### Retrieve dimensions from A_prev's shape (≈1 line)  (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape# Retrieve dimensions from W's shape (≈1 line)(f, f, n_C_prev, n_C) = W.shape# Retrieve information from "hparameters" (≈2 lines)stride = hparameters['stride']pad = hparameters['pad']# Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)#计算卷积之后的结果矩阵的大小n_H = int((n_H_prev - f + 2 * pad) / stride + 1)n_W = int((n_W_prev - f + 2 * pad) / stride + 1)# Initialize the output volume Z with zeros. (≈1 line)Z = np.zeros((m , n_H , n_W , n_C))# Create A_prev_pad by padding A_prevA_prev_pad = zero_pad(A_prev , pad)for i in range(m):# 遍历所有训练数据                               a_prev_pad = A_prev_pad[i]#选出第 i 个训练数据 , 大小为 (n_H, n_W, n_C)for h in range(n_H):                           # loop over vertical axis of the output volumefor w in range(n_W):                       # loop over horizontal axis of the output volumefor c in range(n_C):                   # loop over channels (= #filters) of the output volume# Find the corners of the current "slice" (≈4 lines)vert_start = h * stridevert_end = vert_start + fhoriz_start = w * stridehoriz_end = horiz_start + f# 选取要进行卷积的片段 a_slice_prev = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :]# 进行单步卷积Z[i, h, w, c] = conv_single_step(a_slice_prev , W[:,:,:,c] , b[:,:,:,c])### END CODE HERE ###

池化

补全pool_forward

提取数据的方式和卷积相同，先提取要池化的数据
然后进行池化，并赋值

### START CODE HERE ###for i in range(m):                         # loop over the training examplesfor h in range(n_H):                     # loop on the vertical axis of the output volumefor w in range(n_W):                 # loop on the horizontal axis of the output volumefor c in range (n_C):            # loop over the channels of the output volume# Find the corners of the current "slice" (≈4 lines)vert_start = h * stridevert_end = vert_start + fhoriz_start = w * stridehoriz_end = horiz_start + f# Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)a_prev_slice = A_prev[i , vert_start:vert_end , horiz_start:horiz_end , c]# Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.if mode == "max":A[i, h, w, c] = np.max(a_prev_slice)elif mode == "average":A[i, h, w, c] = np.average(a_prev_slice)### END CODE HERE ###

卷积BP

BP推导

首先求损失函数对卷积层输入的梯度，令卷积层输入为 AAA , 卷积层输出为 ZZZ
- dA+=∑h=0nH∑w=0nWWc×dZhwdA += \sum _{h=0} ^{n_H} \sum_{w=0} ^{n_W} W_c \times dZ_{hw}dA+=∑h=0nH∑w=0nWWc×dZhw
  - 这里的 WcW_cWc 就是一个卷积，也就是一个与原数据进行点乘的矩阵
  - dZhwdZ_{hw}dZhw 指的是损失函数对卷积结果矩阵中元素 (h , w) 的梯度。
    - 容易看出的是，损失是个标量， ZhwZ_{hw}Zhw 也是标量，因此标量对标量求导 dZhwdZ_{hw}dZhw 也是标量
  - 对黄色区域数据来说，dAyellow=Wc×dZ00dA_{yellow} = W_c \times dZ_{00}dAyellow=Wc×dZ00
  - 我们遍历右边的卷积结果矩阵，每个位置都会产生 dAdAdA ，总的结果是将所有位置产生的梯度相加
然后求 dWcdW_cdWc
- dWc+=∑h=0nH∑w=0nWaslice×dZhwdW_c += \sum _{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw}dWc+=∑h=0nH∑w=0nWaslice×dZhw
  - 容易看出，对 WWW 的梯度只是将对 AAA 的梯度中的 WcW_cWc 换成了 αslice\alpha_{slice}αslice
  - 这很容易理解，对上图黄色区域来说 Zhw=Wc×αsliceZ_{hw} = W_c \times \alpha_{slice}Zhw=Wc×αslice ，因为是点乘，因此ZhwZ_{hw}Zhw 对 WcW_cWc 的导数是 αslice\alpha_{slice}αslice , 对 αslice\alpha_{slice}αslice 的导数是 WcW_cWc
最后求 dbdbdb
- db=∑h∑wdZhwdb = \sum_h \sum_w dZ_{hw}db=∑h∑wdZhw
- 与上面不同的是，ZhwZ_{hw}Zhw 对 bbb 的导数是 1

补全conv_backward

根据上面的推导，该函数很容易补全

    ### START CODE HERE #### Retrieve information from "cache"(A_prev, W, b, hparameters) = cache# Retrieve dimensions from A_prev's shape(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape# Retrieve dimensions from W's shape(f, f, n_C_prev, n_C) = W.shape# Retrieve information from "hparameters"stride = hparameters['stride']pad = hparameters['pad']# Retrieve dimensions from dZ's shape(m, n_H, n_W, n_C) = dZ.shape# Initialize dA_prev, dW, db with the correct shapesdA_prev = np.zeros(A_prev.shape)                           dW = np.zeros(W.shape)db = np.zeros(b.shape)# Pad A_prev and dA_prevA_prev_pad = zero_pad(A_prev , pad)dA_prev_pad = zero_pad(dA_prev , pad)for i in range(m):                       # loop over the training examples# select ith training example from A_prev_pad and dA_prev_pada_prev_pad = A_prev_pad[i]da_prev_pad = dA_prev_pad[i]for h in range(n_H):                   # loop over vertical axis of the output volumefor w in range(n_W):               # loop over horizontal axis of the output volumefor c in range(n_C):           # loop over the channels of the output volume# Find the corners of the current "slice"vert_start = h * stridevert_end = vert_start + fhoriz_start = w * stridehoriz_end = horiz_start + f# Use the corners to define the slice from a_prev_pada_slice = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :]# Update gradients for the window and the filter's parameters using the code formulas given aboveda_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i , h , w , c]dW[:,:,:,c] += a_slice * dZ[i , h , w , c]db[:,:,:,c] += dZ[i , h , w, c]# Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])dA_prev[i, :, :, :] = da_prev_pad[pad:-pad , pad:-pad]### END CODE HERE ###

池化BP

最大值池化

可见，对上图来说，最大值池化只有最大值起作用，因此池化结果的梯度由池化输入的最大值造成。这时，我们可以在池化之前的梯度上加上一个mask，用于只让最大值对梯度产生影响。

对于一次池化来说池化结果 ZZZ 与池化的输入 AAA 之间的关系为 Z=max(A)Z = max(A)Z=max(A)

def create_mask_from_window(x):"""Creates a mask from an input matrix x, to identify the max entry of x.Arguments:x -- Array of shape (f, f)Returns:mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x."""### START CODE HERE ### (≈1 line)mask = (x == np.max(x))### END CODE HERE ###return mask

我们将mask添加到梯度的计算上

if mode == "max":a_prev_slice = a_prev[vert_start:vert_end,horiz_start:horiz_end,c]#看作上图左上角蓝色区域mask = create_mask_from_window(a_prev_slice)#计算该区域对应的mask，即除了 7 的位置为 1 外，其余位置都是 0#池化结果的梯度由 7 造成，因此我们将原梯度点乘 mask，只让 7 发挥作用dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i,h,w,c]

平均值池化

对上面左上角区域来说，令池化的输出为 ZZZ , 输入为 AAA ，那么 Z=14sum(A)=14(a1+a2+a3+a4)Z = \frac{1}{4}sum(A) = \frac{1}{4}(a_1+a_2+a_3+a_4)Z=41sum(A)=41(a1+a2+a3+a4)。也就是说 AAA 中的四个位置的变化对 ZZZ 的影响程度是相同的。比如左上角的 2 变成 6 ，那么输出结果就是 5。如果不是 2 变而是 3 变成 7 ，输出结果同样是 5。因此 ZZZ 的变化是由 AAA 中元素共同影响的，且影响相同。同样，由公式，可得dZ=14(da1+da2+da3+da4)dZ = \frac{1}{4}(da_1+da_2+da_3+da_4)dZ=41(da1+da2+da3+da4)

也就是说，池化结果的梯度（也就是变化程度）由池化的输入元素均摊

这里我们写一个影响均摊函数，返回一个 mask

def distribute_value(dz, shape):"""Distributes the input value in the matrix of dimension shapeArguments:dz -- input scalarshape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dzReturns:a -- Array of size (n_H, n_W) for which we distributed the value of dz"""### START CODE HERE #### Retrieve dimensions from shape (≈1 line)(n_H, n_W) = shape# Compute the value to distribute on the matrix (≈1 line)average = dz / (n_H * n_W)# Create a matrix where every entry is the "average" value (≈1 line)a = np.zeros(shape) + average### END CODE HERE ###return a

我们将 mask 添加到计算上

elif mode == "average":               da = dA[i,h,w,c] #可以看做上图右边的4的梯度        shape = [f,f] #均摊框的大小    #输出 4 的变化由左上角四个元素均摊    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da,shape)

Application

TensorFlow要求您为运行会话时将输入到模型中的输入数据创建占位符。现在我们要实现创建占位符的函数，因为我们使用的是小批量数据块，输入的样本数量可能不固定，所以我们在数量那里我们要使用None作为可变数量。输入X的维度为[None,n_H0,n_W0,n_C0]，对应的Y是[None,n_y]。

def create_placeholders(n_H0, n_W0, n_C0, n_y):    """    Creates the placeholders for the tensorflow session.        Arguments:    n_H0 -- scalar, height of an input image    n_W0 -- scalar, width of an input image    n_C0 -- scalar, number of channels of the input    n_y -- scalar, number of classes            Returns:    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"    Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"    """    ### START CODE HERE ### (≈2 lines)    X = tf.placeholder(tf.float32 , [None , n_H0 , n_W0 , n_C0])    Y = tf.placeholder(tf.float32 , [None , n_y])    ### END CODE HERE ###        return X, Y

补全initialize_parameters

# GRADED FUNCTION: initialize_parametersdef initialize_parameters():    """    Initializes weight parameters to build a neural network with tensorflow. The shapes are:                        W1 : [4, 4, 3, 8]                        W2 : [2, 2, 8, 16]    Returns:    parameters -- a dictionary of tensors containing W1, W2    """        tf.set_random_seed(1)                              # so that your "random" numbers match ours            ### START CODE HERE ### (approx. 2 lines of code)    W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0))    W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))    ### END CODE HERE ###    parameters = {"W1": W1,                  "W2": W2}        return parameters

补全 forward_propagation

tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’)#给定输入X和一组过滤器W1，这个函数将会自动使用W1来对X进行卷积，第三个输入参数是[1,s,s,1]是指对于输入 (m, n_H_prev, n_W_prev, n_C_prev)而言，每次滑动的步伐

tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’)#给定输入X，该函数将会使用大小为（f,f）以及步伐为(s,s)的窗口对其进行滑动取最大值

```
tf.nn.relu(Z1)#计算Z1的ReLU激活
```

tf.contrib.layers.flatten(P)#给定一个输入P，此函数将会把每个样本转化成一维的向量，然后返回一个tensor变量，其维度为（batch_size,k）

tf.contrib.layers.fully_connected(F, num_outputs)#给定一个已经一维化了的输入F，此函数将会返回一个由全连接层计算过后的输出#全连接层会自动初始化权值且在你训练模型的时候它也会一直参与，所以当我们初始化参数的时候我们不需要专门去初始化它的权值

# GRADED FUNCTION: forward_propagationdef forward_propagation(X, parameters):    """    Implements the forward propagation for the model:    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED        Arguments:    X -- input dataset placeholder, of shape (input size, number of examples)    parameters -- python dictionary containing your parameters "W1", "W2"                  the shapes are given in initialize_parameters    Returns:    Z3 -- the output of the last LINEAR unit    """        # Retrieve the parameters from the dictionary "parameters"     W1 = parameters['W1']    W2 = parameters['W2']        ### START CODE HERE ###    # CONV2D: stride of 1, padding 'SAME'    Z1 = tf.nn.conv2d(X , W1 , strides=[1,1,1,1] , padding='SAME')    # RELU    A1 = tf.nn.relu(Z1)    # MAXPOOL: window 8x8, sride 8, padding 'SAME'    P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')    # CONV2D: filters W2, stride 1, padding 'SAME'    Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')    # RELU    A2 = tf.nn.relu(Z2)    # MAXPOOL: window 4x4, stride 4, padding 'SAME'    P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1],strides = [1,4,4,1],padding = "SAME")    # FLATTEN    P2 = tf.contrib.layers.flatten(P2)    # FULLY-CONNECTED without non-linear activation function (not not call softmax).    # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"     Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn = None)    ### END CODE HERE ###    return Z3

补全compute_cost

tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)#计算softmax的损失函数。#这个函数既计算softmax的激活，也计算其损失

tf.reduce_mean()#计算的是所有样本损失的平均值

# GRADED FUNCTION: compute_cost def compute_cost(Z3, Y):    """    Computes the cost        Arguments:    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)    Y -- "true" labels vector placeholder, same shape as Z3        Returns:    cost - Tensor of the cost function    """        ### START CODE HERE ### (1 line of code)    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y))    ### END CODE HERE ###        return cost

搭建并训练整个模型

为 X , Y 创建占位符
初始化参数
前向传播
计算cost
创建一个优化器
minibatch训练

# GRADED FUNCTION: modeldef model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,          num_epochs = 100, minibatch_size = 64, print_cost = True):    """    Implements a three-layer ConvNet in Tensorflow:    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED        Arguments:    X_train -- training set, of shape (None, 64, 64, 3)    Y_train -- test set, of shape (None, n_y = 6)    X_test -- training set, of shape (None, 64, 64, 3)    Y_test -- test set, of shape (None, n_y = 6)    learning_rate -- learning rate of the optimization    num_epochs -- number of epochs of the optimization loop    minibatch_size -- size of a minibatch    print_cost -- True to print the cost every 100 epochs        Returns:    train_accuracy -- real number, accuracy on the train set (X_train)    test_accuracy -- real number, testing accuracy on the test set (X_test)    parameters -- parameters learnt by the model. They can then be used to predict.    """        ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables    tf.set_random_seed(1)                             # to keep results consistent (tensorflow seed)    seed = 3                                          # to keep results consistent (numpy seed)    (m, n_H0, n_W0, n_C0) = X_train.shape                 n_y = Y_train.shape[1]                                costs = []                                        # To keep track of the cost        # Create Placeholders of the correct shape    ### START CODE HERE ### (1 line)    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)    ### END CODE HERE ###    # Initialize parameters    ### START CODE HERE ### (1 line)    parameters = initialize_parameters()    ### END CODE HERE ###        # Forward propagation: Build the forward propagation in the tensorflow graph    ### START CODE HERE ### (1 line)    Z3 = forward_propagation(X, parameters)    ### END CODE HERE ###        # Cost function: Add cost function to tensorflow graph    ### START CODE HERE ### (1 line)    cost = compute_cost(Z3, Y)    ### END CODE HERE ###        # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.    ### START CODE HERE ### (1 line)    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)    ### END CODE HERE ###        # Initialize all the variables globally    init = tf.global_variables_initializer()         # Start the session to compute the tensorflow graph    with tf.Session() as sess:                # Run the initialization        sess.run(init)                # Do the training loop        for epoch in range(num_epochs):            minibatch_cost = 0.            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set            seed = seed + 1            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)            for minibatch in minibatches:                # Select a minibatch                (minibatch_X, minibatch_Y) = minibatch                # IMPORTANT: The line that runs the graph on a minibatch.                # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).                ### START CODE HERE ### (1 line)                _ , temp_cost = sess.run([optimizer,cost],feed_dict={X:minibatch_X, Y:minibatch_Y})                ### END CODE HERE ###                                minibatch_cost += temp_cost / num_minibatches                            # Print the cost every epoch            if print_cost == True and epoch % 5 == 0:                print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))            if print_cost == True and epoch % 1 == 0:                costs.append(minibatch_cost)                        # plot the cost        plt.plot(np.squeeze(costs))        plt.ylabel('cost')        plt.xlabel('iterations (per tens)')        plt.title("Learning rate =" + str(learning_rate))        plt.show()        # Calculate the correct predictions        predict_op = tf.argmax(Z3, 1)        correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))                # Calculate accuracy on the test set        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))        print(accuracy)        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})        print("Train Accuracy:", train_accuracy)        print("Test Accuracy:", test_accuracy)                        return train_accuracy, test_accuracy, parameters

Tensor(“Mean_1:0”, shape=(), dtype=float32)

Train Accuracy: 0.86851853

Test Accuracy: 0.73333335

最后，点个赞

Residual Networks

深层网络可以拟合特别复杂的函数，但是容易造成梯度消失问题。

Residual Networks加了 ‘shortcut’ ，以便于梯度能够直接传播到前面的层

本次实验的Residual Networks主要由两种block构成

identity block
- 是ResNets中的一个标准block，使用在输入与输出的维度相同的时候
- 在 ‘shortcut’ 路径上没有进行卷积

这里我们根据实验中的设计，补全identity_block函数

# GRADED FUNCTION: identity_blockdef identity_block(X, f, filters, stage, block):    """    Implementation of the identity block as defined in Figure 3        Arguments:    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)    f -- integer, specifying the shape of the middle CONV's window for the main path    filters -- python list of integers, defining the number of filters in the CONV layers of the main path    stage -- integer, used to name the layers, depending on their position in the network    block -- string/character, used to name the layers, depending on their position in the network        Returns:    X -- output of the identity block, tensor of shape (n_H, n_W, n_C)    """        # defining name basis    conv_name_base = 'res' + str(stage) + block + '_branch'    bn_name_base = 'bn' + str(stage) + block + '_branch'        # Retrieve Filters    F1, F2, F3 = filters        # Save the input value. You'll need this later to add back to the main path.     X_shortcut = X        # First component of main path    X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)    X = Activation('relu')(X)        ### START CODE HERE ###        # Second component of main path (≈3 lines)    X = Conv2D(filters=F2,kernel_size =(f,f),strides=(1,1),padding = 'SAME',name = conv_name_base +'2b',kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3 , name = bn_name_base + '2b')(X)    X = Activation('relu')(X)    # Third component of main path (≈2 lines)    X = Conv2D(filters=F3,kernel_size =(1,1),strides=(1,1),padding = 'VALID',name = conv_name_base +'2c',kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3 , name = bn_name_base + '2c')(X)    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)    X = Add()([X,X_shortcut])    X = Activation('relu')(X)        ### END CODE HERE ###        return X

convolutional block
- 使用在输入与输出的维度不同时
- 与上面identity block的唯一区别就是在‘shortcut’路径上也进行了卷积

这里我们根据实验中的设计，补全convolutional_block函数

# GRADED FUNCTION: convolutional_blockdef convolutional_block(X, f, filters, stage, block, s = 2):    """    Implementation of the convolutional block as defined in Figure 4        Arguments:    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)    f -- integer, specifying the shape of the middle CONV's window for the main path    filters -- python list of integers, defining the number of filters in the CONV layers of the main path    stage -- integer, used to name the layers, depending on their position in the network    block -- string/character, used to name the layers, depending on their position in the network    s -- Integer, specifying the stride to be used        Returns:    X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)    """        # defining name basis    conv_name_base = 'res' + str(stage) + block + '_branch'    bn_name_base = 'bn' + str(stage) + block + '_branch'        # Retrieve Filters    F1, F2, F3 = filters        # Save the input value    X_shortcut = X    ##### MAIN PATH #####    # First component of main path     X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)    X = Activation('relu')(X)        ### START CODE HERE ###    # Second component of main path (≈3 lines)    X = Conv2D(F2,(f,f),strides=(1,1),padding='SAME',name = conv_name_base+'2b',kernel_initializer=glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3,name = bn_name_base+'2b')(X)    X = Activation('relu')(X)    # Third component of main path (≈2 lines)    X = Conv2D(F3,(1,1),strides=(1,1),padding='VALID',name=conv_name_base+'2c',kernel_initializer=glorot_uniform(seed=0))(X)    X = BatchNormalization(axis=3,name=bn_name_base+'2c')(X)    ##### SHORTCUT PATH #### (≈2 lines)    X_shortcut = Conv2D(F3,(1,1),strides=(s,s),padding='VALID',name=conv_name_base+'1',kernel_initializer = glorot_uniform(seed=0))(X_shortcut)    X_shortcut = BatchNormalization(axis=3,name = bn_name_base+'1')(X_shortcut)    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)    X = Add()([X,X_shortcut])    X = Activation('relu')(X)        ### END CODE HERE ###        return X

最后我们根据上面构造的两个 ResNets 基本块，进行整个网络的搭建与训练

补全ResNet50函数

# GRADED FUNCTION: ResNet50def ResNet50(input_shape = (64, 64, 3), classes = 6):    """    Implementation of the popular ResNet50 the following architecture:    CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3    -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER    Arguments:    input_shape -- shape of the images of the dataset    classes -- integer, number of classes    Returns:    model -- a Model() instance in Keras    """        # Define the input as a tensor with shape input_shape    X_input = Input(input_shape)        # Zero-Padding    X = ZeroPadding2D((3, 3))(X_input)        # Stage 1    X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X)    X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)    X = Activation('relu')(X)    X = MaxPooling2D((3, 3), strides=(2, 2))(X)    # Stage 2    X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1)    X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')    X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')    ### START CODE HERE ###    # Stage 3 (≈4 lines)    X = convolutional_block(X,f=3,filters=[128,128,512],stage=3,block='a',s=2)    X = identity_block(X,3,[128,128,512],stage=3,block='b')    X = identity_block(X,3,[128,128,512],stage=3,block='c')    X = identity_block(X,3,[128,128,512],stage=3,block='d')    # Stage 4 (≈6 lines)    X = convolutional_block(X,f=3,filters=[256,256,1024],stage=4,block='a',s=2)    X = identity_block(X,3,[256,256,1024],stage=4,block='b')    X = identity_block(X,3,[256,256,1024],stage=4,block='c')    X = identity_block(X,3,[256,256,1024],stage=4,block='d')    X = identity_block(X,3,[256,256,1024],stage=4,block='e')    X = identity_block(X,3,[256,256,1024],stage=4,block='f')    # Stage 5 (≈3 lines)    X = convolutional_block(X,f=3,filters=[512,512,2048],stage=5,block='a',s=2)    X = identity_block(X,3,[512,512,2048],stage=5,block='b')    X = identity_block(X,3,[512,512,2048],stage=5,block='c')    # AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"    X = AveragePooling2D((2,2),name='avg_pool')(X)        ### END CODE HERE ###    # output layer    X = Flatten()(X)    X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X)       # Create model    model = Model(inputs = X_input, outputs = X, name='ResNet50')    return model

使用 SIGNS 数据集训练并得到最终结果
- 120/120 [==============================] - 3s 24ms/step
- Loss = 0.5301783005396525
- Test Accuracy = 0.8666666626930237

吴恩达深度学习作业04相关推荐

吴恩达深度学习作业（week2）-(1)
出发作业地址https://github.com/robbertliu/deeplearning.ai-andrewNG 视频,bilibili吴恩达深度学习. 推荐食用方式 def basic_s ...
吴恩达深度学习作业编程1【识别猫片】
1.jumpyter基本操作 shift+enter执行程序 2.向量化和非向量化的区别 import numpy as np a=np.array([1,2,3,4])#创建一个数组 print(a ...
吴恩达深度学习作业之deepleraning_L1W2_h1
#吴恩达<深度学习>L1W2作业1 知识点:numpy入门,函数向量化实现 ''' 做完这个作业,你能学会:用ipython notebook 用numpy,包括函数调用及向量矩阵运算理 ...
Python吴恩达深度学习作业24 -- 语音识别关键字
关键字语音识别在本周的视频中,你学习了如何将深度学习应用于语音识别.在此作业中,你将构建语音数据集并实现用于关键词检测(有时也称为唤醒词或触发词检测)的算法.关键词识别是一项技术,可让诸如Amazo ...
Python吴恩达深度学习作业16 -- 人脸识别
人脸识别 - the Happy House 在此次作业中,你将学习构建人脸识别系统. 人脸识别问题通常分为两类: 人脸验证:比如在某些机场,系统通过扫描你的护照,然后确认你(携带护照的人)是本人,从 ...
Python吴恩达深度学习作业13 -- Keras教程
Keras教程在此作业中,你将: 学习使用Keras,这是一种高级神经网络API(编程框架),采用Python编写,并且能够在包括TensorFlow和CNTK在内的几个较低级框架之上运行. 了解任 ...
Python吴恩达深度学习作业4 -- 构建深度神经网络
逐步构建你的深度神经网络在此之前你已经训练了一个2层的神经网络(只有一个隐藏层).本周,你将学会构建一个任意层数的深度神经网络! 在此作业中,你将实现构建深度神经网络所需的所有函数. 在下一个作业中 ...
Python吴恩达深度学习作业20 -- 用LSTM网络创作一首爵士小歌
用LSTM网络创作一首爵士小歌在本次作业中,你将使用LSTM实现乐曲生成模型.你可以在作业结束时试听自己创作的音乐. 你将学习: 将LSTM应用于音乐生成. 通过深度学习生成自己的爵士乐曲. fro ...
吴恩达深度学习 —— 作业2
1.神经元节点计算什么?(A) A.神经元节点先计算线性函数(z = Wx + b),再计算激活函数. B.神经元节点先计算激活函数,再计算线性函数(z = Wx + b). C.神经元节点计算函数g ...

吴恩达深度学习作业04