
  • 1.导入包
  • 2.回顾:softmax函数
  • 3.神经网络
    • 3.1 问题描述
    • 3.2 数据集
      • 3.2.1 查看变量
      • 3.2.2 查看维度
      • 3.2.3 数据可视化
    • 3.3 模型表示
    • 3.4 模型构建
    • 3.5 预测



import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.activations import linear, relu, sigmoid
%matplotlib widget
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')import logging
tf.autograph.set_verbosity(0)from public_tests import * from autils import *
from lab_utils_softmax import plt_softmax



The softmax function can be written:
aj=ezj∑k=0N−1ezk(1)a_j = \frac{e^{z_j}}{ \sum_{k=0}^{N-1}{e^{z_k} }} \tag{1}aj​=∑k=0N−1​ezk​ezj​​(1)

Where z=w⋅x+bz = \mathbf{w} \cdot \mathbf{x} + bz=w⋅x+b and N is the number of feature/categories in the output layer.


# UNQ_C1
# GRADED CELL: my_softmaxdef my_softmax(z):  """ Softmax converts a vector of values to a probability distribution.Args:z (ndarray (N,))  : input data, N featuresReturns:a (ndarray (N,))  : softmax of z"""    ### START CODEez  = np.exp(z)a = ez/np.sum(ez)### END CODE HERE ### return a


z = np.array([1., 2., 3., 4.])
a = my_softmax(z)
atf = tf.nn.softmax(z)
print(f"my_softmax(z):         {a}")
print(f"tensorflow softmax(z): {atf}")my_softmax(z):         [0.03 0.09 0.24 0.64]
tensorflow softmax(z): [0.03 0.09 0.24 0.64]


3.1 问题描述


3.2 数据集

  • 下面显示的“load_data()”函数将数据加载到变量“X”和“y”中`
  • 数据集包含5000个手写数字1^11的训练示例。
    • 每个训练示例是一个20像素x 20像素的数字灰度图像。
    • 每个像素由一个浮点数表示,该浮点数指示该位置的灰度强度。
    • 20×20像素网格被“展开”为400维向量。
    • 每个训练示例都成为数据矩阵“X”中的一行。
    • 这给了我们一个5000x400矩阵“x”,其中每一行都是手写数字图像的训练示例。

X=(−−−(x(1))−−−−−−(x(2))−−−⋮−−−(x(m))−−−)X = \left(\begin{array}{cc} --- (x^{(1)}) --- \\ --- (x^{(2)}) --- \\ \vdots \\ --- (x^{(m)}) --- \end{array}\right)X=⎝⎛​−−−(x(1))−−−−−−(x(2))−−−⋮−−−(x(m))−−−​⎠⎞​

  • 训练集的第二部分是一个5000 x 1维向量“y”,其中包含训练集的标签

  • “y=0”(如果图像是数字“0”)、“y=4”(如果是数字“4”)等等。

1^11 This is a subset of the MNIST handwritten digit dataset (http://yann.lecun.com/exdb/mnist/)

# load dataset
X, y = load_data()

3.2.1 查看变量


print ('The first element of X is: ', X[0])
print ('The first element of y is: ', y[0,0])
print ('The last element of y is: ', y[-1,0])

3.2.2 查看维度

print ('The shape of X is: ' + str(X.shape))
print ('The shape of y is: ' + str(y.shape))The shape of X is: (5000, 400)
The shape of y is: (5000, 1)

3.2.3 数据可视化

  • 在下面的单元格中,代码从“X”中随机选择64行,将每行映射回20像素乘20像素的灰度图像,并将图像一起显示。
  • 每个图像的标签都显示在图像上方
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cellm, n = X.shapefig, axes = plt.subplots(8,8, figsize=(5,5))
fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]#fig.tight_layout(pad=0.5)
for i,ax in enumerate(axes.flat):# Select random indicesrandom_index = np.random.randint(m)# Select rows corresponding to the random indices and# reshape the imageX_random_reshaped = X[random_index].reshape((20,20)).T# Display the imageax.imshow(X_random_reshaped, cmap='gray')# Display the label above the imageax.set_title(y[random_index,0])ax.set_axis_off()fig.suptitle("Label, image", fontsize=14)

3.3 模型表示


  • 这有两个具有ReLU激活的致密层,然后是具有线性激活的输出层。

    • 由于图像的大小为20×2020\times2020×20,这为我们提供了400400400的输入
  • The parameters have dimensions that are sized for a neural network with 252525 units in layer 1, 151515 units in layer 2 and 101010 output units in layer 3, one for each digit.

    • Recall that the dimensions of these parameters is determined as follows:

      • If network has sins_{in}sin​ units in a layer and souts_{out}sout​ units in the next layer, then

        • WWW will be of dimension sin×souts_{in} \times s_{out}sin​×sout​.
        • bbb will be a vector with souts_{out}sout​ elements
    • Therefore, the shapes of W, and b, are

      • layer1: The shape of W1 is (400, 25) and the shape of b1 is (25,)
      • layer2: The shape of W2 is (25, 15) and the shape of b2 is: (15,)
      • layer3: The shape of W3 is (15, 10) and the shape of b3 is: (10,)

Note: The bias vector b could be represented as a 1-D (n,) or 2-D (n,1) array. Tensorflow utilizes a 1-D Tensorflow 用1维的 representation and this lab will maintain that convention:

Tensorflow models are built layer by layer. A layer’s input dimensions (sins_{in}sin​ above) are calculated for you. You specify a layer’s output dimensions and this determines the next layer’s input dimension. The input dimension of the first layer is derived from the size of the input data specified in the model.fit statement below.

Note: It is also possible to add an input layer that specifies the input dimension of the first layer. For example:
tf.keras.Input(shape=(400,)), #specify input shape
We will include that here to illuminate some model sizing.

3.4 模型构建



  • 最终的致密层应使用“线性”激活。这实际上是没有激活的。
  • The model.compile statement will indicate this by including from_logits=True.
  • 这不会影响目标的形状。在稀疏类别交叉熵SparseCategorialCrossentropy的情况下,目标是预期数字0-9。
  • 注意输出不是概率。如果需要输出概率,则应用softmax函数。


# UNQ_C2
# GRADED CELL: Sequential model
tf.random.set_seed(1234) # for consistent results
model = Sequential([               ### START CODE HERE ### tf.keras.Input(shape=(400,)),Dense(25,activation='relu',name="L1"),Dense(15,activation='relu',name="L2"),Dense(10,activation='linear',name="L3"),### END CODE HERE ### ], name = "my_model"


[layer1, layer2, layer3] = model.layers#### Examine Weights shapes
W1,b1 = layer1.get_weights()
W2,b2 = layer2.get_weights()
W3,b3 = layer3.get_weights()
print(f"W1 shape = {W1.shape}, b1 shape = {b1.shape}")
print(f"W2 shape = {W2.shape}, b2 shape = {b2.shape}")
print(f"W3 shape = {W3.shape}, b3 shape = {b3.shape}")W1 shape = (400, 25), b1 shape = (25,)
W2 shape = (25, 15), b2 shape = (15,)
W3 shape = (15, 10), b3 shape = (10,)


  • 定义了一个损失函数SparseCategorialCrossentropy,并通过添加from_logits=True指示softmax应包含在损失计算中。)
  • 定义优化器。一个流行的选择是自适应时刻(Adam),这在讲座中有所描述。
)history = model.fit(X,y,epochs=40
def plot_loss_tf(history):fig,ax = plt.subplots(1,1, figsize = (4,3))widgvis(fig)ax.plot(history.history['loss'], label='loss')ax.set_ylim([0, 2])ax.set_xlabel('Epoch')ax.set_ylabel('loss (cost)')ax.legend()ax.grid(True)plt.show()plot_loss_tf(history)

3.5 预测

image_of_two = X[1015]
display_digit(image_of_two)prediction = model.predict(image_of_two.reshape(1,400))  # predictionprint(f" predicting a Two: \n{prediction}")
print(f" Largest Prediction index: {np.argmax(prediction)}")


prediction_p = tf.nn.softmax(prediction)print(f" predicting a Two. Probability vector: \n{prediction_p}")predicting a Two. Probability vector:
[[2.01e-06 1.35e-02 8.98e-01 6.44e-02 1.14e-07 7.06e-06 5.35e-05 2.37e-021.00e-04 2.77e-05]]


yhat = np.argmax(prediction_p)print(f"np.argmax(prediction_p): {yhat}")


import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cellm, n = X.shapefig, axes = plt.subplots(8,8, figsize=(5,5))
fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]
for i,ax in enumerate(axes.flat):# Select random indicesrandom_index = np.random.randint(m)# Select rows corresponding to the random indices and# reshape the imageX_random_reshaped = X[random_index].reshape((20,20)).T# Display the imageax.imshow(X_random_reshaped, cmap='gray')# Predict using the Neural Networkprediction = model.predict(X[random_index].reshape(1,400))prediction_p = tf.nn.softmax(prediction)yhat = np.argmax(prediction_p)# Display the label above the imageax.set_title(f"{y[random_index,0]},{yhat}",fontsize=10)ax.set_axis_off()
fig.suptitle("Label, yhat", fontsize=14)


def display_errors(model,X,y):f = model.predict(X)yhat = np.argmax(f, axis=1)doo = yhat != y[:,0]idxs = np.where(yhat != y[:,0])[0]if len(idxs) == 0:print("no errors found")else:cnt = min(8, len(idxs))fig, ax = plt.subplots(1,cnt, figsize=(5,1.2))fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.80]) #[left, bottom, right, top]widgvis(fig)for i in range(cnt):j = idxs[i]X_reshaped = X[j].reshape((20,20)).T# Display the imageax[i].imshow(X_reshaped, cmap='gray')# Predict using the Neural Networkprediction = model.predict(X[j].reshape(1,400))prediction_p = tf.nn.softmax(prediction)yhat = np.argmax(prediction_p)# Display the label above the imageax[i].set_title(f"{y[j,0]},{yhat}",fontsize=10)ax[i].set_axis_off()fig.suptitle("Label, yhat", fontsize=12)return(len(idxs))print( f"{display_errors(model,X,y)} errors out of {len(X)} images")

