
Below are the steps to build a model that can classify handwritten digits with an accuracy of more than 95%. While reading this article I suggest you simultaneously try the code in colab notebook. Follow the steps and observe the outputs.

以下是构建模型的步骤,该模型可以以95%以上的精度对手写数字进行分类。 阅读本文时,建议您同时尝试colab笔记本中的代码。 遵循步骤并观察输出。

You can find the complete code on GitHub as well.


1.准备输入数据 (1. Prepare the input data)

Step-1 Import the required libraries


import numpy as npimport kerasfrom keras.datasets import mnistimport matplotlib.pyplot as plt
  • Numpy has many functions that provide support for arrays. An image is nothing but a NumPy array containing pixels of the data points.

    Numpy具有许多对数组提供支持的功能。 图像不过是包含数据点像素的NumPy数组。

  • Keras library has many functions that make it really easy to build models.


  • Matplotlib helps to plot images and visualize results.


Step-2 Load the MNIST dataset


Sample of MNIST dataset containing hand-written digits.

Dataset- This dataset consists of 60,000 28x28 grayscale images of the 10 digits (0–9), along with a test set of 10,000 images[1]. More info can be found at the MNIST homepage.

Dataset-该数据集包括60000个28x28的10个数字(0-9)灰度图像,与测试组10,000张图像[1]的沿。 可以在MNIST主页上找到更多信息。

(x_train, y_train), (x_test, y_test)=mnist.load_data()Downloading data from 11493376/11490434 [==============================] - 0s 0us/step

Step-3 Reshape the input data


#View the shape of loaded datasetx_train.shapex_test.shapey_train.shapex_test.shape

The size of input images is 78x78. But we can pass only flattened arrays in a Deep Neural Network. So let's reshape the images.

输入图像的大小为78x78 。 但是我们只能在深度神经网络中传递扁平化的数组。 因此,让我们重塑图像。

#Reshape the datasetx_test=x_test.reshape(-1,784)x_train=x_train.reshape(-1, 784)

2.建立模型 (2. Build your model)

We will build a Sequential model using Keras layers. In a sequential model output of the previous layer is treated as an input for the next layer.

我们将使用Keras层构建顺序模式 l。 在顺序模型中,将上一层的输出视为下一层的输入。

Step-4 Import the libraries required to build a model


from keras.models import Sequentialfrom keras.layers import Dense
  • For building a sequential model we import Sequential class from keras.models.


  • We import Dense class to add dense layers in our network. In a dense layer, each neuron of a layer is connected to all the neurons of the next layer. In simple words, a dense layer is a fully connected layer.我们导入Dense类以在网络中添加密集层。 在密集层中,一层的每个神经元都连接到下一层的所有神经元。 简而言之,致密层是完全连接的层。

Step-5 Add layers to your network



We created an object of Sequential class called a model. Now we will add the required layers to the model using model.add().

我们创建了一个称为模型的序列类对象。 现在,我们将使用model.add()将所需的图层添加到模型中。

model.add(Dense(units=64, activation=’relu’,input_shape= (784, )))model.add(Dense(units= 64, activation=’relu’))model.add(Dense(units= 128, activation=’relu’))model.add(Dense(units= 64, activation=’relu’))model.add(Dense(units= 10, activation=’softmax’))

In this model, we will add 5 dense layers. For each layer, we have to give values for a few parameters. The two important parameters are units and activation. However, there are many other parameters like kernal_initializer, bias_initializer, etc, which will be initialized with their default values.

在此模型中,我们将添加5个密集层。 对于每一层,我们必须给出一些参数的值。 两个重要参数是单位激活 。 但是,还有许多其他参数,如kernal_initializerbias_initializer等,将使用其默认值进行初始化。

  • Units: refers to the number of neurons in a layer. We generally increase the number of neurons preferably in the form of 2^n. In multiclass classification number of units in the last layer is equal to a number of different classes.

    单位:指一层中神经元的数量。 我们通常以2 ^ n的形式增加神经元的数量。 在多类别分类中,最后一层中的单元数等于多个不同类别。

  • Activation: The activation function to be used in each layer. In our model, we have used Relu(Rectified Linear Unit function) in hidden layers and Softmax function in the output layer.

    激活:在每个层中使用的激活功能。 在我们的模型中,我们在隐藏层中使用了Relu(整流线性单位函数),在输出层中使用了Softmax函数。

  • input_shape: This argument is passed only for the first layer because the first layer does not know what kind of input will be given by the user. Hence we have to give it explicitly. The other layers will get the input of the same shape as the shape of the output of the previous layer.

    input_shape :此参数仅在第一层传递,因为第一层不知道用户将提供哪种输入。 因此,我们必须明确给出它。 其他层将获得与前一层输出形状相同的形状的输入。

The output is given as activation(dot(input, kernal)+bias) where activation is the element-wise activation function passed as the activation argument, kernal is a weights matrix created by the layer, and bias is a bias vector created by the layer.

输出给出为activation(dot(input(input,kernal)+ bias) ,其中activation是作为激活参数传递的逐元素激活函数,kernal是由图层创建的权重矩阵,bias是由图层创建的偏置向量。层。

View the model summary if you want


model.summary()Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= dense (Dense)                (None, 64)                50240      _________________________________________________________________ dense_1 (Dense)              (None, 64)                4160       _________________________________________________________________ dense_2 (Dense)              (None, 128)               8320       _________________________________________________________________ dense_3 (Dense)              (None, 64)                8256       _________________________________________________________________ dense_4 (Dense)              (None, 10)                650        ================================================================= Total params: 71,626 Trainable params: 71,626 Non-trainable params: 0 _________________________________________________________________

Step-6 Compile the model


Now we will compile the model and give the following parameters :


  • Optimizer: It optimizes the loss function. Here we will use adam optimizer.

    优化器 :优化损失函数。 在这里,我们将使用adam优化器。

  • loss: This model is performing multiclass classification so we will use categorical_crossentropy loss.

    loss :此模型正在执行多类分类,因此我们将使用categorical_crossentropy损失。

  • metrics: It defines on what basis we have to evaluate our model. Here, we will use accuracy as the metrics.

    指标 :它定义了我们必须在什么基础上评估我们的模型。 在这里,我们将准确性作为指标。

model.compile(optimizer=”adam”, loss=’categorical_crossentropy’,metrics=[‘accuracy’] )

Step-7 Convert the output vector to one hot vector


To know what is one-hot encoding and why it is necessary read this.

要了解什么是一键编码以及为什么有必要阅读此内容 。

from keras.utils import to_categoricaly_train=to_categorical(y_train)

3.训练模型 (3. Train your model)

Step-8 Fit the model on your training dataset


Now we will fit the model on training data and save the model as ‘hist’. We have to give the values of the following parameters:

现在我们将模型拟合到训练数据上,并将模型另存为“历史”。 我们必须提供以下参数的值:

  • x: the input vector of training data.

    x :训练数据的输入向量。

  • y: output vector of training data, which was one-hot encoded.

    y :训练数据的输出向量,它是一热编码的。

  • batch_size: it is significant for large datasets. We chose 32 as the batch size which means in every iteration 32 examples will be processed.

    batch_size :对于大型数据集而言意义重大。 我们选择32作为批处理大小,这意味着在每次迭代中将处理32个示例。

  • epochs: the number of epochs

    纪元 :纪元数

  • validation_split: the ratio of total data to be used for validation.

    validation_split :用于验证的总数据的比率。

There are many other parameters but they will be initialized with their default values.

还有许多其他参数,但是它们将使用其默认值进行初始化。, y=y_train, batch_size=32, epochs=10, validation_split=0.2, shuffle=True)Epoch 1/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.8803 - accuracy: 0.8268 - val_loss: 0.3275 - val_accuracy: 0.9077 Epoch 2/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.2615 - accuracy: 0.9238 - val_loss: 0.2284 - val_accuracy: 0.9383 Epoch 3/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.2054 - accuracy: 0.9406 - val_loss: 0.1965 - val_accuracy: 0.9447 Epoch 4/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.1703 - accuracy: 0.9504 - val_loss: 0.1894 - val_accuracy: 0.9503 Epoch 5/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.1489 - accuracy: 0.9569 - val_loss: 0.2132 - val_accuracy: 0.9452 Epoch 6/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.1319 - accuracy: 0.9620 - val_loss: 0.1746 - val_accuracy: 0.9558 Epoch 7/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.1182 - accuracy: 0.9655 - val_loss: 0.1546 - val_accuracy: 0.9583 Epoch 8/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.1066 - accuracy: 0.9695 - val_loss: 0.1533 - val_accuracy: 0.9605 Epoch 9/10 1500/1500 [==============================] - 4s 2ms/step - loss: 0.0952 - accuracy: 0.9722 - val_loss: 0.1680 - val_accuracy: 0.9617 Epoch 10/10 1500/1500 [==============================] - 4s 3ms/step - loss: 0.0868 - accuracy: 0.9747 - val_loss: 0.1775 - val_accuracy: 0.9574

4,测试你的模型 (4.Test your model)

Step-9 Predict outcome for any random datapoint


Now lets test one random example from out dataset, say example no 999. To make predictions we use model.predict() function.



The output given is 9 in my code. To check if the prediction made by your model is correct plot the data using plt.imshow().

在我的代码中给出的输出为9。 要检查模型做出的预测是否正确,请使用plt.imshow()绘制数据。

plt.imshow(x_test[999].reshape(28,28), cmap=’gray’)

Our model predicted the input digit correctly.





