cnn kaggle仙人掌

by Jerin Paul

杰林·保罗(Jerin Paul)

我如何开发可识别情绪并闯入Kaggle前10名的CNN (How I developed a C.N.N. that recognizes emotions and broke into the Kaggle top 10)

A baby starts to recognize its parents’ faces when it is just a couple of weeks old. As it grows, this innate ability improves. By the time it is a few months old, it starts to display social cues and is able to understand basic emotions like a smile.

几周大的婴儿开始认出父母的脸。随着它的成长，这种先天的能力会提高。几个月前，它开始显示社交线索，并能够理解基本的情感，如微笑。

Thanks to millions of years of evolution, we are able to understand each other without using a single word. Just a look and that is all that takes to understand whether a person is crestfallen or elated. Well, I tried teaching computers to do just that. This article is a detailed account of how the whole experiment turned out. Follow along as we recreate the network.

由于数百万年的发展，我们无需使用任何单词就能相互理解。只是看一下就可以了解一个人是否垂头丧气或兴高采烈。好吧，我试着教计算机做到这一点。本文详细介绍了整个实验的结果。跟随我们重新创建网络。

Cut to the chase Paul, please, give me the code. Don’t want fancy reading? No problem. You can find the code for this project here.

切入正题Paul，请给我代码。 不想看书吗？没问题。您可以在此处找到该项目的代码。

简介 (A Brief Introduction)

“The best and most beautiful things in the world cannot be seen or even touched. They must be felt with the heart” ― Helen Keller

“世界上最美好的事物是看不见的，甚至无法触及的。必须用心去感受他们。”- 海伦·凯勒

Hellen Keller excellently described the essence of human emotions in the aforementioned quote. What was once reserved for animals is no longer limited to them. Machine learning is catching on at a mindnumbing pace. The onset of convolutional neural networks was a breakthrough and changed the way computers “look” at the world.

Hellen Keller在上述引用中很好地描述了人类情感的本质。曾经为动物保留的东西不再仅限于它们。机器学习正以令人发指的速度发展。卷积神经网络的出现是一个突破，改变了计算机“看待”世界的方式。

Facial expressions are nothing more than the arrangement of facial muscles to convey a certain emotional state to the observer. Emotions can be divided into six broad categories — Anger, Disgust, Fear, Happiness, Sadness, Surprise, and Neutral. In this M.L. project, we will train a model to differentiate between these.

面部表情无非是安排面部肌肉将某种情绪状态传达给观察者。情绪可分为六大类-愤怒，厌恶，恐惧，幸福，悲伤，惊奇和中立。在此ML项目中，我们将训练一个模型来区分这些模型。

We will train a convolutional neural network using the FER2013 dataset and will use various hyper-parameters to fine-tune the model. We will train it on Google Colab, which is a research project created to disseminate ML education. They will allocate you some resources like G.P.U. or T.P.U., and these can be used to train your model faster. The best part is that it is completely free.

我们将使用FER2013数据集训练卷积神经网络，并将使用各种超参数对模型进行微调。我们将在Google Colab上对它进行培训，这是一个旨在传播ML教育的研究项目。他们将为您分配一些资源，例如GPU或TPU，这些资源可用于更快地训练模型。最好的部分是它是完全免费的。

窥探数据 (Peek at the data)

We will start by uploading the FER2013.csv file to our drive so that we can access it from Google Colab. There are 35,888 images in this dataset which are classified into six emotions. The data file contains 3 columns — Class, Image data, and Usage.

我们将从将FER2013.csv文件上传到我们的驱动器开始，以便我们可以从Google Colab对其进行访问。该数据集中有35,888张图像，分为六种情绪。数据文件包含3列-类，图像数据和使用情况。

Class: is a digit between 0 to 6 and represents the emotion depicted in the corresponding picture. Each emotion is mapped to an integer as shown below.

类别：是介于0到6之间的数字，代表相应图片中描述的情绪。每种情绪都映射到一个整数，如下所示。

0 - 'Angry'1 - 'Disgust'2 - 'Fear' 3 - 'Happy' 4 - 'Sad' 5 - 'Surprise'6 - 'Neutral'

Image data: is a string of 2,304 numbers and these are the pixel intensity values of our image, we will cover this in detail in a while.

图像数据：是一个2304个数字的字符串，这些是我们图像的像素强度值，我们将在稍后详细介绍。

Usage: denotes whether the corresponding data should be used to train the network or test it.

用法：表示应使用相应的数据来训练网络还是对其进行测试。

分解图像。 (Decomposing an image.)

As we all know that images are composed of pixels and these pixels are nothing more than numbers. Colored images have three color channels — red, green, and blue — and each channel is represented by a grid (2-dimensional array). Each cell in the grid stores a number between 0 and 255 which denotes the intensity of that cell.

众所周知，图像由像素组成，这些像素不过是数字。彩色图像具有三个颜色通道-红色，绿色和蓝色-每个通道都由一个网格(二维数组)表示。网格中的每个单元格都存储一个介于0和255之间的数字，表示该单元格的强度。

When these three channels are aligned together we get the images that we see.

当这三个通道对齐在一起时，我们得到的图像就是我们看到的。

导入必要的库 (Importing Necessary Libraries)

%matplotlib inlineimport matplotlib.pyplot as plt

import numpy as npfrom keras.utils import to_categoricalfrom sklearn.model_selection import train_test_split

from keras.models import Sequential #Initialise our neural network model as a sequential networkfrom keras.layers import Conv2D #Convolution operationfrom keras.layers.normalization import BatchNormalizationfrom keras.regularizers import l2from keras.layers import Activation#Applies activation functionfrom keras.layers import Dropout#Prevents overfitting by randomly converting few outputs to zerofrom keras.layers import MaxPooling2D # Maxpooling functionfrom keras.layers import Flatten # Converting 2D arrays into a 1D linear vectorfrom keras.layers import Dense # Regular fully connected neural networkfrom keras import optimizersfrom keras.callbacks import ReduceLROnPlateau, EarlyStopping, TensorBoard, ModelCheckpointfrom sklearn.metrics import accuracy_score

定义数据加载机制 (Define Data Loading Mechanism)

Now, we will define the load_data() function which will efficiently parse the data file and extract necessary data and then convert it into a usable image format.

现在，我们将定义load_data()函数，该函数将有效地解析数据文件并提取必要的数据，然后将其转换为可用的图像格式。

All the images in our dataset are 48x48 in dimension. Since these images are gray-scale, there is only one channel. We will extract the image data and rearrange it into a 48x48 array. Then convert it into unsigned integers and divide it by 255 to normalize the data. 255 is the maximum possible value of a single cell. By dividing every element by 255, we ensure that all our values range between 0 and 1.

我们的数据集中所有图像的尺寸均为48x48。由于这些图像是灰度图像，因此只有一个通道。我们将提取图像数据并将其重新排列为48x48阵列。然后将其转换为无符号整数并将其除以255以对数据进行归一化。 255是单个单元格的最大可能值。通过将每个元素除以255，我们确保所有值都在0到1之间。

We will check the Usage column and store the data in separate lists, one for training the network and the other for testing it.

我们将检查“ 使用情况”列，并将数据存储在单独的列表中，一个用于训练网络，另一个用于测试网络。

def load_data(dataset_path):

data = []  test_data = []  test_labels = []  labels =[]

with open(dataset_path, 'r') as file:      for line_no, line in enumerate(file.readlines()):          if 0 < line_no <= 35887:            curr_class, line, set_type = line.split(',')            image_data = np.asarray([int(x) for x in line.split()]).reshape(48, 48)            image_data =image_data.astype(np.uint8)/255.0                        if (set_type.strip() == 'PrivateTest'):                            test_data.append(image_data)              test_labels.append(curr_class)            else:              data.append(image_data)              labels.append(curr_class)            test_data = np.expand_dims(test_data, -1)      test_labels = to_categorical(test_labels, num_classes = 7)      data = np.expand_dims(data, -1)         labels = to_categorical(labels, num_classes = 7)          return np.array(data), np.array(labels), np.array(test_data), np.array(test_labels)

Once our data is segregated, we will expand the dimensions of both testing and training data by one to accommodate the channel. Then, we will one hot encode all the labels using the to_categorical() function and return all the lists as numpy arrays.

隔离数据后，我们将把测试和培训数据的维度扩大一个以适应渠道。然后，我们将使用to_categorical()函数对所有标签进行热编码，并将所有列表作为numpy数组返回。

We will load the data by calling the load_data() function.

我们将通过调用load_data()函数来加载数据。

dataset_path = "/content/gdrive/My Drive/Colab Notebooks/Emotion Recognition/Data/fer2013.csv"

train_data, train_labels, test_data, test_labels = load_data(dataset_path)

print("Number of images in Training set:", len(train_data))print("Number of images in Test set:", len(test_data))

Our data is loaded and now let us get to the best part, defining the network.

我们的数据已加载，现在让我们进入最佳状态，定义网络。

定义模型。 (Defining the model.)

We will use Keras to create a Sequential Convolutional Network. Which means that our neural network will be a linear stack of layers. This network will have the following components:

我们将使用Keras创建顺序卷积网络。这意味着我们的神经网络将是层的线性堆叠。该网络将包含以下组件：

Convolutional Layers: These layers are the building blocks of our network and these compute dot product between their weights and the small regions to which they are linked. This is how these layers learn certain features from these images.卷积层：这些层是我们网络的构建块，它们计算权重与所链接的小区域之间的点积。这就是这些图层从这些图像中学习某些功能的方式。
Activation functions: are those functions which are applied to the outputs of all layers in the network. In this project, we will resort to the use of two functions— Relu and Softmax.

激活功能：是应用于网络所有层的输出的那些功能。在这个项目中，我们将诉诸使用两个函数-Relu和Softmax 。
Pooling Layers: These layers will downsample the operation along the dimensions. This helps reduce the spatial data and minimize the processing power that is required.合并层：这些层将沿维度下采样操作。这有助于减少空间数据并使所需的处理能力最小化。
Dense layers: These layers are present at the end of a C.N.N. They take in all the feature data generated by the convolution layers and do the decision making.密集层：这些层位于CNN的末尾，它们吸收卷积层生成的所有特征数据并进行决策。
Dropout Layers: randomly turns off a few neurons in the network to prevent overfitting.辍学层：随机关闭网络中的一些神经元，以防止过度拟合。
Batch Normalization: normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. This speeds up the training process.批次归一化：通过减去批次平均值并除以批次标准偏差来归一化先前激活层的输出。这样可以加快培训过程。

model.add(Conv2D(64, (3, 3), activation='relu', input_shape=(48, 48, 1), kernel_regularizer=l2(0.01)))model.add(Conv2D(64, (3, 3), padding='same',activation='relu'))model.add(BatchNormalization())model.add(MaxPooling2D(pool_size=(2,2), strides=(2, 2)))model.add(Dropout(0.5))    model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(MaxPooling2D(pool_size=(2,2)))model.add(Dropout(0.5))    model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(MaxPooling2D(pool_size=(2,2)))model.add(Dropout(0.5))    model.add(Conv2D(512, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(Conv2D(512, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(Conv2D(512, (3, 3), padding='same', activation='relu'))model.add(BatchNormalization())model.add(MaxPooling2D(pool_size=(2,2)))model.add(Dropout(0.5))    model.add(Flatten())model.add(Dense(512, activation='relu'))model.add(Dropout(0.5))model.add(Dense(256, activation='relu'))model.add(Dropout(0.5))model.add(Dense(128, activation='relu'))model.add(Dropout(0.5))model.add(Dense(64, activation='relu'))model.add(Dropout(0.5))model.add(Dense(7, activation='softmax'))

We will compile the network using Adam optimizer and will use a variable learning rate. Since we are dealing with a classification problem that involves multiple categories, we will use categorical_crossentropy as our loss function.

我们将使用Adam优化器来编译网络，并将使用可变的学习率。由于我们正在处理涉及多个类别的分类问题，因此我们将使用categorical_crossentropy作为损失函数。

adam = optimizers.Adam(lr = learning_rate)

model.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])    print(model.summary()

回调功能 (Callback functions)

Callback functions are those functions which are called after every epoch during the training process. We will be using the following callback functions:

回调函数是在训练过程中的每个时期之后调用的函数。我们将使用以下回调函数：

ReduceLROnPlateau: Training a neural network can plateau at times and we stop seeing any progress during this stage. Therefore, this function monitors the validation loss for signs of a plateau and then alter the learning rate by the specified factor if a plateau is detected.ReduceLROnPlateau：训练神经网络有时会停滞不前，在此阶段我们看不到任何进展。因此，此功能监视平稳损失的平稳迹象，如果检测到平稳，则将学习率改变指定的因子。

lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=3)

2. EarlyStopping: At times, the progress stalls while training a neural network and we stop seeing any improvement in the validation accuracy (in this case). Majority of the time, this means that the network won’t converge any further and there is no point in continuing the training process. This function waits for a specified number of epochs and terminates the training if no change in the parameter is found.

2. EarlyStopping：有时，在训练神经网络时进度会停滞，我们不再看到验证准确性有任何改善(在这种情况下)。在大多数情况下，这意味着网络不会进一步融合，继续进行培训没有意义。如果找不到参数更改，此函数将等待指定的时期数并终止训练。

early_stopper = EarlyStopping(monitor='val_acc', min_delta=0, patience=6, mode='auto')

3. ModelCheckpoint: Training neural networks generally takes a lot of time and anything can happen during this period that may result in loss of all the variables and weights. Creating checkpoints is a good habit as it saves your model after every epoch. In case your training stops you can load the checkpoint and resume the process.

3. ModelCheckpoint：训练神经网络通常会花费很多时间，在此期间可能发生任何事情，可能会导致所有变量和权重损失。创建检查点是一个好习惯，因为它会在每个时期后保存您的模型。万一您的培训停止了，您可以加载检查点并继续该过程。

checkpointer = ModelCheckpoint('/content/gdrive/My Drive/Colab Notebooks/Emotion Recognition/Model/weights.hd5', monitor='val_loss', verbose=1, save_best_only=True)

训练时间 (Time to train)

All our hard work is about to be put to the test. But before we fit the model, let us define some hyper-parameters.

我们所有的辛苦工作即将受到考验。但是在拟合模型之前，让我们定义一些超参数。

epochs = 100batch_size = 64learning_rate = 0.001

Our data will pass through the model 100 times and in batches of 64 images. We will use 20% of our training data to validate the model after every epoch.

我们的数据将通过模型100次并分批处理64张图像。在每个时期之后，我们将使用20％的训练数据来验证模型。

model.fit(          train_data,          train_labels,          epochs = epochs,          batch_size = batch_size,          validation_split = 0.2,          shuffle = True,          callbacks=[lr_reducer, checkpointer, early_stopper]          )

Now that the network is being trained, I suggest that you go and finish that book you started or go for a run. It took me about an hour on Google Colab.

现在已经对网络进行了培训，我建议您去完成您开始或尝试的那本书。我在Google Colab上花了大约一个小时。

测试模型 (Test the model)

Remember the private set we stored separately? That was for this very moment. This is the moment of truth and this is where we will reap the fruit of our labor.

还记得我们单独存储的私有集吗？那是在这一刻。这是关键时刻，这是我们收获劳动成果的地方。

predicted_test_labels = np.argmax(model.predict(test_data), axis=1)test_labels = np.argmax(test_labels, axis=1)print ("Accuracy score = ", accuracy_score(test_labels, predicted_test_labels))

Well, the results came back and we scored 63.167%. On first glance, it isn’t much but we broke into the ninth position of the Facial Emotion Recognition Kaggle competition.

好吧，结果又回来了，我们得分了63.167％。乍一看，虽然不多，但我们闯入了面部表情识别Kaggle竞赛的第9位。

Now, pat yourself on the back and start brainstorming about the ways in which you can improve this model. We can use better hyper-parameters or create a different network architecture altogether to achieve higher accuracies.

现在，拍打自己的背，开始就如何改进此模型进行集思广益。我们可以使用更好的超参数或完全创建不同的网络体系结构以实现更高的精度。

保存模型 (Save the model)

Quickly save the model using model_from_json from keras.models.

快速保存使用model_from_json从keras.models模型。

from keras.models import model_from_json

model_json = model.to_json()with open("/content/gdrive/My Drive/Colab Notebooks/Emotion Recognition/FERmodel.json", "w") as json_file:    json_file.write(model_json)# serialize weights to HDF5model.save_weights("/content/gdrive/My Drive/Colab Notebooks/Emotion Recognition/FERmodel.h5")print("Saved model to disk")

包装全部 (Wrapping it all up)

We started off by defining a loading mechanism and loading the images. Then we created a training set and a testing set. Then we defined a fine model and defined a few callback functions. We went over the basic components of a convolutional neural network and then we trained our network.

我们首先定义了一种加载机制并加载了图像。然后，我们创建了训练集和测试集。然后，我们定义了一个很好的模型并定义了一些回调函数。我们研究了卷积神经网络的基本组件，然后训练了我们的网络。

I extended this project by creating a python application which is able to detect faces and recognize their emotions in real time. That will be covered in a later post.

我通过创建一个Python应用程序扩展了该项目，该应用程序能够实时检测人脸并识别其情绪。这将在以后的文章中介绍。

We just accomplished something that was part of science fiction a few decades ago. Yet there is a lot left to learn. The internet provides us with a plethora of information to constantly create and learn. May the learning never cease.

几十年前，我们刚刚完成了一些科幻小说。然而，还有很多东西需要学习。互联网为我们提供了大量信息，以不断创造和学习。愿学习永无止境。

翻译自: https://www.freecodecamp.org/news/facial-emotion-recognition-develop-a-c-n-n-and-break-into-kaggle-top-10-f618c024faa7/