深度学习:在图像上找到手势

情感手势检测 (Emotion Gesture Detection)

Hello everyone! Welcome back to the part-2 of human emotion and gesture detector using Deep Learning. In case you haven’t already, check out part-1 here. In this article, we will be covering the training of our gestures model and also look at a way to achieve higher accuracy on the emotions model. Finally, we will create a final pipeline using computer vision through which we can access our webcam and get a vocal response from the models we have trained. Without further ado let’s start coding and understanding the concepts.

大家好! 欢迎回到使用深度学习人类情感和手势检测器的第二部分。 如果您还没有的话,请在此处查看第1部分 在本文中,我们将介绍手势模型的训练,并探讨一种在情感模型上实现更高准确性的方法。 最后,我们将使用计算机视觉创建最终管道,通过该管道我们可以访问网络摄像头并从我们训练的模型中获得声音响应。 事不宜迟,让我们开始编码和理解概念。

For training the gestures model, we will be using a transfer learning model. We will use VGG-16 architecture for training the model and exclude the top layer of the VGG-16. Then we will proceed to add our own custom layers to improve the accuracy and reduce the loss. We will try to achieve an overall high accuracy of about 95% on our gestures model as we have a fairly balanced dataset and using the techniques of image data augmentation and the VGG-16 transfer learning model this task can be achieved easily and also in fewer epochs comparatively to our emotions model. In a future article, we will cover how exactly the VGG-16 architecture works but for now let us proceed to analyze the data at hand and perform an exploratory data analysis on the gestures dataset similar to how we performed on the emotions dataset after the extraction of images.

为了训练手势模型,我们将使用转移学习模型。 我们将使用VGG-16架构来训练模型,并排除VGG-16的顶层。 然后,我们将继续添加自己的自定义图层,以提高准确性并减少损失。 我们将尝试在手势模型上实现大约95%的总体高精度,因为我们拥有相当平衡的数据集,并且使用图像数据增强技术和VGG-16转移学习模型可以轻松且以更少的方式完成此任务相对于我们的情感模型而言。 在以后的文章中,我们将介绍VGG-16架构的工作原理,但现在让我们继续分析手头的数据,并对手势数据集执行探索性数据分析,类似于提取后对情感数据集的操作方式图片。

探索性数据分析(EDA): (EXPLORATORY DATA ANALYSIS (EDA):)

In this next code block, we will look at the contents in the train folder and try to figure out the total number of classes, that we have for each of the categories for the gestures in the train folder.

在下一个代码块中,我们将查看火车文件夹中的内容,并尝试找出火车文件夹中每个类别的手势的类总数。

培养: (Train:)

We can look at the four sub-folders we have in the train1 folder. Let us visually look at the number of images in these directories.

我们可以看一下train1文件夹中的四个子文件夹。 让我们直观地查看这些目录中的图像数量。

条状图: (Bar Graph:)

We can notice from the bar graph that each of the directories contains 2400 images each and this is a completely balanced dataset. Now, let us proceed to visualize the images in the train directory. We will look at the first image in each of the sub-directories and then look into the dimensions and number of channels of each of the images which are present in these folders.

从条形图中我们可以注意到,每个目录都包含2400张图像,这是一个完全平衡的数据集。 现在,让我们继续可视化火车目录中的图像。 我们将查看每个子目录中的第一张图像,然后查看这些文件夹中存在的每个图像的尺寸和通道数。

The dimension of the images are as follows:

图像的尺寸如下:

The Height of the image = 200 pixelsThe Width of the image = 200 pixelsThe Number of channels = 3

图像的高度= 200像素图像的宽度= 200像素通道数= 3

Similarly, we can perform an analysis on the validation1 directory and check how our Validation dataset and the validation images look like.

同样,我们可以对validation1目录执行分析,并检查Validation数据集和验证图像的外观。

验证: (Validation:)

条状图: (Bar Graph:)

We can notice from the bar graph that each of the directories contains 600 images each and this is a completely balanced dataset. Now, let us proceed to visualize the images in the validation directory. We will look at the first image in each of the sub-directories. The dimensions and number of channels of each of the images which are present in these folders are the same as the train directory.

从条形图中我们可以注意到,每个目录包含600张图像,这是一个完全平衡的数据集。 现在,让我们继续可视化验证目录中的图像。 我们将查看每个子目录中的第一张图像。 这些文件夹中存在的每个图像的尺寸和通道数与火车目录相同。

With this our exploratory data analysis (EDA) for our gestures dataset is completed. We can proceed to build the gestures training model for appropriate gestures prediction.

这样,我们完成了手势数据集的探索性数据分析(EDA)。 我们可以继续构建用于适当的手势预测的手势训练模型。

手势训练模型: (Gestures Train Model:)

Let us look at the code block below to understand the libraries we are importing as well as set the number of classes along with their dimensions and their respective directories.

让我们看一下下面的代码块,以了解我们要导入的库,并设置类的数量以及它们的尺寸和各自的目录。

Import all the important required Deep Learning Libraries to train the gestures model.Keras is an Application Programming Interface (API) that can run on top of tensorflow.Tensorflow will be the main deep learning module we will use to build our deep learning model.From tensorflow, we will be referring to a pre-trained model called VGG-16. We will be using VGG-16 with custom convolutional neural networks (CNN’s) i.e. We will be using our transfer learning model VGG-16 alongside our own custom model to train an overall accurate model. The VGG-16 model in keras is pre-trained with the imagenet weights.

导入所有必需的重要深度学习库以训练手势模型.Keras是可以在tensorflow之上运行的应用程序编程接口(API).Tensorflow将是我们用于构建深度学习模型的主要深度学习模块。 tensorflow,我们将参考称为VGG-16的预训练模型。 我们将使用带有自定义卷积神经网络(CNN)的VGG-16,即,我们将使用转移学习模型VGG-16和我们自己的自定义模型来训练总体准确的模型。 keras中的VGG-16模型已使用imagenet权重进行了预训练。

The ImageDataGenerator is used for Data augmentation where the model can see more copies of the model. Data Augmentation is used for creating replications of the original images and using those transformations in each epoch.The layers for training which will be used are as follows:1. Input = The input layer in which we pass the input shape.2. Conv2D = The Convoluional layer combined with Input to provide an output of tensors.3. Maxpool2D = Downsampling the Data from the convolutional layer.4. Batch normalization = It is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.5. Dropout = Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly and this prevents over-fitting.6. Dense = Fully Connected layers.7. Flatten = Flatten the entire structure to a 1-D array. The Models can be built in a model like structure as shown in this particular model or can be built in a sequential manner. Here, we will be using a functional API model-like structure which is different from our emotions model which is a sequential model.We can use l2 regularization for fine-tuning.The optimizer used will be Adam as it performs better than the other optimizers on this model.We are also importing the os module to make it compatible with the Windows environment.

ImageDataGenerator用于数据增强,其中模型可以查看模型的更多副本。 数据增强用于创建原始图像的副本并在每个时期使用这些转换。将使用的训练层如下:1。 输入 =我们在其中传递输入形状的输入层2。 Conv2D =卷积层与Input结合以提供张量输出3。 Maxpool2D =从卷积层对数据进行下采样4。 批次归一化 =这是一种用于训练非常深的神经网络的技术,该技术可以将每个微型批次的输入标准化。 这具有稳定学习过程并显着减少训练深度网络所需的训练时期的数量的作用。5。 辍学 =辍学是一种在训练过程中忽略随机选择的神经元的技术。 它们是随机“脱落”的,这可以防止过拟合6。 密集 =完全连接的图层7。 展平 =将整个结构展平为一维数组。 可以按照此特定模型所示的类似结构的模型构建模型,也可以按顺序构建模型。 在这里,我们将使用类似于API的功能模型结构,而不是作为顺序模型的情感模型。我们可以使用l2正则化进行微调。所使用的优化器将是Adam,因为它的性能优于其他优化器在此模型上。我们还导入了os模块,以使其与Windows环境兼容。

We have 4 classes of gestures which are namely punch, Victory, Super and Loser.Each of the images has a height and width of 200 as well as it is a RGB image i.e. a 3-Dimensional image.We will be using a batch_size of 128 for the image Data Augmentation.

我们有4种手势,分别是打Kong,胜利,超级和失败者。每个图像的高度和宽度均为200,并且是RGB图像即3维图像。我们将使用batch_size为128用于图像数据增强。

We will also specify the train and the validation directory for the stored images.train_dir is the directory that will contain the set of images for training.validation_dir is the directory that will contain the set of validation images.

我们还将为存储的图像指定火车和验证目录.train_dir是将包含训练图像集的目录.validation_dir是将包含验证图像集的目录。

数据扩充 (DATA AUGMENTATION:)

We will look at the image data augmentation for the gestures dataset which is similar to the emotions data.

我们将查看手势数据集的图像数据增强,该数据与情感数据类似。

The ImageDataGenerator is used for data augmentation of images. We will be replicating and making copies of the transformations of the original images. The Keras Data Generator will use the copies andnot the original ones. This will be useful for training at each epoch. We will be rescaling the image and updating all the parameters to suit our model:1. rescale = Rescaling by 1./255 to normalize each of the pixel values2. rotation_range = specifies the random range of rotation3. shear_range = Specifies the intensity of each angle in the counter-clockwise range.4. zoom_range = Specifies the zoom range. 5. width_shift_range = specify the width of the extension.6. height_shift_range = Specify the height of the extension.7. horizontal_flip = Flip the images horizontally.8. fill_mode = Fill according to the closest boundaries. train_datagen.flow_from_directory Takes the path to a directory & generates batches of augmented data. The callable properties are as follows:1. train dir = Specifies the directory where we have stored the image data.2. color_mode = Important feature which we need to specify how our images are categorized i.e. grayscale or RGB format. The default is RGB.3. target_size = The Dimensions of the image.4. batch_size = The number of batches of data for the flow operation.5. class_mode = Determines the type of label arrays that are returned.“categorical” will be 2D one-hot encoded labels.6. shuffle = shuffle: Whether to shuffle the data (default: True) If set to False, sorts the data in alphanumeric order.

ImageDataGenerator用于图像的数据扩充。 我们将复制原始图像并对其进行复制。 Keras数据生成器将使用副本而不是原始副本。 这对于每个时期的训练都是有用的。 我们将重新缩放图像并更新所有参数以适合我们的模型:1。 重新调整 =重标度由1./255归一化每个像素values2的。 rotation_range =指定旋转的随机范围3。 shear_range =指定逆时针范围内每个角度的强度4。 zoom_range =指定缩放范围。 5. width_shift_range =指定扩展名的宽度。6height_shift_range =指定扩展的高度7。 horizo​​ntal_flip = 水平翻转图像8。 fill_mode =根据最接近的边界填充。 train_datagen.flow_from_directory取得目录的路径并生成批次的扩充数据。 可调用的属性如下:1。 train dir =指定我们存储图像数据的目录2。 color_mode =重要功能,我们需要指定图像的分类方式,即灰度或RGB格式。 默认值为RGB.3。 target_size =图片的尺寸4。 batch_size =流操作的数据批数5。 class_mode =确定返回的标签数组的类型。“ categorical”将是二维一键编码的标签。6。 shuffle = shuffle:是否随机播放数据(默认值:True)如果设置为False,则按字母数字顺序对数据进行排序。

In the next code block, we are importing the VGG-16 Model in the variable VGG16_MODEL and making sure we input the model without the top layer.Using the VGG16 architecture without the top layer, we can now add our custom layers. To Avoid training VGG16 Layers we give the command below:layers.trainable = False. We will also print out these layers and make sure their training is set as False.

在下一个代码块中,我们将VGG-16模型导入变量VGG16_MODEL中,并确保输入的模型不包含顶层。使用不包含顶层的VGG16体系结构,我们现在可以添加自定义层。 为避免训练VGG16图层,我们提供以下命令:layers.trainable = False。 我们还将打印出这些图层,并确保将它们的训练设置为False。

手指手势模型: (FINGERS GESTURE MODEL:)

Below is the complete code for the custom layers of the fingers gesture model we are building —

以下是我们正在构建的手指手势模型的自定义层的完整代码-

The Finger Gesture Model we are building will be trained by usingtransfer learning. We will be using the VGG-16 model with no top layer.We will be adding custom layers to the top layer of the VGG-16 model and then we will use this transfer learning model for prediction ofthe finger gestures.The Custom layer consists of the input layer which is, basically theoutput of the VGG-16 Model. We add a convolutional layer with 32 filters,kernel_size of (3,3), and default strides of (1,1) and we use activationas relu with he_normal as the initializer.We will be using the pooling layer to downsampled the layers from theconvolutional layer.The 2 fully connected layers are used with activation as relu i.e. a Dense architecture after the sample is being passed through a flattenlayer.The output layer has a softmax activation with num_classes is 4 thatpredicts the probabilities for the num_classes namely Punch, Super, Victory and Loser.The final Model takes the input as the start of the VGG-16 modeland outputs as the final output layer.

我们正在构建的手指手势模型将通过使用转移学习进行训练。 我们将使用没有顶层的VGG-16模型,将在VGG-16模型的顶层添加自定义层,然后使用此转移学习模型来预测手指手势。输入层,基本上是VGG-16模型的输出。 我们添加了一个包含32个滤镜的卷积层,kernel_size为(3,3),默认跨度为(1,1),我们使用Activationas relu和he_normal作为初始值设定项。我们将使用池化层对卷积层进行降采样2个完全连接的层与激活一起使用,即样本通过平坦层后作为密集结构。输出层的softmax激活值为num_classes为4,可预测num_classs的概率,即打Kong,超级,胜利最终模型将输入作为VGG-16模型的开始,并将输出作为最终输出层。

The callbacks are similar to the previous emotions model, so let us directly move on the compilation and training of the gestures model.

回调类似于以前的情绪模型,因此让我们直接进行手势模型的编译和训练。

编译并拟合模型: (Compile and fit the model:)

We are compiling and fitting our model in the final step. Here, we are training the model and saving the best weights to gesturenew.h5 so that we don’t have to re-train the model repeatedly and we can use our saved model when required. We are training on both the training and validation data. The loss we have used is categorical_crossentropy which computes the cross-entropy loss between the labels and predictions. The optimizer we will be using is Adam with a learning rate of 0.001 and we will compile our model on the metric accuracy. We will fit the data on the augmented training and validation images. After the fitting step, these are the results we are able to achieve on train and validation loss and accuracy.

我们将在最后一步中编译和拟合模型。 在这里,我们正在训练模型并将最佳权重保存到gesturenew.h5,这样我们就不必重复训练模型,并且可以在需要时使用保存的模型。 我们正在训练数据和验证数据。 我们使用的损失是categorical_crossentropy,它计算标签和预测之间的交叉熵损失。 我们将使用的优化器是Adam,学习率为0.001,我们将根据度量精度来编译我们的模型。 我们将把数据拟合在增强的训练和验证图像上。 拟合步骤完成之后,这些就是我们在训练中以及验证损失和准确性上能够实现的结果。

图形: (Graph:)

观察: (Observation:)

The Model is able to perform extremely well. We can notice that the train and validation losses are decreasing constantly and the train as well as validation accuracy increases constantly. There is no over-fitting in the deep learning model and we are able to achieve a validation accuracy of over 95%.

该模型的性能非常好。 我们可以注意到,训练和验证损失不断减少,训练和验证准确性也在不断增加。 深度学习模型中没有过度拟合的问题,我们能够实现超过95%的验证准确性。

奖金: (BONUS:)

情绪模型2: (EMOTIONS MODEL-2:)

This is an additional model that we will be looking at. With this method, we can achieve higher accuracy with the exact same model. After some research and experimentation, I was able to find out that we could achieve higher accuracy by using the pixels in numpy arrays and then training them. There is a wonderful article where the author has used a similar approach. I would highly recommend users to check out that article as well. Here, we will use this approach with the custom sequential model and see what accuracy we are able to achieve. Import the libraries similar to the previous emotions model. Refer to the GitHub repository at the end of the post for additional information. Below is the code block for the complete preparation of data for the model.

这是我们将要研究的附加模型。 使用这种方法,我们可以使用完全相同的模型获得更高的精度。 经过一些研究和实验,我发现通过使用numpy数组中的像素然后对其进行训练,可以达到更高的精度。 有一篇很棒的文章 ,作者使用了类似的方法。 我强烈建议用户也查看该文章。 在这里,我们将这种方法与自定义顺序模型一起使用,并查看我们能够达到的精度。 导入与以前的情绪模型相似的库。 有关更多信息,请参阅文章末尾的GitHub存储库。 以下是用于完整准备模型数据的代码块。

num_classes = Defines the number of classes we have to predict which are namely Angry, Fear, Happy, Neutral, Surprise, Neutral, and Disgust.From the exploratory Data Analysis we know that The Dimensions of the image are: Image Height = 48 pixels Image Width = 48 pixels Number of channels = 1 because it is a grayscale image.We will consider a batch size of 64 for the model.

num_classes =定义我们必须预测的类别数,即愤怒,恐惧,快乐,中立,惊奇,中立和厌恶。通过探索性数据分析,我们知道图像的尺寸为:图像高度= 48像素图像宽度= 48像素通道数= 1,因为它是灰度图像。我们将考虑模型的批处理大小为64。

We will convert the pixels to a list in this method. We split the data by spaces and then take them as arrays and reshape them into 48, 48 shape. We can proceed to expand the dimensions and then convert the labels to the categorical matrix.

我们将通过这种方法将像素转换为列表。 我们将数据按空格分割,然后将它们作为数组,然后将其重塑为48、48形状。 我们可以继续扩展维度,然后将标签转换为分类矩阵。

Finally, we split the data into train, test, and validation. This approach is slightly different from our previous model’s approach where we only made use of train and validation as we divided the data in an 80:20 ratio. Here, we divide the data in an 80:10:10 format. We will be using the same sequential model as I did in the previous part. Let us have a look at the model once again and see how it performs after training.

最后,我们将数据分为训练,测试和验证。 这种方法与我们先前模型的方法略有不同,在以前的模型中,我们仅以训练和验证的方式将数据按80:20的比例划分。 在这里,我们将数据以80:10:10格式划分。 我们将使用与上一部分相同的顺序模型。 让我们再次看一下模型,看看它在训练后的表现。

The final accuracy, validation accuracy, loss, and validation loss we were able to achieve on all 7 emotions were as follows:

我们对所有7种情绪都能实现的最终准确性,验证准确性,损失和验证损失如下:

图形: (Graph:)

观察: (Observation:)

The Model is able to perform quite well. We can notice that the train and validation losses are decreasing constantly and the train, as well as validation accuracy, increases constantly. There is no over-fitting in the deep learning model and we are able to achieve a validation accuracy of over 65% and an accuracy of almost 70% and reduce the overall losses as well.

该模型能够执行得很好。 我们可以注意到,训练和验证损失在不断减少,训练以及验证准确性也在不断增加。 深度学习模型中没有过度拟合的问题,我们能够实现超过65%的验证准确度和几乎70%的准确度,并且还能减少总体损失。

录音: (Recordings:)

In this section, we will be creating the recordings required for the vocal response from the models. We can create custom recordings for each of the models and for each emotion or gesture. In the below code block, I will be showing an example for the recordings for one emotion and one gesture respectively.

在本节中,我们将创建模型的声音响应所需的录音。 我们可以为每个模型以及每个情感或手势创建自定义记录。 在下面的代码块中,我将显示一个分别记录一种情感和一种手势的示例。

Understanding the imported libraries:

了解导入的库:

  1. gTTS = Google Text-to-Speech is a python library that we can use to convert text to a vocal translation response.

    gTTS = Google Text-to-Speech是一个python库,我们可以使用该库将文本转换为语音翻译响应。

  2. playsound = This module is useful for playing sound directly from a specified path with a .mp3 format.

    playound =此模块对于直接从指定路径播放.mp3格式的声音很有用。

  3. shutil = This module offers several high-level operations on files and collections of files. In particular, functions are provided which support file copying, moving, and removal.

    shutil =该模块对文件和文件集合提供了一些高级操作。 特别是,提供了支持文件复制,移动和删除的功能。

In this python file, we will be creating all the required voice recordings for both the emotions as well as all the gestures and we will be storing them in the reactions directory. I have shown an example of how to create a custom voice recording in the code block for each emotion or gesture. The entire code for the recordings will be posted in the GitHub repository at the end of this post.

在这个python文件中,我们将为情感和所有手势创建所有必需的语音记录,并将它们存储在React目录中。 我已经展示了一个示例,说明如何在代码块中为每种情感或手势创建自定义语音记录。 记录的整个代码将在此文章的结尾发布在GitHub存储库中。

最终管道: (Final Pipeline:)

Our final pipeline will consist of loading both our saved models and then using them accordingly to predict emotions and gestures. I will be including 2 python files in the GitHub repository. The final_run.py takes the choice from the user and runs either an emotion or gestures model. The final_run1.py runs both the emotions and gestures model simultaneously. Feel free to use whichever is more convenient for you guys. I will be using the saved models from the first emotions trained model and the trained gestures model. We will be using an additional XML file called haarcascade_frontalface_default.xml for the detection of faces. Let us try to understand the code for the final pipeline from the code block below.

我们的最终流程包括加载两个保存的模型,然后相应地使用它们来预测情绪和手势。 我将在GitHub存储库中包含2个python文件。 final_run.py从用户那里进行选择,并运行情感或手势模型。 final_run1.py同时运行情绪模型和手势模型。 随意使用对您来说更方便的一种。 我将使用第一个情感训练模型和手势训练模型中保存的模型。 我们将使用一个名为haarcascade_frontalface_default.xml的附加XML文件来检测面部。 让我们尝试从下面的代码块中了解最终管道的代码。

In this particular code block, we are importing all the required libraries which we will be using to obtain a vocal response for the predicted label by the model. The cv2 is the computer vision (open-cv) module which we will be using to access and use our webcam in real-time. We are importing the time module to make sure we get a prediction only after 10 seconds of analysis. We load the saved pre-trained weights of both the emotions and gestures model. We then specify the classifier that we will be used for the detection of faces. We then assign all the emotions and gestures labels which can be predicted by our model.

在这个特定的代码块中,我们将导入所有必需的库,我们将使用它们来获得模型对预测标签的声音响应。 cv2是计算机视觉(open-cv)模块,我们将使用它来实时访问和使用我们的网络摄像头。 我们正在导入时间模块,以确保仅在分析10秒后才能获得预测。 我们加载已保存的情绪和手势模型的预训练权重。 然后,我们指定用于面部检测的分类器。 然后,我们分配所有可以由我们的模型预测的情感和手势标签。

In the next code block, we will look at a code snippet for the emotions model. For the entire code, refer to the GitHub repository at the end of the article.

在下一个代码块中,我们将查看情绪模型的代码片段。 有关整个代码,请参阅本文末尾的GitHub存储库。

In this choice, we will be running the emotions model. While webcam is detected we will read the frames and then we will proceed to draw a rectangle box (similar to a bounding box) when the haar cascade classifier detects a face. We will convert the facial image into a grayscale of dimensions 48, 48 similar to the trained images for better predictions. The Prediction is only made when the np.sum detects at least one face. The keras commands img_to_array converts the image to array dimensions and in case more images are detected we expand the dimensions. The Predictions are made according to the labels and the recordings will be played accordingly.

在这种选择下,我们将运行情绪模型。 当检测到网络摄像头时,我们将读取帧,然后在haar级联分类器检测到人脸时继续绘制一个矩形框(类似于边界框)。 我们将把面部图像转换为与训练图像相似的尺寸为48、48的灰度,以获得更好的预测。 仅当np.sum检测到至少一张脸时才进行预测。 keras命令img_to_array将图像转换为数组尺寸,如果检测到更多图像,我们将扩展尺寸。 根据标签进行预测,并将相应地播放录音。

Let us look at the code snippet for running the gestures model.

让我们看一下运行手势模型的代码片段。

In this choice, we will be running the gestures model. While the webcam is detected we will read the frames and then we will draw a rectangle box in the middle of the screen, unlike the emotions model. The User will have to place the fingers in the required box to make the following work. The Prediction is only made when the np.sum detects at least one finger model. The keras commands img_to_array converts the image to array dimensions and in case more images are detected we expand the dimensions. The Predictions are made according to the labels and the recordings will be played accordingly. With this, our final pipeline is completed and we have analyzed all the code required for building the human emotion and gesture detector models. We can now proceed to release the video capture and destroy all windows, which means we can quit running the frame which is being run by the computer vision module.

在此选择中,我们将运行手势模型。 当检测到网络摄像头时,我们将读取帧,然后在屏幕中间绘制一个矩形框,这与情感模型不同。 用户必须将手指放在所需的框中才能进行以下工作。 仅当np.sum检测到至少一个手指模型时才进行预测。 keras命令img_to_array将图像转换为数组尺寸,如果检测到更多图像,我们将扩展尺寸。 根据标签进行预测,并将相应地播放录音。 至此,我们的最终流程完成了,我们已经分析了构建人类情感和手势检测器模型所需的所有代码。 现在,我们可以继续释放视频捕获并销毁所有窗口,这意味着我们可以停止运行由计算机视觉模块运行的框架。

结论: (Conclusion:)

We have finally completed going through the entire human emotion and gesture detector. The GitHub repository for the entire code can be found here. I would highly recommend experimenting with the various parameters as well as the layers in all the 3 models we have built and try to achieve better results. The various recordings can also be modified as desired by the user. It is also possible to try out various transfer learning models or build your custom architectures and achieve an overall better performance. Have fun experimenting and trying out different and unique things with the models!

我们终于完成了整个人类情感和手势检测器的测试。 完整代码的GitHub存储库可在此处找到。 我强烈建议您尝试各种参数以及我们构建的所有3个模型中的图层,并尝试获得更好的结果。 各种记录也可以根据用户的需要进行修改。 还可以尝试各种迁移学习模型或构建您的自定义体系结构,以实现总体上更好的性能。 尝试模型,尝试不同的独特事物,玩得开心!

最后的想法: (Final Thoughts:)

I had great fun in writing this 2-part series and it was an absolute blast. I hope all of you enjoyed reading this as much as I did writing this. I look forward to posting more articles in the future as I find it extremely enjoyable. So, any ideas for future articles or any topic you guys want me to cover would be highly appreciated. Thank you everyone for sticking on till the end and I wish you all a wonderful day!

我写了这个由两部分组成的系列,这非常有趣。 我希望大家像我写这篇文章一样喜欢阅读本文。 我希望以后会发布更多文章,因为我觉得它非常有趣。 因此,对于未来文章的任何想法或您希望我介绍的任何主题将受到高度赞赏。 谢谢大家坚持到底,祝大家有美好的一天!

翻译自: https://towardsdatascience.com/human-emotion-and-gesture-detector-using-deep-learning-part-2-471724f7a023

深度学习:在图像上找到手势


http://www.taodudu.cc/news/show-863396.html

相关文章:

  • 爆破登录测试网页_预测危险的地震爆破第一部分:EDA,特征工程和针对不平衡数据集的列车测试拆分
  • 概率论在数据挖掘_为什么概率论在数据科学中很重要
  • 集合计数 二项式反演_对计数数据使用负二项式
  • 使用TorchElastic训练DeepSpeech
  • 神经网络架构搜索_神经网络架构
  • raspberry pi_通过串行蓝牙从Raspberry Pi传感器单元发送数据
  • 问答机器人接口python_设计用于机器学习工程的Python接口
  • k均值算法 二分k均值算法_如何获得K均值算法面试问题
  • 支持向量机概念图解_支持向量机:基本概念
  • 如何设置Jupiter Notebook服务器并从任何地方访问它(Windows 10)
  • 无监督学习 k-means_监督学习-它意味着什么?
  • logistic 回归_具有Logistic回归的优秀初学者项目
  • 脉冲多普勒雷达_是人类还是动物? 多普勒脉冲雷达和神经网络的目标分类
  • pandas内置绘图_使用Pandas内置功能探索数据集
  • sim卡rfm_信用卡客户的RFM集群
  • 需求分析与建模最佳实践_社交媒体和主题建模:如何在实践中分析帖子
  • 机器学习 数据模型_使用PyCaret将机器学习模型运送到数据—第二部分
  • 大数据平台蓝图_数据科学面试蓝图
  • 算法竞赛训练指南代码仓库_数据仓库综合指南
  • 深度学习 图像分类_深度学习时代您应该阅读的10篇文章了解图像分类
  • 蝙蝠侠遥控器pcb_通过蝙蝠侠从Circle到ML:第一部分
  • cnn卷积神经网络_5分钟内卷积神经网络(CNN)
  • 基于树的模型的机器学习
  • 数据分析模型和工具_数据分析师工具包:模型
  • 图像梯度增强_使用梯度增强机在R中进行分类
  • 机器学习 文本分类 代码_无需担心机器学习-如何在少于10行代码中对文本进行分类
  • lr模型和dnn模型_建立ML或DNN模型的技巧
  • 数量和质量评价模型_数量对于语言模型可以具有自己的质量
  • mlflow_使用MLflow跟踪进行超参数调整
  • 聊天产生器

深度学习:在图像上找到手势_使用深度学习的人类情绪和手势检测器:第2部分相关推荐

  1. 深度学习:在图像上找到手势_使用深度学习的人类情绪和手势检测器:第1部分

    深度学习:在图像上找到手势 情感手势检测 (Emotion Gesture Detection) Has anyone ever wondered looking at someone and tri ...

  2. 基于几何学习图像的三维重建发展_基于深度学习的三维重建算法:MVSNet、RMVSNet、PointMVSNet、Cascade系列...

    欢迎关注微信公众号"3D视觉学习笔记",分享博士期间3D视觉学习收获 MVSNet:香港科技大学的权龙教授团队的MVSNet(2018年ECCV)开启了用深度做多视图三维重建的先河 ...

  3. globalmapper如何选取图像上的点_图像配准算法

    [导读]图像配准与相关[1]是图像处理研究领域中的一个典型问题和技术难点,其目的在于比较或融合针对同一对象在不同条件下获取的图像,例如图像会来自不同的采集设备,取自不同的时间,不同的拍摄视角等等,有时 ...

  4. 深度学习的权重衰减是什么_【深度学习理论】一文搞透Dropout、L1L2正则化/权重衰减...

    前言 本文主要内容--一文搞透深度学习中的正则化概念,常用正则化方法介绍,重点介绍Dropout的概念和代码实现.L1-norm/L2-norm的概念.L1/L2正则化的概念和代码实现- 要是文章看完 ...

  5. 将特定像素点在图像上连接起来_(NeurIPS 2019) Gated CRF Loss-一种用于弱监督图像语义分割的新型损失函数...

    本文已经被NeurIPS 2019(2019 Conference and Workshop on Neural Information Processing Systems)接收,论文为弱监督图像语 ...

  6. 将特定像素点在图像上连接起来_图像分割【论文解读】快速图像分割的SuperBPD方法 CVPR-2020...

    提出的super-BPD 与 之前方法MCG相比,精度相当或者更优,而却可以达到 25fps,MCG仅 0.07fps.速度更快且有更好的迁移性,可应用于未见的场景. 作者 | Jianqiang W ...

  7. globalmapper如何选取图像上的点_图像去雾的算法历史与综述

    图像去雾的算法历史与综述 1. 前言 其实之前对图像去雾也没有什么深入的理解,只是了解,实现过一些传统的图像去雾方法而已.个人感觉,在CNN模型大流行的今天,已经有很多人忽略了传统算法的发展,以至于你 ...

  8. detectandcompute 图像尺寸太大_基于深度局部特征的图像检索

    1.背景 基于CNN的图像搜索的pipeline:端到端的学到输入图片的global feature,然后根据该global feature进行相似性度量.比如人脸识别,person/vehicle ...

  9. gcn在图像上的应用_每日摘要|基于CNN 特征的图像卷积网络识别杂草和作物

    文章信息 标题:CNN feature based graph convolutional network for weed and crop recognition in smart farming ...

最新文章

  1. java虚拟机栈帧_Java虚拟机,运行时栈帧结构
  2. 利用目录服务器实现单点登录
  3. java读取 info.plist源码_Java 解析 IPA 文件,读取 Info.plist 信息-Go语言中文社区
  4. Unity3D研究院之Android同步方法读取streamingAssets
  5. php怎么查自己的文件编码,php检测文件编码的方法示例
  6. .NET SDK-Style 项目(Core、Standard、.NET5)中的版本号
  7. c语言 方程改main的值_C语言编程笔记丨编写第一个C语言程序hello world,我教你哇...
  8. canvas绘制图形
  9. 【spring】spring的事务传播性 hibernate/jpa等的事务隔离性
  10. [Swift]LeetCode897. 递增顺序查找树 | Increasing Order Search Tree
  11. AJAX跨域问题解决一:使用web代理
  12. Cost function of Logistic Regression and Neural Network
  13. 地址规范化--城市三级联动(layui) (A)
  14. matlab-gaussmf正态曲线的绘制
  15. Android学习之——操作SIM卡联系人
  16. sigsuspend 与sigwait 的区别
  17. mysql迁移后数据对比_Oracle/云MySQL/MsSQL“大迁移”真相及最优方案
  18. 三秒教会你如何使用scrcpy手机无线投屏到电脑
  19. pmp考试需要备考多长时间?
  20. 学计算机是不是要i7,电脑i7处理器一定比i5强吗?很多人搞错,看完这四点就明白...

热门文章

  1. 【JAVA基础】四舍五入之7中舍入法
  2. RedHat Enterprise Linux之raid5磁盘阵列
  3. 万能门店小程序_关于传统门店开发微信小程序的优势
  4. python 进度条_六种酷炫Python运行进度条
  5. oracle 挖掘日志,Oracle 日志挖掘(LogMiner)使用详解
  6. query什么意思php,关于 QueryPHP
  7. lua php 触摸精灵,lua程序设计主要学习路径
  8. app把信息添加到mysql_如何将数据从iphone app上传到mysql数据库
  9. 更改盘符不成功_酷小二资讯:天猫店铺转让后可以更改类目和店铺名吗?
  10. 解决This picacion faied to trt becuse t could, not find or load the Qt platform plugin “windows““问题