手语识别

by Vagdevi Kommineni

通过瓦格德维·科米尼(Vagdevi Kommineni)

如何使用转移学习进行手语识别 (How to use transfer learning for sign language recognition)

As a continuation of my previous post on ASL Recognition using AlexNet — training from scratch, let us now consider how to solve this problem using the transfer learning technique.

作为我以前关于使用AlexNet进行ASL识别的文章(从零开始的培训)的继续,现在让我们考虑如何使用迁移学习技术解决此问题。

Transfer learning has become so handy for computer vision geeks.

对于计算机视觉极客来说,转移学习变得非常方便。

It’s basically a mechanism where the knowledge acquired by training a model for achieving a task is efficiently modified or optimized in order to accomplish the second related task.

从根本上说,这是一种机制,可以有效地修改或优化通过训练用于完成任务的模型而获得的知识,以完成第二个相关任务。

One of the powerful tasks of deep learning is that, sometimes we can take the knowlewdge the neural network has learnt from one task (task A) and apply that knowledge in another task (task B). This is called transfer learning.

深度学习的强大任务之一是,有时我们可以利用神经网络从一项任务(任务A)中学到的知识,并将该知识应用于另一项任务(任务B)。 这称为转移学习。

For example, a neural network trained on object recognition can be used to read x-ray scans. This is achieved by freezing the weights until the initial or mid-layers are learned on the data for task A, removing the last layer or a few of the last layers, and adding new layers and training those parameters using the data for task B.

例如,经过对象识别训练的神经网络可用于读取X射线扫描。 这是通过冻结权重直到在任务A的数据上学习到初始层或中间层,删除最后一层或最后一层的一些层,并添加新层并使用任务B的数据训练这些参数来实现的。

Transfer learning makes sense when the data in training for task A is quite large and that of task B is relatively smaller. By getting trained on such vast amounts of data and showing excellent performance on its test data, this implies that the neural network has a good knowledge of extracting useful features from the input images. This is essential and powerful for achieving a task.

当任务A的训练数据很大而任务B的数据相对较小时,转移学习才有意义。 通过接受如此大量数据的训练并在其测试数据上显示出卓越的性能,这意味着神经网络具有从输入图像中提取有用特征的丰富知识。 这对于完成任务至关重要且功能强大。

Now that we have such powerful features from these layers (whose weights from task A are frozen), we just need to make use of these extracted features to achieve task B. So, these features from frozen layers are fed to the new layers and the parameters for these layers are trained on the data of task B.

现在,我们已经从这些层中获得了如此强大的功能(冻结了任务A的权重),我们只需要利用这些提取的特征来实现任务B。因此,来自冻结层的这些特征将被馈送到新层,并且这些层的参数是在任务B的数据上训练的。

So basically, we store the knowledge from the previous task in the form of the weights of the frozen layers (called pre-training). Then we make the neural network task B-specific by training (called fine-tuning) the latter layers on the new data. For more information about transfer learning, please visit here.

因此,基本上,我们以冻结层权重的形式(称为预训练)存储来自先前任务的知识。 然后,通过在新数据上训练(称为微调)后面的层,使神经网络任务特定于B。 有关转学的更多信息,请访问此处 。

This technique is really useful because:

该技术非常有用,因为:

  • we can bring up a model which performs elegantly for task B, though we have less data available for task B,我们可以建立一个可以轻松完成任务B的模型,尽管我们可以为任务B提供的数据更少,
  • there are fewer parameters to be trained (only last layer/layers) and thus less training time,需要训练的参数更少(仅最后一层),因此训练时间更少,
  • there is less demand for heavy computational resources like GPU, TPU (but still depends on the data available for task B).对繁重的计算资源(如GPU,TPU)的需求较少(但仍取决于任务B的可用数据)。

Since this post is the continuation of the previous post about ASL Recognition using AlexNet — training from scratch, please refer to that post for preprocessing details and the code (preprocess.py).

由于该文章是上一篇有关使用AlexNet进行ASL识别的文章的延续-从头开始培训 ,请参阅该文章以获取预处理的详细信息和代码(preprocess.py)。

The data used for both the posts is this Kaggle data for ASL. The dataset consists of images of hand gestures for each letter in the English alphabet. The images of a single class are of different variants, as in zoomed versions, dim and bright light conditions, etc. For each class, there are as many as 3000 images. Here are links for the full code of preprocessing & training and testing.

这两个帖子使用的数据就是ASL的Kaggle数据 。 数据集由英语字母中每个字母的手势图像组成。 单个类别的图像具有不同的变体,例如在缩放版本,昏暗和明亮的光照条件等中。对于每个类别,最多有3000张图像。 这是预处理,培训和测试的完整代码的链接。

For transfer learning, I have used the VGG16 pre-trained model trained on the ImageNet Dataset. The weights are readily available in keras. We shall first import all the necessary modules as follows:

对于转移学习,我使用了在ImageNet数据集上训练的VGG16预训练模型。 重量很容易以喀拉拉邦获得。 我们将首先导入所有必要的模块,如下所示:

import kerasfrom keras.optimizers import SGD       from keras.models import Sequential from keras.applications import VGG16   #VGG16 pretrained weights    from keras.preprocessing import imagefrom keras.layers.normalization import BatchNormalizationfrom keras.layers import Dense, Activation, Dropout, Flatten,Conv2D, MaxPooling2D
print("Imported Network Essentials")

Let us now initiate the model to be a sequential one and first add the pre-trained VGG16 network to our model. Note that we need to remove the last layers (called top layers) and freeze the weights of all the previous layers. That’s done by include_top=False . weights='imagenet’ takes the weights of the VGG16 network trained on the ImageNet Dataset.

现在让我们将模型初始化为顺序模型,然后首先将预训练的VGG16网络添加到我们的模型中。 请注意,我们需要删除最后一层(称为顶层)并冻结所有先前层的权重。 这是通过include_top=False完成的。 weights='imagenet'接受ImageNet数据集上训练的VGG16网络的权重。

# to fix the input image sizeimage_size=224
# Load the VGG modelvgg_base = VGG16(weights='imagenet',include_top=False,                 input_shape=(image_size,image_size,3))

Now, the part of VGG16 we want is stored in vgg_base. We shall also add the other layers like dense layers and dropout layers on top of vgg_base. Thus the full architecture of the neural network we use shall be:

现在,我们想要的VGG16部分存储在vgg_base 。 我们还将在vgg_base之上添加其他层,例如密集层和退出层。 因此,我们使用的神经网络的完整架构应为:

#initiate a modelmodel = Sequential() #Add the VGG base modelmodel.add(vgg_base) #Add new layersmodel.add(Flatten()) model.add(Dense(8192,activation='relu'))model.add(Dropout(0.8))model.add(Dense(4096,activation='relu'))model.add(Dropout(0.5))model.add(Dense(5, activation='softmax'))

We shall next define our optimizer as SGD and set the learning rate lr value. Since this is a categorical classification, we use categorical_crossentropy as the loss function in model.compile. Using checkpoints is the best way to store the weights we got until the point of interruption, so that we may use them later. The first parameter is to set the place to store: save it as weights.{epoch:02d}-{val_loss:.2f}.hdf5 in the Checkpoints folder. We then go for training by using model.fit.

接下来,我们将优化器定义为SGD并设置学习率lr值。 由于这是分类分类,因此我们将categorical_crossentropy用作model.compile的损失函数。 使用检查点是存储获得的权重直到中断点的最佳方法,以便我们以后可以使用它们。 第一个参数是设置存储位置:将其保存为weights.{epoch:02d}-{val_loss:.2f}.hdf5位于Checkpoints文件夹中。 然后,我们使用model.fit进行训练。

# Compile sgd = SGD(lr=0.001)model.compile(loss='categorical_crossentropy', optimizer=sgd,    metrics=['accuracy'])checkpoint = keras.callbacks.ModelCheckpoint("Weights/weights.{epoch:02d}-{val_loss:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)
# Trainmodel.fit(X_train/255.0, Y_train, batch_size=32, epochs=15, verbose=1,validation_data=(X_test/255.0,Y_test/255.0), shuffle=True,callbacks=[checkpoint])

We can save the model and weights as follows:

我们可以保存模型和权重,如下所示:

# serialize model to JSONmodel_json = model.to_json()with open("Model/model.json", "w") as json_file:    json_file.write(model_json)
# serialize weights to HDF5model.save_weights("Model/model_weights.h5")print("Saved model to disk")

Let’s have a look at the whole code for training here:

让我们在这里查看整个培训代码:

# train.py
import kerasfrom keras.optimizers import SGD       from keras.models import Sequential from keras.applications import VGG16   #VGG16 pretrained weights    from keras.preprocessing import imagefrom keras.layers.normalization import BatchNormalizationfrom keras.layers import Dense, Activation, Dropout, Flatten,Conv2D, MaxPooling2Dprint("Imported Network Essentials")
# to fix the input image sizeimage_size=224
# Load the VGG modelvgg_base = VGG16(weights='imagenet',include_top=False,                 input_shape=(image_size,image_size,3))
#initiate a modelmodel = Sequential() #Add the VGG base modelmodel.add(vgg_base) #Add new layersmodel.add(Flatten()) model.add(Dense(8192,activation='relu'))model.add(Dropout(0.8))model.add(Dense(4096,activation='relu'))model.add(Dropout(0.5))model.add(Dense(5, activation='softmax'))
# Compile sgd = SGD(lr=0.001)model.compile(loss='categorical_crossentropy', optimizer=sgd,    metrics=['accuracy'])checkpoint = keras.callbacks.ModelCheckpoint("Weights/weights.{epoch:02d}-{val_loss:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)
# Trainmodel.fit(X_train/255.0, Y_train, batch_size=32, epochs=15, verbose=1,validation_data=(X_test/255.0,Y_test/255.0), shuffle=True,callbacks=[checkpoint])
# serialize model to JSONmodel_json = model.to_json()with open("Model/model.json", "w") as json_file:    json_file.write(model_json)
# serialize weights to HDF5model.save_weights("Model/model_weights.h5")print("Saved model to disk")

Now it’s time for testing! Here’s the way to load the model and trained weights from the stored JSON files and use the evaluation metric accuracy_score from sklearn.metrics .

现在该进行测试了! 下面是加载从存储的JSON文件模型和训练的权重,并使用评价指标的方式accuracy_scoresklearn.metrics

# test.py
import numpy as npfrom keras.models import model_from_jsonfrom sklearn.metrics import accuracy_score
# dimensions of our imagesimage_size = 224 with open('Model/model.json', 'r') as f:    model = model_from_json(f.read())      model.summary()model.load_weights('Model/model_weights.h5')
# loading the numpy test images (feel free to look at preprocessing)X_test=np.load("Numpy/test_set.npy")Y_test=np.load("Numpy/test_classes.npy")
# getting predictions and getting the maximum of predictions# since predictions are of form [0.01, 0.99, 0, 0] in Y_predict and # are of the form [0,1,0,0] in Y_testY_predict = model.predict(X_test) Y_predict = [np.argmax(r) for r in Y_predict]Y_test = [np.argmax(r) for r in Y_test]
print("##################")acc_score = accuracy_score(Y_test, Y_predict)print("Accuracy: "+str(acc_score))print("##################")

I got an accuracy of 97%. You may follow certain steps to improve accuracy like:

我的准确率是97%。 您可以按照某些步骤来提高准确性,例如:

  • hyperparameter tuning.超参数调整。
  • using a different pretrained model like ResNet, VGG19, etc instead of VGG16.使用其他预先训练的模型,例如ResNet,VGG19等,而不是VGG16。

The full code can be found here. I would love to hear your results in the comments section below.

完整的代码可以在这里找到。 我希望在下面的评论部分中听到您的结果。

Happy learning!

学习愉快!

翻译自: https://www.freecodecamp.org/news/asl-recognition-using-transfer-learning-918ba054c004/

手语识别

手语识别_如何使用转移学习进行手语识别相关推荐

  1. 如何使用python人脸识别_如何利用python进行精准人脸识别

    2017-10-21 回答 1.1.介绍introduction 从opencv2.4开始,加入了新的类facerecognizer,我们可以使用它便捷地进行人脸识别实验.本文既介绍代码使用,又介绍算 ...

  2. python场景文字识别_针对复杂场景的 OCR 文本识别,推荐一个Python 库!

    大家好,我是 zeroing~ 1,前言 之前谈到图片文本 OCR 识别时,写过一篇文章介绍了一个 Python 包 pytesseract ,具体内容可参考 介绍一个Python 包 ,几行代码可实 ...

  3. 微分算法 非侵入式负荷识别_非侵入式负荷监测的识别方法和关键技术

    原标题:非侵入式负荷监测的识别方法和关键技术 在智能电网时代,必须突破目前用户家用电表只能读取用电总量,不能深入分析用户内部负荷成分,获取负荷信息有限的这一瓶颈,以完善用电信息采集系统和智能用电系统. ...

  4. 楚留香ai人脸识别_戴口罩居然也能人脸识别?这些AI黑科技真的藏不住了.........

    当人工智能遇见影像技术,将会释放出多少意想不到的巨大能量? 「喔图·知图实验室」瞄准当下的影像痛点,持续发力升级AI黑科技,带来两大必杀技--人脸识别再度升级.AI智能旋转校正. 戴口罩也能识别--人 ...

  5. python名片识别_百度AI攻略:名片识别

    1.功能描述: 支持对各类名片的9个关键字段进行结构化识别,包括姓名.公司.职位.邮编.邮箱.电话.网址.地址.手机号.使用名片识别技术,实现对用户名片关键信息的结构化识别和录入,可应用于线下会议.论 ...

  6. python制作文字识别_用Python轻松进行图像文本识别

    用Python轻松进行图像文本识别 作者:梅朵 微信公众号:实用办公编程技能 微信号:Excel-Python 最近,办公室的同事小李在整理一份报告,很多材料的电子版找不到了,都是纸质版的,纸质版上的 ...

  7. vue 拍照人脸识别_安排上了!PC人脸识别登录,出乎意料的简单

    推荐阅读: 微服务实战文档分享,阿里内部的Spring cloud微服务精髓都在里面 去面试3W月薪的Java岗位,被虐哭了,原来是我这些技术点还有欠缺 春招失厉,狂刷200+面试文档,终斩获头条,阿 ...

  8. python手写汉字识别_用python实现手写数字识别

    前言 在之前的学习中,已经对神经网络的算法具体进行了学习和了解.现在,我们可以用python通过两种方法来实现手写数字的识别.这两种方法分别是多元逻辑回归和神经网络方法. 用多元逻辑回归手写数字识别 ...

  9. 人工智能python3+tensorflow人脸识别_机器学习tensorflow object detection 实现人脸识别...

    object detection是Tensorflow很常用的api,功能强大,很有想象空间,人脸识别,花草识别,物品识别等.下面是我做实验的全过程,使用自己收集的胡歌图片,实现人脸识别,找出胡歌. ...

最新文章

  1. You should rebuild using libgmp = 5 to avoid timing attack vulnerability
  2. linux tar命令 打包 解压
  3. Linux脚本获取日期,Shell脚本获取格式化日期与时间
  4. 下面不属于python保留字_下面不属于Python保留字的是:?????????????????????????????????...
  5. SpringBoot2.x整合Redis实战 4节课
  6. 路飞学城django
  7. Stack Overflow 遭黑客入侵;中国首条 5G 覆盖地铁诞生;VS Code 1.34 发布!| 极客头条...
  8. 如何安装vscode网页版_如何让用编辑器编写EverNote?
  9. 简单实例讲解为何深度学习有效
  10. Atitit 提升扩展性bpmn艾提拉总结 工作流 目录 1.1. 尽管BPMN 1.1全面地处理了过程建模符号,但它实质上缺少解决交换格式(用于图交换)的问题 1 1.2. BPMN 2.0中使
  11. 生鲜配送小程序源码_ThinkPHP社区水果生鲜蔬菜同城配送服务平台 社区团购商城小程序源码...
  12. 爬虫学习5-JSON 数据的分析与解析
  13. jdk工具keytool和jarsigner帮助(jdk keytooljarsigner tool manual)
  14. 寄存器与ROM与RAM
  15. [Java FX 2] Stage with rounded corners and background image
  16. 区块链的概念定义是什么
  17. Xiaojie雷达之路---速度解模糊
  18. Python常用模块4-Python的datetime及time模块简介
  19. 论文笔记(八):360 VR Based Robot Teleoperation Interface for Virtual Tour
  20. 拍照打卡签到活动到达地点拍照上传管理document.getElementById(“myP“).innerHTML=“拍照“;

热门文章

  1. 演练 可以飞可以喷火的人 java
  2. 编码规范二 缩进与注释
  3. linux-用户的创建
  4. JAVA菜鸟入门HelloWorld
  5. node.js当中net模块的简单应用(基于控制台的点对点通信)
  6. 虚拟时代将至:环绕计算才是未来
  7. jQuery-图片上传裁剪插件--imgAreaSelect(分析一) 放大缩小
  8. python 自动下载 voa MP3
  9. aspnetdb生成
  10. 使用TDengine快速搭建运维监测系统