一篇基于深度学习的步态识别综述阅读笔记，原文链接为：https://arxiv.org/pdf/2102.09546.pdf

摘要：

1.Introduction 简介

2 REVIEW METHODOLOGY Review的方法

3 TEST PROTOCOLS AND DATASETS 测试的原则与数据集

3.1 Protocols 测试的协议

3.2 Datasets 数据集

4 PROPOSED TAXONOMY 提出的分类法

4.1 Body Representation 形体表征

4.2 Temporal Representation 时序表征

4.3 Feature Representation 特征表征

4.4 Neural Architectures 神经网络结构

4.4.1 Convolutional Neural Networks 卷积神经网络

4.4.2 Deep Belief Networks 深度置信网络

4.4.3 Recurrent Neural Networks 循环神经网络

4.4.4 Deep AutoEncoders 深度自编码器

4.4.5 Generative Adversarial Networks 生成式对抗网络

4.4.6 Capsule Networks 胶囊网络

4.4.7 3D Convolutional Neural Networks 3D卷积网络

4.4.8 Graph Convolutional Networks 图卷积网络

4.4.9 Hybrid Networks 混合网络

5 STATE-OF-THE-ART 目前的水平

5.1 Analysis and Trends 分析与趋势

5.2 Performance Comparison 性能比较

6 CHALLENGES AND FUTURE RESEARCH DIRECTIONS 挑战与未来的研究方向

6.1 Disentanglement 解耦

6.2 Self-supervised Learning 自监督学习

6.3 Multi-task Learning 多任务学习

6.4 Data Synthesis and Domain Adaptation 数据合成与领域适配

6.5 Cross-Dataset Evaluation 跨数据集评估

6.6 Multi-View Recognition 多视角识别

6.7 Multi-biometric Recognition 多生物特征识别

摘要：

Deep learning has reshaped the research landscape in this area since 2015 through the ability to automatically learn discriminative representations.

深度学习引入步态识别是2015年，好处是可自动学习具有高可辨识性的表征。

1.Introduction 简介

Gait information can be captured using a number of sensing modalities such as wearable sensors attached to the human body, for instance accelerometers, gyroscopes, and force and pressure sensors

步态信息可以通过一系列可穿戴设备上的传感器获取，比如加速度计、陀螺仪和压力传感器等。

Non-wearable gait recognition systems predominantly use vision, and are therefore mostly known as vision-based gait recognition.

非穿戴的步态识别目前均是以视觉为基础开展的。

The performance of vision-based gait recognition systems,hereafter only referred to only as gait recognition,

视觉步态识别的干扰因素：1.外形干扰（帽子、背包）；2.不同的相机视角；3.遮挡；4.背景环境的干扰；5.不同亮度的光照；

In recent years, there has been a clear trend in migrating from non-deep methods to deep learning-based solutions for gait recognition.

近些年来，步态识别的研究逐渐从非深度学习向深度学习转变。

followed by the first shallowneural network for gait recognition in 2008

2008首次用浅层网络实现了步态识别

The current state-of-the-art results on CASIA-B dataset have been reported by 3DCNNGait with a recognition accuracy of 90.4%.

目前最高水平是3DCNNGait，精度为90.4%.

body representation,temporal representation, feature representation, and neural archi-tecture,

四个维度对现状进行分类：身体表征、时序表征、特征表征和网络结构。

2 REVIEW METHODOLOGY Review的方法

基本没什么重要的内容，就是讲讲综述中论文筛选的方法与原则。

3 TEST PROTOCOLS AND DATASETS 测试的原则与数据集

3.1 Protocols 测试的协议

in the subject-dependent protocol, both the training and testing sets include samples from all the subjects. However, in the subject-independent protocol, the test subjects are disjoint from the training subjects.

步态识别的测试原则分为subject-dependent和subject-independent。subject-dependent的原则是训练集与测试集包含所有的样本。而subject-independent的训练集与测试集是相对独立的。

Finally, a classifier is used to compare the probe features with the gallery ones in order to identify the most similar gait patterns and label them as being from the same identity.

用训练好的模型，提取待检测目标的特征，并与gallery中的特征相比较，从而通过步态确定目标的身份。

Gait recognition results in the literature have all been measured and presented using rank-1 recognition accuracy.

目前所有的文章都采用Rank-1来衡量准确率。

3.2 Datasets 数据集

These datasets cover various parameters related to acquisition viewpoints, environment conditions, and appearance of the subjects.

各数据集的差异主要表现在拍摄的视角，环境条件和人物展现的外形特征。

4 PROPOSED TAXONOMY 提出的分类法

4.1 Body Representation 形体表征

This dimension relates to the way the body is represented for recognition, which can be based on silhouettes or skeletons.

形体表征可以分为轮廓特征和骨架特征。

4.2 Temporal Representation 时序表征

Two types of representations, templates and volumes, have been commonly used in the literature.

templates和volumes是时序特征的两个主要表征方法。

Templates aggregate temporal walking information over a sequence of silhouettes in a single map, for example by averaging the silhouettes over at least one gait cycle.

Templates方法将一系列步行时的轮廓信息汇集到一张图上，如至少一个步行周期轮廓的平均图。

With respect to deep gait recognition architectures, gait silhouettes can be aggregated in the initial layer of a network.

在深度学习中，步态轮廓图从初始层输入。

Gait silhouettes can alternatively be aggregated in an intermediate layer of the network after several convolution and pooling layers , also known as convolutional template.

步态轮廓信息也可以在多次的卷积与池化层之后，在网络的中间层进行聚合，也称为卷积模板。

temporal templates include: (i) gait energy images (GEI) ; (ii) chrono gait images (CGI); (iii) frame-difference energy images (FDEI); (iv) gait entropy images (GEnI);and (v) period energy images (PEI).

步态时序模板包含五种类型：（1）步态能量图；（2）顺序步态图；（3）帧间差分能量图；（4）步态熵图；（5）周期能量图。

To preserve and learn from the order and relationship of framesin gait sequences, instead of aggregating them, sequence volume representations have be adopted.

volumn的优点就是保留了步态的序列信息和数据帧之间的关系。

in order to learn the temporal information, two different approaches have been adopted.

为了学习时序信息，一般使用两大类深度学习工具。一种是循环神经网络，直接处理轮廓序列。一种是3D卷积网络，将序列组成一个3D张量后处理。

4.3 Feature Representation 特征表征

This dimension encapsulates the region of support for representation learning, which can be either global or partial.

特征表征分为全局特征和局部特征。

Methods based on global representations tend to be more sensitive to occlusions and appearance changes

全局特征对遮挡和外表的变化较为敏感。

partial regions often maintain different contributions towards the final recognition performance, thus learning their importance can improve the overall performance of gait recognition methods

局部特征的鲁棒性更强一些。

4.4 Neural Architectures 神经网络结构

4.4.1 Convolutional Neural Networks 卷积神经网络

Convolutional neural networks (CNNs) have been used the most for gait recognition.

CNN在步态识别里面最为常用。

4.4.2 Deep Belief Networks 深度置信网络

A deep belief network (DBN) is a probabilistic generative model, composed by staking restricted Boltzmann machines(RBMs) with the aim of extracting hierarchical representations from the training data.

DBN是一个概率生成模型，通过堆叠RBM来提取训练数据中的层次特征。

DBNs have been used for gaitrecognition in [90] and [25].

DBN是通过对现有的特征进行学习，获取更多的可辨识特征。

4.4.3 Recurrent Neural Networks 循环神经网络

A layer of RNN is typically composed of several cells, each corresponding to one input element of the sequence, e.g., one frame of a gait video.

RNN的每一层由多个cell组成，每层对应输入序列数据的一个单元，比如一帧步态数据。

There have been three different approaches for using RNNs in the context of deep gait recognition systems.

RNN在步态中的三种用法：1.直接处理骨架序列；2.与CNN混合使用；3.对步态数据分块后，各块单独使用RNN处理。

4.4.4 Deep AutoEncoders 深度自编码器

Deep auto-encoder (DAE) is a type of network that aims to extract so called bottleneck features or latent space representations, using an encoder-decoder structure.

DAE的本质就是压缩，压缩数据，压缩特征。

DAE networks are generally trained with the aim of minimizing the reconstruction error that measures the difference between the original input and the reconstructed version.

对DAE的训练就是让重构损失最小化。

4.4.5 Generative Adversarial Networks 生成式对抗网络

These networks can also be used to preserve identity information while transferring gait variations such as pose and clothing along low-dimensional manifolds in a process referred to as domain adaptation.

GAN在保留身份特征的同时，从低维流形变化的角度对数据进行变化，生成不同视角、服饰等信息的步态数据。

4.4.6 Capsule Networks 胶囊网络

Capsule Networks (CapsNet) have been proposed to address two important shortcomings in CNNs, namely the limits of scalar activations and poor information routing through pooling operations.

胶囊网络用于解决CNN的两个缺陷:数据激活受限和因池化操作导致的较少信息传递。

CapsNets are composed of capsules which are groups of neurons that explicitly encode the intrinsic viewpoint-invariant relationships available in different parts of the objects.

胶囊网络由许多胶囊组成，每个胶囊内部有一些神经元组成。这些神经元对目标不同部分内潜在的视场角不变的关系进行显著性编码。

This is in contrast to the standard pooling layers in CNNs that lose positional attributes, such as rotation, location,and scale.

传统的CNN，其池化层会丢失姿态属性，包括旋转、位置与尺寸。（胶囊网络解决了这些问题）

It then uses a CapsNet with dynamic routing to retain the relationship within each template with the aim of finding more robust features.

用胶囊网络的动态“路由”特性，保留了各时序中的关系，从而找寻更加鲁棒性的特征。

4.4.7 3D Convolutional Neural Networks 3D卷积网络

3D CNNs take the stacked gait frames in the form of a 3D tensor as input, and then use multiple 3D convolution filters and pooling operations to extract the spatio-angular representations.

3D CNN将输入的步态序列堆叠成一个3D张量作为输入数据，再用多个3D卷积过滤器和池化操作提取空间角度特征。

The limitation of 3DCNNs for gait recognition is the lack of flexibility in processing variable length sequences.

3D卷积网络的局限性是，对可变长度序列的支持不好。

4.4.8 Graph Convolutional Networks 图卷积网络

GCNs can jointly model both the structural information and temporal relationships available in a gait sequence in order to learn discriminative and robust features with respect to camera viewpoint and subject appearance.

GCNs可以结合结构信息与时序信息，从步态序列中提取相机视角和目标外观无关的具有区别性与鲁棒性的特征。

4.4.9 Hybrid Networks 混合网络

A large number of hybrid deep networks that make use of two or more types of networks have been proposed to boost the performance of gait recognition systems.

混合深度网络是指连接两种或多种类型的网络，提升步态识别的性能。

CNN+RNN. Integration of CNNs with RNNs (notably LSTM and GRU) for learning the temporal relationships following spatial encoding is perhaps the most popular approach for spatio-temporal learning, which has also been used for gait recognition in the literature.

联合CNN和RNN应该是最流行的联合方法，旨在提取空间特征的同时，同时学习时序关系。

DAE+GAN. Recently, DAEs have been considered as the backbone of the generator and/or discriminator components in GANs for gait recognition.

目前，常用DAE当做GAN网络中的backbone。

DAE+RNNs. The combination of DAEs and RNNs has recently been proposed for generating sequence-based disentangled features using an LSTM RNN.

联合DAE和RNN一起生成具有可辨识特征的序列。

RNNs+CapsNets. Recurrently learned features obtained by RNNs can be treated as capsules, thus learning coupling weights between these capsules through dynamic routing.

目前，采用胶囊网络的结构来为RNN提取特征，为了利用胶囊间动态路由产生的权重。

5 STATE-OF-THE-ART 目前的水平

5.1 Analysis and Trends 分析与趋势

Body Representation. Silhouettes are the most widely adopted body representation for deep gait recognition,

轮廓特征是步态识别最常用的信息。

we anticipate methods based on hybrid silhouettes-skeleton body representations to gain popularity in the near future.

预计轮廓+骨架的表征法将逐渐流行起来。

Temporal Representation. Gait templates have been the most considered representation for capturing temporal information in gait sequences,

步态模板是最常用的步态时序表示法。

we anticipate that these templates gain further popularity and surpass temporal templates in the future.

预计卷积模板将在今后更为普遍，超过时序模板。

Feature Representation. Our analysis shows that over 87% of the available methods are based on global feature representations, where the deep features are learned by considering the gait information as a whole.

全局特征的使用更加普遍，包括深度特征。

The performance of such techniques points to promising potential inpartial representation learning for discriminating key gait features.

新的文献表明，局部特征在分辨步态特征的方面具有较大的潜力。

Neural Architectures. 2D CNNsare the most widely used DNN type

2D CNN是最流行的深度神经网络模型。

3D CNNs and GANs are the next popular categories,

3D CNNs和GANs次流行。

DAEs, RNNs, CapsNets, DBNs, and GCNs are less considered among DNNs,

剩下几个类型的网络使用较少。

CNN-RNN combinations are the most widely adapted approach

混合模型中，CNN+RNN目前是最流行的。

We expect that hybrid methods that make use of two or more types of DNN attract more attention in the near future and demonstrate robust performance in the field.

预计混合模型将在今后更加的流行。

Loss Functions. Among the single loss functions, cross-entropy has been the most widely adopted with 20% of solutions having used it.

单一损失函数中，交叉熵用的最多。

Triplet loss is the next popular type

多损失函数其次。

Datasets. We observe that CASIA-B is the most widelyused dataset. it provides a large number of samples with variations in carrying and wearing conditions.

CASIA-B是最常用的步态识别数据集。数据集的特点是包含很多不同的携带和穿着情况。

we therefore found OU-ISIR to be the second most popular dataset having been used by 40% of the solutions.

OU-ISIR是第二大流行的数据集。

we anticipate that this dataset will become the standard benchmark dataset for gait recognition in the near future,

预计CASIA-E将成为最流行的数据集。

5.2 Performance Comparison 性能比较

The results show that the method proposed in [35] currently provides the best recognition results on CASIA-B (average performance result of 90.4%) and OU-ISIR (performance result of 99.9%). Concerning the OU-MVLP dataset, results show the superiority of the method proposed in [33] (performance result of 89.18%) over other methods.

主流方法的识别率都已经很高了。

This analysis reveals the effectiveness of hybrid approaches, in term of either neural architectures as well as loss functions, for achieving strong performance in the area.

作者对混合模型的前景较为乐观。

6 CHALLENGES AND FUTURE RESEARCH DIRECTIONS 挑战与未来的研究方向

6.1 Disentanglement 解耦

Complex gait data arise from the interaction between many factors such as occlusion, camera view-points, appearance of individuals, sequence order, body part motion, or lighting sources present in the data.

一大挑战是步态数据中的各种干扰，包含遮挡、不同视角、个体差异、序列问题（变速吗）、身体部分区域在动、光照等。

6.2 Self-supervised Learning 自监督学习

In order to utilize unlabeled gait data to learn more efficient and generalizable gait representations, self-supervised learning can be exploited.

步态识别中的自监督学习。

One important challenge in using self-supervised learning in the context of gait recognition is to design effective pretext tasks to ensure the network can learn meaningful representations.

自监督学习的一个重要挑战是，如何利用步态数据上下文，学习到有意义的表征。

6.3 Multi-task Learning 多任务学习

Multi-task learning is generally performed to simultaneously learn multiple tasks using a shared model, thus learning more generalized and often reinforced representations.

多任务学习是指，利用共享模型同时开展多个学习任务，从而学到更加泛化、更加高效的表征。

6.4 Data Synthesis and Domain Adaptation 数据合成与领域适配

In the context of deep gait recognition, data synthesis, for instance using GANs, can be considered for creating large datasets or data augmentation.

用GANs等工具，对步态识别的数据集进行扩展，进行数据增强等。

6.5 Cross-Dataset Evaluation 跨数据集评估

In order to examine the generalizability of gait recognition systems in real-world applications, cross-dataset evaluations should be adopted, for example using transfer learning techniques.

为了验证方法的泛化能力，可以考虑跨数据集的验证，比如采用迁移学习的技术。

6.6 Multi-View Recognition 多视角识别

These methods generally learn intra-view relationships and ignore inter-view information between multiple viewpoints.

目前的方法仅学习了视角内的关系，忽略了多视角之间的信息。

Another challenge in multi-view gait recognition is that most existing multi-view descriptors consider a well defined camera network topology with fixed camera positions.

另一个挑战是，目前多视角方法中均是只能处理多相机位置固定的情况。

6.7 Multi-biometric Recognition 多生物特征识别

various biometric modalities and gait can complement one another to compensate each others’weaknesses in the context of a multi-biometric system.

步态识别与其他生物识别技术的融合，也是研究方向之一。

Deep Gait Recognition: A Survey 阅读笔记相关推荐

Deep Gait Recognition: A Survey
摘要基于深度学习的步态识别方法已经成为该领域的主流技术,并促进了现实世界的应用.在本文中,我们全面概述了深度学习步态识别的突破和最新发展,并涵盖了广泛的主题,包括数据集,测试协议,最新的解决方案,挑 ...
《Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network》阅读笔记(二)
<Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network>阅读笔记–翻译 ...
DCP（Deep Closest Point）论文阅读笔记以及详析
DCP论文阅读笔记前言本文中图片仓库位于github,所以如果阅读的时候发现图片加载困难.建议挂个梯子. 作者博客:https://codefmeister.github.io/ 转载前请联系作者 ...
【深度学习】步态识别-论文阅读：（T-PAMI-2021）综述:Deep Gait Recognition
论文详情: 期刊:T-PAMI-2021 地址:参考笔记 1.Abstract 本文综述了到2021年1月底在步态识别方面的最新进展,以全面概述了深度学习步态识别的突破和最近的发展,涵盖了广泛的主题 ...
Deep Gait Recognition综述提炼
文章目录 Abstract INTRODUCTION REVIEW METHODOLOGY TEST PROTOCOLS AND DATASETS Protocols Datasets PROPOSE ...
预训练综述 Pre-trained Models for Natural Language Processing: A Survey 阅读笔记
原文链接:https://arxiv.org/pdf/2003.08271.pdf 此文为邱锡鹏大佬发布在arXiv上的预训练综述,主要写了预训练模型(PTM)的历史,任务分类,PTM的扩展,将PTM ...
Deep SORT: Simple Online and Realtime Tracking with a Deep Association Metric（论文阅读笔记）（2017CVPR）
论文链接:<Deep SORT: Simple Online and Realtime Tracking with a Deep Association Metric> ABSTRACT ...
Deep Graph Infomax(DGI) 论文阅读笔记
代码及论文github传送门本文中出现的错误欢迎大家指出,在这里提前感谢w 这篇文章先锤了一下基于random walk的图结构上的非监督学习算法,指出了random walk算法的两个致命缺点. ...
百度Deep Voice 1 2 3阅读笔记
7. Deep Voice: Real-time Neural Text-to-Speech 文章于2017年3月发表 Deep Voice是使用DNN开发的语音合成系统,主要思想是将传统参数语音合成 ...
Semantic Visual Simultaneous Localization and Mapping: A Survey阅读笔记
Abstract: 通过语义和vslam结合可以很好解决动态和复杂环境中良好定位. 首先回顾了语义vslam发展,关注优势和差异. 其次探讨了:语义信息提取和关联.语义的应用和语义的优势然后收集分析 ...

Deep Gait Recognition: A Survey 阅读笔记

摘要：

1.Introduction 简介

2 REVIEW METHODOLOGY Review的方法

3 TEST PROTOCOLS AND DATASETS 测试的原则与数据集

3.1 Protocols 测试的协议

3.2 Datasets 数据集

4 PROPOSED TAXONOMY 提出的分类法

4.1 Body Representation 形体表征

4.2 Temporal Representation 时序表征

4.3 Feature Representation 特征表征

4.4 Neural Architectures 神经网络结构

4.4.1 Convolutional Neural Networks 卷积神经网络

4.4.2 Deep Belief Networks 深度置信网络

4.4.3 Recurrent Neural Networks 循环神经网络

4.4.4 Deep AutoEncoders 深度自编码器

4.4.5 Generative Adversarial Networks 生成式对抗网络

4.4.6 Capsule Networks 胶囊网络

4.4.7 3D Convolutional Neural Networks 3D卷积网络

4.4.8 Graph Convolutional Networks 图卷积网络

4.4.9 Hybrid Networks 混合网络

5 STATE-OF-THE-ART 目前的水平

5.1 Analysis and Trends 分析与趋势

5.2 Performance Comparison 性能比较

6 CHALLENGES AND FUTURE RESEARCH DIRECTIONS 挑战与未来的研究方向

6.1 Disentanglement 解耦

6.2 Self-supervised Learning 自监督学习

6.3 Multi-task Learning 多任务学习

6.4 Data Synthesis and Domain Adaptation 数据合成与领域适配

6.5 Cross-Dataset Evaluation 跨数据集评估

6.6 Multi-View Recognition 多视角识别

6.7 Multi-biometric Recognition 多生物特征识别

Deep Gait Recognition: A Survey 阅读笔记相关推荐

最新文章

热门文章