Silhouette based View embeddings for Gait Recognition under Multiple Views

github: 有
分类: 步态

Link

GitHub - ctrasd/gait-view: The codes for the paper “Silhouette-based View-embeddings for Gait Recognition Under Multiple Views”

核心问题

跨视角

解决方案

3.1. View projection matrix selection

Backbone可以使用GaitSet、GaitPart、GaitGL、MT3D等方法

序列（ $Xin∈RT×H×WX_{in}\in \mathbb{R}^{T\times H\times W}$ ）经过Backbone网络（E）得到特征（ $Xf∈RCf×Hf×WfX_f\in \mathbb{R}^{C_f \times H_f\times W_f}$ ）
第一分支：HPM 的结果是 $fHPM∈Rn×Df_{HPM}\in \mathbb{R}^{n\times D}$
第二分支：polling操作 $fv∈RDvf_v\in \mathbb{R}^{D_v}$
1. projection matrices ${W1,W2,…,Wn}(Wi∈RD×D)\lbrace W_1,W_2,\dots,W_n \rbrace(W_i \in \mathbb{R}^{D\times D})$ are selected according to the predicted view, where n is the number of strips cut in the HPP Module [4].
b. $f_v$ classification feature

$Xf=E(Xin)andfv=F(PGlobal_Avg(Xf))X_f=E(X_{in}) \quad \text{and} \quad f_v=F(P_{Global\_Avg}(X_f))$

特别对于GaitSet，还有一个 $X_g$ 可供使用，因此

$f_v=F(P_{Global\_Avg}([X_f;X_g]))$

$F ()$ 表示全连接层， $P_{Global\_Avg}$ 表示GAP操作

predicted view probability $p^∈RM\hat{p} \in \mathbb{R}^M$ and of the input gait silhouettes and the view of maximum probability $y^\hat{y}$ are calculated as:

$p^=Wviewfv+Bviewandy^=arg⁡max⁡ipi^\hat{p} = W_{view}f_v + B_{view} \quad \text{and} \quad \hat{y}=\mathop{\arg\max}\limits_{i} \hat{p_i}$

where M is the number of discrete views, $Wview∈RM×DvW_{view} \in \mathbb{R}^{M\times D_v}$ are weight matrices, $B_{view }$ are the bias terms and $y^∈{0,1,2,…,M}\hat{y}\in \lbrace0,1,2,\dots ,M\rbrace$

所以 $p^\hat{p}$ 相当于是由 $f_v$ 经过一个全连接得出的, $p^\hat{p}$ 是一个 $M$ 的向量, $M$ 是view的个数, 所以 $p^\hat{p}$ 表示的是当前的 $f_v$ 特征属于各个视角的概率, 而 $y^\hat{y}$ 则是最大的概率所对应的那个视角
For predicted view $y^\hat{y}$ , a corresponding view projection matrix group $Zy^∣{Wi∣i=1,2,…,n}Z_{\hat{y}}|\lbrace W_i|i=1,2,\dots,n\rbrace$ will be trained where $Wi∈RD×DW_i\in \mathbb{R}^{D×D}$ is the projection matrix. And all the view projection matrix can be expressed as $\lbrace Z_i|i=1,2,\dots,M\rbrace$

对于一个 $y^\hat{y}$ 有对应的一个 $Zy^Z_{\hat{y}}$ , 每个 $Zy^Z_{\hat{y}}$ 内有n个 $Wi∈RD×DW_i\in\mathbb{R}^{D\times D}$ 的权重矩阵.

所有的权重矩阵构成 $S$ 集合, 即 $S∈RM×n×D×DS\in \mathbb{R}^{M\times n \times D\times D}$ (M 个视角，)

Gengeration的是个啥东西他是如何将这个 $p^\hat{p}$ 和 $y^\hat{y}$ 与对应y视角的下的矩阵联系起来的

3.2. HPP feature projection

此分支的输入为 $fHPM∈Rn×Df_{HPM} \in \mathbb{R}^{n\times D}$ , 第 $i$ 个水平条表示为 $fHPM,ii=1,2,…,nf_{HPM,i}\quad i=1,2,\dots,n$
假定输入轮廓序列的 $y^\hat{y}$ 被认定为 $θ\theta$ , 预测特征可以表示为

$ffinal,i=WifHPM,iffinal=[ffinal,1,ffinal,2,…,ffinal,n]f_{final,i} = W_if_{HPM,i} \\ f_{final}=[f_{final,1},f_{final,2},\dots,f_{final,n}]$

where $i=1,2,…,ni=1,2,\dots ,n$ , $Wi∈ZθW_i\in Z_{\theta}$ 最终使用 $f_{final}$ 用作最终的特征衡量

3.3. Joint losses

损失函数

$Lce=−∑j=1N∑i=1Myjlog(pji)w.r.t.pji=ep^ji∑i=1Mep^ji\mathcal{L}_{ce}=-\sum^N_{j=1}\sum^M_{i=1}y_jlog(p_{ji}) \quad w.r.t.\quad p_{ji}=\frac{e^{\hat{p}_{ji}}}{\sum^M_{i=1}e^{\hat{p}_{ji}}}$

$N$ 所有的步态序列, $y_j$ 是第j个序列的独立真值, $(Q, P, N)$ 表示三元组，其中Q,P来自同一对象，Q,N对应不同对象

Denote $K$ triplets of fixed identity as ${Ti∣Ti(ffinalQi,ffinalPi,ffinalNi,i=1,2,…,K)\lbrace T_i|T_i(f^{Q_i}_{final},f^{P_i}_{final},f^{N_i}_{final},i=1,2,\dots,K)$ , then combining the Equation (4), the triplet loss can be expressed as:

$Ltrip=1K∑i=1K∑j=1nmax⁡(m−dij−+dij+,0)\mathcal{L}_{trip}=\frac{1}{K}\sum^K_{i=1}\sum^n_{j=1}\max (m-d_{ij}^-+d_{ij}^+,0)$

where $dij−=∣∣ffinal,jQi−ffinal,jNi∣∣22,dij+=∣∣ffinal,jQi−ffinal,jPi∣∣22d_{ij}^-=||f^{Q_i}_{final,j}-f^{N_i}_{final,j}||^2_2, \ d_{ij}^+=||f^{Q_i}_{final,j}-f^{P_i}_{final,j}||^2_2$

$L=λCELCE+λtripLtrip\mathcal{L}=\lambda_{CE}\mathcal{L}_{CE}+\lambda_{trip}\mathcal{L}_{trip}$

其中 $λCE\lambda_{CE}$ 和 $λtrip\lambda_{trip}$ 是超参数

实验结果

我可以使用的想法

图2。条形 0 和条带 20 的视图投影矩阵示例。Diff 列显示了同一条带中不同视图的两个矩阵之间的绝对差异。

In order to explain the effectiveness of our framework, we compare the projection matrices of different views in ViGaitGL (trained on OU-MVLP). As illustrated in Figure 2, their difference has obvious vertical texture, which indicates that the projection matrices of different views has view specificity for feature mapping.

为了解释我们框架的有效性，我们比较了 ViGaitGL 中不同观点的投影矩阵（在 OU-MVLP 上接受过培训）。如图 2 所示，它们的差异具有明显的垂直纹理，这表明不同视图的投影矩阵具有特征映射的视图特异性。

【文献阅读】Silhouette based View embeddings for Gait Recognit相关推荐

文献阅读笔记 # Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
<Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks> 用于快速搭建NLP任务的demo的开源项目sbert的原始 ...
【步态识别】LagrangeGait基于拉格朗日《Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition》
目录 1. 论文&代码源 2. 论文亮点 3. 模型结构 3.1 建模思路 3.2 建立拉格朗日方程 3.3 网络结构 3.3.1 运动分支(Motion Branch) 3.3.2 视图嵌入 ...
文献阅读：SimCSE：Simple Contrastive Learning of Sentence Embeddings
文献阅读:SimCSE:Simple Contrastive Learning of Sentence Embeddings 1. 文献内容简介 2. 主要方法介绍 3. 主要实验介绍 1. STS ...
谣言检测文献阅读二—Earlier detection of rumors in online social networks using certainty‑factor‑based convolu
系列文章目录谣言检测文献阅读一-A Review on Rumour Prediction and Veracity Assessment in Online Social Network 谣言检测 ...
经典文献阅读之--Swin Transformer
0. 简介 Transfomer最近几年已经霸榜了各个领域,之前我们在<经典文献阅读之–Deformable DETR>这篇博客中对DETR这个系列进行了梳理,但是想着既然写了图像处理领域 ...
最大熵模型（Maximum Entropy Model）文献阅读指南
最大熵模型(Maximum Entropy Model)是一种机器学习方法,在自然语言处理的许多领域(如词性标注.中文分词.句子边界识别.浅层句法分析及文本分类等)都有比较好的应用效果.张乐博士的最大 ...
条件随机场（Conditional random fields，CRFs）文献阅读指南
与最大熵模型相似,条件随机场(Conditional random fields,CRFs)是一种机器学习模型,在自然语言处理的许多领域(如词性标注.中文分词.命名实体识别等)都有比较好的应用效果.条 ...
文献阅读-Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor
Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Cont ...
文献阅读总结：网络表示学习/图学习
本文是对网络表示学习/图学习(Network Representation Learning / Graph Learning)领域已读文献的归纳总结,长期更新. 朋友们,我们在github创建了一个 ...

【文献阅读】Silhouette based View embeddings for Gait Recognit