paper reading:Part-based Graph Convolutional Network for Action Recognition

文章目录

  • paper reading:Part-based Graph Convolutional Network for Action Recognition
    • graph 与 skeleton:
    • 传统的 action recognition from S-videos:
    • 本文模型使用的两种信息:
    • 本文主要贡献:
    • 单图(无划分)的卷积公式:
      • k-th neighborhood
      • 1-th neighborhood
    • Part-based Graph
      • 图的划分的定义:
      • two parts (b):
      • four parts (c ) (推荐):
      • six part (d) :
      • 子图的连接:
    • Part-based Graph Convolutions
      • 邻域:
      • 卷积:
        • 子图卷积:
        • 子图卷积结果聚合:
    • Spatio-temporal Part-based Graph Convolutions
      • 卷积的步骤
      • 邻域的划分
      • 标签的给定
      • 卷积的全部公式!!!
        • 子图的空间卷积
        • 子图空间卷积的聚合
        • 时域卷积
        • 时域卷积

graph 与 skeleton:

Human skeleton is intuitively represented as a sparse graph with joints as nodes and natural connections between them as edges.

  • nodes:joints
  • edges:natural connections between joints

传统的 action recognition from S-videos:

  • the whole skeleton is treated as a single graph
  • 使用 3D coordinate

本文模型使用的两种信息:

  • Geometric features:such as relative joint coordinates
  • motion features:such as temporal displacements


本文主要贡献:

  • Formulation of a general part-based graph convolutional network (PB-GCN) .

  • Use of geometric and motion features in place of 3D joint locations at each vertex.

    即,几何信息(relative joint coordinates)和运动信息(temporal displacements)的使用

  • Exceeding the state-of-the-art on challenging benchmark datasets NTURGB+D and HDM05.


单图(无划分)的卷积公式:

k-th neighborhood

Y(vi)=∑vj∈Nk(vi)W(L(vj))X(vj)Y(v_i) = \sum_{v_j\in \ N_k (v_i)} W(L(v_j))X(v_j) Y(vi​)=vj​∈ Nk​(vi​)∑​W(L(vj​))X(vj​)

  • W(⋅)W(·)W(⋅): a filter weight vector of size of LLL indexed by the label assigned to neighbor vjv_jvj​ in the kkk-neighborhood Nk(vi)N_k(v_i)Nk​(vi​)
  • X(vj)X(v_j)X(vj​):the input feature at vjv_jvj​
  • Y(vj)Y(v_j)Y(vj​) :convolved output feature at root vertex viv_ivi​

1-th neighborhood

将邻域Nk(vi)N_k(v_i)Nk​(vi​)换一种表示形式(用邻接矩阵AAA表示),且将邻域数从kkk降为1,则得到下面的式子
Y(vi)=∑jAnorm(i,j)W(L(vj))X(vj)Y(v_i) = \sum_j A^{norm}(i, j) W(L(v_j)) X(v_j) Y(vi​)=j∑​Anorm(i,j)W(L(vj​))X(vj​)

  • D(i,i)=∑j(i,j)D(i,i) = \sum_j(i,j)D(i,i)=∑j​(i,j);Anorm=D−1/2AD−1/2A^{norm}=D^{-1/2}AD^{-1/2}Anorm=D−1/2AD−1/2

Part-based Graph

In general, a part-based graph can be constructed as a combination of subgraphs where each subgraph has certain properties that define it.

图的划分的定义:

We consider scenarios in which the partitions can share vertices or have edges connecting them.

即,一个图被划分为不同的子图,不同的子图会共享顶点共享边
G=⋃p∈{1,...,n}Pp∣Pp=(Vp,εp)G = \bigcup_{p \in \{1,...,n\}} P_p |P_p=(V_p, \varepsilon _p) G=p∈{1,...,n}⋃​Pp​∣Pp​=(Vp​,εp​)

  • PpP_pPp​ is the partition (or subgraph) ppp of the graph GGG

two parts (b):

  • Axial skeleton
  • Appendicular skeleton

four parts (c ) (推荐):

  • head
  • hands
  • torso
  • legs

We consider left and right parts of hands and legs together in order to be agnostic to laterality [31] (handedness / footedness) of the human when performing an action.

即,排除侧向性的干扰(左手招手和右手招手都是招手)。

six part (d) :

we divide the upper and lower components of appendicular skeleton into left and right (shown in Figure 1(d)), resulting in six parts

子图的连接:

图的连接有两种方式:点连接 & 边连接。此处采用的是点连接。

To cover all natural connections between joints in skeleton graph, we include an overlap of at least one joint between two adjacent parts.

即,每个子图之间有至少有一个公用的node。


Part-based Graph Convolutions

不同于上述提到的单图的卷积公式(Eq.2) ,划分为子图后,graph有新的卷积公式。

同时,有几个概念需要重新定义。

邻域:

  • 空间邻域(Spatial neighbor):单个 frame 下(特定时间)一阶邻域(Figure 3(a))。
  • 时间邻域(Temporal neighbor):单个 node 的 不同的时间的位置(Figure 3(a))。
  • 时空邻域(Spatial-temporal neighbor):时空邻域的并集(Figure 3(b))。

卷积:

graph convolutions over a part identifies the properties of that subgraph and an aggregation across subgraphs learns the relations between them.

For a part-based graph, convolutions for each part are performed separately and the results are combined using an aggregation function FaggF_{agg}Fagg​

即,先通过子图内卷积(一阶邻域),再通过聚合函数FaggF_{agg}Fagg​计算各子图的联系。

公式表达如下:

子图卷积:

Yp(vi)=∑vj∈Nkp(vi)Wp(Lp(vj))Xp(vj),p∈1,...,nY_p(v_i) = \sum_{v_j\in N_{kp}(v_i)} W_p(L_p(v_j)) X_p(v_j), p \in {1,...,n} Yp​(vi​)=vj​∈Nkp​(vi​)∑​Wp​(Lp​(vj​))Xp​(vj​),p∈1,...,n

  • WpW_pWp​ can be shared across parts or kept separate, while the neighbors of viv_ivi​ only in that part (Nkp(vi)N_{kp}(v_i)Nkp​(vi​)) are considered
子图卷积结果聚合:

边共享形式:
Y(vi)=Fagg(Yp1(vi),Yp2(vj))∣(vi,vj)∈ε(p1,p2),(p1,p2)∈{1,...,n}×{1,...,n}Y(v_i) = F_{agg}(Y_{p1}(v_i),Y_{p2}(v_j)) | (v_i, v_j) \in \varepsilon(p1,p2), (p1, p2) \in \{1,...,n\} × \{1,...,n\} Y(vi​)=Fagg​(Yp1​(vi​),Yp2​(vj​))∣(vi​,vj​)∈ε(p1,p2),(p1,p2)∈{1,...,n}×{1,...,n}
顶点共享形式:
Y(vi)=Fagg(Yp1(vi),Yp2(vi))∣(p1,p2)∈{1,...,n}×{1,...,n}Y(v_i) = F_{agg}(Y_{p1}(v_i),Y_{p2}(v_i)) | (p1, p2) \in \{1,...,n\} × \{1,...,n\} Y(vi​)=Fagg​(Yp1​(vi​),Yp2​(vi​))∣(p1,p2)∈{1,...,n}×{1,...,n}


Spatio-temporal Part-based Graph Convolutions

卷积的步骤

The S-videos are represented as spatio-temporal graphs.

即,S-video 的本质是 spatio-temporal graphs.

we spatially convolve each partition independently for each frame, aggregate them at each frame and perform temporal convolution on the temporal dimension of the aggregated graph.

即大致分为两步,细致可分为3步:

  • Spatial convolution(空间卷积):

    • 子图卷积:spatially convolve each partition independently for each frame
    • 子图卷积结果聚合:aggregate result of partition convolution at each frame
  • Temporal convolution(时间卷积):
    • 对聚合结果进行时间卷积:temporal convolution on the temporal dimension of the aggregated graph


邻域的划分

For each vertex, we use 1-neighborhood (kkk = 1) for spatial dimension (N1N_1N1​) as the skeleton graph is not very large and a τττ-neighborhood (kkk = τττ) for the temporal dimension (NτN_τNτ​ ), NτN_τNτ​ is not part-specific.

空间邻域时间邻域的划分,由下式表示:
N1p(vi)={vj∣d(vi,vj)≤1,vi,vj∈Vp}N_{1p}(v_i) = \{ v_j | d(v_i, v_j) ≤ 1, v_i, v_j \in V_p\} N1p​(vi​)={vj​∣d(vi​,vj​)≤1,vi​,vj​∈Vp​}

Nτ(vita)={vitb∣d(vita,vitb)≤∣τ2∣}N_τ (v_{it_a}) = \{v_{it_b} | d(v_{it_a}, v_{it_b}) ≤|\frac{τ}{2}|\} Nτ​(vita​​)={vitb​​∣d(vita​​,vitb​​)≤∣2τ​∣}


标签的给定

For ordering vertices in the receptive fields (or neighborhoods), we use a single label spatially (LS:V→{0})L_S : V → \{0\})LS​:V→{0}) to weigh vertices in N1pN_{1p}N1p​ of each vertex equally and τττ labels temporally (LT:V→{0,...,τ−1}L_T : V → \{0,..., τ −1\}LT​:V→{0,...,τ−1}) to weigh vertices across frames in NτN_τNτ​ differently.

即,对于 root 节点,空间邻域内 label 相同(为0),时间邻域内 label 不同。

公式表达如下:
LS(vjt)={0∣vjt∈N1p(vit)}L_S(v_{jt}) = \{0 | v_{jt} \in N_{1p}(v_{it})\} LS​(vjt​)={0∣vjt​∈N1p​(vit​)}

LT(vitb)={((tb−ta)+∣τ2∣)∣vitb∈Nτ(vita)}L_T (v_{it_b}) = \{((t_b −t_a) +|\frac{τ}{2}|) | v_{it_b} ∈ N_τ (v_{it_a} )\} LT​(vitb​​)={((tb​−ta​)+∣2τ​∣)∣vitb​​∈Nτ​(vita​​)}


卷积的全部公式!!!

子图的空间卷积

Zp(vjt)=Wp(LS(vjt))Xp(vjt)Z_p(v_{jt}) = W_p(L_S(v_{jt})) X_p(v_{jt}) Zp​(vjt​)=Wp​(LS​(vjt​))Xp​(vjt​)

  • Wp∈RC′×C×1×1W_p \in \R^{C \ ' × C × 1 × 1}Wp​∈RC ′×C×1×1:part-specific channel transform kernel (pointwise operation)
  • LSL_SLS​ for each part is same but N1pN_{1p}N1p​ is part-specific
  • ZpZ_pZp​:output from applying WpW_pWp​ on input features XpX_pXp​ at each vertex

Yp(vit)=∑vjt∈N1p(vit)Ap(i,j)Zp(vjt)∣p∈{1,...,4}Y_p(v_{it}) = \sum_{v_{jt} \in N_{1p}(v_{it})} A_p(i, j)Z_p(v_{jt}) | p \in \{1,...,4\} Yp​(vit​)=vjt​∈N1p​(vit​)∑​Ap​(i,j)Zp​(vjt​)∣p∈{1,...,4}

  • ApA_pAp​:normalized adjacency matrix for part ppp
  • WT∈RC′×C′×τ×1W_T \in \R^{C \ ' ×C \ '×τ×1}WT​∈RC ′×C ′×τ×1:temporal convolution kernel
子图空间卷积的聚合

YS(vit)=Fagg({Y1(vit),...,Yn(vit)})Y_S(v_{it}) = F_{agg}(\{Y_1(v_{it}),...,Y_n(v_{it})\}) YS​(vit​)=Fagg​({Y1​(vit​),...,Yn​(vit​)})

  • YsY_sYs​:output obtained after aggregating all partition graphs at one frame
时域卷积

YT(vita)=∑vjtb∈Nτ(vita)WT(LT(vitb))YS(vitb)Y_T (v_{it_a}) = \sum_{v_{jt_b} \in N_τ (v_{it_a})} W_T (L_T(v_{it_b})) Y_S(v_{it_b}) YT​(vita​​)=vjtb​​∈Nτ​(vita​​)∑​WT​(LT​(vitb​​))YS​(vitb​​)

g}({Y_1(v_{it}),…,Y_n(v_{it})})
$$

  • YsY_sYs​:output obtained after aggregating all partition graphs at one frame
时域卷积

YT(vita)=∑vjtb∈Nτ(vita)WT(LT(vitb))YS(vitb)Y_T (v_{it_a}) = \sum_{v_{jt_b} \in N_τ (v_{it_a})} W_T (L_T(v_{it_b})) Y_S(v_{it_b}) YT​(vita​​)=vjtb​​∈Nτ​(vita​​)∑​WT​(LT​(vitb​​))YS​(vitb​​)

  • YTY_TYT​:output after applying temporal convolution on YSY_SYS​ output of τ frames

paper reading:Part-based Graph Convolutional Network for Action Recognition相关推荐

  1. RA-GCN:Richly Activated Graph Convolutional Network for Robust Skeleton-based Action Recognition

    Richly Activated Graph Convolutional Network for Robust Skeleton-based Action Recognition TCSVT2020 ...

  2. 论文浏览:Edge Based Graph Neural Network to Recognize Semigraph Representation of English Alphabets

    一.简介 介绍了基于边的图神经网络,用于识别英文大写半字母. 图神经网络(GNN)是一种连接主义模型,由过渡网络(两个前馈神经网络(FNN))和输出网络根据图拓扑结构的递归体系结构连接而成.将图中边的 ...

  3. Knowledge Embedding Based Graph Convolutional Network

    研究问题 提出了一种可以充分结合异构的节点信息和边信息,同时学习这两者的嵌入的图卷积网络KE-GCN,并将之前的几种知识图谱CNN纳入一个统一的框架下 背景动机 传统的图卷积模型一般不关注学习边的嵌入 ...

  4. Adaptive Propagation Graph Convolutional Network

    翻译一篇TNN 的论文仅用于学习 原文章链接 有道翻译的也是用了第一人称.如果有错,一定是你对. 题目:Adaptive Propagation Graph Convolutional Network ...

  5. 图卷积网络 GCN Graph Convolutional Network(谱域GCN)的理解和详细推导

    文章目录 1. 为什么会出现图卷积神经网络? 2. 图卷积网络的两种理解方式 2.1 vertex domain(spatial domain):顶点域(空间域) 2.2 spectral domai ...

  6. Two-Stream Convolutional Networks for Action Recognition in Videos双流网络论文精读

    Two-Stream Convolutional Networks for Action Recognition in Videos双流网络论文精读 论文:Two-Stream Convolution ...

  7. paper reading:[renormalization]Semi-supervised Classification with Graph Convolutional Networks

    paper reading:[Renormalization Trick] Semi-supervised classification with graph convolutional networ ...

  8. 论文笔记(SocialGCN: An Efficient Graph Convolutional Network based Model for Social Recommendation)

    一个有效的基于图卷积神经网络的社交推荐模型 原文链接:SocialGCN: An Efficient Graph Convolutional Network based Model for Socia ...

  9. CaEGCN: Cross-Attention Fusion based Enhanced Graph Convolutional Network for Clustering 2021

    问题:现有的深度聚类方法往往忽略了数据之间的关系. 本文提出了一种基于交叉注意的深度聚类框架--基于交叉注意融合的增强型图形卷积网络(CaEGCN) ,该网络包含四个主要模块: 交叉注意融合模块,创新 ...

最新文章

  1. 如何从零设计一款高并发架构(建议收藏)
  2. 关于SQL查询效率,100w数据,查询只要1秒
  3. 关于召开全国大学生智能车竞赛--航天智慧物流项目
  4. 用VirtualWifi软件实现无线网卡同时连接多个AP。
  5. 使用jOOQ和JavaFX将SQL数据转换为图表
  6. 混合云异军突起 英特尔的全“芯”体验为企业保驾护航
  7. linux系统如何挂载新硬盘,Linux系统挂载新硬盘操作流程
  8. 易语言怎么判断文件是否一样_怎么判断专利代理人或专利代理机构是否靠谱?...
  9. Python笔记-BeautifulSoup通过查找Id获取元素信息
  10. 【英语学习】【WOTD】shanghai 释义/词源/示例
  11. sql azure 语法_使用Visual Studio和SQL Azure数据库
  12. iView 实现可编辑表格 1
  13. OpenCV-图像阴影调整
  14. using和名空间namespace
  15. java基础总结06-常用api类-System类常用方法
  16. 产品经理笔试面试(题目+答案)
  17. SMAA算法详解 - SMAANeighborhoodBlendingPS
  18. 高德地图定位获取当前地址城市街道等详细信息(全部代码)
  19. Python PIL Image的使用
  20. 04741计算机网络原理2018年版-第八章 网络安全基础 知识要点

热门文章

  1. python 下载文件-Python实现HTTP协议下的文件下载方法总结
  2. 语音识别系统报告_2018-2024年中国语音识别系统行业市场发展格局及投资价值评估研究报告_中国产业信息网...
  3. c语言实现md5比java难_浅谈md5弱类型比较和强碰撞
  4. asp.net_php_jsp,对ASP、JSP、PHP、ASP.NET进行实际应用%统计
  5. RocketMQ 实战-SpringBoot整合RocketMQ同步消息、异步消息、单向消息
  6. Easy Code,IntelliJ IDEA中代码一键生成
  7. 企业class类命名规范
  8. 我的开源项目:AAC格式分析器
  9. x264编码指南——码率控制
  10. 视频特性TI(时间信息)和SI(空间信息)的计算工具:TIandSI