行为识别论文笔记-ARTNet-Appearance-and-Relation Networks for Video Classification

Wang, Limin, et al. “Appearance-and-relation networks for video classification.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

Motivation

3 kinds of architectures for video classification: (1) two-stream CNNs (time-consuming, optical flow in advance) (2) 3D CNNs (worse than two stream) and (3) 2D CNNs with temporal models on top such as LSTM, temporal convolution, sparse sampling and aggregation, and attention modeling. (worse in local spatiotemporal representation)

multiplicative interactions to model relation between different views: Gated Boltzmann machines, Energy models, Independent Subspace Analysis (ISA)(similar to Energy model but its weights are trained from data); Apply for optical flow estimation and person re-identification

Some energy models:

Derpanis, Konstantinos G., et al. “Efficient action spotting based on a spacetime oriented structure representation.” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010.

Wang, LiMin, Yu Qiao, and Xiaoou Tang. “Motionlets: Mid-level 3d parts for human motion recognition.” Proceedings of the ieee conference on computer vision and pattern recognition. 2013. ( work of author himself

Solutions

multiple stacked SMART blocks

Two branches of SMART block: (1) appearance branch for spatial modeling (2) relation branch for temporal modeling

flexible implementations:

save 3D CNNs computation consumption and promise acc

enhance the ability of local representations for long-term Models

  • Relation branch: square-pooling architecture (similar with ISA) to learn appearance-independent relation between frames

    • Square function: a hidden unit Z_k from two patches x and y from consecutive frames (这个模块的前向反向估计要自己实现一下,到时候研究 code 实验中stride=2 是为了在两个patch之间建模吧,表达局部特征)

    • Cross channel pooling: 1 x 1 x 1 conv

  • Appearance branch: 2D CNN

Experiments

Kinetics train, UCF101 HMDB test

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gEvGNrZZ-1606697177469)(行为识别论文笔记-ARTNet-Appearance-and-Relation Networks for Video Classification.assets/image-20201128235023194.png)]

SMART bolck 参数量和计算效率是升高的,这可能与 GPU 没有针对性优化有关,但从公式上看,3D CNN就是z=x+y,Square function 是 z = (x+y)^2 的时间效率更低;

x+y 保证线性,x*y 保证独立性

English Expression

Assuming the independence between appearance and relation, it is reasonable to decouple these two kinds of information when designing learning modules.

Advantages and Drawbacks

  1. 自己创造了一种energy model的形式,在来自于连续帧上的patch建模(2帧间的局部特征表达,多帧跳帧是否支持未知,需要check code
  2. 时序建模上,不见得比 3D CNN 效率高,参数少;可解释性上,自己造的Z_k计算公式也难说比 3D CNN可解释性好;3D CNN肯定是支持多帧时序建模的,本文sqaure-pooling是否仅支持连续2帧间建模?这需要check code了。有check过的兄弟告知一声昂
  3. 比longterm model 更关注局部特征,好
  4. ARTNet 的 stakced block 没有使用残差结构,网络越深,时序信息越弱

Reference:

行为识别论文笔记|ARTNet|Appearance-and-Relation Networks for Video Classification相关推荐

  1. 行为识别论文笔记|TSN|Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

    行为识别论文笔记|TSN|Temporal Segment Networks: Towards Good Practices for Deep Action Recognition Temporal ...

  2. SER 语音情感识别-论文笔记2

    SER 语音情感识别-论文笔记2 <Speech emotion recognition: Emotional models, databases, features, preprocessin ...

  3. SER 语音情感识别-论文笔记4

    SER 语音情感识别-论文笔记4 <SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION ...

  4. SER 语音情感识别-论文笔记3

    SER 语音情感识别-论文笔记3 <SPEECH EMOTION RECOGNITION USING SEMANTIC INFORMATION> 2021年ICASSP Code avai ...

  5. SER 语音情感识别-论文笔记5

    SER 语音情感识别-论文笔记5 <MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF ...

  6. 行为识别论文笔记|I3D S3D R(2+1)D P3D CSN

    行为识别论文笔记-I3D T3D S3D R(2+1)D P3D CSN I3D Carreira, Joao, and Andrew Zisserman. "Quo vadis, acti ...

  7. 论文笔记 Federated Optimization in Heterogeneous Networks

    论文题目:<Federated Optimization in Heterogeneous Networks> 论文地址:https://arxiv.org/pdf/1812.06127. ...

  8. 论文笔记【A Comprehensive Study of Deep Video Action Recognition】

    论文链接:A Comprehensive Study of Deep Video Action Recognition 目录 A Comprehensive Study of Deep Video A ...

  9. 论文笔记:PRIN: Pointwise Rotation-Invariant Networks

    PRIN: Pointwise Rotation-Invariant Networks 1.四个问题 要解决什么问题? 使用特殊结构的神经网络来提取具有旋转不变性的点云特征. 用了什么方法解决? 提出 ...

  10. 【深度学习】步态识别-论文笔记:(ICCV-2021)用于步态识别的3D局部卷积神经网络

    这里写目录标题 论文详情 概述 达摩院视频讲解笔记 挑战 提出3D local CNN 3D local block 数据集 论文 摘要 1 介绍 2 主要贡献 3 方法 3.1 Formulatio ...

最新文章

  1. VISTA IIS Worker Process 已停止工作 解决办法
  2. Android app 应用签名
  3. 换一种方式“写代码 编程序“,为自己的程序生涯找条新路
  4. 微信支付 SDK 惊爆漏洞:黑客可 0 元购买任意商品
  5. python之修改pip为阿里源
  6. envi 打开影像报错:‘HISTOGRAM:illegal binsize or max/min‘.The result maybe invalid
  7. Beta阶段第1周/共2周 Scrum立会报告+燃尽图 03
  8. 0x00007FFC4480532C(opencv_world310.dll)处(位于opencv-024.exe中)引发的异常:OxC0000005:读取位置0xFFFFFFFFFF时发生访问冲突
  9. PL/SQL详细的安装和配置教程(附带网盘下载链接,以及PL/SQL的基本操作与注意事项)
  10. WordPress伪原创工具-更新网站一键伪原创发布软件
  11. 通过串口波特率计算bit时间
  12. android模拟器mac版本下载,天天模拟器for Mac-天天模拟器mac版下载 V1.0.7-PC6苹果网...
  13. ACL 2017 录用论文整理(长文)
  14. java根据经纬度得出中心点的经纬度
  15. 历时七天,史上最强MySQL优化总结,从此优化So Easy!
  16. 计算机专业怎么选择笔记本,选错一次,后悔四年!不同专业的大学生如何选笔记本电脑?...
  17. android平台获取手机IMSI,IMEI ,序列号,和 手机号的方法
  18. 前端工程师第一篇-HTML(1)
  19. 元模型驱动架构(M-MDA)思想及应用
  20. OLED屏显传感器数据显示到OLED上

热门文章

  1. Alfred插件之有道翻译配置过程
  2. matlab 平滑曲线连接_用MATLAB做数据拟合究竟有多直观
  3. 宝峰c1对讲机写频软件_宝峰对讲机写频软件下载7.01 官方正式版-宝峰BF480,BF520,F25,F26对讲机写频软件西西软件下载...
  4. Scan2CAD pro中文版
  5. 软件开发学习资料大全
  6. 28.STM32电阻与电容触摸屏幕
  7. 51单片机波特率计算c语言,8051单片机波特率计算公式(配套C语言例程
  8. 数据科学学习之统计实验的设计、检验与分析
  9. ubuntu使用CNKI官方的caj浏览器
  10. 海外RPA企业盘点:谁是领导者,谁是挑战者?