目录

Contributions

Method

1、Model

2、Three sampling strategies.

3、Video frame encoding.

Results

More Reference to Follow


论文名称:Self-Supervised Video Representation Learning With Odd-One-Out Networks(2017 CVPR)

论文作者:Basura Fernando, Hakan Bilen, Efstratios Gavves, Stephen Gould

下载地址:https://openaccess.thecvf.com/content_cvpr_2017/html/Fernando_Self-Supervised_Video_Representation_CVPR_2017_paper.html


Contributions

We propose a new self-supervised CNN pre-training technique based on a novel auxiliary task called odd-one-out learning. In this task, we sample subsequences from videos and ask the network to learn to predict the odd video subsequence. The odd video subsequence is sampled such that it has wrong temporal order of frames while the even ones have the correct temporal order. Our learning machine is implemented as multi-stream convolutional neural network, which is learned end-to-end. Using odd-one-out networks, we learn temporal representations for videos that generalizes to other related tasks such as action recognition.


Method

1、Model

O3N is composed of (N+1) input branches, each contains five Convolutional layers and weights are shared across the input layers. Configuration of each input branch is identical to AlexNet architecture up to the first fully connected layer. We then introduce a fusion layer which merges the information from (N+1) branches after the first fully connected layer. We experiment with two fusion models, the Concatenation model and sum of difference model leading to two different network architectures as shown in Fig. 2.

  1. Concatenation model: The first fully connected layers from each branch are concatenated to give a (N + 1) × d dimensional vector, where d is the dimensionality of the first fully connected layer.
  2. Sum of difference model: The first fully connected layers from each branch are summed after taking the pair-wise activation difference leading to a d dimensional vector, where d is the dimensionality of the first fully connected layer. Mathematically, let vi be the activation vector of the i-th branch of the network. The output of the sum of difference layer is given by

2、Three sampling strategies.

  1. Consecutive sampling: We sample W number of consecutive frames N times from video X to generate N number of even (related) elements. Each sampled even element of the odd-one-out question is a valid video sub-clip consisting of W consecutive frames from the original video. However, the odd video sequence of length W is constructed by random ordering of frames and therefore does not satisfy the order constraints.
  2. Random sampling: We randomly sample W frames N times from the video X to generate N number of even (related) elements. Each of these N elements are sequences that has the correct temporal order and satisfy the original order constraints of X. However, the frames are not consecutive as in the case of consecutive sampling. The odd video sequence of length W is also constructed by randomly sampling frames. Similar to consecutive sampling strategy, the odd sequence does not satisfy the order constraints. Specifically, we randomly shuffled the frames of the odd element (sequence).
  3. Constrained consecutive sampling: First we sub select a video clip of size 1.5 ×W from the original video which we denote by Xˆ. We randomly sample W consecutive frames N times from Xˆ to generate N number of even (related) elements. Each of these N elements are subsequences that have the correct temporal order and satisfy the original order constraints of X. At the same time each of the sampled even video clips of size W overlaps more than 50% with each other. The odd video sequence of length W is also constructed by randomly sampling frames from Xˆ. Similar to other sampling strategies, the odd sequence does not satisfy the order constraints. Specifically, we randomly shuffled the frames of the odd element (sequence).

3、Video frame encoding.

Each element (video-clip or subsequence) in an odd-one-out question is encoded to extract temporal information before presenting to the first convolutional filters of the network. There are several ways to capture the temporal structure of a video sequence. For example, one can use 3D-convolutions, recurrent encoders, rank-pooling encoders or simply concatenate frames. Odd-one-out networks can use any of the above methods to learn video representations in self-supervised manner using video data. Next, we discuss three technique that is used in our experiments to encode video-frame-clips using the differences of RGB frames into a single tensor Xd.

  • Sum of differences of frames video-clip encoder: We take the difference of frames and then sum the differences to obtain a single image Xd. This single image captures the structure of the sequence. Precisely, this is exactly same as the equation 2 but now applied over frames. It is interesting to note that this equation boils down to a weighted average of frames such that Xd=∑wtXt where the weight of frame at index t is given by

  • Dynamic image encoder: This method is similar to the sum of differences of frames method, however the only difference is that now the input sequence is pre-processed to obtain a smoothed sequence M=<M1,M2,⋯MW>. Smoothing is obtain using the mean at index t. The smoothed frame at index t denoted by Mt is given by

where Xj is the frame at index j of the sub-video.

  • Stack of differences of frames video-clip encoder: We also stack the difference of frames. However, now the resulting image is not any more a standard RGB image with three channels. Instead, we obtain (N − 1) × 3 channel image.

Results


More Reference to Follow

  1. traditionally unsupervised feature learning (e.g.[6, 20])【看看 video 怎么做 unsupervised】
  2. There has also been un- supervised temporal feature encoding methods to capture the structure of videos for action classification [13, 14, 15, 29, 36].

论文阅读:Self-Supervised Video Representation Learning With Odd-One-Out Networks相关推荐

  1. 论文阅读:Self-supervised video representation learning with space-time cubic puzzles

    论文名称:Self-supervised video representation learning with space-time cubic puzzles(2019 AAAI) 论文作者:Dah ...

  2. 论文阅读:Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

    题目:Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting 作者:Marti ...

  3. 【论文阅读】InfoGAN: Interpretable Representation Learning by Information Maximizing GAN

    论文下载 bib: @inproceedings{chenduan2016infogan,author = {Xi Chen and Yan Duan and Rein Houthooft and J ...

  4. 论文阅读之Improved Word Representation Learning with Sememes(2017)

    文章目录 论文介绍 Conventional Skip-gram Model Simple Sememe Aggregation Model(SSA) Sememe Attention over Co ...

  5. 论文阅读——Mockingjay: unsupervised speech representation learning

    <Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Enco ...

  6. 论文阅读GraphSAGE《Inductive Representation Learning on Large Graphs》

    目录 研究背景 算法模型 采样邻居顶点 生成向量的伪代码 聚合函数的选取 参数的学习 实验结果 GraphSAGE的核心: 改进方向: 其他补充学习知识 归纳式与直推式 为什么GCN是transduc ...

  7. 【论文阅读-NRE】Self-Supervised Representation Learning via Neighborhood-Relational Encoding

    Abs: 自监督表示 NRE:neighborhood-relational encoding邻居相关编码 旧的unsupervised方法一般只注重用deep net抓取data中的重点,而忽视了他 ...

  8. T-PAMI-2021论文Semi-Supervised Multi-View Deep Discriminant Representation Learning阅读笔记

    提示:文 0.论文信息 题目:Semi-Supervised Multi-View Deep Discriminant Representation Learning 期刊: IEEE Transac ...

  9. 【论文阅读】Rethinking Spatiotemporal Feature Learning For Video Understanding

    [论文阅读]Rethinking Spatiotemporal Feature Learning For Video Understanding 这是一篇google的论文,它和之前介绍的一篇face ...

最新文章

  1. linux中怎么退出执行过程,(进程)处理过程中的Linux:从执行到退出
  2. 如何生成安全的密码 Hash:MD5, SHA, PBKDF2, BCrypt 示例
  3. [Lintcode]41. Maximum Subarray/[Leetcode]53. Maximum Subarray
  4. oracle db-link 分布式数据库网络配置协议错误,Oracle学习(18)【DBA向】:分布式数据库...
  5. 我的世界1.8.9无需正版的服务器,我的世界1period;8period;9服务器纯洁服地址 | 手游网游页游攻略大全...
  6. 信息学奥赛一本通(1238:一元三次方程求解)
  7. linux /proc文件系统(1)
  8. HTML滚动条自定义
  9. qt5使用触屏 偶尔没响应_戴着手套玩手机!你试过吗?触屏灵敏又保暖,冬天再也不冻手~...
  10. nginx 缓存区太小导致后台Connection reset by peer 报错
  11. 如何开发与设计一个爆款小游戏
  12. python数据分析与展示 嵩天_Python数据分析与展示第2周学习笔记(北理工 嵩天)...
  13. 【转】Office2003与Office2007/2010共存方法
  14. springboot 打 jar 包分离依赖 lib 和 配置文件
  15. 【素史】曼怛罗(70)
  16. APP安全——反编译分析(反编译、再编译、签名)
  17. 回顾|Apache Flink Meetup · 北京站(附问题解答 PPT 下载)
  18. 央国企的企业并购重组信息能在塔米狗上找到吗?
  19. iTween那些事儿(二)
  20. IjkPlayer Option配置不当、导致视频卡顿或没有部分视频无声的坑;

热门文章

  1. 双屏玩游戏鼠标滑到另外一个屏幕_灵耀X2 Duo双屏操作!边上课边做笔记秀了,旦用难回的创新体验...
  2. Web UI自动化录制工具-Selenium IDE
  3. 解决启动报错Consider defining a bean of type ‘xxx‘ in your configuration.
  4. linux设置BIOS串口,将x86平台的Linux控制台重定向到串口
  5. 手把手教您如何实现英文文本的情感分析-准确度高达82%-98%(新手必看项目)
  6. a73*2+a53*2指的是什么_在影视表演的训练和学习中,台词的正确练习技巧是什么?...
  7. win7刷bois破解经历,华硕主板.
  8. ubuntu12.04本地搭建ubuntu更新源
  9. python早餐组合
  10. 中忻嘉业电商:什么是冷启动,抖音直播冷启动怎么做