文章目录

  • 1. 论文总述
  • 2. 利用CNN预测光流的另一种思路
  • 3. 通过堆叠网络来进行光流估计优化的idea来源
  • 4. FlyingThings3D (Things3D) dataset
  • 5. The order of presenting training data with different properties matters.
  • 6. 堆叠两个相同网络时(FlowNetS)
  • 7. 堆叠不同的网络结构(FlowNetC+FlowNetS,或者3/8 通道数的FlowNetS)
  • 8. FlowNet-CSS的训练好像很麻烦
  • 9. Small Displacements 在传统方法里被解决的很好
  • 10. 为小位移场景增加一个网络
  • 11. 不同模型结果对比
  • 参考文献

1. 论文总述

本文是FlowNet的进化版,由于FlowNet是基于CNN光流估计的开创之作,所以肯定有很多不足之处,本文FlowNet 2.0就从三个方面做了改进:

  • (1)数据方面:首先扩充数据集,FlyThings3D,以及侧重 small displacements的数据集ChairsSDHom;然后实验验证了不同数据集的训练顺序对模型性能也有很大影响,先学习简单数据集,再学习困难点的数据集,这样比较合理
  • (2)网络结构:通过在FlowNetC的基础上堆叠FlowNetS来加大模型,通过堆叠子网络来优化光流估计结果,同时在模型中引入了warp的概念:即将光流估计的中间结果wrap回到原图上(原feature),然后继续优化,即逐步缩小预测结果与GT的差距,类似机器学习中GBDT的理念,其实在传统光流估计算法中(如DIS的真正实现中),不断迭代的过程就是逐步缩小差距理念的体现。
  • (3)small displacements:特意为小位移场景下的光流估计设计了一个网络结构以及合成了一个小位移的数据集ChairsSDHom

First, we evaluate the influence
of dataset schedules. Interestingly, the more sophisticated
training data provided by Mayer et al. [18] leads to inferior results if used in isolation. However, a learning schedule consisting of multiple datasets improves results signifi-
cantly. In this scope, we also found that the FlowNet version
with an explicit correlation layer outperforms the version
without such layer. This is in contrast to the results reported
in Dosovitskiy et al. [10].
As a second contribution, we introduce a warping operation and show how stacking multiple networks using this
operation can significantly improve the results. By varying
the depth of the stack and the size of individual components
we obtain many network variants with different size and
runtime. This allows us to control the trade-off between accuracy and computational resources. We provide networks
for the spectrum between 8fps and 140fps.
Finally, we focus on small, subpixel motion and realworld data. To this end, we created a special training dataset
and a specialized network. We show that the architecture
trained with this dataset performs well on small motions
typical for real-world videos. To reach optimal performance
on arbitrary displacements, we add a network that learns to
fuse the former stacked network with the small displacement network in an optimal manner

2. 利用CNN预测光流的另一种思路

An alternative approach to learning-based optical flow
estimation is to use CNNs to match image patches. Thewlis
et al. [29] formulate Deep Matching [31] as a CNN and
optimize it end-to-end. Gadot & Wolf [12] and Bailer et
al. [3] learn image patch descriptors using Siamese network
architectures. These methods can reach good accuracy, but
require exhaustive matching of patches. Thus, they are restrictively slow for most practical applications. Moreover,
methods based on (small) patches are inherently unable to
use the larger whole-image context.

利用cnn进行patch的match有问题:(1)速度太慢(2)由于是局部patch,不能利用image的上下文信息(我觉得这样导致不能充分挖掘CNN的潜力)

3. 通过堆叠网络来进行光流估计优化的idea来源

CNNs trained for per-pixel prediction tasks often produce noisy or blurry results. As a remedy, off-the-shelf optimization can be applied to the network predictions (e.g.,
optical flow can be postprocessed with a variational approach [10]). In some cases, this refinement can be approximated by neural networks: Chen & Pock [9] formulate
their reaction diffusion model as a CNN and apply it to image denoising, deblocking and superresolution. Recently,
it has been shown that similar refinement can be obtained
by stacking several CNNs on top of each other. This led to
improved results in human pose estimation [17, 8] and semantic instance segmentation [22]. In this paper we adapt
the idea of stacking networks to optical flow estimation.
Our network architecture includes warping layers th

利用CNN进行像素级预测的时候,确实容易出现很多噪声或者模糊边界

4. FlyingThings3D (Things3D) dataset

The FlyingThings3D (Things3D) dataset proposed by
Mayer et al. [18] can be seen as a three-dimensional version of Chairs: 22k renderings of random scenes show 3D
models from the ShapeNet dataset [23] moving in front of
static 3D backgrounds. In contrast to Chairs, the images
show true 3D motion and lighting effects and there is more
variety among the object models

5. The order of presenting training data with different properties matters.

同时:FlowNetC outperforms FlowNetS.
推翻了FlowNet_V1 中的结论

6. 堆叠两个相同网络时(FlowNetS)

We make the following observations: (1) Just stacking
networks without warping yields better results on Chairs,
but worse on Sintel; the stacked network is over-fitting. (2)
Stacking with warping
always improves results. (3) Adding
an intermediate loss after Net1 is advantageous when training the stacked network end-to-end. (4) The best results are
obtained by keeping the first network fixed and only training the second network after the warping operation.
Clearly, since the stacked network is twice as big as the
single network, over-fitting is an issue. The positive effect
of flow refinement after warping can counteract this problem, yet the best of both is obtained when the stacked networks are trained one after the other, since this avoids over-
fitting while having the benefit of flow refinement.

堆叠两个相同网络时(FlowNetS):容易过拟合,加入warp,或者模型参数一个接一个学习(更新后一个网络的参数时,冻结前一个网络的参数),这两招都可以一定程度上避免过拟合。

文中还提到一点:以前都是在FlyingChairs上训练,在Sintel上测试,但为了查看模型是否过拟合, 作者测试时用了一部分FlyingChairs的数据。

7. 堆叠不同的网络结构(FlowNetC+FlowNetS,或者3/8 通道数的FlowNetS)

小写的s表示:3/8 通道数的FlowNetS

两个小网络可能要比一个大网络又快又好。。

8. FlowNet-CSS的训练好像很麻烦

. As also done in [17, 9], we therefore add networks
with different weights to the stack. Compared to identical
weights, stacking networks with different weights increases
the memory footprint, but does not increase the runtime. In
this case the top networks are not constrained to a general
improvement of their input, but can perform different tasks
at different stages and the stack can be trained in smaller
pieces by fixing existing networks and adding new networks

one-by-one. We do so by using the Chairs→Things3D
schedule from Section 3 for every new network and the
best configuration with warping from Section 4.1. Furthermore, we experiment with different network sizes and alternatively use FlowNetS or FlowNetC as a bootstrapping
network. We use FlowNetC only in case of the bootstrap
network, as the input to the next network is too diverse to be
properly handeled by the Siamese structure of FlowNetC.
Smaller size versions of the networks were created by taking only a fraction of the number of channels for every layer
in the network. Figure 4 shows the network accuracy and
runtime for different network sizes of a single FlowNetS.
Factor 3/8
yields a good trade-off between speed and accuracy when aiming for faster networks.

9. Small Displacements 在传统方法里被解决的很好

While the original FlowNet [10] performed well on the
Sintel benchmark, limitations in real-world applications
have become apparent. In particular, the network cannot
reliably estimate small motions (see Figure 1). This is
counter-intuitive, since small motions are easier for traditional methods,
and there is no obvious reason why networks should not reach the same performance in this setting. Thus, we compared the training data to the UCF101
dataset [25]
as one example of real-world data. While
Chairs are similar to Sintel, UCF101 is fundamentally different (we refer to our supplemental material for the analysis): Sintel is an action movie and as such contains many
fast movements that are difficult for traditional methods,
while the displacements we see in the UCF101 dataset are
much smaller, mostly smaller than 1 pixel. Thus, we created
a dataset in the visual style of Chairs but with very small displacements and a displacement histogram much more like
UCF101. We also added cases with a background that is
homogeneous or just consists of color gradients. We call
this dataset ChairsSDHom.

所以觉得是数据集的问题,所以专门为Small Displacements 合成了一个数据集:ChairsSDHom(按照UCF101的直方图分布)

We fine-tuned our FlowNet2-CSS network for smaller
displacements by further training the whole network
stack on a mixture of Things3D and ChairsSDHom
and by applying a non-linearity to the error to down-
weight large displacements2
. We denote this network by
FlowNet2-CSS-ft-sd. This improves results on small displacements and we found that this particular mixture does
not sacrifice performance on large displacements. However, in case of subpixel motion, noise still remains a problem and we conjecture that the FlowNet architecture might
in general not be perfect for such motion.

FlowNet2-CSS-ft-sd仅仅表示在小位移数据上微调,并没有改变模型结构。

虽然有精度提升,但仍然不够,所以还必须得改变模型结构

10. 为小位移场景增加一个网络

Therefore, we
slightly modified the original FlowNetS architecture and removed the stride 2 in the first layer. We made the beginning
of the network deeper by exchanging the 7×7 and 5×5
kernels in the beginning with multiple 3×3 kernels2
. Because noise tends to be a problem with small displacements,
we add convolutions between the upconvolutions to obtain
smoother estimates like in [18]. We denote the resulting
architecture by FlowNet2-SD; see Figure 2.

注:说是在FlowNetS上做的修改

注:小位移场景下,噪声是个大问题!!!!

有了FlowNet2-SD之后,就需要把两者做个融合,即Figure2所示的最终网络结构。

Finally, we created a small network that fuses
FlowNet2-CSS-ft-sd and FlowNet2-SD (see Figure 2). The
fusion network receives the flows, the flow magnitudes and
the errors in brightness after warping as input. It contracts
the resolution twice by a factor of 2 and expands again.
Contrary to the original FlowNet architecture it expands to
the full resolution (这块不理解) . We find that this produces crisp motion
boundaries (本文的这种方法对运动边界的效果更好?)and performs well on small as well as on large
displacements. We denote the final network as FlowNet2.

11. 不同模型结果对比

参考文献

1. FlowNet 论文笔记

2. 光流介绍以及FlowNet学习笔记

3. 图像处理中的全局优化技术(Global optimization techniques in image processing and computer vision) (三)(主要是介绍变分优化)

4. 光流Optical Flow介绍与OpenCV实现(光流可视化的C++代码)

5. FlowNet到FlowNet2.0:基于卷积神经网络的光流预测算法

论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks相关推荐

  1. CNN光流计算2--FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

    FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks CVPR2017 Code: https://github.c ...

  2. 【论文阅读】【综述】从Optical Flow到Scene Flow

    文章目录 Optical Flow FlowNet(2015ICCV) FlowNet2.0(2017CVPR) PWCNet(2018CVPR) MaskFlowNet(2020CVPR) Sene ...

  3. 论文阅读:An Empirical Study of Spatial Attention Mechanisms in Deep Networks

    1.研究空间注意力机制. (1)Transformer attention 处理自然语言序列的模型有 rnn, cnn(textcnn),但是现在介绍一种新的模型,transformer.与RNN不同 ...

  4. 深度学习论文阅读图像分类篇(三):VGGNet《Very Deep Convolutional Networks for Large-Scale Image Recognition》

    深度学习论文阅读图像分类篇(三):VGGNet<Very Deep Convolutional Networks for Large-Scale Image Recognition> Ab ...

  5. [论文阅读:姿态识别Transformer] TFPose: Direct Human Pose Estimation with Transformers

    [论文阅读:姿态识别&Transformer] TFPose: Direct Human Pose Estimation with Transformers 文章目录 [论文阅读:姿态识别&a ...

  6. 论文阅读和分析:《DeepGCNs: Can GCNs Go as Deep as CNNs?》

    下面所有博客是个人对EEG脑电的探索,项目代码是早期版本不完整,需要完整项目代码和资料请私聊. 数据集 1.脑电项目探索和实现(EEG) (上):研究数据集选取和介绍SEED 相关论文阅读分析: 1. ...

  7. 论文笔记-Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation

    Hello, 这是论文阅读计划的第24篇啦 今天介绍的这篇论文是CVPR 2020的论文,非监督的光流估计. 一.背景介绍 光流作为图像的运动描述,已经广泛应用于高级视频任务.得益于深度学习的发展,基 ...

  8. 【论文阅读】Attention Based Spatial-Temporal GCN...Traffic Flow Forecasting[基于注意力的时空图卷积网络交通流预测](1)

    [论文阅读]Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting[基于注 ...

  9. 【论文阅读】面部表情识别综述(2018年)(Deep Facial Expression Recognition: A Survey)

    论文地址:https://ieeexplore.ieee.org/abstract/document/9039580 百度网盘地址:https://pan.baidu.com/s/1A8NKT_wz4 ...

最新文章

  1. 安装zookeeper时候,可以查看进程启动,但是状态显示报错:Error contacting service. It is probably not running
  2. c语言值传递 地址传递 引用传递参数,C++参数传递(值传递,引用传递)
  3. ajax前台传json到后台解析的方法以及注意事项
  4. ndnsim r语言 ubuntu_Python语言---数据库
  5. Oracle中的环境变量(ORACLE_HOME 和 ORACLE_SID)
  6. paip.目录文件列表排序算法
  7. excel计算机一级知识点,计算机一级考试考点:Excel电子表格
  8. linux centos用户修改密码,centos怎么修改用户密码
  9. koreader下载_Koreader阅读器app免费安装最新版|Koreader阅读器2018手机最新版下载_v1.0_9ht安卓下载...
  10. 腾讯云和百度贴吧web端支付成功页面模板(根据实际页面修改整理)
  11. mysql统计字数_使用SQL确定文本字段的字数统计
  12. java邮件发送 qq与163邮箱互发和qq和163邮箱发送其他邮箱实例
  13. GPT分区是什么?如何创建GPT分区
  14. 安卓 获取rtsp流 截屏_安卓星雨视频+星火电视盒子版+安卓文件闪传+安卓截屏大师...
  15. node 简介及安装
  16. SpringBoot+阿里云短信服务实现发送短信验证码功能
  17. 转:我收到一份《中国焦虑图鉴》
  18. 学python兼职赚钱是真的吗,学python做兼职赚钱吗....
  19. 什么是ER图?数据库ER图基础概念整理
  20. To:世界最伟大的10位程序员。老师们,谢谢你们!

热门文章

  1. 2021年计算机类考研人数,2021考研报名人数最多的八大专业
  2. div显示在上层_DIV重叠 如何优先显示(div浮在重叠的div上面)
  3. opencv去除图片中某一颜色(python实现)
  4. Java JDBC https://www.bilibili.com/video/BV1PE411i7CV?p=31spm_id_from=pageDriver
  5. 西安阿里云代理商:vue项目部署到阿里云服务器(windows)Nginx代理
  6. Google数字图书馆
  7. 清除浮动的五种方法详解
  8. wordpress中文管理软件,wordpress中文在线翻译
  9. 2015 mbpr13 升级固态硬盘
  10. MyBatis关联映射:一对一、一对多