一.Deep Ordinal Regression Network for Monocular Depth Estimation

用于单目深度估计的深度有序回归网络
CVPR 2018
在3D视觉感知主题里，单目图像深度估计是一个重要并且艰难的任务。虽然目前的方法已经取得了一些不错的成绩，但是这些方法普遍忽略了深度间固有的有序关系。针对这一问题，我们提出在模型中引入排序机制来帮助更准确地估计图像的深度信息。具体来说，我们首先将真值深度（ground-truth depth）按照区间递增（SID）的方法预分为许多深度子区间；然后设计了一个像素到像素的有序回归（ordinal regression）损失函数来模拟这些深度子区间的有序关系。在网络结构方面，不同于传统的编码解码（encoder-decoder）深度估计网络, 我们采用洞卷积（dilated convolution）型网络来更好地提取多尺度特征和获取高分辨率深度图。另外，我们借鉴全局池化和全连接操作，提出了一个有效的全局信息学习器。我们的方法在KITTI，NYUV2和Make3D三个数据集上都实现了当前最佳的结果。并且在KITTI新开的测试服务器上取得了比官方baseline高出30%~70%的分数

Abstract

DCNNs深度估计建模为一个回归问题，并通过最小化均方误差来训练回归网络，该网络收敛速度慢，局部解不理想，另外采用重复的空间pooling操作导致undesirable low-resolution feature maps（当前DCNNs存在的问题）

为了获得高分辨率特征图采用跳跃连接或多层反卷积网络，使得网络训练复杂化，计算量大为增加（也存在问题）

为了消除或者很大程度上减少上述问题，[1]提出间隔递增离散化(spacing-increasing discretization ,SID) 策略,将深度离散化&&将深度网络学习作为一个有序回归问题进行重构，通过使用普通的回归损失来训练网络，我们的方法获得了更高的精度和更快的同步收敛速度。[2]采用多尺度NN，避免不必要的pooling操作而且可以并行获得多尺度信息。

实验设置在KITTI 、MAK3D 和NYU Depth V2三个数据集上，获得state-of-the-art &&outperforms existing methods by a large margin（大幅度优于现存方法）

Deep Ordinal Regression Network缩写DORN

1. Introduction

介绍MDE（Monocular Depth Estimation）的重要性（场景重建和理解人物的关键步骤）

相对立体图像和视频序列的深度估计，MDE存在an ill-posed problem（不适定问题：单个2D图像可以由无限多个不同的3D场景生成）

近年来的工作：[38, 55, 46, 9, 28, 31, 33, 3]基于DCNNs模型，大大提高了MDE的性能，但是收敛速度慢&最终的解决方案远不能令人满意。

[9, 15, 31, 33, 38, 57]重复的空间pooling操作(stride=32)降低特征图的分辨率，对于深度估计来说并不可取。多层反卷积网络 [33,15,31] 、多尺度网络[38,9] 或跳跃连接[57]来获得高分辨率深度图，但这样的处理不仅需要额外的计算和存储成本，同时也使网络结构和训练过程复杂化。

Ours：主要介绍了一下思想，详细见第三部分。

使用SID策略的原因：深度预测的不确定性随着ground-truth深度的增加而增加，这说明在预测较大的深度值时，最好允许较大的误差，避免较大的深度值对训练过程的影响过度增强。

2. Related Work

Depth Estimation

深度估计对于从二维图像理解场景的三维结构是至关重要的。

geometry-based algorithms [50, 12, 11]：

[48]手工特征，由于手工特征本身只能捕获局部信息，马尔可夫随机场(MRFs)等概率图形模型通常基于这些特征来建立，以结合long-range and global cues[49,63,39]。

DCNNs [18, 61, 35, 40, 52, 56, 46, 38, 27]. multi-scale networks [10, 9]

skip-connection[57]将较深的特征图与浅层的特征图进行fuse

multi-layer deconvolutional networks [15, 31, 33]

加入条件随机场来进一步提高估计深度图的质量[55,38]。

为了提高效率，Roy和Todorovic[46]提出了神经回归森林方法，该方法允许对“浅”cnn进行并行训练

Ordinal Regression

新的SVM来处理多阈值。Cammer和Singer[8]将多阈值在线感知器算法推广到有序回归中。另一种方法是建立有序回归作为一组二分类子问题。例如，Frank和Hall[13]使用了一些决策树作为有序回归的二值分类器。在计算机视觉中，顺序回归与DCNNs相结合来解决年龄估计问题[42]。

3. Method

3.1 网络结构

两部分：a dense feature extractor and a scene understanding modular, and outputs multi-channel dense ordinal labels given an image.

3.1.1 dense feature extractor

作者受场景分析网络[60，4，62]思想的启发，去除DCNNs最后几层下采样操作并加上空洞卷积在不降低空间分辨率或增加参数数目的情况下扩大滤波器的感受野。代码采用ResNet101

3.1.2 scene understanding modular

三个并行部分：atrous spatial pyramid pooling (ASPP) module + a cross-channel leaner(1*1 conv) + a full-image encoder.

ASPP模块通过空洞卷积的操作扩大感受野【多尺度】，dilation rates =6,12,18分别后接1*1conv（ learn complex cross-channel interactions）

full-image encoder captures global contextual information and can greatly clarify local confusions in depth estimation

top （fc-fashion ）：参数量多，训练难度大、内存消耗大

bottom：1.先用平均池化+小步长得到一个fetaure map（主要是降低空间维数）

2.接个fc，得到一个C维特征向量，将该特征向量作为空间维数为1x1的特征映射的C通道，再加入1*1conv【cross channel parametric pooling structure】

3.将特征向量沿空间维度 copy 到F上，使得F的每个位置对整个图像都有相同的理解。copy怎么操作，看了代码是上采样

class FullImageEncoder(nn.Module):def __init__(self):super(FullImageEncoder, self).__init__()self.global_pooling = nn.AvgPool2d(8, stride=8, padding=(4, 2))  # KITTI 16 16self.dropout = nn.Dropout2d(p=0.5)self.global_fc = nn.Linear(2048 * 6 * 5, 512)self.relu = nn.ReLU(inplace=True)self.conv1 = nn.Conv2d(512, 512, 1)  # 1x1 卷积self.upsample = nn.UpsamplingBilinear2d(size=(33, 45))  # KITTI 49X65 NYU 33X45weights_init(self.modules(), 'xavier')def forward(self, x):x1 = self.global_pooling(x)# print('# x1 size:', x1.size())x2 = self.dropout(x1)x3 = x2.view(-1, 2048 * 6 * 5)x4 = self.relu(self.global_fc(x3))# print('# x4 size:', x4.size())x4 = x4.view(-1, 512, 1, 1)# print('# x4 size:', x4.size())x5 = self.conv1(x4)out = self.upsample(x5)return out

两个1x1conv：前者降低特征维数，learn cross channel；后者 transforms the features into multi-channel dense ordinal labels.

3.2 SID 策略

均匀离散化 UD

间距递增离散化 SID

SID 的动机是在实际任务中，depth 越大，对其估计误差的容忍就越大，因此对 depth 取 log 后均匀量化可以满足这个目的，

在深度估计任务中，随着深度值的增大，用于深度估计的信息量较少，意味着较大深度值的估计误差通常比较大。

UD策略：较大深度值会导致过度强化损失 **over-strengthened loss **

SID：在对数空间中均匀的离散给定的深度间隔

假设一个深度间隔[α，β]需要离散成K个子间隔，公式如下：

α*？保证区间是从1开始？有何作用？

3.3 Learning and Inference

在得到离散的深度值后，直接将标准回妇问题转化为多分类问题，并采用 softmax回归损失来学习深度估计网络中的参数：然而，典型的多分类损失忽略了离散label之间的有序信息，而深度值由于形成有序集而具有很强的有序相关性：因此，我们将深度估计问题转化为序数回归问题，并develop序数损失来学习我们的网络参数。

loss 设计部分：

χ=φ(I,Φ)\chi = \varphi(I,\Phi)χ=φ(I,Φ) 其中III 表示输入图片，Φ\PhiΦ 表示dense feature extractor + scene understanding modular中的参数

Y=ψ(χ,Θ)Y = \psi(\chi,\Theta)Y=ψ(χ,Θ) 其中Θ\ThetaΘ 表示权重向量 Θ=(θ0,θ1,...,θ2K−1)\Theta = (\theta_0,\theta_1,...,\theta_{2K-1})Θ=(θ0,θ1,...,θ2K−1)

l(w,h)∈0,1,...,K−1l_{(w,h)}∈{0,1,...,K-1}l(w,h)∈0,1,...,K−1 表示在(w,h)(w,h)(w,h)处由SID策略产生的离散标签

Ψ(h,w,χ,Θ)\Psi(h,w,\chi,\Theta)Ψ(h,w,χ,Θ)表示pixelwise ordinal loss （二分类任务，采用交叉熵形式？）

L(χ,Θ)L(\chi,\Theta)L(χ,Θ)表示的整个图片的 loss = Ψ(h,w,χ,Θ)\Psi(h,w,\chi,\Theta)Ψ(h,w,χ,Θ)的平均值

l^(w,h)\hat l(w,h)l^(w,h) is the estimated discrete value decoding from y(w,h)y_{(w,h)}y(w,h)

P(w,h)P_{(w,h)}P(w,h)表示ordinal layer处的ord_labels.

输出参数是DORN的两个返回值：depth_labels, ord_labels【见main.py】

class ordLoss(nn.Module):"""Ordinal loss is defined as the average of pixelwise ordinal loss F(h, w, X, O)over the entire image domain:"""def __init__(self):super(ordLoss, self).__init__()self.loss = 0.0def forward(self, ord_labels, target):""":param ord_labels: ordinal labels for each position of Image I.:param target:     the ground_truth discreted using SID strategy.:return: ordinal loss"""# assert pred.dim() == target.dim()# invalid_mask = target < 0# target[invalid_mask] = 0N, C, H, W = ord_labels.size()ord_num = C# print('ord_num = ', ord_num)self.loss = 0.0# faster versionif torch.cuda.is_available():K = torch.zeros((N, C, H, W), dtype=torch.int).cuda()for i in range(ord_num):K[:, i, :, :] = K[:, i, :, :] + i * torch.ones((N, H, W), dtype=torch.int).cuda()else:K = torch.zeros((N, C, H, W), dtype=torch.int)for i in range(ord_num):K[:, i, :, :] = K[:, i, :, :] + i * torch.ones((N, H, W), dtype=torch.int)mask_0 = (K <= target).detach()mask_1 = (K > target).detach()one = torch.ones(ord_labels[mask_1].size())if torch.cuda.is_available():one = one.cuda()self.loss += torch.sum(torch.log(torch.clamp(ord_labels[mask_0], min=1e-8, max=1e8))) \+ torch.sum(torch.log(torch.clamp(one - ord_labels[mask_1], min=1e-8, max=1e8)))N = N * H * Wself.loss /= (-N)  # negativereturn self.loss

实验

NYU Depth v2

作者采用消融实验，因为作者的贡献[1]将回归任务变为有序回归（有序的多分类任务，DORN好）

[2]设计了ordinal regression loss，ordinal regression loss vs regression loss (i.e. BerHu)；

[3]另外又比较了SID策略和UD策略在该数据集上的结果。

对于SID中间隔K的选择，区间[40,120]对score和δ\deltaδ的敏感程度不大，但是K不能过大也不能过小。

过少会造成较大的量化误差，而过大则会失去离散化的优势。

Deep Ordinal Regression Network for Monocular Depth Estimation相关推荐

Deep Ordinal Regression Network for Monocular Depth Estimation 单目深度估计,论文阅读,DORN；视频笔记
tags: 单目深度估计,论文阅读,DORN 原始论文是: Deep Ordinal Regression Network for Monocular Depth Estimation Huan Fu ...
Monocular Depth Estimation UsingLaplacian Pyramid-Based Depth Residuals翻译
基于拉普拉斯金字塔深度残差的单目深度估计 Monocular Depth Estimation UsingLaplacian Pyramid-Based Depth Residuals英文注解: 基于 ...
3D Packing for Self-Supervised Monocular Depth Estimation
动机: propose a novel self-supervised monocular depth estimation method combining geometry with a new ...
面向单目深度估计的基于几何的预训练方式 -- Geometric Pretraining for Monocular Depth Estimation
一些前提知识 Monocular Depth Estimation:单目深度估计,从单张图片中去预测每个像素点具体的深度,相当于从二维图像推测出三维空间. ImageNet-Pretraining:基 ...
Unsupervised Monocular Depth Estimation With Left-Right Consistency 论文笔记
文 | 陈十三公众号首发 | 一只在路上的哈士奇公众号ID | super_Mrchen 关注可了解更多.问题或建议,请公众号留言 0x00 补充知识 1.视差:左右双目图像中,两个匹配块中心像素 ...
【FastDepth】《FastDepth：Fast Monocular Depth Estimation on Embedded Systems》
ICRA-2019 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 Ex ...
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimati
转载请注明作者和出处: http://blog.csdn.net/john_bh/ paper 地址:GDR-Net: Geometry-Guided Direct Regression Networ ...
Unsupervised Monocular Depth Estimation From Light Field Image
** Unsupervised Monocular Depth Estimation From Light Field Image ** Network Architecture 在深入研究前人工作的 ...
单目深度估计(Monocular Depth Estimation)论文阅读 2021-01-15
单目深度估计问题公式化:求非线性映射函数一.数据集: NYU Depth:视频序列和dense depth map通过RGB-D采集的,但是不是每一种图像都有深度图,因为映射是离散的. KITTI ...
Towards Robust Monocular Depth Estimation:Mixing Datasets for Zero-shot Cross-dataset Transfer——阅读阶段
相关申明项目地址 @article{Ranftl2020,author = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Kon ...

Deep Ordinal Regression Network for Monocular Depth Estimation