参考代码：bts

1. 概述

导读：从2D图像中估计出深度信息是多解的，对此文章提出了在解码器的多个stage上加上隐式约束，从而引导解码器中适应深度估计特征的生成，从而产生更佳的深度估计结果。其中的隐式约束即为LPG（local planar guidance）层，但是该层实现与其配套的操作较为复杂，直接运用于部署存在较大难度，不过用于teacher网络指导student网络是一个不错的选择。

将LPG层的输出取出，得到下图中第二行的结果：

可以看到在stride=8stride=8stride=8的层次上生成的是整幅图像的大体结构，之后随着stride的减小，逐渐呈现出更多的细节信息。不过这部分生成的结果并没有进行显式监督，而是只有在不同stride的LPG输出组合得到深度上进行监督。因而LPG可以看作是深度信息在不同stride分量的隐式约束。

2. 方法设计

2.1 网络结构

文章使用的深度估计网络是U型网络，见下图所示，主要的改进点也就是下图右边详细的decoder部分。

在backbone输出特征之后文章通过shortcut+upsample的形式得到stride=8stride=8stride=8的特征图，之后通过ASPP网络增强网络的感知能力，其中的膨胀系数为r=[3,6,12,18,24]r=[3, 6, 12, 18, 24]r=[3,6,12,18,24]。接下来这些特征便会经过不同stride对应的LPG层，最后得到最后的深度输出。

2.2 LPG层

文章提出的LGP网络结构见下图所示：

首先输入的stride=kstride=kstride=k的特征（对应的特征图大小为Hk\frac{H}{k}kH），之后经过一个reduction操作将输入的特征图编码为3通道的特征（对应上图的输入部分）。这样在此基础上就可以得到构建局部平面的参数，也就是文章中提到的(θ,ϕ)(\theta,\phi)(θ,ϕ)（对应上图的中间部分），可以参考：

# pytorch/bts.py#L110
def forward(self, net):net = self.reduc.forward(net)if not self.is_final:theta = self.sigmoid(net[:, 0, :, :]) * math.pi / 3phi = self.sigmoid(net[:, 1, :, :]) * math.pi * 2dist = self.sigmoid(net[:, 2, :, :]) * self.max_depthn1 = torch.mul(torch.sin(theta), torch.cos(phi)).unsqueeze(1)n2 = torch.mul(torch.sin(theta), torch.sin(phi)).unsqueeze(1)n3 = torch.cos(theta).unsqueeze(1)n4 = dist.unsqueeze(1)net = torch.cat([n1, n2, n3, n4], dim=1)return net

其实现源自文章提供的公式2：
n1=sin(θ)cos(ϕ),n2=sin(θ)sin(ϕ),n3=cos(θ)n_1=sin(\theta)cos(\phi),\ n_2=sin(\theta)sin(\phi),\ n3=cos(\theta)n1=sin(θ)cos(ϕ), n2=sin(θ)sin(ϕ), n3=cos(θ)
那么接下来就是需要将得到局部平面表达规整化到原图的分辨下，对于当前stride下与原图分辨率不统一的问题，这里是采用torch.repeat_interleave()堆叠复制之后与当前stride下对应block块(尺寸为k∗kk*kk∗k)运算得到，其实现可以参考：

# pytorch/bts.py#L32
def forward(self, plane_eq, focal):plane_eq_expanded = torch.repeat_interleave(plane_eq, int(self.upratio), 2)plane_eq_expanded = torch.repeat_interleave(plane_eq_expanded, int(self.upratio), 3)n1 = plane_eq_expanded[:, 0, :, :]n2 = plane_eq_expanded[:, 1, :, :]n3 = plane_eq_expanded[:, 2, :, :]n4 = plane_eq_expanded[:, 3, :, :]u = self.u.repeat(plane_eq.size(0), plane_eq.size(2) * int(self.upratio), plane_eq.size(3)).cuda()u = (u - (self.upratio - 1) * 0.5) / self.upratiov = self.v.repeat(plane_eq.size(0), plane_eq.size(2), plane_eq.size(3) * int(self.upratio)).cuda()v = (v - (self.upratio - 1) * 0.5) / self.upratioreturn n4 / (n1 * u + n2 * v + n3)

其实现对应的原文的公式1：
cˉi=n4n1ui+n2vi+n3\bar{c}_i=\frac{n_4}{n_1u_i+n_2v_i+n_3}cˉi=n1ui+n2vi+n3n4
这样的操作会在stride=[1,2,4,8]stride=[1,2,4,8]stride=[1,2,4,8]上进行，从而可以得到深度图在不同stage上的表达，最后这些表达经过concat组合起来，经过卷积网络之后输出得到最后的深度图，其实现描述为：
dˉ=f(W1cˉ1∗1+W2cˉ2∗2+W3cˉ4∗4+W4cˉ8∗8)\bar{d}=f(W_1\bar{c}^{1*1}+W_2\bar{c}^{2*2}+W_3\bar{c}^{4*4}+W_4\bar{c}^{8*8})dˉ=f(W1cˉ1∗1+W2cˉ2∗2+W3cˉ4∗4+W4cˉ8∗8)

上述提到的各个模块消融实验：

2.3 损失函数

这里采用的损失函数为尺度不变损失（log空间）：
D(g)=1T∑igi2−λ(1T∑igi)2D(g)=\frac{1}{T}\sum_ig_i^2-\lambda(\frac{1}{T}\sum_ig_i)^2D(g)=T1i∑gi2−λ(T1i∑gi)2
其中，λ\lambdaλ是对深度预测错误方差的关注程度。
L=αD(g)L=\alpha\sqrt{D(g)}L=αD(g)
参考文献：

paper：Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
论文笔记

3. 实验结果

《From Big to Small：Multi-Scale Local Planar Guidance for Monocular Depth Estimation》论文笔记相关推荐

论文笔记之Understanding and Diagnosing Visual Tracking Systems
Understanding and Diagnosing Visual Tracking Systems 论文链接:http://dwz.cn/6qPeIb 本文的主要思想是为了剖析出一个跟踪算法中到 ...
《Understanding and Diagnosing Visual Tracking Systems》论文笔记
本人为目标追踪初入小白,在博客下第一次记录一下自己的论文笔记,如有差错,恳请批评指正!! 论文相关信息:<Understanding and Diagnosing Visual Tracking ...
论文笔记Understanding and Diagnosing Visual Tracking Systems
最近在看目标跟踪方面的论文,看到王乃岩博士发的一篇分析跟踪系统的文章,将目标跟踪系统拆分为多个独立的部分进行分析,比较各个部分的效果.本文主要对该论文的重点的一个大致翻译,刚入门,水平有限,如有理解错 ...
目标跟踪笔记Understanding and Diagnosing Visual Tracking Systems
Understanding and Diagnosing Visual Tracking Systems 原文链接:https://blog.csdn.net/u010515206/article/d ...
追踪系统分模块解析（Understanding and Diagnosing Visual Tracking Systems）
追踪系统分模块解析(Understanding and Diagnosing Visual Tracking Systems) PROJECT http://winsty.net/tracker_di ...
ICCV 2015 《Understanding and Diagnosing Visual Tracking Systems》论文笔记
目录写在前面文章大意一些benchmark 实验实验设置基本模型数据集实验1 Featrue Extractor 实验2 Observation Model 实验3 Motion Mod ...
Understanding and Diagnosing Visual Tracking Systems
文章把一个跟踪器分为几个模块,分别为motion model, feature extractor, observation model, model updater, and ensemble po ...
CVPR 2017 SANet:《SANet: Structure-Aware Network for Visual Tracking》论文笔记
理解出错之处望不吝指正. 本文模型叫做SANet.作者在论文中提到,CNN模型主要适用于类间判别,对于相似物体的判别能力不强.作者提出使用RNN对目标物体的self-structure进行建模,用于提 ...
ICCV 2017 UCT:《UCT: Learning Unified Convolutional Networks forReal-time Visual Tracking》论文笔记
理解出错之处望不吝指正. 本文模型叫做UCT.就像论文题目一样,作者提出了一个基于卷积神经网络的end2end的tracking模型.模型的整体结构如下图所示(图中实线代表online trackin ...
CVPR 2018 STRCF:《Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking》论文笔记
理解出错之处望不吝指正. 本文提出的模型叫做STRCF. 在DCF中存在边界效应,SRDCF在DCF的基础上中通过加入spatial惩罚项解决了边界效应,但是SRDCF在tracking的过程中要使用 ...

《From Big to Small：Multi-Scale Local Planar Guidance for Monocular Depth Estimation》论文笔记