论文《CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus》笔记

我们提出了一个鲁棒的估计器，用于将相同形式的多个参数模型拟合到噪声测量中。应用场景包括在场景中寻找到多个消失点，将平面拟合到建筑图像，或者估计同一序列中的多个刚性运动。
尽管，目前很多工作集中在单个参数模型的拟合，但我们将重点放在将相同形式的多个模型拟合至数据的场景中。当数据中存在多个模型时，估计变得更具挑战性。一个模型的inliers构成所有其他模型的离群值（outliers），现有的离群值过滤器无法解决该类伪离群值。
早期的多模型拟合方法是采用一种顺序工作方式：它们反复应用类似RANSAC这样的估计器，在每次迭代中都删除与当前预测模型相关的数据点。

main contirbution

CONSAC，第一个基于学习的鲁棒多模型拟合方法(multi-model fitting)。它基于神经网络，该神经网络为假设选择过程(hypothesis selection process)顺序更新条件采样(conditional sampling)概率。
一个新的数据集，我们称之为NYU-VP，用于消失点估计。
我们为新的NYU-VP和YUD +数据集实现了最新的消失点估计结果。
我们在AdelaideRMF数据集上，取得了多模型单应性估计的最好的结果。

CONSAC

考虑一个带有噪声、带有离群值的观测(observations)集合 y ∈ Y \mathbf{y}\in\mathcal{Y} y∈Y。我们希望将模型(geometric model) h h h拟合到数据中，并产生 M M M个模型实例( instances )。 M M M个模型实例构成 M = { h 1 , . . . , h M } \mathcal{M}=\{h_1,...,h_M\} M={h1,...,hM}。CONSAC算法通过三个嵌套的循环，来估计 M \mathcal{M} M。
1 在神经网络的指导下，通过RANSAC采样（通过假设模型-假设模型池-最佳假设模型）生成单个模型实例（最佳假设模型） h ^ \hat{h} h^。
2 重复单个模型实例的生成过程，同时更新采样权值。多个模型实例形成multi-hypothesis M \mathcal{M} M。
3 我们重复上述过程1、2，形成multi-hypothesis池，选择其中最好的multi-hypothesis作为最终的multi-hypothesis M ^ \hat{\mathcal{M}} M^。

生成模型实例（Model Instance）（假设模型-假设模型池-最佳假设模型）

我们通过采样一个最小子集，获得单个模型的参数。比如，针对VP（Vanishing Point Estimation）问题，我们至少需要 C C C个观测（即两个线段）（该 C C C个观测即为最小子集），再加上一个求解器 f S f_S fS，即可求解模型的参数，在RANSAC中，这样得到的模型为假设模型（hypothesis）。
从所有观测中，重复获得采样获得 S S S个最小子集（minimal sets），即可获得假设模型池（ hypothesis pool） H = { h 1 , . . . , h S } \mathcal{H}=\{h_1,...,h_S\} H={h1,...,hS}。选择其中最佳假设模型（the best hypothesis） h ^ \hat{h} h^，依据一个得分函数 g S g_S gS，即inlier数量（残差函数 r ( y , h ) r(y,h) r(y,h)，阈值 τ \tau τ)。

Multi-Hypothesis 生成

我们重复 M M M次上述过程，生成 M M M个模型实例，组成一个multi-hypothesis M \mathcal{M} M。在VP问题中，对应图像的消失点集合。
考虑Multi-Hypothesis中，第 m m m个模型实例 h ^ m \hat{h}_m h^m，来自第 m m m个假设模型池中，依据评分函数 g S g_S gS:
h ^ m = arg ⁡ max ⁡ h ∈ H m g s ( h , Y , h ^ 1 : ( m − 1 ) ) \hat{\mathbf{h}}_{m}=\underset{\mathbf{h} \in \mathcal{H}_{m}}{\arg \max } g_{\mathbf{s}}\left(\mathbf{h}, \mathcal{Y}, \hat{\mathbf{h}}_{1:(m-1)}\right) h^m=h∈Hmargmaxgs(h,Y,h^1:(m−1))
因为之后要添加条件采样，并考虑顺序性，即当前的采样依赖于前面确定的模型实例 h ^ 1 : ( m − 1 ) \hat{\mathbf{h}}_{1:(m-1)} h^1:(m−1)。

Multi-Hypothesis池

重复 P P P次之前的过程，生成一个multi-hypothese池 P = { M 1 , . . . , M P } \mathcal{P}=\{\mathcal{M}_1,...,\mathcal{M}_P\} P={M1,...,MP}。我们选择其中最佳multi-hypothesis，根据另一个评分函数 g m g_m gm:
M ^ = arg ⁡ max ⁡ M ∈ P g m ( M , Y ) \hat{\mathcal{M}}=\underset{\mathcal{M} \in \mathcal{P}}{\arg \max } g_{\mathrm{m}}(\mathcal{M}, \mathcal{Y}) M^=M∈Pargmaxgm(M,Y)
g m g_m gm衡量了 M \mathcal{M} M中所有模型实例的inlier数量。

条件采样

RANSAC算法从所有观测 Y \mathcal{Y} Y中，均匀地采样获得最小集。如果观测中，离群值比例较大，那么获得无离群值的最小集所需要的采样次数就会呈指数上涨。
NG-RANSAC，根据一个分类分布 y ∼ p ( y ; w ) \mathbf{y} \sim p(\mathbf{y} ; \mathbf{w}) y∼p(y;w)（参数 w w w来自神经网络）进行采样，以提高采样到无离群值最小群的可能。此方法在解决高离群值比例问题中是有效的，但不适用于处理由多个模型实例构成的问题（模型实例A的inlier对模型实例B是离群值，我们称为伪离群值）。
顺序RANSAC通过在每次确定模型实例后，从观测 Y \mathcal{Y} Y中删属于该模型实例的inlier，然后再来对其余的观测进行采样。尽管能够减少后续实例的伪离群值，但该方法既不能处理前几次采样中的伪离群值，也不能处理总体的离群值。 Instead, we parametrise the conditional distribution by a neural network w w w conditioned on a state s s s： y ∼ p ( y ∣ s ; w ) \mathbf{y} \sim p(\mathbf{y} | \mathbf{s} ; \mathbf{w}) y∼p(y∣s;w)。
在生成第 m m m个模型实例时，此时，状态向量 s m s_m sm已经编码了先前确定的模型实例 h ^ 1 : ( m − 1 ) \hat{\mathbf{h}}_{1:(m-1)} h^1:(m−1)：

s m , i = max ⁡ j ∈ [ 1 , m ) g y ( y i , h ^ j ) s_{m, i}=\max _{j \in[1, m)} g_{y}\left(\mathbf{y}_{i}, \hat{\mathbf{h}}_{j}\right) sm,i=j∈[1,m)maxgy(yi,h^j)
即， s m , i s_{m,i} sm,i记录了观测 y i y_i yi最可能属于哪一个模型实例。其中函数 g y g_y gy用于衡量观测 y y y是否属于模型 h h h。
独立地对multi_hypothesis池进行采样：
p ( P ; w ) = ∏ i = 1 P p ( M i ; w ) p(\mathcal{P} ; \mathbf{w})=\prod_{i=1}^{P} p\left(\mathcal{M}_{i} ; \mathbf{w}\right) p(P;w)=i=1∏Pp(Mi;w)
同样地：
p ( M ; w ) = ∏ m = 1 M p ( H m ∣ s m ; w ) p ( H ∣ s ; w ) = ∏ s = 1 S p ( h s ∣ s ; w ) p ( h ∣ s ; w ) = ∏ c = 1 C p ( y c ∣ s ; w ) \begin{aligned} p(\mathcal{M} ; \mathbf{w}) &=\prod_{m=1}^{M} p\left(\mathcal{H}_{m} | \mathbf{s}_{m} ; \mathbf{w}\right) \\ p(\mathcal{H} | \mathbf{s} ; \mathbf{w}) &=\prod_{s=1}^{S} p\left(\mathbf{h}_{s} | \mathbf{s} ; \mathbf{w}\right) \\ p(\mathbf{h} | \mathbf{s} ; \mathbf{w}) &=\prod_{c=1}^{C} p\left(\mathbf{y}_{c} | \mathbf{s} ; \mathbf{w}\right) \end{aligned} p(M;w)p(H∣s;w)p(h∣s;w)=m=1∏Mp(Hm∣sm;w)=s=1∏Sp(hs∣s;w)=c=1∏Cp(yc∣s;w)
在采样获得假设模型池 H \mathcal{H} H进行采样时，我们不会更新状态 s s s，而只会在生成multi-hypothesis M \mathcal{M} M时更新状态 s s s。

神经网络训练

训练参数 w w w，用以增加采样到无离群值最小集的可能。我们定义task loss：
L ( w ) = E P ∼ p ( P ; w ) [ ℓ ( M ^ ) ] \mathcal{L}(\mathbf{w})=\mathbb{E}_{\mathcal{P} \sim p(\mathcal{P} ; \mathbf{w})}[\ell(\hat{\mathcal{M}})] L(w)=EP∼p(P;w)[ℓ(M^)]
并获得其相对于参数的导数：
∂ ∂ w L ( w ) ≈ 1 K ∑ k = 1 K [ ℓ ( M ^ k ) ∂ ∂ w log ⁡ p ( P k ; w ) ] \frac{\partial}{\partial \mathbf{w}} \mathcal{L}(\mathbf{w}) \approx \frac{1}{K} \sum_{k=1}^{K}\left[\ell\left(\hat{\mathcal{M}}_{k}\right) \frac{\partial}{\partial \mathbf{w}} \log p\left(\mathcal{P}_{k} ; \mathbf{w}\right)\right] ∂w∂L(w)≈K1k=1∑K[ℓ(M^k)∂w∂logp(Pk;w)]

有监督训练

如果ground true可用，即 M g t = { h 1 g t , . . . , h G g t } \mathcal{M}^{gt}=\{h_1^{gt},...,h_G^{gt}\} Mgt={h1gt,...,hGgt}。我们可以使用误差 ℓ s ( h ^ , h g t ) \ell_s(\hat{h},h^{gt}) ℓs(h^,hgt)来获得单个模型实例的误差。并使用 ℓ ( M ^ , M g t ) = f H ( C 1 : m i n ( M , G ) ) \ell(\hat{\mathcal{M}},\mathcal{M}^{gt})=f_H(\mathbf{C}_{1:min(M,G)}) ℓ(M^,Mgt)=fH(C1:min(M,G))，其中 C i j = ℓ s ( h ^ i , h j g t ) \mathbf{C}_{ij}=\ell_s(\hat{h}_i,h^{gt}_j) Cij=ℓs(h^i,hjgt)，获得multi-instance误差。

无监督训练

在没有ground true情况下，则选择最大化模型实例的联合inlier数量：
g c i ( h ^ m , Y ) = 1 ∣ Y ∣ ∑ i = 1 ∣ Y ∣ max ⁡ j ∈ [ 1 , m ] g i ( y i , h ^ j ) g_{\mathrm{ci}}\left(\hat{\mathbf{h}}_{m}, \mathcal{Y}\right)=\frac{1}{|\mathcal{Y}|} \sum_{i=1}^{|\mathcal{Y}|} \max _{j \in[1, m]} g_{\mathrm{i}}\left(\mathbf{y}_{i}, \hat{\mathbf{h}}_{j}\right) gci(h^m,Y)=∣Y∣1i=1∑∣Y∣j∈[1,m]maxgi(yi,h^j)
可以定义误差为：
ℓ self ( M ^ ) = − 1 M ∑ m = 1 M g ci ( h ^ m , Y ) \ell_{\text {self }}(\hat{\mathcal{M}})=-\frac{1}{M} \sum_{m=1}^{M} g_{\text {ci }}\left(\hat{\mathbf{h}}_{m}, \mathcal{Y}\right) ℓself (M^)=−M1m=1∑Mgci (h^m,Y)