CVPR2020 | Code


image and instance levels alignment [7], strong-local and weak-global alignment [44], local-region alignment based on region proposal [62], multi-level feature alignment with prediction-guided instance-level constraint


the transferability refers to the invariance of the learned representations across domains,
and discriminability refers to the ability of the detector to localize and distinguish different instances.

因此提出使用Hierarchical Transferability Calibration Network (HTCN)来平衡这两者的关系。

Importance Weighted Adversarial Training with Input Interpolation (IWAT-I)

首先使用CycleGAN分别对源域和目标域生成了相应的合成图像,然后对这些生成的图像按照跨域的相似性(cross-domain similarity)进行权重的分配。

Our key insight is that not all images are created equally in terms of transferability especially after interpolation

根据域分类器D2D_2D2​的输出di=D2(G1⋅G2(xi))d_i = D_2(G_1 \cdot G_2(x_i))di​=D2​(G1​⋅G2​(xi​)),将其带入信息熵函数中
vi=H(di)=−di⋅log⁡(di)−(1−di)⋅log⁡(1−di)v_i = H(d_i) = -d_i \cdot \log(d_i) -(1- d_i) \cdot \log(1-d_i) vi​=H(di​)=−di​⋅log(di​)−(1−di​)⋅log(1−di​)


gi=fi×(1+vi)g_i = f_i \times (1+v_i)gi​=fi​×(1+vi​)


Lga=E[log⁡(D3(G3(gis)))]+E[1−log⁡(D3(G3(git)))]\mathcal{L}_{ga} = \mathbb{E}[\log(D_3(G_3(g_i^s)))] + \mathbb{E}[1-\log(D_3(G_3(g_i^t)))]Lga​=E[log(D3​(G3​(gis​)))]+E[1−log(D3​(G3​(git​)))]


Context-Aware Instance-Level Alignment (CILA)


ffus=[fc1,fc2,fc3]⊗finsf_{fus} = [{f_{c}^{1}}, {f_c^2},f_c^3] \otimes f_{ins}ffus​=[fc1​,fc2​,fc3​]⊗fins​

但与strong-weak不同的是,这样的tensor product operation可能会产生维度爆炸,因此先随机采样,然后进行Hadamard product.

ffus=1d(R1fc)⊙(R2fins)f_{fus} = \frac{1}{\sqrt{d}}(R_1f_c) \odot (R_2f_{ins})ffus​=d​1​(R1​fc​)⊙(R2​fins​)

Local Feature Mask for Semantic Consistency


we assume that some local regions of the whole image are more descriptive and dominant than others.


根据特征图上的每个位置kkk,计算mfk=2−H(dik)m_f^k = 2- H(d_i^k)mfk​=2−H(dik​)得到相应的Mask 图(不确定性值越小的区域,可迁移性越好),作为一个Attention Map。


作者将信息熵作为一个度量域分类器的准则,从而挖掘出难迁移的图像(IWAT-I),易迁移的区域(Local Feature Mask)。

