【论文阅读】Universal Domain Adaptation

Universal Domain Adaptation

SUMMARY@2020/3/27

文章目录

Motivation
Related Work
Challenges / Aims /Contribution
Method Proposed
- Feature extractor FFF
- Label classifier GGG
- **Non-adversarial** domain discriminator D′D^\primeD′
- Adversarial domain discriminator DDD
Training
Testing
Experiment

Motivation

This paper focuses on its special setting of universal domain adaptation, where

no prior information about the target label sets is provided.
we know source domain with labeled data.

The following figure shows this motivation of this setting.

and the following show some settings of this universal domain adaptation:

Related Work

This work is partly based on some early works of partially set domain adaptation by Mingsheng Long group, like:

SAN (Partial Transfer Learning with Selective Adversarial Networks)
- utilizes multiple domain discriminators with class level and instance-level weighting mechanism to achieve per-class adversarial distribution matching.
PADA (Partial adversarial domain adaptation)
- only one adversarial network and jointly applying class-level weighting on the source classifier
- haven’t yet read

and some of others’ relative work of :

IWAN (Importance weighted adversarial nets for partial domain adaptation)
- constructs an auxiliary domain discriminator to quantify the probability of a source sample being similar to the target domain.
- haven’t yet read

And these works all partly applies the idea of adversarial network GAN and domain adaptation version GAN:

GAN (Generative Adversarial Nets)
DANN( Domain-Adversarial Training of Neural Networks)
- adversarial-based, deep method domain adaptation

Challenges / Aims /Contribution

Under the universal domain adaptation setting, our goal now is to match the common categories in source and target domain. The main challenges of solving this universal problem are:

how to deal with the CSˉ\bar {C_S}CSˉ part which is unrelated part of source domain to circumvent negative transfer for target domain
effective domain adaptation between related part of source domain and target domain
learn model (feature extraction & classifier )to minimize the target risk in the common set CCC

Method Proposed

UAN(universal adaptation network) is composes of 4 parts in training phase as following figure shows.

Feature extractor FFF

find good features that match source and target
good features to be used by classifier

Label classifier GGG

compute prediction label y^=G(F(x))∈CS\hat y = G(F(x)) \in C_Sy^=G(F(x))∈CS (source domain label set)
classification loss need to be minimized by good parameters of FFF and GGG
EG=E(x,y)∼pL(y,G(F(x)))E_G = \mathbb E_{(\mathrm{x,y})\sim p}L(\mathrm{y},G(F(\mathrm x))) EG=E(x,y)∼pL(y,G(F(x)))

Non-adversarial domain discriminator D′D^\primeD′

compute similarity of each x\rm xx to source domain
- d^′=D′(z)∈[0,1]\hat d^\prime = D^\prime(\rm z) \in[0,1]d^′=D′(z)∈[0,1]
- $\hat d^\prime \rightarrow 1 $ if x is more similar to source
domain classification loss need to be minimized, thus end up with good d^′\hat d^\primed^′ output for every sample from both source and target domain:
ED′=−Ex∼plog(D′(F(x)))−Ex∼qlog(1−D′(F(x)))E_{D^\prime} = - \mathbb E_{\mathrm{x}\sim p}\mathrm{log}(D^\prime(F(\mathrm x))) - \mathbb E_{\mathrm{x}\sim q}\mathrm{log}(1- D^\prime(F(\mathrm x))) ED′=−Ex∼plog(D′(F(x)))−Ex∼qlog(1−D′(F(x)))
hypothesis: expectation of similarity value from different label set distribution will be used in weighting adversarial domain discriminator D:
Ex∼pCSˉd^′>Ex∼pCd^′>Ex∼qCd^′>Ex∼qCtˉd^′\mathbb E_{\mathrm x\sim {p_{\bar {C_S}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {p_{ {C}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {q_{{C}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {q_{\bar {C_t}}}} {\hat d^\prime} Ex∼pCSˉd^′>Ex∼pCd^′>Ex∼qCd^′>Ex∼qCtˉd^′
not used in adversarial, since it is the same as in DANN, which aims at matching the exactly same source and target label space. may cause negative transfer in universal setting.

Adversarial domain discriminator DDD

aims at discriminate source and target in the common label set CCC
domain discriminate loss: needs to be minimized for good discriminator; needs to be maximized which equals the good representation of feature extractor:
ED=−Ex∼pws(x)log(D′(F(x)))−Ex∼qwt(x)log(1−D′(F(x)))E_{D} = - \mathbb E_{\mathrm{x}\sim p}w^s(\mathrm x)\mathrm{log}(D^\prime(F(\mathrm x))) - \mathbb E_{\mathrm{x}\sim q}w^t(\mathrm x)\mathrm{log}(1- D^\prime(F(\mathrm x))) ED=−Ex∼pws(x)log(D′(F(x)))−Ex∼qwt(x)log(1−D′(F(x)))
add big weights for samples from common label set in both source and target domain , aims at maximally match the source and target domain specially in common label set.
weights(called “sample level transferability criterion”) to be constructed:
Ex∼pCws(x)>Ex∼pˉCsws(x)Ex∼qCwt(x)>Ex∼qˉCtwt(x)\mathbb E_{\mathrm x\sim {p_{{C}}}} w^s(\mathrm x) > \mathbb E_{\mathrm x\sim {\bar p_{{C_s}}}} w^s(\mathrm x) \\ \mathbb E_{\mathrm x\sim {q_{{C}}}} w^t(\mathrm x) > \mathbb E_{\mathrm x\sim {\bar q_{{C_t}}}} w^t(\mathrm x) Ex∼pCws(x)>Ex∼pˉCsws(x)Ex∼qCwt(x)>Ex∼qˉCtwt(x)
use entropy of predicted vector to measure uncertainty of prediction:
Ex∼qCtˉH(y^)>Ex∼qCH(y^)>Ex∼pCH(y^)>Ex∼pCsˉH(y^)\mathbb E_{\mathrm x\sim {q_{\bar {C_t}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {q_{{C}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {p_{{C}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {p_{\bar {C_s}}}} H(\mathrm {\hat y}) Ex∼qCtˉH(y^)>Ex∼qCH(y^)>Ex∼pCH(y^)>Ex∼pCsˉH(y^)
use domain similarity and the prediction uncertainty of each sample, to develop a weighting mechanism for discovering label sets shared by both domains and promote common-class adaptation
ws(x)=H(y^)log∣Cs∣−d^′(x)wt(x)=d^′(x)−H(y^)log∣Cs∣w^s(\mathrm x) = \frac{H(\mathrm {\hat y})}{\mathrm{log}|C_s|}-\hat d^\prime(\mathrm x) \\ w^t(\mathrm x) = \hat d^\prime(\mathrm x)-\frac{H(\mathrm {\hat y})}{\mathrm{log}|C_s|}\\ ws(x)=log∣Cs∣H(y^)−d^′(x)wt(x)=d^′(x)−log∣Cs∣H(y^)
- normalized H
- all together normalized when training

Training

to write in GAN-based two stage, but in neural network implemented end-to-end by using the gradient reversal layer from DANN:

KaTeX parse error: Expected group after '_' at position 6: \max_̲\limits{D}\min_…

Testing

see figure below :

no adversarial DDD
calculate weight wt(x)w^t(x)wt(x)for sample xxx from target
set a validated threshold to argue whether x comes from common label set

Experiment

FFF is pretrained ResNet50
all unknown in target labeled as a whole "unknow " big class
better than prior setting methods