Paper Notes: Cross-Domain Image Translation Based on GAN
1 GAN
1.1 Introduction
To learn the generator’s distribution pgpg
over data x, we define a prior on input noise variables pz(z)pz(z)
, then represent a mapping to data space as G(z;θg)G(z;θg)
, where GG
is a differentiable function represented by a multilayer perceptron with parameters θg
. We also define a second multilayer perceptron D(x;θd)D(x;θd)
that outputs a single scalar. D(x)D(x)
represents the probability that xx
came from the data rather than pg
. We train DD
to maximize the probability of assigning the correct label to both training examples and samples from G
. We simultaneously train GG
to minimize log(1−D(G(z)))
. In other words, DD
and G
play the following two-player mini-max game with value function V(G;D):V(G;D):
1.2 Theory Analysis
2 Cycle GAN(ICCV 2017)
2.1 Task: Cross Domain Image Translation
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image.
In this paper, we present a method that can learn to do the same: capturing special characteristics of one image collection and figuring out how these characteristics could be translated into the other image collection, all in the absence of any paired training examples.
2.2 Model: Cycle consistent
Loss:
2.3 Experiment
Dataset: Cityscapes dataset , map and aerial photo on data scraped from Google Maps
Metrics: AMT perceptual studies, FCN score, Semantic segmentation metrics
Result:
2.4 Limitations
- On translation tasks that involve color and texture changes, like many of those reported above, the method often succeeds. We have also explored tasks that require geometric changes, with little success.
- Some failure cases are caused by the distribution characteristics of the training datasets.
- We also observe a lingering gap between the results achievable with paired training data and those achieved by our unpaired method.
3 DIAT: Deep Identity-aware Transfer of Facial Attributes
3.1 Task: Identity-aware Transfer of Facial Attributes
Our DIAT and DIAT-A models can provide a unified solution for several representative facial attribute transfer tasks such as expression transfer, accessory removal, age progression, and gender transfer
3.2 Model
In this section, a two-stage scheme is developed to tackle the identity-aware attribute transfer task.
Face transform network
Loss:
Face Enhancement Network
Loss:
3.3 DIAT-A
In DIAT, the perceptual identity loss is defined on the pre-trained VGG-Face. Actually, it may be more effective to define this loss on some CNN trained to attribute transfer. Here we treat identity-preserving and attribute transfer as two related tasks, and define the perceptual identity loss based on the convolutional features of the discriminator. By this way, the network parameters for identity loss will be changed along with the updating of discriminator, and thus we named it as adaptive perceptual identity loss.
3.4 Experiments
Dataset: a subset of the aligned CelebA dataset
4 Unsupervised Cross-Domain Image Generation(ICLR 2017 )
4.1 Task
Recent achievements replicate some of these capabilities to some degree: Generative Adversarial Networks (GANs) are able to convincingly generate novel samples that match that of a given training set; style transfer methods are able to alter the visual style of images; domain adaptation methods are able to generalize learned functions to new domains even without labeled samples in the target domain and transfer learning is now commonly used to import existing knowledge and to make learning much more efficient.
These capabilities, however, do not address the general analogy synthesis problem that we tackle in this work. Namely, given separated but otherwise unlabeled samples from domains SS
and T
and a perceptual function ff
, learn a mapping G:S→T
such that f(x)∼f(G(x)f(x)∼f(G(x)
As a main application challenge, we tackle the problem of emoji generation for a given facial image. Despite a growing interest in emoji and the hurdle of creating such personal emoji manually, no system has been proposed, to our knowledge, that can solve this problem. Our method is able to produce face emoji that are visually appealing and capture much more of the facial characteristics than the emoji created by well-trained human annotators who use the conventional tools.
4.2 Model
Loss:
- DD
is a ternary classification function from the domain T
to 1,2,3, and Di(x)Di(x)
is the
probability it assigns to class i=1,2,3i=1,2,3
for an input sample xx
- During optimization, LG
is minimized over gg
and LD
is minimized over DD
- LCONST
enforces f-constancy for x∈Sx∈S
, while LTIDLTID
enforces that for samples x∈Tx∈T
- LTVLTV
is an anisotropic total variation loss, which is added in order to slightly smooth the resulting image - ff
is trained use other datasets before training this model
4.3 Experiments
Dataset:
- Street View House Number (SVHN) dataset to the domain of the MNIST dataset
- FROM PHOTOS TO EMOJI
Metrics: MNIST Accuracy
5 StarGAN: Multi-Domain Image-to-Image Translation
5.1 Introduction
Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains.
To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model
We can further extend to training multiple domains from different datasets.
### 5.2 Model
Loss:
- a domain classification loss of real images(Lrcls
) used to optimize D, and a domain classification loss of fake images(LfclsLclsf
) used to optimize G - Use LrecLrec
to guarantee that translated images preserve the content of its input images while changing only the domain-related part of the inputs.
5.3 Training with Multiple Datasets
5.3.1 Mask Vector
In StarGAN, we use an n-dimensional one-hot vector to represent m, with n being the number of datasets. and cici
represents a vector for the labels of the ii
-th dataset. The vector of the known label ci
can be represented as either a binary vector for binary attributes or a one-hot vector for categorical attributes
5.3.2 Training Strategy
When training StarGAN with multiple datasets, we use the domain label c∼c∼
defined at above as input to the generator. By doing so, the generator learns to ignore the unspecified labels, which are zero vectors, and
focus on the explicitly given label. The structure of the generator is exactly the same as in training with a single dataset, except for the dimension of the input label c∼c∼
.
5.3.3 CelebA and RaFD dataset demo
5.4 Experiments
Dataset: CelebA, RaFD
Metrics: AMT(human evaluation)
Dataset: RaFD dataset (90%/10% splitting for training and test sets)
Metrics: compute the classification error of a facial expression on synthesized images
6 Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks(use paired data)(CVPR 2017)
6.1 Introduction
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations.
we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either
(One architecture to different works)
6.2 Model
6.2.1 Generator with skips
6.2.2 Conditional GANs
6.2.3 PatchGAN
It is well known that the L2 loss and L1produce blurry results on image generation problems . Although these losses fail to encourage high-frequency crispness, in many cases they nonetheless accurately capture the low frequencies .
In order to model high-frequencies, it is sufficient to restrict our attention to the structure in local image patches. Therefore, we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each N × N patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D
6.2.4 Loss
6.3 Experiments
Dataset:
- Semantic labels$photo, trained on the Cityscapes dataset.
- Architectural labels!photo, trained on CMP Facades
- Map to aerial photo, trained on data scraped from Google Maps.
- BW to color photos, trained on [50 Imagenet large scale visual recognition challenge].
- Edges to photo, trained on data from [64 Generative visual manipulation on the natural image manifold] and [59 Fine-Grained Visual Comparisons with Local Learning ]; binary edges generated using the HED edge detector [57 Holistically-nested edge detection ] plus post processing.
- Sketch to photo: tests edges to photo models on human drawn sketches from [18 How do humans sketch objects].
- Day to night, trained on [32 Transient attributes for high-level understanding and editing of outdoor
scenes ]. - Thermal to color photos, trained on data from [26 Multispectral pedestrian detection: Benchmark dataset and baseline].
- Photo with missing pixels to inpainted photo, trained on Paris StreetView from [13 What makes paris look like paris]
Metrics: AMT, FCN-scores
7 Photo-Realistic Single Image Super-Resolution Using a GAN(use paired data to train)(CVPR 2017)
7.1 Task
Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors?
Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution.
To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss
7.2 Model
Loss:
- φi,jφi,j
in lSRVGG/i,jlVGG/i,jSR
, we indicate the feature map obtained by the j-th convolution (after activation) before the i-th max pooling layer within the VGG19 network - D network is optimized by the min-max game
- G network is optimized by the loss lSRlSR
7.3 Experiments
Dataset:
- Set5 [Low-complexity single-image super-resolution based on nonnegative neighbor embedding ],
- Set14 [On single image scale-up using sparse-representations ]
- BSD100
- the testing set of BSD300
Metrics: Mean opinion score (MOS) testing(human evaluation)
8 Conclusion
8.1 Reason for using GAN
Difficulties of traditional methods
- How to design effective loss
- How to use unpaired data
GAN’s advantages
- No need of the specific loss, but a high level goal
- Able to handle unpaired data
GAN’s disadvantages
- The Generator network often produce insensitive results
- Mode collapse: all inputs are mapped to the same output
8.2 Good ideas
- GAN Loss: keep high level domain feature
- Keep specific entity feature
- Given separated but otherwise unlabeled samples from domains SS
and T
and a perceptual function ff
, learn a mapping G:S→T
such that f(x)∼f(G(x))f(x)∼f(G(x))
- Perceptual Loss
- pre-trained f
- Cycle consistency
- Enhancement network
- Given separated but otherwise unlabeled samples from domains SS
- Translations for multiple domains using only a single model
8.3 Metrics
- Human evaluation: AMT, MOS
- Visualizing the generated results
- Use a model in the target domain to evaluate: FCN Scores(MNIST classifiers, VGG face classifier)
2
Paper Notes: Cross-Domain Image Translation Based on GAN相关推荐
- Cross Domain Knowledge Transfer for Person Re-identification笔记
Cross Domain Knowledge Transfer for Person Re-identification笔记 1 介绍 2 相关工作 3 方法 3.1 特征提取的ResNet 3.2 特 ...
- 关于ajax跨域请求(cross Domain)
Cross Domain AJAX主要就是A.com网站的页面发出一个XMLHttpRequest,这个Request的url是B.com,这样的请求是被禁止的,浏览器处于安全考虑不允许进行跨域访问, ...
- 添加本地图层出现要求cross domain policy的错误
错误描述: A security exception occured while trying to connect to the REST endpoint. Make sure you have ...
- 对抗机器学习:Generating Adversarial Malware Examples for Black-box Attacks Based on GAN
论文url https://arxiv.org/pdf/1702.05983.pdf @article{hu2017generating, title={Generating adversarial ...
- PAPER NOTES: Roofline: an insightful visual performance model for multicore architectures
• GATHER: (30MIN)20180327 • PAPER INFO: Roofline: an insightful visual performance model for multico ...
- 【 Notes 】MOBILE LOCALIZATON METHOD BASED ON MULTIDIMENSIONAL SIMILARITY ANALYSIS
目录 ABSTRACT INTRODUCTION LINEAR TOA LOCALIZATION MULTIDIMENSIONAL SIMILARITY ANALYSIS SUBSPACE BASED ...
- 【Paper Notes】DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
论文链接 该论文发表在CVPR 2022上 目录 主要任务 方法介绍 DDPM/DDIM回顾 模型架构 Loss设计 Forward和Reverse过程 未知Domain之间的迁移 实验结果 参考文献 ...
- 【Paper】2020_Resilient Self/Event-Triggered Consensus Based on Ternary Control
Matsume H, Wang Y, Ishii H. Resilient self/event-triggered consensus based on ternary control[J]. No ...
- iframe cross domain
http://blog.cakemail.com/the-iframe-cross-domain-policy-problem/ 转载于:https://www.cnblogs.com/dmdj/p/ ...
- 部署在SAP云平台上的应用, 该如何解决跨域问题Cross Domain
要获取更多Jerry的原创文章,请关注公众号"汪子熙":
最新文章
- Good Bye 2014 B. New Year Permutation(floyd )
- CachedNetworkImage 图片缓存
- 10 种保护 Spring Boot 应用的绝佳方法
- Vs2010中删除空行
- WCF系列之.net(3.0/3.5)Rest使用示例
- vba 数组赋值_VBA数组与字典解决方案第18讲:VBA中静态数组的定义及创建
- 数据挖掘之关联分析三(规则的产生)
- NumPy Matplotlib PIP安装
- java获取微信的通讯录,java微信开发API第三步 微信获取以及保存接口调用凭证
- python 学习之路开始了
- webvector将html转为svg或者png图片的工具
- oss读取指定文件夹下所有图片
- 中南大学计算机软件专业曾进,中南大学_2012年校级优秀毕业生名单
- 如何去掉网页一直点击出现蓝色背景的效果
- Unity实现将图片上传到服务器功能
- 三步必杀(高阶差分系列)
- 恢复误删excel工作薄中的表格
- 商品库存推送至外部系统API接口文档
- CTF中MISC常见工具总结
- C语言数组实现丢手绢问题(约瑟夫问题)
热门文章
- ARM(ARM处理器) x64和x86
- 内后视镜和外后视镜哪个显示真实距离?
- 一行代码让图形秒变「手绘风」
- VMware虚拟机磁盘文件vmdk单文件转多文件相互转换
- win32_mfc 理论资料 供自己查阅
- 指针的类型(即指针本身的类型)和指针所指向的类型是两个概念
- Android Behavior之ViewPager+Fragment+RecyclerView实现吸顶效果
- (6.6)【PC中数据隐藏】Linux中隐写:TrueCrypt创建隐藏卷、使用方法
- linux怎么停止ping命令
- html5一个圆圈旋转,Javascript实现可旋转的圆圈实例代码