A one-to-many conditional generative adversarial network framework for multiple image-to-image translations

Chai, Chunlei; Liao, Jing; Zou, Ning; Sun, Lingyun

doi:10.1007/s11042-018-5968-7

A one-to-many conditional generative adversarial network framework for multiple image-to-image translations

Published: 30 April 2018

Volume 77, pages 22339–22366, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chunlei Chai¹,
Jing Liao¹,
Ning Zou¹ &
…
Lingyun Sun¹

1637 Accesses
Explore all metrics

Abstract

Image-to-Image translation was proposed as a general form of many image learning problems. While generative adversarial networks were successfully applied on many image-to-image translations, many models were limited to specific translation tasks and were difficult to satisfy practical needs. In this work, we introduce a One-to-Many conditional generative adversarial network, which could learn from heterogeneous sources of images. This is achieved by training multiple generators against a discriminator in synthesized learning way. This framework supports generative models to generate images in each source, so output images follow corresponding target patterns. Two implementations, hybrid fake and cascading learning, of the synthesized adversarial training scheme are also proposed, and experimented on two benchmark datasets, UTZap50K and MVOD5K, as well as a new high-quality dataset BehTex7K. We consider five challenging image-to-image translation tasks: edges-to-photo, edges-to-similar-photo translation on UTZap50K, cross-view translation on MVOD5K, and grey-to-color, grey-to-Oil-Paint on BehTex7K. We show that both implementations are able to faithfully translate from an image to another image in edges-to-photo, edges-to-similar-photo, grey-to-color, and grey-to-Oil-Paint translation tasks. The quality of output images in cross-view translation need to be further boosted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation

An Overview of Image-to-Image Translation Using Generative Adversarial Networks

SingleGAN: Image-to-Image Translation by a Single-Generator Network Using Multiple Generative Adversarial Learning

References

Cai B, Xu X, Jia K, Qing C, Tao D (2016) DehazeNet: an end-to-end system for single image haze removal. IEEE Trans Image Process 25(11):5187–5198
Article MathSciNet Google Scholar
Çalışır F, Baştan M, Ulusoy Ö, Güdükbay U (2017) Mobile multi-view object image search. Multimedia Tools & Applications 76(10):12433–12456
Article Google Scholar
Chen M, Denoyer L (2016) Multi-view Generative Adversarial Networks arXiv eprint arXiv:1611.02019
Elgammal A, Liu B, Elhoseiny M, Mazzone M (2017) CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms. arXiv eprint arXiv:1706.07068
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmannc AG (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97
Article Google Scholar
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2414–2423
Ghosh A, Kulharia V, Namboodiri V, Torr PHS, Dokania PK (2017). Multi-Agent Diverse Generative Adversarial Networks. arXiv eprint arXiv:1606.07536
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp 2672–2680
Isola P, Zhu JY, Zhou TH, Efros, AA (2016) Image-to-Image Translation with Conditional Adversarial Networks arXiv eprint arXiv:1611.07004
Jacob VG, Gupta S (2009) Colorization of grayscale images and videos using a semiautomatic approach. In: 2009 16th IEEE International Conference on Image Processing, pp 1653–1656. doi:10.1109/ICIP.2009.5413392
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. arXiv eprint arXiv:1703.05192
Kwak H, Zhang BT (2016) Ways of Conditioning Generative Adversarial Networks. arXiv eprint arXiv:1611.01455
Liu MY, Tuzel O (2016) Coupled generative adversarial networks. arXiv preprint arXiv:
Liu A-A, Su Y-T, Jia P-P, Gao Z, Hao T, Yang Z-X (Jun. 2015) (2015) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Transactions on Cybernetics 45(6):1194–1208
Article Google Scholar
Liu Y, Qin Z, Luo Z, Wang H (2017) Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. arXiv eprint arXiv:1705.01908
Liu Z et al. (2017) Multiview and multimodal pervasive indoor localization. ACM on Multimedia Conference ACM: 109–117
Luan F, Paris S, Bala K (2017) Deep Photo Style Transfer. arXiv eprint arXiv:1703.07511
Mirza M, Osindero S (2014) Conditional generative adversarial nets. Computer Science 2672–2680
Nie L, Wang M, Zha Z, et al (2011) Multimedia answering: enriching text QA with media information: 695–704
Perarnau G, Weijer JVD, Raducanu B, Álvarez JM (2016) Invertible Conditional GANs for image editing. In Conference and Workshop on Neural Information Processing Systems 2016. arXiv eprint arXiv:1611.06355
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved Techniques for Training GANs. arXiv eprint arXiv:1606.03498
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444. https://doi.org/10.1109/TIP.2005.859378
Article Google Scholar
Vedran V, Raymond C, Gravier G (2017) Generative adversarial networks for multimodal representation learning in video hyperlinking. In: ACM on International Conference on Multimedia Retrieval, pp 416–419
Wang X, Gupta A (2016) Generative Image Modeling Using Style and Structure Adversarial Networks. arXiv eprint arXiv:1603.05631
Wang Y, Zhang L, Weijer JVD (2016) Ensembles of Generative Adversarial Networks. arXiv eprint arXiv:1612.00991
Wang C, Xu C, Tao D (2017) Perceptual Adversarial Networks for Image-to-Image Transformation. arXiv eprint arXiv:1706.09138
Xie S, Tu Z (2017) Holistically-nested edge detection. Int J Comput Vis 125:3–18
Article MathSciNet Google Scholar
Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia 15(3):661–669
Article Google Scholar
Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. arXiv eprint arXiv:1704.02510
Yu A, Grauman K (2014) Fine-grained visual comparisons with local learning. In: Computer Vision and Pattern Recognition, pp 192–199
Zhang L, Zhang L, Mou X, Zhang D (2012) A comprehensive evaluation of full reference image quality assessment algorithms. In: 2012 19th IEEE International Conference on Image Processing, pp 1477–1480. doi:10.1109/ICIP.2012.6467150
Zhang R, Isola P, Efros AA (2016). Colorful Image Colorization. arXiv eprint arXiv:1603.08511
Zhang H et al (2016) Online collaborative learning for open-vocabulary visual classifiers. IEEE Computer Vision and Pattern Recognition: 2809–2817
Zhang H, Sindagi V, Patel VM (2017) Image De-raining Using a Conditional Generative Adversarial Network. arXiv eprint arXiv:1701.05957
Zhou W, Bovik AC (2002) A universal image quality index. IEEE Signal Processing Letters 9(3):81–84. https://doi.org/10.1109/97.995823
Article Google Scholar
Zhou W, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Article Google Scholar
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv eprint arXiv:1703.10593

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (61303137), the National Science and Technology Support Program (2015BAH21F01) and the Art Project for National Social-Science Foundation (15BG084). We thank Dr. Preben Hansen from Stockholm University, Department of Computer Science, for assistance in proofreading and technical editing of the manuscript.

Author information

Authors and Affiliations

Laboratory of CAD&CG, Zhejiang University, Hangzhou, China
Chunlei Chai, Jing Liao, Ning Zou & Lingyun Sun

Authors

Chunlei Chai
View author publications
You can also search for this author inPubMed Google Scholar
Jing Liao
View author publications
You can also search for this author inPubMed Google Scholar
Ning Zou
View author publications
You can also search for this author inPubMed Google Scholar
Lingyun Sun
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ning Zou.

Appendix

1.1 Derivations and Proofs

1.1.1 Derivation of the objective of a set of generators

As for optimizing over multiple generators, Ghosh et al. [7] modified the objective function of the discriminator where along with finding the fakes, the discriminator has to find the generator that produced the given fake.

$$ \underset{D}{\max }{\mathbb{E}}_{x\sim {p}_{data}(x)}\left[\log {D}_{k+1}(x)\right]+\sum \limits_{i=1}^k{\mathbb{E}}_{x_i\sim {p}_{g_i}(x)}\left[\log {D}_i\left({x}_i\right)\right] $$

(15)

For a fixed generator, the objective is to minimize:

$$ {\mathbb{E}}_{x\sim {p}_d}\log {D}_{k+1}(x)+\sum \limits_{i=1}^k{\mathbb{E}}_{x\sim {p}_{g_i}}\log \left(1-{D}_{k+1}(x)\right) $$

(16)

For a set of generators, the objective is:

$$ \underset{G_1,{G}_2,\dots, {G}_k}{\min }{\mathbb{E}}_{x\sim {p}_{data}(x)}\left[\log {D}_{k+1}(x)\right]+\sum \limits_{i=1}^k{\mathbb{E}}_{x_i\sim {p}_{g_i}(x)}\left[\log \left(1-{D}_{k+1}(x)\right)\right] $$

(17)

where k is the number of generators, denoted as M, p_d the true data distribution, denoted as p_data, $ {p}_{g_i} $ the distribution learned by the i-th generator, denoted as $ {p}_{G_m} $ in this context.

Introducing the conditioned variable into (17), and replacing notations by those used in this paper, the objective of a set of conditional generators is:

$$ \underset{G_1,{G}_2,\dots, {G}_M}{\min }{\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log {D}^{M+1}\left(x,y\right)+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\log \left(1-{D}^{M+1}\left(x,y\right)\right) $$

(18)

Likely, the objective of conditional discriminator is:

$$ \underset{D}{\max }{\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log {D}^{M+1}\left(x,y\right)+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\log {D}^m\left(x,y\right) $$

(19)

We add L₁ regularization term to reduce blurs in image [9], yield the final objective (12), (13) and (14). Note that as for hybrid fake implementation, a hybrid instance $ \mathbb{I} $ is used by discriminator, rather than a individual instance x.

1.1.2 Proofs

[7] has provided detailed propositions and theorems about the objective of training a set of generators and a discriminator for an unconditional GAN. The proofs of the One-to-Many cGAN are inspired by these propositions and theorems. We introduce conditioned variable into the optimal distribution learned by the unconditional discriminator [7], and proposed a general format of the optimal distribution learned by a conditional discriminator:

$$ {D}^m\left(x,y\right)=\frac{p_{G_m}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)},\forall m\in \left\{1,2,\dots, M+1\right\} $$

(20)

Note that the unknown $ {p}_{G_{M+1}}:= {p}_{data} $ to avoid clutter [7].

Then, replacing D^m and D^M + 1 in (18) using (20), yields

$$ {\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log \left[\frac{p_{data}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right]+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\left[\log \left(1-\frac{p_{G_{M+1}}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right)\right] $$

(21)

Using $ {\sum}_{m=1}^{M+1}{D}^m=1 $,

$$ {\displaystyle \begin{array}{l}{\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log \left[\frac{p_{data}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right]+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\left[\log \left(\frac{\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right)\right]\\ {}:= {\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log \left[\frac{p_{data}\left(x,y\right)}{p_{avg}\left(x,y\right)}\right]+M{\mathbb{E}}_{x,y\sim {p}_G\left(x,y\right)}\log \left[\frac{p_G\left(x,y\right)}{p_{avg}\left(x,y\right)}\right]-\left(M+1\right)\log \left(M+1\right)+M\log M\end{array}} $$

(22)

where $ {p}_G=\frac{\sum_{m=1}^M{p}_{G_m}\left(x,y\right)}{M} $, $ {p}_{avg}\left(x,y\right)=\frac{p_{data}(x)+{\sum}_{m=1}^M{p}_{G_m}\left(x,y\right)}{M+1} $, and $ {\mathit{\sup}}_D\left({p}_G\right)={\bigcup}_{m=1}^M{\mathit{\sup}}_D\left({p}_{G_m}\right) $. The final term (22) obtains its minimum –(M + 1) log(M + 1) + M log M, when $ {p}_{data}=\frac{\sum_{m=1}^M{p}_{G_m}\left(x,y\right)}{M} $ [7]. When the number of generator M is equal to 1, the One-to-Many cGAN obtains the minimum value of log 4 of the Jensen-Shannon divergence based objective function in the original GAN [8].

The convergence of $ {p}_{G_m} $ can be shown by computing gradient descent update at the optimal D giving the corresponding G_m. Each $ {\mathit{\sup}}_D\left({p}_{G_m},D\right) $ forms convex in $ {p}_{G_m} $ with a unique global optimal value as proven in [7]. Therefore, with sufficiently small updates of $ {p}_{G_m} $, $ {p}_{G_m} $ converges to the corresponding p_data(x_m).

1.2 Architecture of generator and discriminator

We denote C(k) a Convolution-BatchNorm-ReLU layer with k filters, CD(k) a Convolution-BatchNorm-Dropout-ReLU layer with a dropout rate of 50%. All ReLUs in discriminator and the encoder of generator are leaky, with slop 0.2. All ReLUs in the decoder are not leaky. The generator is a modified encoder-decoder architecture called U-Net [9]:

Encoder: C(64)-C(128)-C(256)-C(512)-C(512)-C(512)-C(512)-C(512)
Decoder: CD(512)-CD(1024)-CD(1024)-CD(1024)-CD(1024)-C(512)-C(256)-C(128)

The discriminator is a 70 × 70 Markovian discriminator (PatchGAN) [9]: C(64)-C(128)-C(256)-C(512).

BatchNorm is not applied to the first layer C(64).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chai, C., Liao, J., Zou, N. et al. A one-to-many conditional generative adversarial network framework for multiple image-to-image translations. Multimed Tools Appl 77, 22339–22366 (2018). https://doi.org/10.1007/s11042-018-5968-7

Download citation

Received: 27 September 2017
Revised: 16 March 2018
Accepted: 03 April 2018
Published: 30 April 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5968-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A one-to-many conditional generative adversarial network framework for multiple image-to-image translations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation

An Overview of Image-to-Image Translation Using Generative Adversarial Networks

SingleGAN: Image-to-Image Translation by a Single-Generator Network Using Multiple Generative Adversarial Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Derivations and Proofs

1.1.1 Derivation of the objective of a set of generators

1.1.2 Proofs

1.2 Architecture of generator and discriminator

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now