Skip to main content
Log in

A one-to-many conditional generative adversarial network framework for multiple image-to-image translations

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image-to-Image translation was proposed as a general form of many image learning problems. While generative adversarial networks were successfully applied on many image-to-image translations, many models were limited to specific translation tasks and were difficult to satisfy practical needs. In this work, we introduce a One-to-Many conditional generative adversarial network, which could learn from heterogeneous sources of images. This is achieved by training multiple generators against a discriminator in synthesized learning way. This framework supports generative models to generate images in each source, so output images follow corresponding target patterns. Two implementations, hybrid fake and cascading learning, of the synthesized adversarial training scheme are also proposed, and experimented on two benchmark datasets, UTZap50K and MVOD5K, as well as a new high-quality dataset BehTex7K. We consider five challenging image-to-image translation tasks: edges-to-photo, edges-to-similar-photo translation on UTZap50K, cross-view translation on MVOD5K, and grey-to-color, grey-to-Oil-Paint on BehTex7K. We show that both implementations are able to faithfully translate from an image to another image in edges-to-photo, edges-to-similar-photo, grey-to-color, and grey-to-Oil-Paint translation tasks. The quality of output images in cross-view translation need to be further boosted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Cai B, Xu X, Jia K, Qing C, Tao D (2016) DehazeNet: an end-to-end system for single image haze removal. IEEE Trans Image Process 25(11):5187–5198

    Article  MathSciNet  Google Scholar 

  2. Çalışır F, Baştan M, Ulusoy Ö, Güdükbay U (2017) Mobile multi-view object image search. Multimedia Tools & Applications 76(10):12433–12456

    Article  Google Scholar 

  3. Chen M, Denoyer L (2016) Multi-view Generative Adversarial Networks arXiv eprint arXiv:1611.02019

  4. Elgammal A, Liu B, Elhoseiny M, Mazzone M (2017) CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms. arXiv eprint arXiv:1706.07068

  5. Gao Z, Zhang H, Xu GP, Xue YB, Hauptmannc AG (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97

    Article  Google Scholar 

  6. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2414–2423

  7. Ghosh A, Kulharia V, Namboodiri V, Torr PHS, Dokania PK (2017). Multi-Agent Diverse Generative Adversarial Networks. arXiv eprint arXiv:1606.07536

  8. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp 2672–2680

  9. Isola P, Zhu JY, Zhou TH, Efros, AA (2016) Image-to-Image Translation with Conditional Adversarial Networks arXiv eprint arXiv:1611.07004

  10. Jacob VG, Gupta S (2009) Colorization of grayscale images and videos using a semiautomatic approach. In: 2009 16th IEEE International Conference on Image Processing, pp 1653–1656. doi:10.1109/ICIP.2009.5413392

  11. Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. arXiv eprint arXiv:1703.05192

  12. Kwak H, Zhang BT (2016) Ways of Conditioning Generative Adversarial Networks. arXiv eprint arXiv:1611.01455

  13. Liu MY, Tuzel O (2016) Coupled generative adversarial networks. arXiv preprint arXiv:

  14. Liu A-A, Su Y-T, Jia P-P, Gao Z, Hao T, Yang Z-X (Jun. 2015) (2015) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Transactions on Cybernetics 45(6):1194–1208

    Article  Google Scholar 

  15. Liu Y, Qin Z, Luo Z, Wang H (2017) Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. arXiv eprint arXiv:1705.01908

  16. Liu Z et al. (2017) Multiview and multimodal pervasive indoor localization. ACM on Multimedia Conference ACM: 109–117

  17. Luan F, Paris S, Bala K (2017) Deep Photo Style Transfer. arXiv eprint arXiv:1703.07511

  18. Mirza M, Osindero S (2014) Conditional generative adversarial nets. Computer Science 2672–2680

  19. Nie L, Wang M, Zha Z, et al (2011) Multimedia answering: enriching text QA with media information: 695–704

  20. Perarnau G, Weijer JVD, Raducanu B, Álvarez JM (2016) Invertible Conditional GANs for image editing. In Conference and Workshop on Neural Information Processing Systems 2016. arXiv eprint arXiv:1611.06355

  21. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved Techniques for Training GANs. arXiv eprint arXiv:1606.03498

  22. Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444. https://doi.org/10.1109/TIP.2005.859378

    Article  Google Scholar 

  23. Vedran V, Raymond C, Gravier G (2017) Generative adversarial networks for multimodal representation learning in video hyperlinking. In: ACM on International Conference on Multimedia Retrieval, pp 416–419

  24. Wang X, Gupta A (2016) Generative Image Modeling Using Style and Structure Adversarial Networks. arXiv eprint arXiv:1603.05631

  25. Wang Y, Zhang L, Weijer JVD (2016) Ensembles of Generative Adversarial Networks. arXiv eprint arXiv:1612.00991

  26. Wang C, Xu C, Tao D (2017) Perceptual Adversarial Networks for Image-to-Image Transformation. arXiv eprint arXiv:1706.09138

  27. Xie S, Tu Z (2017) Holistically-nested edge detection. Int J Comput Vis 125:3–18

    Article  MathSciNet  Google Scholar 

  28. Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia 15(3):661–669

    Article  Google Scholar 

  29. Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. arXiv eprint arXiv:1704.02510

  30. Yu A, Grauman K (2014) Fine-grained visual comparisons with local learning. In: Computer Vision and Pattern Recognition, pp 192–199

  31. Zhang L, Zhang L, Mou X, Zhang D (2012) A comprehensive evaluation of full reference image quality assessment algorithms. In: 2012 19th IEEE International Conference on Image Processing, pp 1477–1480. doi:10.1109/ICIP.2012.6467150

  32. Zhang R, Isola P, Efros AA (2016). Colorful Image Colorization. arXiv eprint arXiv:1603.08511

  33. Zhang H et al (2016) Online collaborative learning for open-vocabulary visual classifiers. IEEE Computer Vision and Pattern Recognition: 2809–2817

  34. Zhang H, Sindagi V, Patel VM (2017) Image De-raining Using a Conditional Generative Adversarial Network. arXiv eprint arXiv:1701.05957

  35. Zhou W, Bovik AC (2002) A universal image quality index. IEEE Signal Processing Letters 9(3):81–84. https://doi.org/10.1109/97.995823

    Article  Google Scholar 

  36. Zhou W, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  37. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv eprint arXiv:1703.10593

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (61303137), the National Science and Technology Support Program (2015BAH21F01) and the Art Project for National Social-Science Foundation (15BG084). We thank Dr. Preben Hansen from Stockholm University, Department of Computer Science, for assistance in proofreading and technical editing of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Zou.

Appendix

Appendix

1.1 Derivations and Proofs

1.1.1 Derivation of the objective of a set of generators

As for optimizing over multiple generators, Ghosh et al. [7] modified the objective function of the discriminator where along with finding the fakes, the discriminator has to find the generator that produced the given fake.

$$ \underset{D}{\max }{\mathbb{E}}_{x\sim {p}_{data}(x)}\left[\log {D}_{k+1}(x)\right]+\sum \limits_{i=1}^k{\mathbb{E}}_{x_i\sim {p}_{g_i}(x)}\left[\log {D}_i\left({x}_i\right)\right] $$
(15)

For a fixed generator, the objective is to minimize:

$$ {\mathbb{E}}_{x\sim {p}_d}\log {D}_{k+1}(x)+\sum \limits_{i=1}^k{\mathbb{E}}_{x\sim {p}_{g_i}}\log \left(1-{D}_{k+1}(x)\right) $$
(16)

For a set of generators, the objective is:

$$ \underset{G_1,{G}_2,\dots, {G}_k}{\min }{\mathbb{E}}_{x\sim {p}_{data}(x)}\left[\log {D}_{k+1}(x)\right]+\sum \limits_{i=1}^k{\mathbb{E}}_{x_i\sim {p}_{g_i}(x)}\left[\log \left(1-{D}_{k+1}(x)\right)\right] $$
(17)

where k is the number of generators, denoted as M, pd the true data distribution, denoted as pdata, \( {p}_{g_i} \) the distribution learned by the i-th generator, denoted as \( {p}_{G_m} \) in this context.

Introducing the conditioned variable into (17), and replacing notations by those used in this paper, the objective of a set of conditional generators is:

$$ \underset{G_1,{G}_2,\dots, {G}_M}{\min }{\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log {D}^{M+1}\left(x,y\right)+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\log \left(1-{D}^{M+1}\left(x,y\right)\right) $$
(18)

Likely, the objective of conditional discriminator is:

$$ \underset{D}{\max }{\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log {D}^{M+1}\left(x,y\right)+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\log {D}^m\left(x,y\right) $$
(19)

We add L1 regularization term to reduce blurs in image [9], yield the final objective (12), (13) and (14). Note that as for hybrid fake implementation, a hybrid instance \( \mathbb{I} \) is used by discriminator, rather than a individual instance x.

1.1.2 Proofs

[7] has provided detailed propositions and theorems about the objective of training a set of generators and a discriminator for an unconditional GAN. The proofs of the One-to-Many cGAN are inspired by these propositions and theorems. We introduce conditioned variable into the optimal distribution learned by the unconditional discriminator [7], and proposed a general format of the optimal distribution learned by a conditional discriminator:

$$ {D}^m\left(x,y\right)=\frac{p_{G_m}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)},\forall m\in \left\{1,2,\dots, M+1\right\} $$
(20)

Note that the unknown \( {p}_{G_{M+1}}:= {p}_{data} \) to avoid clutter [7].

Then, replacing Dm and DM + 1 in (18) using (20), yields

$$ {\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log \left[\frac{p_{data}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right]+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\left[\log \left(1-\frac{p_{G_{M+1}}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right)\right] $$
(21)

Using \( {\sum}_{m=1}^{M+1}{D}^m=1 \),

$$ {\displaystyle \begin{array}{l}{\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log \left[\frac{p_{data}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right]+\sum \limits_{m=1}^M{\mathbb{E}}_{x,y\sim {p}_{G_m}\left(x,y\right)}\left[\log \left(\frac{\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}{p_{data}\left(x,y\right)+\sum \limits_{m=1}^M{p}_{G_m}\left(x,y\right)}\right)\right]\\ {}:= {\mathbb{E}}_{x,y\sim {p}_{data}\left(x,y\right)}\log \left[\frac{p_{data}\left(x,y\right)}{p_{avg}\left(x,y\right)}\right]+M{\mathbb{E}}_{x,y\sim {p}_G\left(x,y\right)}\log \left[\frac{p_G\left(x,y\right)}{p_{avg}\left(x,y\right)}\right]-\left(M+1\right)\log \left(M+1\right)+M\log M\end{array}} $$
(22)

where \( {p}_G=\frac{\sum_{m=1}^M{p}_{G_m}\left(x,y\right)}{M} \), \( {p}_{avg}\left(x,y\right)=\frac{p_{data}(x)+{\sum}_{m=1}^M{p}_{G_m}\left(x,y\right)}{M+1} \), and \( {\mathit{\sup}}_D\left({p}_G\right)={\bigcup}_{m=1}^M{\mathit{\sup}}_D\left({p}_{G_m}\right) \). The final term (22) obtains its minimum –(M + 1) log(M + 1) + M log M, when \( {p}_{data}=\frac{\sum_{m=1}^M{p}_{G_m}\left(x,y\right)}{M} \) [7]. When the number of generator M is equal to 1, the One-to-Many cGAN obtains the minimum value of log 4 of the Jensen-Shannon divergence based objective function in the original GAN [8].

The convergence of \( {p}_{G_m} \) can be shown by computing gradient descent update at the optimal D giving the corresponding Gm. Each \( {\mathit{\sup}}_D\left({p}_{G_m},D\right) \) forms convex in \( {p}_{G_m} \) with a unique global optimal value as proven in [7]. Therefore, with sufficiently small updates of \( {p}_{G_m} \), \( {p}_{G_m} \) converges to the corresponding pdata(xm).

1.2 Architecture of generator and discriminator

We denote C(k) a Convolution-BatchNorm-ReLU layer with k filters, CD(k) a Convolution-BatchNorm-Dropout-ReLU layer with a dropout rate of 50%. All ReLUs in discriminator and the encoder of generator are leaky, with slop 0.2. All ReLUs in the decoder are not leaky. The generator is a modified encoder-decoder architecture called U-Net [9]:

  • Encoder: C(64)-C(128)-C(256)-C(512)-C(512)-C(512)-C(512)-C(512)

  • Decoder: CD(512)-CD(1024)-CD(1024)-CD(1024)-CD(1024)-C(512)-C(256)-C(128)

The discriminator is a 70 × 70 Markovian discriminator (PatchGAN) [9]: C(64)-C(128)-C(256)-C(512).

BatchNorm is not applied to the first layer C(64).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chai, C., Liao, J., Zou, N. et al. A one-to-many conditional generative adversarial network framework for multiple image-to-image translations. Multimed Tools Appl 77, 22339–22366 (2018). https://doi.org/10.1007/s11042-018-5968-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5968-7

Keywords

Navigation