Abstract
Image-to-Image translation was proposed as a general form of many image learning problems. While generative adversarial networks were successfully applied on many image-to-image translations, many models were limited to specific translation tasks and were difficult to satisfy practical needs. In this work, we introduce a One-to-Many conditional generative adversarial network, which could learn from heterogeneous sources of images. This is achieved by training multiple generators against a discriminator in synthesized learning way. This framework supports generative models to generate images in each source, so output images follow corresponding target patterns. Two implementations, hybrid fake and cascading learning, of the synthesized adversarial training scheme are also proposed, and experimented on two benchmark datasets, UTZap50K and MVOD5K, as well as a new high-quality dataset BehTex7K. We consider five challenging image-to-image translation tasks: edges-to-photo, edges-to-similar-photo translation on UTZap50K, cross-view translation on MVOD5K, and grey-to-color, grey-to-Oil-Paint on BehTex7K. We show that both implementations are able to faithfully translate from an image to another image in edges-to-photo, edges-to-similar-photo, grey-to-color, and grey-to-Oil-Paint translation tasks. The quality of output images in cross-view translation need to be further boosted.












Similar content being viewed by others
References
Cai B, Xu X, Jia K, Qing C, Tao D (2016) DehazeNet: an end-to-end system for single image haze removal. IEEE Trans Image Process 25(11):5187–5198
Çalışır F, Baştan M, Ulusoy Ö, Güdükbay U (2017) Mobile multi-view object image search. Multimedia Tools & Applications 76(10):12433–12456
Chen M, Denoyer L (2016) Multi-view Generative Adversarial Networks arXiv eprint arXiv:1611.02019
Elgammal A, Liu B, Elhoseiny M, Mazzone M (2017) CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms. arXiv eprint arXiv:1706.07068
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmannc AG (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2414–2423
Ghosh A, Kulharia V, Namboodiri V, Torr PHS, Dokania PK (2017). Multi-Agent Diverse Generative Adversarial Networks. arXiv eprint arXiv:1606.07536
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp 2672–2680
Isola P, Zhu JY, Zhou TH, Efros, AA (2016) Image-to-Image Translation with Conditional Adversarial Networks arXiv eprint arXiv:1611.07004
Jacob VG, Gupta S (2009) Colorization of grayscale images and videos using a semiautomatic approach. In: 2009 16th IEEE International Conference on Image Processing, pp 1653–1656. doi:10.1109/ICIP.2009.5413392
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. arXiv eprint arXiv:1703.05192
Kwak H, Zhang BT (2016) Ways of Conditioning Generative Adversarial Networks. arXiv eprint arXiv:1611.01455
Liu MY, Tuzel O (2016) Coupled generative adversarial networks. arXiv preprint arXiv:
Liu A-A, Su Y-T, Jia P-P, Gao Z, Hao T, Yang Z-X (Jun. 2015) (2015) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Transactions on Cybernetics 45(6):1194–1208
Liu Y, Qin Z, Luo Z, Wang H (2017) Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. arXiv eprint arXiv:1705.01908
Liu Z et al. (2017) Multiview and multimodal pervasive indoor localization. ACM on Multimedia Conference ACM: 109–117
Luan F, Paris S, Bala K (2017) Deep Photo Style Transfer. arXiv eprint arXiv:1703.07511
Mirza M, Osindero S (2014) Conditional generative adversarial nets. Computer Science 2672–2680
Nie L, Wang M, Zha Z, et al (2011) Multimedia answering: enriching text QA with media information: 695–704
Perarnau G, Weijer JVD, Raducanu B, Álvarez JM (2016) Invertible Conditional GANs for image editing. In Conference and Workshop on Neural Information Processing Systems 2016. arXiv eprint arXiv:1611.06355
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved Techniques for Training GANs. arXiv eprint arXiv:1606.03498
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444. https://doi.org/10.1109/TIP.2005.859378
Vedran V, Raymond C, Gravier G (2017) Generative adversarial networks for multimodal representation learning in video hyperlinking. In: ACM on International Conference on Multimedia Retrieval, pp 416–419
Wang X, Gupta A (2016) Generative Image Modeling Using Style and Structure Adversarial Networks. arXiv eprint arXiv:1603.05631
Wang Y, Zhang L, Weijer JVD (2016) Ensembles of Generative Adversarial Networks. arXiv eprint arXiv:1612.00991
Wang C, Xu C, Tao D (2017) Perceptual Adversarial Networks for Image-to-Image Transformation. arXiv eprint arXiv:1706.09138
Xie S, Tu Z (2017) Holistically-nested edge detection. Int J Comput Vis 125:3–18
Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia 15(3):661–669
Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. arXiv eprint arXiv:1704.02510
Yu A, Grauman K (2014) Fine-grained visual comparisons with local learning. In: Computer Vision and Pattern Recognition, pp 192–199
Zhang L, Zhang L, Mou X, Zhang D (2012) A comprehensive evaluation of full reference image quality assessment algorithms. In: 2012 19th IEEE International Conference on Image Processing, pp 1477–1480. doi:10.1109/ICIP.2012.6467150
Zhang R, Isola P, Efros AA (2016). Colorful Image Colorization. arXiv eprint arXiv:1603.08511
Zhang H et al (2016) Online collaborative learning for open-vocabulary visual classifiers. IEEE Computer Vision and Pattern Recognition: 2809–2817
Zhang H, Sindagi V, Patel VM (2017) Image De-raining Using a Conditional Generative Adversarial Network. arXiv eprint arXiv:1701.05957
Zhou W, Bovik AC (2002) A universal image quality index. IEEE Signal Processing Letters 9(3):81–84. https://doi.org/10.1109/97.995823
Zhou W, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv eprint arXiv:1703.10593
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (61303137), the National Science and Technology Support Program (2015BAH21F01) and the Art Project for National Social-Science Foundation (15BG084). We thank Dr. Preben Hansen from Stockholm University, Department of Computer Science, for assistance in proofreading and technical editing of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Derivations and Proofs
1.1.1 Derivation of the objective of a set of generators
As for optimizing over multiple generators, Ghosh et al. [7] modified the objective function of the discriminator where along with finding the fakes, the discriminator has to find the generator that produced the given fake.
For a fixed generator, the objective is to minimize:
For a set of generators, the objective is:
where k is the number of generators, denoted as M, pd the true data distribution, denoted as pdata, \( {p}_{g_i} \) the distribution learned by the i-th generator, denoted as \( {p}_{G_m} \) in this context.
Introducing the conditioned variable into (17), and replacing notations by those used in this paper, the objective of a set of conditional generators is:
Likely, the objective of conditional discriminator is:
We add L1 regularization term to reduce blurs in image [9], yield the final objective (12), (13) and (14). Note that as for hybrid fake implementation, a hybrid instance \( \mathbb{I} \) is used by discriminator, rather than a individual instance x.
1.1.2 Proofs
[7] has provided detailed propositions and theorems about the objective of training a set of generators and a discriminator for an unconditional GAN. The proofs of the One-to-Many cGAN are inspired by these propositions and theorems. We introduce conditioned variable into the optimal distribution learned by the unconditional discriminator [7], and proposed a general format of the optimal distribution learned by a conditional discriminator:
Note that the unknown \( {p}_{G_{M+1}}:= {p}_{data} \) to avoid clutter [7].
Then, replacing Dm and DM + 1 in (18) using (20), yields
Using \( {\sum}_{m=1}^{M+1}{D}^m=1 \),
where \( {p}_G=\frac{\sum_{m=1}^M{p}_{G_m}\left(x,y\right)}{M} \), \( {p}_{avg}\left(x,y\right)=\frac{p_{data}(x)+{\sum}_{m=1}^M{p}_{G_m}\left(x,y\right)}{M+1} \), and \( {\mathit{\sup}}_D\left({p}_G\right)={\bigcup}_{m=1}^M{\mathit{\sup}}_D\left({p}_{G_m}\right) \). The final term (22) obtains its minimum –(M + 1) log(M + 1) + M log M, when \( {p}_{data}=\frac{\sum_{m=1}^M{p}_{G_m}\left(x,y\right)}{M} \) [7]. When the number of generator M is equal to 1, the One-to-Many cGAN obtains the minimum value of log 4 of the Jensen-Shannon divergence based objective function in the original GAN [8].
The convergence of \( {p}_{G_m} \) can be shown by computing gradient descent update at the optimal D giving the corresponding Gm. Each \( {\mathit{\sup}}_D\left({p}_{G_m},D\right) \) forms convex in \( {p}_{G_m} \) with a unique global optimal value as proven in [7]. Therefore, with sufficiently small updates of \( {p}_{G_m} \), \( {p}_{G_m} \) converges to the corresponding pdata(xm).
1.2 Architecture of generator and discriminator
We denote C(k) a Convolution-BatchNorm-ReLU layer with k filters, CD(k) a Convolution-BatchNorm-Dropout-ReLU layer with a dropout rate of 50%. All ReLUs in discriminator and the encoder of generator are leaky, with slop 0.2. All ReLUs in the decoder are not leaky. The generator is a modified encoder-decoder architecture called U-Net [9]:
-
Encoder: C(64)-C(128)-C(256)-C(512)-C(512)-C(512)-C(512)-C(512)
-
Decoder: CD(512)-CD(1024)-CD(1024)-CD(1024)-CD(1024)-C(512)-C(256)-C(128)
The discriminator is a 70 × 70 Markovian discriminator (PatchGAN) [9]: C(64)-C(128)-C(256)-C(512).
BatchNorm is not applied to the first layer C(64).
Rights and permissions
About this article
Cite this article
Chai, C., Liao, J., Zou, N. et al. A one-to-many conditional generative adversarial network framework for multiple image-to-image translations. Multimed Tools Appl 77, 22339–22366 (2018). https://doi.org/10.1007/s11042-018-5968-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5968-7