Skip to main content
Log in

Disentangling latent space better for few-shot image-to-image translation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In an unpaired image-to-image translation, the main concept is to learn an underlying mapping between the source and target domains. Previous approaches required large numbers of data from both domains to learn this mapping. However, under a few-shot condition, that is, few-shot image-to-image translation, only one domain can meet the required number of data , and thus, the underlying mapping becomes ill-conditioned owing to the limited data as well as the imbalanced distribution of the two domains. We argue that a powerful model with a better disentangled representation of the latent space can better tackle the more challenging few-shot image-to-image translation . Motivated by this, under a partially-shared assumption, we propose a better disentanglement of the content and style latent space using a domain-specific style latent classifier and a domain-shared cross-content latent discriminator. Moreover, we design asymmetric weak/strong domain discriminators to achieve a better translation performance with limited data within the few-shot domain. Furthermore, our method can be easily embedded into any latent space disentangled model of an image-to-image translation for a few-shot setting. Subjective evaluation and objective evaluation results both show that compared with other state-of-the-art methods, the images synthesized by our method have higher fidelity while maintaining certain diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. In: ICML, pp 214–223

  2. Bai X, Yang M, Huang T, Dou Z, Yu R, Xu Y (2020) Deep-person: learning discriminative deep features for person re-identification. Pattern Recogn 98:107036

    Article  Google Scholar 

  3. Benaim S, Wolf L (2018) One-shot unsupervised cross domain translation. In: NeurIPS, pp 2104–2114

  4. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE TPAMI 35(8):1798–1828

    Article  Google Scholar 

  5. Bhattacharjee D, Kim S, Vizier G, Salzmann M (2020) Dunit: detection-based unsupervised image-to-image translation. In: CVPR

  6. Chen YC, Xu X, Jia J (2020) Domain adaptive image-to-image translation. In: CVPR

  7. Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In: NIPS, pp 2172–2180

  8. Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp 8789–8797

  9. Gonzalez-Garcia A, van de Weijer J, Bengio Y (2018) Image-to-image translation for cross-domain disentanglement. In: NeurIPS, pp 1287–1298

  10. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp 2672–2680

  11. He H, Garcia EA (2008) Learning from imbalanced data. IEEE TKDE 21(9):1263–1284

    Google Scholar 

  12. He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, Oxford

    Book  MATH  Google Scholar 

  13. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NIPS, pp 6626–6637

  14. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR

  15. Huang C, Li Y, Change Loy C, Tang X (2016) Learning deep representation for imbalanced classification. In: CVPR, pp 5375–5384

  16. Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: ECCV, pp 172–189

  17. Hu Q, Szabó A, Portenier T, Favaro P, Zwicker M (2018) Disentangling factors of variation by mixing them. In: CVPR, pp 3399–3407

  18. Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: CVPR, pp 1125–1134

  19. Jeong S, Kim Y, Lee E, Sohn K (2021) Memory-guided unsupervised image-to-image translation. In: CVPR, pp 6558–6567

  20. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp 694–711

  21. Jolliffe I (2011) Principal component analysis. Springer, Berlin

    MATH  Google Scholar 

  22. Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: ICML, pp 1857–1865

  23. Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: NIPS, pp 3581–3589

  24. LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  25. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp 4681–4690

  26. Lee HY, Tseng HY, Huang JB, Singh M, Yang MH (2018) Diverse image-to-image translation via disentangled representations. In: ECCV, pp 35–51

  27. Lee W, Kim D, Hong S, Lee H (2020) High-fidelity synthesis with disentangled representation. In: ECCV, pp 157–174

  28. Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2019) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: IEEE transactions on pattern analysis and machine intelligence

  29. Liu MY, Tuzel O (2016) Coupled generative adversarial networks. In: NIPS, pp 469–477

  30. Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: NIPS, pp 700–708

  31. Liu MY, Huang X, Mallya A, Karras T, Aila T, Lehtinen J, Kautz J (2019) Few-shot unsupervised image-to-image translation. arXiv preprint arXiv:1905.01723

  32. Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2018) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 5(4):2315–2322. https://doi.org/10.1109/JIOT.2017.2737479

    Article  Google Scholar 

  33. Lu H, Tang Y, Sun Y (2021) Drrs-bc: decentralized routing registration system based on blockchain. IEEE/CAA J Automat Sin 8(12):1868–1876. https://doi.org/10.1109/JAS.2021.1004204

    Article  Google Scholar 

  34. Lu H, Zhang M, Xu X, Li Y, Shen HT (2021) Deep fuzzy hashing network for efficient image retrieval. IEEE Trans Fuzzy Syst 29(1):166–176. https://doi.org/10.1109/TFUZZ.2020.2984991

    Article  Google Scholar 

  35. Lu H, Zhang Y, Li Y, Jiang C, Abbas H (2021) User-oriented virtual mobile network resource management for vehicle communications. IEEE Trans Intell Transp Syst 22(6):3521–3532. https://doi.org/10.1109/TITS.2020.2991766

    Article  Google Scholar 

  36. Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. JMLR 9(Nov), 2579–2605

  37. Ma L, Jia X, Georgoulis S, Tuytelaars T, Van Gool L (2019) Exemplar guided unsupervised image-to-image translation with semantic consistency. In: ICLR

  38. Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2016) Adversarial autoencoders. In: ICLR

  39. Mao Q, Lee HY, Tseng HY, Ma S, Yang MH (2019) Mode seeking generative adversarial networks for diverse image synthesis. In: CVPR

  40. Mathieu MF, Zhao JJ, Zhao J, Ramesh A, Sprechmann P, LeCun Y (2016) Disentangling factors of variation in deep representation using adversarial training. In: NIPS, pp 5040–5048

  41. Mo S, Cho M, Shin J (2019) Instance-aware image-to-image translation. In: International conference on learning representations . https://openreview.net/forum?id=ryxwJhC9YX

  42. Press O, Galanti T, Benaim S, Wolf L (2019) Emerging disentanglement in auto-encoder based unsupervised image content transfer. In: ICLR

  43. Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR

  44. Shen Z, Huang M, Shi J, Xue X, Huang T (2019) Towards instance-level image-to-image translation. In: CVPR

  45. Shu Z, Sahasrabudhe M, Alp Guler R, Samaras D, Paragios N, Kokkinos I (2018) Deforming autoencoders: unsupervised disentangling of shape and appearance. In: ECCV, pp 650–665

  46. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826

  47. Taigman Y, Polyak A, Wolf L (2017) Unsupervised cross-domain image generation. In: ICLR

  48. Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: Unsupervised dual learning for image-to-image translation. In: ICCV, pp 2849–2857

  49. Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp 586–595

  50. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp 2223–2232

  51. Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: NIPS, pp 465–476

Download references

Acknowledgements

Peng Liu and Yueyue Wang contributed equally to this work. This work was supported by the Natural Science Foundation of Shandong Province under Grant ZR2021MF080, the National Natural Science Foundation of China under Grant numbers 61771440, 32073029, the key project of Shandong Provincial Natural Science Foundation (ZR202010310016) and the postgraduate education quality improvement project of Shandong Province(SDYJG19134) .

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhaorui Gu or Xiaodong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, P., Wang, Y., Du, A. et al. Disentangling latent space better for few-shot image-to-image translation. Int. J. Mach. Learn. & Cyber. 14, 419–427 (2023). https://doi.org/10.1007/s13042-022-01552-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01552-4

Keywords

Navigation