Skip to main content

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13843))

Included in the following conference series:

  • 689 Accesses

Abstract

Image-to-image translation is a classic image generation task that attempts to translate an image from the source domain to an analogous image in the target domain. Recent advances in deep generative networks have shown remarkable capabilities in translating images among different domains. Most of these models either require pixel-level (with paired input and output images) or domain-level (with image domain labels) supervision to help the translation task. However, there are practical situations where the required supervisory information is difficult to collect and one would need to perform truly unsupervised image translation on a large number of images without paired image information or domain labels. In this paper, we present a truly unsupervised image-to-image translation model that performs the image translation task without any extra supervision. The crux of our model is an embedding network that extracts the domain and style information of the input style (or reference) image with contrastive representation learning and serves the translation module that actually carries out the translation task. The embedding network and the translation module can be integrated together for training and benefit from each other. Extensive experimental evaluation has been performed on various datasets concerning both cross-domain and multi-domain translation. The results demonstrate that our model outperforms the best truly unsupervised image-to-image translation model in the literature. In addition, our model can be easily adapted to take advantage of available domain labels to achieve a performance comparable to the best supervised image translation methods when all domain labels are known or a superior performance when only some domain labels are known.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)

    Google Scholar 

  2. Baek, K., Choi, Y., Uh, Y., Yoo, J., Shim, H.: Rethinking the truly unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14154–14163 (2021)

    Google Scholar 

  3. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  4. Chen, X., He, K.: Exploring simple Siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)

    Google Scholar 

  5. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188–8197 (2020)

    Google Scholar 

  6. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  7. Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)

  8. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777 (2017)

    Google Scholar 

  9. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  10. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems 30 (2017)

    Google Scholar 

  11. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)

    Google Scholar 

  12. Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11

    Chapter  Google Scholar 

  13. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

    Google Scholar 

  14. Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9865–9874 (2019)

    Google Scholar 

  15. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

    Google Scholar 

  16. Kim, K., Park, S., Jeon, E., Kim, T., Kim, D.: A style-aware discriminator for controllable image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18239–18248 (2022)

    Google Scholar 

  17. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

  18. Lin, J., Xia, Y., Liu, S., Zhao, S., Chen, Z.: ZstGAN: an adversarial approach for unsupervised zero-shot image-to-image translation. Neurocomputing 461, 327–335 (2021)

    Article  Google Scholar 

  19. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp. 700–708 (2017)

    Google Scholar 

  20. Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10551–10560 (2019)

    Google Scholar 

  21. Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., Yang, M.H.: Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1429–1437 (2019)

    Google Scholar 

  22. Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: International Conference on Machine Learning, pp. 3481–3490. PMLR (2018)

    Google Scholar 

  23. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  24. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)

    Google Scholar 

  25. Pang, Y., Lin, J., Qin, T., Chen, Z.: Image-to-image translation: Methods and applications. arXiv preprint arXiv:2101.08629 (2021)

  26. Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol. 12354, pp. 319–345 Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19

  27. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)

    Google Scholar 

  28. Saito, K., Saenko, K., Liu, M.-Y.: COCO-FUNIT: few-shot unsupervised image translation with a content conditioned style encoder. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 382–398. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_23

    Chapter  Google Scholar 

  29. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural. Inf. Process. Syst. 29, 2234–2242 (2016)

    Google Scholar 

  30. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)

    Google Scholar 

  31. Wang, M., et al.: Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1495–1504 (2019)

    Google Scholar 

  32. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

    Google Scholar 

  33. Wang, Y., Khan, S., Gonzalez-Garcia, A., van de Weijer, J., Khan, F.S.: Semi-supervised learning for few-shot image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4453–4462 (2020)

    Google Scholar 

  34. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the Ieee Conference On Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)

    Google Scholar 

  35. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China grant 2018YFC0910404.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Jiang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 28330 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hong, Z., Feng, J., Jiang, T. (2023). Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13843. Springer, Cham. https://doi.org/10.1007/978-3-031-26313-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26313-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26312-5

  • Online ISBN: 978-3-031-26313-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics