Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

Hong, Zhiwei; Feng, Jianxing; Jiang, Tao

doi:10.1007/978-3-031-26313-2_15

Zhiwei Hong¹²,
Jianxing Feng¹³ &
Tao Jiang^12,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13843))

Included in the following conference series:

Asian Conference on Computer Vision

689 Accesses

Abstract

Image-to-image translation is a classic image generation task that attempts to translate an image from the source domain to an analogous image in the target domain. Recent advances in deep generative networks have shown remarkable capabilities in translating images among different domains. Most of these models either require pixel-level (with paired input and output images) or domain-level (with image domain labels) supervision to help the translation task. However, there are practical situations where the required supervisory information is difficult to collect and one would need to perform truly unsupervised image translation on a large number of images without paired image information or domain labels. In this paper, we present a truly unsupervised image-to-image translation model that performs the image translation task without any extra supervision. The crux of our model is an embedding network that extracts the domain and style information of the input style (or reference) image with contrastive representation learning and serves the translation module that actually carries out the translation task. The embedding network and the translation module can be integrated together for training and benefit from each other. Extensive experimental evaluation has been performed on various datasets concerning both cross-domain and multi-domain translation. The results demonstrate that our model outperforms the best truly unsupervised image-to-image translation model in the literature. In addition, our model can be easily adapted to take advantage of available domain labels to achieve a performance comparable to the best supervised image translation methods when all domain labels are known or a superior performance when only some domain labels are known.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
Google Scholar
Baek, K., Choi, Y., Uh, Y., Yoo, J., Shim, H.: Rethinking the truly unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14154–14163 (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Chen, X., He, K.: Exploring simple Siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)
Google Scholar
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188–8197 (2020)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777 (2017)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)
Google Scholar
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Chapter Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9865–9874 (2019)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Google Scholar
Kim, K., Park, S., Jeon, E., Kim, T., Kim, D.: A style-aware discriminator for controllable image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18239–18248 (2022)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Lin, J., Xia, Y., Liu, S., Zhao, S., Chen, Z.: ZstGAN: an adversarial approach for unsupervised zero-shot image-to-image translation. Neurocomputing 461, 327–335 (2021)
Article Google Scholar
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp. 700–708 (2017)
Google Scholar
Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10551–10560 (2019)
Google Scholar
Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., Yang, M.H.: Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1429–1437 (2019)
Google Scholar
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: International Conference on Machine Learning, pp. 3481–3490. PMLR (2018)
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)
Google Scholar
Pang, Y., Lin, J., Qin, T., Chen, Z.: Image-to-image translation: Methods and applications. arXiv preprint arXiv:2101.08629 (2021)
Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol. 12354, pp. 319–345 Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Google Scholar
Saito, K., Saenko, K., Liu, M.-Y.: COCO-FUNIT: few-shot unsupervised image translation with a content conditioned style encoder. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 382–398. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_23
Chapter Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural. Inf. Process. Syst. 29, 2234–2242 (2016)
Google Scholar
Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)
Google Scholar
Wang, M., et al.: Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1495–1504 (2019)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Google Scholar
Wang, Y., Khan, S., Gonzalez-Garcia, A., van de Weijer, J., Khan, F.S.: Semi-supervised learning for few-shot image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4453–4462 (2020)
Google Scholar
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the Ieee Conference On Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China grant 2018YFC0910404.

Author information

Authors and Affiliations

Tsinghua University, Beijing, 100084, China
Zhiwei Hong & Tao Jiang
Haohua Technology Co., Ltd., Shanghai, China
Jianxing Feng
University of California, Riverside, CA, 92521, USA
Tao Jiang

Authors

Zhiwei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Jianxing Feng
View author publications
You can also search for this author in PubMed Google Scholar
Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Jiang .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 28330 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, Z., Feng, J., Jiang, T. (2023). Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13843. Springer, Cham. https://doi.org/10.1007/978-3-031-26313-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-26313-2_15
Published: 02 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26312-5
Online ISBN: 978-3-031-26313-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning