skip to main content
research-article

Equivariant Adversarial Network for Image-to-image Translation

Published: 14 June 2021 Publication History

Abstract

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.

References

[1]
Matthew Amodio and Smita Krishnaswamy. 2019. TravelGAN: Image-to-image translation by transformation vector learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8983–8992.
[2]
Yogesh Balaji, Hamed Hassani, Rama Chellappa, and Soheil Feizi. 2018. Entropic GANs meet VAEs: A statistical approach to compute sample likelihoods in GANs. arXiv preprint arXiv:1810.04147 (2018).
[3]
Cher Bass, Tianhong Dai, Benjamin Billot, Kai Arulkumaran, Antonia Creswell, Claudia Clopath, Vincenzo De Paola, and Anil Anthony Bharath. 2019. Image synthesis with a convolutional capsule generative adversarial network. In International Conference on Medical Imaging with Deep Learning. PMLR, 39–62.
[4]
Matan Ben-Yosef and Daphna Weinshall. 2018. Gaussian mixture generative adversarial networks for diverse datasets, and the unsupervised clustering of images. arXiv preprint arXiv:1808.10356 (2018).
[5]
Charlotte Bunne, David Alvarez-Melis, Andreas Krause, and Stefanie Jegelka. 2019. Learning generative models across incomparable spaces. arXiv preprint arXiv:1905.05461 (2019).
[6]
Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. 2018. PairedCycleGAN: Asymmetric style transfer for applying and removing makeup. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 40–48.
[7]
William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, and Ian Goodfellow. 2017. Many paths to equilibrium: GANs do not need to decrease a divergence at every step. arXiv preprint arXiv:1710.08446 (2017).
[8]
Aude Genevay, Gabriel Peyré, and Marco Cuturi. 2017. Learning generative models with Sinkhorn divergences. arXiv preprint arXiv:1706.00292 (2017).
[9]
Abel Gonzalez-Garcia, Joost Van De Weijer, and Yoshua Bengio. 2018. Image-to-image translation for cross-domain disentanglement. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.1287–1298.
[10]
Ian Goodfellow. 2016. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016).
[11]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of Wasserstein GANs. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 5767–5777.
[12]
Uiwon Hwang, Dahuin Jung, and Sungroh Yoon. 2019. HexaGAN: Generative adversarial nets for real world classification. arXiv preprint arXiv:1902.09913 (2019).
[13]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125–1134.
[14]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2017–2025.
[15]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations.
[16]
Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 1857–1865.
[17]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[18]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Master's thesis. Department of Computer Science, University of Toronto.
[19]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[20]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681–4690.
[21]
Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV). 35–51.
[22]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 3730–3738.
[23]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
[24]
Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).
[25]
Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2018. On the effectiveness of least squares generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 12 (2018), 2947–2960.
[26]
Youssef Alami Mejjati, Christian Richardt, James Tompkin, Darren Cosker, and Kwang In Kim. 2018. Unsupervised attention-guided image-to-image translation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 3693–3703.
[27]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
[28]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 2642–2651.
[29]
Gabriel Peyré, Marco Cuturi, and Justin Solomon. 2016. Gromov-Wasserstein averaging of kernel and distance matrices. In Proceedings of the International Conference on Machine Learning. 2664–2672.
[30]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
[31]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training GANs. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2234–2242.
[32]
Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. 2018. Improving GANs using optimal transport. arXiv preprint arXiv:1803.05573 (2018).
[33]
Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Huiyu Zhou, Ruili Wang, M Emre Celebi, and Jie Yang. 2021. Image synthesis with adversarial networks: A comprehensive survey and case studies. Inf. Fus. (2021).
[34]
Pourya Shamsolmoali, Masoumeh Zareapoor, Linlin Shen, Abdul Hamid Sadka, and Jie Yang. 2020. Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing (2020).
[35]
Pourya Shamsolmoali, Masoumeh Zareapoor, Ruili Wang, Deepak Kumar Jain, and Jie Yang. 2019. G-GANISR: Gradual generative adversarial network for image super resolution. Neurocomputing 366 (2019), 140–153.
[36]
Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, and Jie Yang. 2020. AMIL: Adversarial multi-instance learning for human pose estimation. ACM Trans. Multim. Comput. Commun. Applic. 16, 1s (2020), 1–23.
[37]
Zhengwei Wang, Qi She, and Tomas E. Ward. 2019. Generative adversarial networks in computer vision: A survey and taxonomy. arXiv preprint arXiv:1906.01529 (2019).
[38]
Jerry Wei, Arief Suriawinata, Louis Vaickus, Bing Ren, Xiaoying Liu, Jason Wei, and Saeed Hassanpour. 2019. Generative image translation for data augmentation in colorectal histopathology images. arXiv preprint arXiv:1910.05827 (2019).
[39]
Karren D. Yang and Caroline Uhler. 2018. Scalable unbalanced optimal transport using generative adversarial networks. arXiv preprint arXiv:1810.11447 (2018).
[40]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21–29.
[41]
Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision. 2849–2857.
[42]
Masoumeh Zareapoor, Pourya Shamsolmoali, and Jie Yang. 2021. Oversampling adversarial network for class-imbalanced fault diagnosis. Mech. Syst. Sig. Proc. 149 (2021), 107175.
[43]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223–2232.
[44]
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017. Toward multimodal image-to-image translation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 465–476.

Cited By

View all
  • (2024)Multi-Model Style-Aware Diffusion Learning for Semantic Image SynthesisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368615520:11(1-21)Online publication date: 2-Aug-2024
  • (2024)Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604820:7(1-21)Online publication date: 16-May-2024
  • (2023)Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of VariationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359135819:6(1-25)Online publication date: 6-Apr-2023
  • Show More Cited By

Index Terms

  1. Equivariant Adversarial Network for Image-to-image Translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2s
    June 2021
    349 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3465440
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2021
    Accepted: 01 March 2021
    Revised: 01 March 2021
    Received: 01 August 2020
    Published in TOMM Volume 17, Issue 2s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Stylistic image generation
    2. image-to-image translation
    3. generative model
    4. domain adaptation

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Key R&D Program of China
    • NSFC, China
    • Committee of Science and Technology, Shanghai, China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-Model Style-Aware Diffusion Learning for Semantic Image SynthesisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368615520:11(1-21)Online publication date: 2-Aug-2024
    • (2024)Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604820:7(1-21)Online publication date: 16-May-2024
    • (2023)Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of VariationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359135819:6(1-25)Online publication date: 6-Apr-2023
    • (2023)Perceptual Hashing of Deep Convolutional Neural Networks for Model Copy DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357277719:3(1-20)Online publication date: 2-Mar-2023
    • (2023)CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction MetricACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352939419:1s(1-23)Online publication date: 3-Feb-2023
    • (2023)Category-Stitch Learning for Union Domain GeneralizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352413619:1(1-19)Online publication date: 5-Jan-2023
    • (undefined)Cycle Consistency and Fine-Grained Image to Image Translation in Augmentation: An OverviewSSRN Electronic Journal10.2139/ssrn.4157023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media