research-article

Equivariant Adversarial Network for Image-to-image Translation

Authors:

Masoumeh Zareapoor,

Jie YangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 2s

Article No.: 73, Pages 1 - 14

https://doi.org/10.1145/3458280

Published: 14 June 2021 Publication History

Abstract

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.

References

[1]

Matthew Amodio and Smita Krishnaswamy. 2019. TravelGAN: Image-to-image translation by transformation vector learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8983–8992.

[2]

Yogesh Balaji, Hamed Hassani, Rama Chellappa, and Soheil Feizi. 2018. Entropic GANs meet VAEs: A statistical approach to compute sample likelihoods in GANs. arXiv preprint arXiv:1810.04147 (2018).

[3]

Cher Bass, Tianhong Dai, Benjamin Billot, Kai Arulkumaran, Antonia Creswell, Claudia Clopath, Vincenzo De Paola, and Anil Anthony Bharath. 2019. Image synthesis with a convolutional capsule generative adversarial network. In International Conference on Medical Imaging with Deep Learning. PMLR, 39–62.

[4]

Matan Ben-Yosef and Daphna Weinshall. 2018. Gaussian mixture generative adversarial networks for diverse datasets, and the unsupervised clustering of images. arXiv preprint arXiv:1808.10356 (2018).

[5]

Charlotte Bunne, David Alvarez-Melis, Andreas Krause, and Stefanie Jegelka. 2019. Learning generative models across incomparable spaces. arXiv preprint arXiv:1905.05461 (2019).

[6]

Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. 2018. PairedCycleGAN: Asymmetric style transfer for applying and removing makeup. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 40–48.

[7]

William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, and Ian Goodfellow. 2017. Many paths to equilibrium: GANs do not need to decrease a divergence at every step. arXiv preprint arXiv:1710.08446 (2017).

[8]

Aude Genevay, Gabriel Peyré, and Marco Cuturi. 2017. Learning generative models with Sinkhorn divergences. arXiv preprint arXiv:1706.00292 (2017).

[9]

Abel Gonzalez-Garcia, Joost Van De Weijer, and Yoshua Bengio. 2018. Image-to-image translation for cross-domain disentanglement. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.1287–1298.

Digital Library

[10]

Ian Goodfellow. 2016. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016).

[11]

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of Wasserstein GANs. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 5767–5777.

Digital Library

[12]

Uiwon Hwang, Dahuin Jung, and Sungroh Yoon. 2019. HexaGAN: Generative adversarial nets for real world classification. arXiv preprint arXiv:1902.09913 (2019).

[13]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125–1134.

[14]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2017–2025.

Digital Library

[15]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations.

[16]

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 1857–1865.

Digital Library

[17]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[18]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Master's thesis. Department of Computer Science, University of Toronto.

[19]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[20]

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681–4690.

[21]

Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV). 35–51.

Digital Library

[22]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 3730–3738.

Digital Library

[23]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.

[24]

Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).

[25]

Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2018. On the effectiveness of least squares generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 12 (2018), 2947–2960.

[26]

Youssef Alami Mejjati, Christian Richardt, James Tompkin, Darren Cosker, and Kwang In Kim. 2018. Unsupervised attention-guided image-to-image translation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 3693–3703.

Digital Library

[27]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).

[28]

Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 2642–2651.

Digital Library

[29]

Gabriel Peyré, Marco Cuturi, and Justin Solomon. 2016. Gromov-Wasserstein averaging of kernel and distance matrices. In Proceedings of the International Conference on Machine Learning. 2664–2672.

Digital Library

[30]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).

[31]

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training GANs. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2234–2242.

Digital Library

[32]

Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. 2018. Improving GANs using optimal transport. arXiv preprint arXiv:1803.05573 (2018).

[33]

Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Huiyu Zhou, Ruili Wang, M Emre Celebi, and Jie Yang. 2021. Image synthesis with adversarial networks: A comprehensive survey and case studies. Inf. Fus. (2021).

[34]

Pourya Shamsolmoali, Masoumeh Zareapoor, Linlin Shen, Abdul Hamid Sadka, and Jie Yang. 2020. Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing (2020).

[35]

Pourya Shamsolmoali, Masoumeh Zareapoor, Ruili Wang, Deepak Kumar Jain, and Jie Yang. 2019. G-GANISR: Gradual generative adversarial network for image super resolution. Neurocomputing 366 (2019), 140–153.

Digital Library

[36]

Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, and Jie Yang. 2020. AMIL: Adversarial multi-instance learning for human pose estimation. ACM Trans. Multim. Comput. Commun. Applic. 16, 1s (2020), 1–23.

Digital Library

[37]

Zhengwei Wang, Qi She, and Tomas E. Ward. 2019. Generative adversarial networks in computer vision: A survey and taxonomy. arXiv preprint arXiv:1906.01529 (2019).

Digital Library

[38]

Jerry Wei, Arief Suriawinata, Louis Vaickus, Bing Ren, Xiaoying Liu, Jason Wei, and Saeed Hassanpour. 2019. Generative image translation for data augmentation in colorectal histopathology images. arXiv preprint arXiv:1910.05827 (2019).

[39]

Karren D. Yang and Caroline Uhler. 2018. Scalable unbalanced optimal transport using generative adversarial networks. arXiv preprint arXiv:1810.11447 (2018).

[40]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21–29.

[41]

Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision. 2849–2857.

[42]

Masoumeh Zareapoor, Pourya Shamsolmoali, and Jie Yang. 2021. Oversampling adversarial network for class-imbalanced fault diagnosis. Mech. Syst. Sig. Proc. 149 (2021), 107175.

[43]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223–2232.

[44]

Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017. Toward multimodal image-to-image translation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 465–476.

Digital Library

Cited By

Niu YWu LZhang YZhu YZhu GWang J(2024)Multi-Model Style-Aware Diffusion Learning for Semantic Image SynthesisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368615520:11(1-21)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3686155
Fu HLiu JYu TWang XMa H(2024)Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604820:7(1-21)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3656048
Fontanini TDonati LBertozzi MPrati A(2023)Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of VariationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359135819:6(1-25)Online publication date: 6-Apr-2023
https://dl.acm.org/doi/10.1145/3591358
Show More Cited By

Index Terms

Equivariant Adversarial Network for Image-to-image Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Co-DGAN: cooperating discriminator generative adversarial networks for unpaired image-to-image translation
Abstract
Recent studies based on generative adversarial networks (GAN) have shown remarkable success in unpaired image-to-image translation, the key idea of which is to translate images from a source domain to a target domain. However, these prior studies ...
Unsupervised image-to-image translation with multiscale attention generative adversarial network
Abstract
Unsupervised image-to-image translation refers to translating images from the source domain to the target domain, assuring that the translated images have the style of the target domain while retaining the content of the source domain. Although ...
Research on Image-to-Image Translation with Capsule Network
Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation
Abstract
Deep learning technologies provide a unified translation framework for image-to-image translation. In particular, Convolution Neural Network (CNN) plays a decisive role because of its remarkable flexibility and performance. Recently, a new ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 2s

June 2021

349 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3465440

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2021

Accepted: 01 March 2021

Revised: 01 March 2021

Received: 01 August 2020

Published in TOMM Volume 17, Issue 2s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key R&D Program of China
NSFC, China
Committee of Science and Technology, Shanghai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
285
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Niu YWu LZhang YZhu YZhu GWang J(2024)Multi-Model Style-Aware Diffusion Learning for Semantic Image SynthesisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368615520:11(1-21)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3686155
Fu HLiu JYu TWang XMa H(2024)Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604820:7(1-21)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3656048
Fontanini TDonati LBertozzi MPrati A(2023)Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of VariationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359135819:6(1-25)Online publication date: 6-Apr-2023
https://dl.acm.org/doi/10.1145/3591358
Chen HZhou HZhang JChen DZhang WChen KHua GYu N(2023)Perceptual Hashing of Deep Convolutional Neural Networks for Model Copy DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357277719:3(1-20)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3572777
Jaiswal RDubey R(2023)CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction MetricACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352939419:1s(1-23)Online publication date: 3-Feb-2023
https://dl.acm.org/doi/10.1145/3529394
Liu YXiong ZLi YLu YTian XZha Z(2023)Category-Stitch Learning for Union Domain GeneralizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352413619:1(1-19)Online publication date: 5-Jan-2023
https://dl.acm.org/doi/10.1145/3524136
Maqsood RAbid Ffarooq G(undefined)Cycle Consistency and Fine-Grained Image to Image Translation in Augmentation: An OverviewSSRN Electronic Journal10.2139/ssrn.4157023
https://doi.org/10.2139/ssrn.4157023

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents