Mutual learning generative adversarial network

Mao, Lin; Wang, Meng; Yang, Dawei; Zhang, Rubo

doi:10.1007/s11042-023-15951-4

Mutual learning generative adversarial network

Published: 08 June 2023

Volume 83, pages 7479–7503, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lin Mao¹,
Meng Wang ORCID: orcid.org/0000-0001-7147-4850¹,
Dawei Yang¹ &
…
Rubo Zhang¹

129 Accesses
Explore all metrics

Abstract

It is the key to realize high fidelity image-to-image translation to realize the precise disentangling of single domain feature based on the establishment of the internal correlation between source and target domain. In order to improve the problem of difficult disentanglement and weak correlation with cross-domain features, this paper designs a feature regroup and redistribution module, to achieve feature hierarchical processing and feature interaction in a mutual space for controllable image-to-image translation. In the feature regroup unit, pyramid with different frequency intervals are designed to extract content feature such as multi-level spatial structure and global color semantic information. Further, the output of frequency pyramid is mapped into mutual pool for cross-domain feature difference comparison and similarity learning to achieve accurate analysis. In the redistribution unit, the mutual pool output and single domain feature are fused in the form of spatial attention to correct content and style feature transmission error. We also design a mutual learning generative adversarial network based on the RR module, which can satisfy minimum errors image-to-image translation in real scenes. The experiment results on BDD100K and Sim10k datasets show that FID, IS, KID_mean, and KID_stddev have greatly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Disentanglement-Based Image-to-Image Translation with Feature Joint Block Fusion

InvolutionGAN: lightweight GAN with involution for unsupervised image-to-image translation

Article 24 April 2023

Semantic Diversity Image Translation Based on Deep Feature Difference and Attention Mechanism

Data availability

(1) The BDD100K dataset is introduced by Yu et al. in “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”, which can be download from https://www.bdd100k.com/.

(2) The Sim10k dataset is introduced by Johnson-Roberson et al. in “Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?”, which can be download from https://fcav.engin.umich.edu/projects/driving-in-the-matrix.

References

Alharbi Y, Wonka P (2020) Disentangled image generation through structured noise injection. In CVPR
Bello I, Zoph B, Le Q, Vaswani A, Shlens J (2019) Attention augmented convolutional networks. Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp 3285–3295 6. https://doi.org/10.1109/ICCV.2019.00338
Daras G, Odena A, Zhang H, Dimakis AG (2020) Your Local GAN: Designing two dimensional local attention mechanisms for generative models. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14519–14527 4. https://doi.org/10.1109/CVPR42600.2020.01454
Elad R, Yuval A, Or P et al (2021) Encoding in Style: a StyleGAN encoder for image-to-image translation proceedings of computer vision and pattern recognition ArXiv preprint arXiv: 2008.00951
Emami H, Aliabadi MM, Dong M, Chinnam RB (2021) SPA-GAN: Spatial attention GAN for image-to-image translation. IEEE Transactions on Multimedia 23:391–401. https://doi.org/10.1109/TMM.2020.2975961
Article Google Scholar
Goodfellow I J, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. Proceedings of 28th Conference on Neural Information Processing Systems (NIPS). 2672-2680 3 3
Hazama T, Seo M, Chen Y-W (2020) Generation of figures with controllable posture using Ss-InfoGAN. Proceedings of IEEE 9th Global Conference on Consumer Electronics (GCCE), pp 670–673
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13708-13717. https://doi.org/10.1109/CVPR46437.2021.01350
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 2011-2023 2. https://doi.org/10.1109/TPAMI.2019.2913372
Huang X, Liu MY, Belongie S, et al (2018) Multimodal unsupervised image-to-image translation. Proceedings of European Conference on Computer Vision (ECCV), pp 457–460 3. https://doi.org/10.1007/978-3-030-01219-9_11
Hyunsu K, Yunjey C, Junho K, Sungjoo Y, Youngjung U (2021) Exploiting spatial dimensions of latent in GAN for real-time image editing. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint: arXiv:2104.14754
Junyan Z, Taesung P, Phillip I, et al (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint arXiv:1703.10593. https://doi.org/10.1109/iccv.2017.244
Karnewar A, Wang O (2020) MSG-GAN: Multi-scale gradients for generative adversarial networks. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 7796–7805. https://doi.org/10.1109/CVPR42600.2020.00782
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. InCVPR
Kim J, Kim M, Kang H et al (2020) U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. ArXiv preprint arXiv 1907:10830
Google Scholar
Kwon G, Ye JC. Diagonal attention and style-based gan for content-style disentanglement in image generation and translation. In ICCV, 2021
Lee HY, Tseng HY, Mao Q et al (2020) DRIT++: Diverse image-to-image translation via disentangled representations. Int J Comput Vis 128:2402–2417. https://doi.org/10.1007/s11263-019-01284-z
Article Google Scholar
Li H, Tang J (2020) Dairy goat image generation based on improved-self-attention generative adversarial networks. IEEE Access. 10.1109, pp: 62448–62457 8. https://doi.org/10.1109/ACCESS.2020.2981496
Li X, Zhang S, Hu J et al (2021) Image-to-image translation via hierarchical style disentanglement. Proc IEEE/CVF Conf Comput Vision Patt Recog (CVPR):8635–8644. https://doi.org/10.1109/CVPR46437.2021.00853
Liu M Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. Proceedings of 31st Annual Conference on Neural Information Processing Systems (NIPS)
Liu W, You J, Lee J (2021) HSIGAN: A conditional hyperspectral image synthesis method with auxiliary classifier. IEEE J Selected Top Appl Earth Obser Remote Sensing 14(2021):3063911. https://doi.org/10.1109/JSTARS.2021
Article Google Scholar
Olaf R, Philipp F, Thomas B (2015) U-Net: Convolutional networks for biomedical image segmentation. Proc Lect Notes Comput Sci. https://doi.org/10.1007/978-3-319-24574-4_28
Park T, Zhu J-Y, Wang O, Jingwan L, Shechtman E, Efros A, Zhang R (2020) Swapping autoencoder for deep image manipulation. NeurIPS, 33
Pizzati F, Cerri P, de Charette R (2021) CoMoGAN: continuous model-guided image-to-image translation. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14283–14293. https://doi.org/10.1109/CVPR46437.2021.01406
Raymond Y, Chen C, Yianlim T et al (2017) Semantic image inpainting with deep generative models. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint arXiv:1607.07539
Ryota I, Kazuhiro H (2020) Feature sharing cooperative network for semantic segmentation. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). ArXiv: 2101.07905
Sangwoo M, Minsu C, Jinwoo S (2019) InstaGAN: Instance-aware image-to-image translation. Proceedings of International Conference on Learning Representation (ICLR). ArXiv: 1812.10889
Sun N, Li W, Liu J, Han G, Wu CJ (2019) Fusing object semantics and deep appearance features for scene recognition. IEEE Trans Circuits Technol Syst Video, pp 1715–1728. https://doi.org/10.1109/GCCE50665.2020.9291836
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 11531–11539 6. https://doi.org/10.1109/CVPR42600.2020.01155
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. Proceedings of European Conference on Computer Vision (ECCV), pp 658–661 3
Yanbo X, Yurqin Y, Liming J, et al (2022) TransEditor: Transformer-based dual-space GAN for highly controllable facial editing. Proceeding of Computer Vision and Pattern Recognition. ArXiv preprint: arXiv:2203.17266
Yichun S, Xiao Y, Yangyue W, Xiaohui S (2021) SemanticStyleGAN: Learning compositional generative priors for controllable image synthesis and editing Proceedings of Computer Vision and Pattern Recognition ArXiv preprint arXiv: 2112.02236
Yue K, Li Y, Li H (2019) Progressive semantic image synthesis via generative adversarial network. Proceedings of IEEE Visual Communications and Image Processing (VCIP), pp 1–4. https://doi.org/10.1109/VCIP47243.2019.8966069
Zhou X et al (2021) CoCosNet v2: Full-resolution correspondence learning for image translation. Proceedings of IEEE/CVF Conf Comput Vision Patt Recog (CVPR):11460–11470. https://doi.org/10.1109/CVPR46437.2021.01130
Zhu P, Abdal R, Femiani J, Wonka P (2021) Barbershop: Gan-based image compositing using segmentation masks. TOG

Download references

Funding

This work is sponsored by the National Natural Science Foundation of China (grant no. 61673084), Natural Science Foundation of Liaoning Province (grant no. 20170540192, 20,180,550,866 and 2020-mzlh-24).

Author information

Authors and Affiliations

Dalian Minzu University, Liaohe Road, Dalian, 116600, Liaoning, China
Lin Mao, Meng Wang, Dawei Yang & Rubo Zhang

Authors

Lin Mao
View author publications
You can also search for this author in PubMed Google Scholar
Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Rubo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Theoretical proposal and experimental analysis were performed by [Lin Mao], [Dawei Yang] and [Meng Wang]. The first draft of the manuscript was written by [Lin Mao] and [Meng Wang], and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Meng Wang.

Ethics declarations

Compliance with Ethical Standards

All authors statement as follows:

(1) There are no potential conflicts of interest in this paper;

(2) No human or animal studies are involved;

(3) All authors approved the final manuscript, and was aware of the submission.

Conflict of Interest

(1) The authors have no relevant financial or non-financial interests to disclose;

(2) The authors have no competing interests to declare that are relevant to the content of this article;

(3) All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript;

(4) The authors have no financial or proprietary interests in any material discussed in this article.

Authors are responsible for correctness of the statements provided in the manuscript. The Editor-in-Chief reserves the right to reject submissions that do not meet the guidelines described in this section.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mao, L., Wang, M., Yang, D. et al. Mutual learning generative adversarial network. Multimed Tools Appl 83, 7479–7503 (2024). https://doi.org/10.1007/s11042-023-15951-4

Download citation

Received: 21 March 2022
Revised: 11 March 2023
Accepted: 29 May 2023
Published: 08 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15951-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mutual learning generative adversarial network

Abstract

Access this article

Similar content being viewed by others

Improving Disentanglement-Based Image-to-Image Translation with Feature Joint Block Fusion

InvolutionGAN: lightweight GAN with involution for unsupervised image-to-image translation

Semantic Diversity Image Translation Based on Deep Feature Difference and Attention Mechanism

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Compliance with Ethical Standards

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mutual learning generative adversarial network

Abstract

Access this article

Similar content being viewed by others

Improving Disentanglement-Based Image-to-Image Translation with Feature Joint Block Fusion

InvolutionGAN: lightweight GAN with involution for unsupervised image-to-image translation

Semantic Diversity Image Translation Based on Deep Feature Difference and Attention Mechanism

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Compliance with Ethical Standards

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation