Skip to main content
Log in

Mutual learning generative adversarial network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

It is the key to realize high fidelity image-to-image translation to realize the precise disentangling of single domain feature based on the establishment of the internal correlation between source and target domain. In order to improve the problem of difficult disentanglement and weak correlation with cross-domain features, this paper designs a feature regroup and redistribution module, to achieve feature hierarchical processing and feature interaction in a mutual space for controllable image-to-image translation. In the feature regroup unit, pyramid with different frequency intervals are designed to extract content feature such as multi-level spatial structure and global color semantic information. Further, the output of frequency pyramid is mapped into mutual pool for cross-domain feature difference comparison and similarity learning to achieve accurate analysis. In the redistribution unit, the mutual pool output and single domain feature are fused in the form of spatial attention to correct content and style feature transmission error. We also design a mutual learning generative adversarial network based on the RR module, which can satisfy minimum errors image-to-image translation in real scenes. The experiment results on BDD100K and Sim10k datasets show that FID, IS, KID_mean, and KID_stddev have greatly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

(1) The BDD100K dataset is introduced by Yu et al. in “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”, which can be download from https://www.bdd100k.com/.

(2) The Sim10k dataset is introduced by Johnson-Roberson et al. in “Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?”, which can be download from https://fcav.engin.umich.edu/projects/driving-in-the-matrix.

References

  1. Alharbi Y, Wonka P (2020) Disentangled image generation through structured noise injection. In CVPR

  2. Bello I, Zoph B, Le Q, Vaswani A, Shlens J (2019) Attention augmented convolutional networks. Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp 3285–3295 6. https://doi.org/10.1109/ICCV.2019.00338

  3. Daras G, Odena A, Zhang H, Dimakis AG (2020) Your Local GAN: Designing two dimensional local attention mechanisms for generative models. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14519–14527 4. https://doi.org/10.1109/CVPR42600.2020.01454

  4. Elad R, Yuval A, Or P et al (2021) Encoding in Style: a StyleGAN encoder for image-to-image translation proceedings of computer vision and pattern recognition ArXiv preprint arXiv: 2008.00951

  5. Emami H, Aliabadi MM, Dong M, Chinnam RB (2021) SPA-GAN: Spatial attention GAN for image-to-image translation. IEEE Transactions on Multimedia 23:391–401. https://doi.org/10.1109/TMM.2020.2975961

    Article  Google Scholar 

  6. Goodfellow I J, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. Proceedings of 28th Conference on Neural Information Processing Systems (NIPS). 2672-2680 3 3

  7. Hazama T, Seo M, Chen Y-W (2020) Generation of figures with controllable posture using Ss-InfoGAN. Proceedings of IEEE 9th Global Conference on Consumer Electronics (GCCE), pp 670–673

  8. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13708-13717. https://doi.org/10.1109/CVPR46437.2021.01350

  9. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 2011-2023 2. https://doi.org/10.1109/TPAMI.2019.2913372

  10. Huang X, Liu MY, Belongie S, et al (2018) Multimodal unsupervised image-to-image translation. Proceedings of European Conference on Computer Vision (ECCV), pp 457–460 3. https://doi.org/10.1007/978-3-030-01219-9_11

  11. Hyunsu K, Yunjey C, Junho K, Sungjoo Y, Youngjung U (2021) Exploiting spatial dimensions of latent in GAN for real-time image editing. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint: arXiv:2104.14754

  12. Junyan Z, Taesung P, Phillip I, et al (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint arXiv:1703.10593. https://doi.org/10.1109/iccv.2017.244

  13. Karnewar A, Wang O (2020) MSG-GAN: Multi-scale gradients for generative adversarial networks. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 7796–7805. https://doi.org/10.1109/CVPR42600.2020.00782

  14. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196

  15. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. InCVPR

  16. Kim J, Kim M, Kang H et al (2020) U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. ArXiv preprint arXiv 1907:10830

    Google Scholar 

  17. Kwon G, Ye JC. Diagonal attention and style-based gan for content-style disentanglement in image generation and translation. In ICCV, 2021

  18. Lee HY, Tseng HY, Mao Q et al (2020) DRIT++: Diverse image-to-image translation via disentangled representations. Int J Comput Vis 128:2402–2417. https://doi.org/10.1007/s11263-019-01284-z

    Article  Google Scholar 

  19. Li H, Tang J (2020) Dairy goat image generation based on improved-self-attention generative adversarial networks. IEEE Access. 10.1109, pp: 62448–62457 8. https://doi.org/10.1109/ACCESS.2020.2981496

  20. Li X, Zhang S, Hu J et al (2021) Image-to-image translation via hierarchical style disentanglement. Proc IEEE/CVF Conf Comput Vision Patt Recog (CVPR):8635–8644. https://doi.org/10.1109/CVPR46437.2021.00853

  21. Liu M Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. Proceedings of 31st Annual Conference on Neural Information Processing Systems (NIPS)

  22. Liu W, You J, Lee J (2021) HSIGAN: A conditional hyperspectral image synthesis method with auxiliary classifier. IEEE J Selected Top Appl Earth Obser Remote Sensing 14(2021):3063911. https://doi.org/10.1109/JSTARS.2021

    Article  Google Scholar 

  23. Olaf R, Philipp F, Thomas B (2015) U-Net: Convolutional networks for biomedical image segmentation. Proc Lect Notes Comput Sci. https://doi.org/10.1007/978-3-319-24574-4_28

  24. Park T, Zhu J-Y, Wang O, Jingwan L, Shechtman E, Efros A, Zhang R (2020) Swapping autoencoder for deep image manipulation. NeurIPS, 33

  25. Pizzati F, Cerri P, de Charette R (2021) CoMoGAN: continuous model-guided image-to-image translation. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14283–14293. https://doi.org/10.1109/CVPR46437.2021.01406

  26. Raymond Y, Chen C, Yianlim T et al (2017) Semantic image inpainting with deep generative models. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint arXiv:1607.07539

  27. Ryota I, Kazuhiro H (2020) Feature sharing cooperative network for semantic segmentation. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). ArXiv: 2101.07905

  28. Sangwoo M, Minsu C, Jinwoo S (2019) InstaGAN: Instance-aware image-to-image translation. Proceedings of International Conference on Learning Representation (ICLR). ArXiv: 1812.10889

  29. Sun N, Li W, Liu J, Han G, Wu CJ (2019) Fusing object semantics and deep appearance features for scene recognition. IEEE Trans Circuits Technol Syst Video, pp 1715–1728. https://doi.org/10.1109/GCCE50665.2020.9291836

  30. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 11531–11539 6. https://doi.org/10.1109/CVPR42600.2020.01155

  31. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. Proceedings of European Conference on Computer Vision (ECCV), pp 658–661 3

  32. Yanbo X, Yurqin Y, Liming J, et al (2022) TransEditor: Transformer-based dual-space GAN for highly controllable facial editing. Proceeding of Computer Vision and Pattern Recognition. ArXiv preprint: arXiv:2203.17266

  33. Yichun S, Xiao Y, Yangyue W, Xiaohui S (2021) SemanticStyleGAN: Learning compositional generative priors for controllable image synthesis and editing Proceedings of Computer Vision and Pattern Recognition ArXiv preprint arXiv: 2112.02236

  34. Yue K, Li Y, Li H (2019) Progressive semantic image synthesis via generative adversarial network. Proceedings of IEEE Visual Communications and Image Processing (VCIP), pp 1–4. https://doi.org/10.1109/VCIP47243.2019.8966069

  35. Zhou X et al (2021) CoCosNet v2: Full-resolution correspondence learning for image translation. Proceedings of IEEE/CVF Conf Comput Vision Patt Recog (CVPR):11460–11470. https://doi.org/10.1109/CVPR46437.2021.01130

  36. Zhu P, Abdal R, Femiani J, Wonka P (2021) Barbershop: Gan-based image compositing using segmentation masks. TOG

Download references

Funding

This work is sponsored by the National Natural Science Foundation of China (grant no. 61673084), Natural Science Foundation of Liaoning Province (grant no. 20170540192, 20,180,550,866 and 2020-mzlh-24).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Theoretical proposal and experimental analysis were performed by [Lin Mao], [Dawei Yang] and [Meng Wang]. The first draft of the manuscript was written by [Lin Mao] and [Meng Wang], and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Meng Wang.

Ethics declarations

Compliance with Ethical Standards

All authors statement as follows:

(1) There are no potential conflicts of interest in this paper;

(2) No human or animal studies are involved;

(3) All authors approved the final manuscript, and was aware of the submission.

Conflict of Interest

(1) The authors have no relevant financial or non-financial interests to disclose;

(2) The authors have no competing interests to declare that are relevant to the content of this article;

(3) All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript;

(4) The authors have no financial or proprietary interests in any material discussed in this article.

Authors are responsible for correctness of the statements provided in the manuscript. The Editor-in-Chief reserves the right to reject submissions that do not meet the guidelines described in this section.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, L., Wang, M., Yang, D. et al. Mutual learning generative adversarial network. Multimed Tools Appl 83, 7479–7503 (2024). https://doi.org/10.1007/s11042-023-15951-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15951-4

Keywords

Navigation