Abstract
It is the key to realize high fidelity image-to-image translation to realize the precise disentangling of single domain feature based on the establishment of the internal correlation between source and target domain. In order to improve the problem of difficult disentanglement and weak correlation with cross-domain features, this paper designs a feature regroup and redistribution module, to achieve feature hierarchical processing and feature interaction in a mutual space for controllable image-to-image translation. In the feature regroup unit, pyramid with different frequency intervals are designed to extract content feature such as multi-level spatial structure and global color semantic information. Further, the output of frequency pyramid is mapped into mutual pool for cross-domain feature difference comparison and similarity learning to achieve accurate analysis. In the redistribution unit, the mutual pool output and single domain feature are fused in the form of spatial attention to correct content and style feature transmission error. We also design a mutual learning generative adversarial network based on the RR module, which can satisfy minimum errors image-to-image translation in real scenes. The experiment results on BDD100K and Sim10k datasets show that FID, IS, KID_mean, and KID_stddev have greatly improved.
Similar content being viewed by others
Data availability
(1) The BDD100K dataset is introduced by Yu et al. in “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”, which can be download from https://www.bdd100k.com/.
(2) The Sim10k dataset is introduced by Johnson-Roberson et al. in “Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?”, which can be download from https://fcav.engin.umich.edu/projects/driving-in-the-matrix.
References
Alharbi Y, Wonka P (2020) Disentangled image generation through structured noise injection. In CVPR
Bello I, Zoph B, Le Q, Vaswani A, Shlens J (2019) Attention augmented convolutional networks. Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp 3285–3295 6. https://doi.org/10.1109/ICCV.2019.00338
Daras G, Odena A, Zhang H, Dimakis AG (2020) Your Local GAN: Designing two dimensional local attention mechanisms for generative models. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14519–14527 4. https://doi.org/10.1109/CVPR42600.2020.01454
Elad R, Yuval A, Or P et al (2021) Encoding in Style: a StyleGAN encoder for image-to-image translation proceedings of computer vision and pattern recognition ArXiv preprint arXiv: 2008.00951
Emami H, Aliabadi MM, Dong M, Chinnam RB (2021) SPA-GAN: Spatial attention GAN for image-to-image translation. IEEE Transactions on Multimedia 23:391–401. https://doi.org/10.1109/TMM.2020.2975961
Goodfellow I J, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. Proceedings of 28th Conference on Neural Information Processing Systems (NIPS). 2672-2680 3 3
Hazama T, Seo M, Chen Y-W (2020) Generation of figures with controllable posture using Ss-InfoGAN. Proceedings of IEEE 9th Global Conference on Consumer Electronics (GCCE), pp 670–673
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13708-13717. https://doi.org/10.1109/CVPR46437.2021.01350
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 2011-2023 2. https://doi.org/10.1109/TPAMI.2019.2913372
Huang X, Liu MY, Belongie S, et al (2018) Multimodal unsupervised image-to-image translation. Proceedings of European Conference on Computer Vision (ECCV), pp 457–460 3. https://doi.org/10.1007/978-3-030-01219-9_11
Hyunsu K, Yunjey C, Junho K, Sungjoo Y, Youngjung U (2021) Exploiting spatial dimensions of latent in GAN for real-time image editing. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint: arXiv:2104.14754
Junyan Z, Taesung P, Phillip I, et al (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint arXiv:1703.10593. https://doi.org/10.1109/iccv.2017.244
Karnewar A, Wang O (2020) MSG-GAN: Multi-scale gradients for generative adversarial networks. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 7796–7805. https://doi.org/10.1109/CVPR42600.2020.00782
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. InCVPR
Kim J, Kim M, Kang H et al (2020) U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. ArXiv preprint arXiv 1907:10830
Kwon G, Ye JC. Diagonal attention and style-based gan for content-style disentanglement in image generation and translation. In ICCV, 2021
Lee HY, Tseng HY, Mao Q et al (2020) DRIT++: Diverse image-to-image translation via disentangled representations. Int J Comput Vis 128:2402–2417. https://doi.org/10.1007/s11263-019-01284-z
Li H, Tang J (2020) Dairy goat image generation based on improved-self-attention generative adversarial networks. IEEE Access. 10.1109, pp: 62448–62457 8. https://doi.org/10.1109/ACCESS.2020.2981496
Li X, Zhang S, Hu J et al (2021) Image-to-image translation via hierarchical style disentanglement. Proc IEEE/CVF Conf Comput Vision Patt Recog (CVPR):8635–8644. https://doi.org/10.1109/CVPR46437.2021.00853
Liu M Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. Proceedings of 31st Annual Conference on Neural Information Processing Systems (NIPS)
Liu W, You J, Lee J (2021) HSIGAN: A conditional hyperspectral image synthesis method with auxiliary classifier. IEEE J Selected Top Appl Earth Obser Remote Sensing 14(2021):3063911. https://doi.org/10.1109/JSTARS.2021
Olaf R, Philipp F, Thomas B (2015) U-Net: Convolutional networks for biomedical image segmentation. Proc Lect Notes Comput Sci. https://doi.org/10.1007/978-3-319-24574-4_28
Park T, Zhu J-Y, Wang O, Jingwan L, Shechtman E, Efros A, Zhang R (2020) Swapping autoencoder for deep image manipulation. NeurIPS, 33
Pizzati F, Cerri P, de Charette R (2021) CoMoGAN: continuous model-guided image-to-image translation. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14283–14293. https://doi.org/10.1109/CVPR46437.2021.01406
Raymond Y, Chen C, Yianlim T et al (2017) Semantic image inpainting with deep generative models. Proceedings of Computer Vision and Pattern Recognition. ArXiv preprint arXiv:1607.07539
Ryota I, Kazuhiro H (2020) Feature sharing cooperative network for semantic segmentation. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). ArXiv: 2101.07905
Sangwoo M, Minsu C, Jinwoo S (2019) InstaGAN: Instance-aware image-to-image translation. Proceedings of International Conference on Learning Representation (ICLR). ArXiv: 1812.10889
Sun N, Li W, Liu J, Han G, Wu CJ (2019) Fusing object semantics and deep appearance features for scene recognition. IEEE Trans Circuits Technol Syst Video, pp 1715–1728. https://doi.org/10.1109/GCCE50665.2020.9291836
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 11531–11539 6. https://doi.org/10.1109/CVPR42600.2020.01155
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. Proceedings of European Conference on Computer Vision (ECCV), pp 658–661 3
Yanbo X, Yurqin Y, Liming J, et al (2022) TransEditor: Transformer-based dual-space GAN for highly controllable facial editing. Proceeding of Computer Vision and Pattern Recognition. ArXiv preprint: arXiv:2203.17266
Yichun S, Xiao Y, Yangyue W, Xiaohui S (2021) SemanticStyleGAN: Learning compositional generative priors for controllable image synthesis and editing Proceedings of Computer Vision and Pattern Recognition ArXiv preprint arXiv: 2112.02236
Yue K, Li Y, Li H (2019) Progressive semantic image synthesis via generative adversarial network. Proceedings of IEEE Visual Communications and Image Processing (VCIP), pp 1–4. https://doi.org/10.1109/VCIP47243.2019.8966069
Zhou X et al (2021) CoCosNet v2: Full-resolution correspondence learning for image translation. Proceedings of IEEE/CVF Conf Comput Vision Patt Recog (CVPR):11460–11470. https://doi.org/10.1109/CVPR46437.2021.01130
Zhu P, Abdal R, Femiani J, Wonka P (2021) Barbershop: Gan-based image compositing using segmentation masks. TOG
Funding
This work is sponsored by the National Natural Science Foundation of China (grant no. 61673084), Natural Science Foundation of Liaoning Province (grant no. 20170540192, 20,180,550,866 and 2020-mzlh-24).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Theoretical proposal and experimental analysis were performed by [Lin Mao], [Dawei Yang] and [Meng Wang]. The first draft of the manuscript was written by [Lin Mao] and [Meng Wang], and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Compliance with Ethical Standards
All authors statement as follows:
(1) There are no potential conflicts of interest in this paper;
(2) No human or animal studies are involved;
(3) All authors approved the final manuscript, and was aware of the submission.
Conflict of Interest
(1) The authors have no relevant financial or non-financial interests to disclose;
(2) The authors have no competing interests to declare that are relevant to the content of this article;
(3) All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript;
(4) The authors have no financial or proprietary interests in any material discussed in this article.
Authors are responsible for correctness of the statements provided in the manuscript. The Editor-in-Chief reserves the right to reject submissions that do not meet the guidelines described in this section.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mao, L., Wang, M., Yang, D. et al. Mutual learning generative adversarial network. Multimed Tools Appl 83, 7479–7503 (2024). https://doi.org/10.1007/s11042-023-15951-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15951-4