AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation

Li, Wenkuan; Xu, Wenyi; Wu, Xubin; Wang, Qianshan; Lu, Qiang; Song, Tianxia; Li, Haifang

doi:10.1007/s10489-023-04559-8

AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation

Published: 06 May 2023

Volume 53, pages 20979–20997, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Wenkuan Li^1,2,
Wenyi Xu^1,2,
Xubin Wu^1,2,
Qianshan Wang^1,2,
Qiang Lu³,
Tianxia Song⁴ &
…
Haifang Li^1,2

396 Accesses
1 Altmetric
Explore all metrics

Abstract

Deep learning-based methods have recently advanced image generation by exploiting the valuable information within immense training data, but they struggle to synthesize new images for rare categories that are scarce enough to cover some visual concepts. Image generation based on few-shot learning is developed to capture the generalizable generative feature from limited data to produce new images for unseen categories. Existing few-shot methods focus on the high-level semantic difference between conditional images and fuse a generative feature based on the semantic metric. However, it ignores the impact of semantic information underlying different level feature subspaces throughout the generation process, leading to the degradation of visual quality when the diversity of synthetic images improves. In this work, we propose a novel Adaptive Multi-scale Modulation Generative Adversarial Network (AMMGAN) for few-shot image generation, leveraging the U-Net with skip connections to exploit multi-scale semantic metric information. Specifically, an adaptive self-metric fusion module is introduced at the junction between the encoder and decoder of generator, which measures the pixel-wise semantic information of conditional images as a self-metric attention map in each level of deep features to fuse a general feature of interest and adaptively adjust the mean and variance of the fused feature based on the high-level semantic feature of decoder. Meanwhile, the part network of discriminator is developed to co-learn with the generator to formulate a reference-metric modulation code embedding channel-wise metric information, which is integrated into the decoder of generator by learnable residual transformation to refine the unsampled fused feature from the channel dimension. Extensive experiments on several benchmark datasets demonstrate the effectiveness of AMMGAN on few-shot image generation and downstream visual classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Few-Shot Image Generation via Lightweight Octave Generative Adversarial Networks

Rethinking cross-domain semantic relation for few-shot image generation

Article 27 June 2023

Few-shot image generation based on contrastive meta-learning generative adversarial network

Article 21 July 2022

Notes

Available at: http://www.robots.ox.ac.uk/vgg_face2/
Available at: http://www.robots.ox.ac.uk/flowers/
Available at: http://www.image-net.org/

References

Yu J, Tan M, Zhang H, Rui Y, Tao D (2022) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Article Google Scholar
Li H, Zeng N, Wu P, Clawson K (2022) Cov-net: a computer-aided diagnosis method for recognizing covid-19 from chest x-ray images via machine vision. Expert Syst Appl 207:118029
Article Google Scholar
Wu P, Li H, Zeng N, Li F (2022) Fmd-yolo: an efficient face mask detection method for covid-19 prevention and control in public. Image Vis Comput 117:104341
Article Google Scholar
Zeng N, Wu P, Wang Z, Li H, Liu W, Liu X (2022) A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans Instrum Meas 71:1–14
Google Scholar
Zhang T, Zhang K, Xiao C, Xiong Z, Lu J (2022a) Joint channel-spatial attention network for super-resolution image quality assessment. Appl Intell 52(15):17118–17132
Article Google Scholar
Zhang Q, Jia R S, Li Z H, Li Y C, Sun H M (2022b) Superresolution reconstruction of optical remote sensing images based on a multiscale attention adversarial network. Appl Intell 52(15):17896–17911
Article Google Scholar
Chen H, Li C, Wang G, Li X, Rahaman M M, Sun H, Hu W, Li Y, Liu W, Sun C et al (2022) Gashis-transformer: a multi-scale visual transformer approach for gastric histopathological image detection. Pattern Recogn 130:108827
Article Google Scholar
Ho J, Saharia C, Chan W, Fleet D J, Norouzi M, Salimans T (2022) Cascaded diffusion models for high fidelity image generation. J Mach Learn Res 23(47):1–33
MathSciNet MATH Google Scholar
Liu M Y, Huang X, Mallya A, Karras T, Aila T, Lehtinen J, Kautz J (2019) Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10551–10560
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63:139–144
Article Google Scholar
Huang J, Liao J, Kwong S (2022) Unsupervised image-to-image translation via pre-trained stylegan2 network. IEEE Trans Multimed 24:1435–1448
Article Google Scholar
Dai L, Tang J (2022) iflowgan: an invertible flow-based generative adversarial network for unsupervised image-to-image translation. IEEE Trans Pattern Anal Mach Intell 44(8):4151–4162
Google Scholar
Zheng Z, Yang J, Yu Z, Wang Y, Sun Z, Zheng B (2022) Not every sample is efficient: analogical generative adversarial network for unpaired image-to-image translation. Neural Netw 148:166–175
Article Google Scholar
Zhu Q, Mao Q, Jia H, Noi O E N, Tu J (2022) Convolutional relation network for facial expression recognition in the wild with few-shot learning. Expert Syst Appl 189:116046
Article Google Scholar
Chen W, Zhang Z, Wang W, Wang L, Wang Z, Tan T (2023) Few-shot learning with unsupervised part discovery and part-aligned similarity. Pattern Recogn 133:108986
Article Google Scholar
Wang Z, Ma P, Chi Z, Li D, Yang H, Du W (2022) Multi-attention mutual information distributed framework for few-shot learning. Expert Syst Appl 202:117062
Article Google Scholar
Ding H, Zhang H, Jiang X (2023) Self-regularized prototypical network for few-shot semantic segmentation. Pattern Recogn 133:109018
Article Google Scholar
Antoniou A, Storkey A, Edwards H (2018) Augmenting image classifiers using data augmentation generative adversarial networks. In: International conference on artificial neural networks, pp 594–603
Liang W, Liu Z, Liu C (2020) Dawson: a domain adaptive few shot generation framework. arXiv:200100576
Clouâtre L, Demers M (2019) Figr: few-shot image generation with reptile. arXiv:190102199
Bartunov S, Vetrov D (2018) Few-shot generative modelling with gmn. In: International conference on artificial intelligence and statistics, vol 84, pp 670–678
Hong Y, Niu L, Zhang J, Zhang L (2020a) Matchinggan: matching-based few-shot image generation. In: 2020 IEEE International conference on multimedia and expo (ICME), pp 1–6
Hong Y, Niu L, Zhang J, Zhao W, Fu C, Zhang L (2020b) F2gan: fusing-and-filling gan for few-shot image generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 2535–2543
Gu Z, Li W, Huo J, Wang L, Gao Y (2021) Lofgan: fusing local representations for few-shot image generation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8463–8471
Wang L, Ding Z, Fu Y (2021) Generic multi-label annotation via adaptive graph and marginalized augmentation. ACM Trans Knowl Discov Data (TKDD) 16(1):1–20
Google Scholar
Zhang K, Cao Z, Wu J (2020) Circular shift: an effective data augmentation method for convolutional neural network on image classification. In: 2020 IEEE International conference on image processing (ICIP), pp 1676–1680
Tran N T, Tran V H, Nguyen N B, Nguyen T K, Cheung N M (2021) On data augmentation for gan training. IEEE Trans Image Process 30:1882–1897
Article MathSciNet Google Scholar
Suh S, Lee H, Lukowicz P, Lee Y O (2021) Cegan: Classification enhancement generative adversarial networks for unraveling data imbalance problems. Neural Netw 133:69–86
Article Google Scholar
Wang W, Bao J, Guo S (2022) Neural generative model for clustering by separating particularity and commonality. Inf Sci 589:813–826
Article Google Scholar
Gnanha A T, Cao W, Mao X, Wu S, Wong H S, Li Q (2022) The residual generator: an improved divergence minimization framework for gan. Pattern Recogn 121:108222
Article Google Scholar
Yu S, Zhang K, Xiao C, Huang J Z, Li M J, Onizuka M (2022) Hsgan: reducing mode collapse in gans by the latent code distance of homogeneous samples. Comput Vis Image Underst 214:103314
Article Google Scholar
Wang D, Qin X, Song F, Cheng L (2022) Stabilizing training of generative adversarial nets via langevin stein variational gradient descent. IEEE Trans Neural Netw Learn Syst 33(7):2768–2780
Article Google Scholar
Wei J, Liu M, Luo J, Zhu A, Davis J, Liu Y (2022) Duelgan: a duel between two discriminators stabilizes the gan training. In: European conference on computer vision. Springer, pp 290–317
Zhu J, Huang C G, Shen C, Shen Y (2021) Cross-domain open-set machinery fault diagnosis based on adversarial network with multiple auxiliary classifiers. IEEE Trans Ind Inform 18(11):8077–8086
Article Google Scholar
Wang D, Qin X, Song F, Cheng L (2022) Stabilizing training of generative adversarial nets via langevin stein variational gradient descent. IEEE Trans Neural Netw Learn Syst 33(7):2768–2780
Article Google Scholar
Liu B, Zhu Y, Song K, Elgammal A (2021) Towards faster and stabilized GAN training for high-fidelity few-shot image synthesis. In: 9th International conference on learning representations
Tseng H Y, Jiang L, Liu C, Yang M H, Yang W (2021) Regularizing generative adversarial networks under limited data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7921–7931
Jiang L, Dai B, Wu W, Loy C C (2021) Deceive d: adaptive pseudo augmentation for gan training with limited data. Adv Neural Inf Process Syst 34:21655–21667
Google Scholar
Yang M, Niu S, Wang Z, Li D, Du W (2023) Dfsgan: introducing editable and representative attributes for few-shot image generation. Eng Appl Artif Intell 117:105519
Article Google Scholar
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135
Nichol A, Schulman J (2018) Reptile: a scalable metalearning algorithm. 2(3):4 arXiv:180302999
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Annual conference on neural information processing systems, pp 3630–3638
Yang C, Shen Y, Zhou B (2021) Semantic hierarchy emerges in deep generative representations for scene synthesis. Int J Comput Vis 129(5):1451–1466
Article Google Scholar
Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Yang J, Zhou P, Wang Z (2021) Enlightengan: deep light enhancement without paired supervision. IEEE Trans Image Process 30:2340–2349
Article Google Scholar
Huang X, Belongie S J (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International conference on computer vision, pp 1510–1519
Li H, Wang S, Wan R, Kot A C (2022) Gmfad: towards generalized visual recognition via multilayer feature alignment and disentanglement. IEEE Trans Pattern Anal Mach Intell 44(3):1289–1303
Article Google Scholar
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2642–2651
Cao Q, Shen L, Xie W, Parkhi O M, Zisserman A (2018) Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition, pp 67–74
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: 2008 Sixth Indian conference on computer vision, graphics & image processing, pp 722–729

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under Grant No.61976150, Guiding Local Scientific and Technological Development Foundation of China under Grant No.YDZJSX2021C005, and Shanxi Provincial Key Research and Development Programme of China under Grant No.2022ZDYF128.

Author information

Authors and Affiliations

School of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
Wenkuan Li, Wenyi Xu, Xubin Wu, Qianshan Wang & Haifang Li
Shanxi Provincial Engineering Research Center of Brain Science and Brain-like Computing, Taiyuan, 030024, China
Wenkuan Li, Wenyi Xu, Xubin Wu, Qianshan Wang & Haifang Li
School of Information Science and Technology, Northwest University, Xi’an, 710069, China
Qiang Lu
School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
Tianxia Song

Authors

Wenkuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenyi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xubin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qianshan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Tianxia Song
View author publications
You can also search for this author in PubMed Google Scholar
Haifang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wenkuan Li or Haifang Li.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, W., Xu, W., Wu, X. et al. AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation. Appl Intell 53, 20979–20997 (2023). https://doi.org/10.1007/s10489-023-04559-8

Download citation

Accepted: 02 March 2023
Published: 06 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10489-023-04559-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation

Abstract

Access this article

Similar content being viewed by others

Efficient Few-Shot Image Generation via Lightweight Octave Generative Adversarial Networks

Rethinking cross-domain semantic relation for few-shot image generation

Few-shot image generation based on contrastive meta-learning generative adversarial network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation

Abstract

Access this article

Similar content being viewed by others

Efficient Few-Shot Image Generation via Lightweight Octave Generative Adversarial Networks

Rethinking cross-domain semantic relation for few-shot image generation

Few-shot image generation based on contrastive meta-learning generative adversarial network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation