Skip to main content
Log in

AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep learning-based methods have recently advanced image generation by exploiting the valuable information within immense training data, but they struggle to synthesize new images for rare categories that are scarce enough to cover some visual concepts. Image generation based on few-shot learning is developed to capture the generalizable generative feature from limited data to produce new images for unseen categories. Existing few-shot methods focus on the high-level semantic difference between conditional images and fuse a generative feature based on the semantic metric. However, it ignores the impact of semantic information underlying different level feature subspaces throughout the generation process, leading to the degradation of visual quality when the diversity of synthetic images improves. In this work, we propose a novel Adaptive Multi-scale Modulation Generative Adversarial Network (AMMGAN) for few-shot image generation, leveraging the U-Net with skip connections to exploit multi-scale semantic metric information. Specifically, an adaptive self-metric fusion module is introduced at the junction between the encoder and decoder of generator, which measures the pixel-wise semantic information of conditional images as a self-metric attention map in each level of deep features to fuse a general feature of interest and adaptively adjust the mean and variance of the fused feature based on the high-level semantic feature of decoder. Meanwhile, the part network of discriminator is developed to co-learn with the generator to formulate a reference-metric modulation code embedding channel-wise metric information, which is integrated into the decoder of generator by learnable residual transformation to refine the unsampled fused feature from the channel dimension. Extensive experiments on several benchmark datasets demonstrate the effectiveness of AMMGAN on few-shot image generation and downstream visual classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Available at: http://www.robots.ox.ac.uk/vgg_face2/

  2. Available at: http://www.robots.ox.ac.uk/flowers/

  3. Available at: http://www.image-net.org/

References

  1. Yu J, Tan M, Zhang H, Rui Y, Tao D (2022) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  2. Li H, Zeng N, Wu P, Clawson K (2022) Cov-net: a computer-aided diagnosis method for recognizing covid-19 from chest x-ray images via machine vision. Expert Syst Appl 207:118029

    Article  Google Scholar 

  3. Wu P, Li H, Zeng N, Li F (2022) Fmd-yolo: an efficient face mask detection method for covid-19 prevention and control in public. Image Vis Comput 117:104341

    Article  Google Scholar 

  4. Zeng N, Wu P, Wang Z, Li H, Liu W, Liu X (2022) A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans Instrum Meas 71:1–14

    Google Scholar 

  5. Zhang T, Zhang K, Xiao C, Xiong Z, Lu J (2022a) Joint channel-spatial attention network for super-resolution image quality assessment. Appl Intell 52(15):17118–17132

    Article  Google Scholar 

  6. Zhang Q, Jia R S, Li Z H, Li Y C, Sun H M (2022b) Superresolution reconstruction of optical remote sensing images based on a multiscale attention adversarial network. Appl Intell 52(15):17896–17911

    Article  Google Scholar 

  7. Chen H, Li C, Wang G, Li X, Rahaman M M, Sun H, Hu W, Li Y, Liu W, Sun C et al (2022) Gashis-transformer: a multi-scale visual transformer approach for gastric histopathological image detection. Pattern Recogn 130:108827

    Article  Google Scholar 

  8. Ho J, Saharia C, Chan W, Fleet D J, Norouzi M, Salimans T (2022) Cascaded diffusion models for high fidelity image generation. J Mach Learn Res 23(47):1–33

    MathSciNet  MATH  Google Scholar 

  9. Liu M Y, Huang X, Mallya A, Karras T, Aila T, Lehtinen J, Kautz J (2019) Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10551–10560

  10. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63:139–144

    Article  Google Scholar 

  11. Huang J, Liao J, Kwong S (2022) Unsupervised image-to-image translation via pre-trained stylegan2 network. IEEE Trans Multimed 24:1435–1448

    Article  Google Scholar 

  12. Dai L, Tang J (2022) iflowgan: an invertible flow-based generative adversarial network for unsupervised image-to-image translation. IEEE Trans Pattern Anal Mach Intell 44(8):4151–4162

    Google Scholar 

  13. Zheng Z, Yang J, Yu Z, Wang Y, Sun Z, Zheng B (2022) Not every sample is efficient: analogical generative adversarial network for unpaired image-to-image translation. Neural Netw 148:166–175

    Article  Google Scholar 

  14. Zhu Q, Mao Q, Jia H, Noi O E N, Tu J (2022) Convolutional relation network for facial expression recognition in the wild with few-shot learning. Expert Syst Appl 189:116046

    Article  Google Scholar 

  15. Chen W, Zhang Z, Wang W, Wang L, Wang Z, Tan T (2023) Few-shot learning with unsupervised part discovery and part-aligned similarity. Pattern Recogn 133:108986

    Article  Google Scholar 

  16. Wang Z, Ma P, Chi Z, Li D, Yang H, Du W (2022) Multi-attention mutual information distributed framework for few-shot learning. Expert Syst Appl 202:117062

    Article  Google Scholar 

  17. Ding H, Zhang H, Jiang X (2023) Self-regularized prototypical network for few-shot semantic segmentation. Pattern Recogn 133:109018

    Article  Google Scholar 

  18. Antoniou A, Storkey A, Edwards H (2018) Augmenting image classifiers using data augmentation generative adversarial networks. In: International conference on artificial neural networks, pp 594–603

  19. Liang W, Liu Z, Liu C (2020) Dawson: a domain adaptive few shot generation framework. arXiv:200100576

  20. Clouâtre L, Demers M (2019) Figr: few-shot image generation with reptile. arXiv:190102199

  21. Bartunov S, Vetrov D (2018) Few-shot generative modelling with gmn. In: International conference on artificial intelligence and statistics, vol 84, pp 670–678

  22. Hong Y, Niu L, Zhang J, Zhang L (2020a) Matchinggan: matching-based few-shot image generation. In: 2020 IEEE International conference on multimedia and expo (ICME), pp 1–6

  23. Hong Y, Niu L, Zhang J, Zhao W, Fu C, Zhang L (2020b) F2gan: fusing-and-filling gan for few-shot image generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 2535–2543

  24. Gu Z, Li W, Huo J, Wang L, Gao Y (2021) Lofgan: fusing local representations for few-shot image generation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8463–8471

  25. Wang L, Ding Z, Fu Y (2021) Generic multi-label annotation via adaptive graph and marginalized augmentation. ACM Trans Knowl Discov Data (TKDD) 16(1):1–20

    Google Scholar 

  26. Zhang K, Cao Z, Wu J (2020) Circular shift: an effective data augmentation method for convolutional neural network on image classification. In: 2020 IEEE International conference on image processing (ICIP), pp 1676–1680

  27. Tran N T, Tran V H, Nguyen N B, Nguyen T K, Cheung N M (2021) On data augmentation for gan training. IEEE Trans Image Process 30:1882–1897

    Article  MathSciNet  Google Scholar 

  28. Suh S, Lee H, Lukowicz P, Lee Y O (2021) Cegan: Classification enhancement generative adversarial networks for unraveling data imbalance problems. Neural Netw 133:69–86

    Article  Google Scholar 

  29. Wang W, Bao J, Guo S (2022) Neural generative model for clustering by separating particularity and commonality. Inf Sci 589:813–826

    Article  Google Scholar 

  30. Gnanha A T, Cao W, Mao X, Wu S, Wong H S, Li Q (2022) The residual generator: an improved divergence minimization framework for gan. Pattern Recogn 121:108222

    Article  Google Scholar 

  31. Yu S, Zhang K, Xiao C, Huang J Z, Li M J, Onizuka M (2022) Hsgan: reducing mode collapse in gans by the latent code distance of homogeneous samples. Comput Vis Image Underst 214:103314

    Article  Google Scholar 

  32. Wang D, Qin X, Song F, Cheng L (2022) Stabilizing training of generative adversarial nets via langevin stein variational gradient descent. IEEE Trans Neural Netw Learn Syst 33(7):2768–2780

    Article  Google Scholar 

  33. Wei J, Liu M, Luo J, Zhu A, Davis J, Liu Y (2022) Duelgan: a duel between two discriminators stabilizes the gan training. In: European conference on computer vision. Springer, pp 290–317

  34. Zhu J, Huang C G, Shen C, Shen Y (2021) Cross-domain open-set machinery fault diagnosis based on adversarial network with multiple auxiliary classifiers. IEEE Trans Ind Inform 18(11):8077–8086

    Article  Google Scholar 

  35. Wang D, Qin X, Song F, Cheng L (2022) Stabilizing training of generative adversarial nets via langevin stein variational gradient descent. IEEE Trans Neural Netw Learn Syst 33(7):2768–2780

    Article  Google Scholar 

  36. Liu B, Zhu Y, Song K, Elgammal A (2021) Towards faster and stabilized GAN training for high-fidelity few-shot image synthesis. In: 9th International conference on learning representations

  37. Tseng H Y, Jiang L, Liu C, Yang M H, Yang W (2021) Regularizing generative adversarial networks under limited data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7921–7931

  38. Jiang L, Dai B, Wu W, Loy C C (2021) Deceive d: adaptive pseudo augmentation for gan training with limited data. Adv Neural Inf Process Syst 34:21655–21667

    Google Scholar 

  39. Yang M, Niu S, Wang Z, Li D, Du W (2023) Dfsgan: introducing editable and representative attributes for few-shot image generation. Eng Appl Artif Intell 117:105519

    Article  Google Scholar 

  40. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135

  41. Nichol A, Schulman J (2018) Reptile: a scalable metalearning algorithm. 2(3):4 arXiv:180302999

  42. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Annual conference on neural information processing systems, pp 3630–3638

  43. Yang C, Shen Y, Zhou B (2021) Semantic hierarchy emerges in deep generative representations for scene synthesis. Int J Comput Vis 129(5):1451–1466

    Article  Google Scholar 

  44. Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Yang J, Zhou P, Wang Z (2021) Enlightengan: deep light enhancement without paired supervision. IEEE Trans Image Process 30:2340–2349

    Article  Google Scholar 

  45. Huang X, Belongie S J (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International conference on computer vision, pp 1510–1519

  46. Li H, Wang S, Wan R, Kot A C (2022) Gmfad: towards generalized visual recognition via multilayer feature alignment and disentanglement. IEEE Trans Pattern Anal Mach Intell 44(3):1289–1303

    Article  Google Scholar 

  47. Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2642–2651

  48. Cao Q, Shen L, Xie W, Parkhi O M, Zisserman A (2018) Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition, pp 67–74

  49. Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: 2008 Sixth Indian conference on computer vision, graphics & image processing, pp 722–729

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under Grant No.61976150, Guiding Local Scientific and Technological Development Foundation of China under Grant No.YDZJSX2021C005, and Shanxi Provincial Key Research and Development Programme of China under Grant No.2022ZDYF128.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wenkuan Li or Haifang Li.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, W., Xu, W., Wu, X. et al. AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation. Appl Intell 53, 20979–20997 (2023). https://doi.org/10.1007/s10489-023-04559-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04559-8

Keywords

Navigation