Abstract
Deep learning-based methods have recently advanced image generation by exploiting the valuable information within immense training data, but they struggle to synthesize new images for rare categories that are scarce enough to cover some visual concepts. Image generation based on few-shot learning is developed to capture the generalizable generative feature from limited data to produce new images for unseen categories. Existing few-shot methods focus on the high-level semantic difference between conditional images and fuse a generative feature based on the semantic metric. However, it ignores the impact of semantic information underlying different level feature subspaces throughout the generation process, leading to the degradation of visual quality when the diversity of synthetic images improves. In this work, we propose a novel Adaptive Multi-scale Modulation Generative Adversarial Network (AMMGAN) for few-shot image generation, leveraging the U-Net with skip connections to exploit multi-scale semantic metric information. Specifically, an adaptive self-metric fusion module is introduced at the junction between the encoder and decoder of generator, which measures the pixel-wise semantic information of conditional images as a self-metric attention map in each level of deep features to fuse a general feature of interest and adaptively adjust the mean and variance of the fused feature based on the high-level semantic feature of decoder. Meanwhile, the part network of discriminator is developed to co-learn with the generator to formulate a reference-metric modulation code embedding channel-wise metric information, which is integrated into the decoder of generator by learnable residual transformation to refine the unsampled fused feature from the channel dimension. Extensive experiments on several benchmark datasets demonstrate the effectiveness of AMMGAN on few-shot image generation and downstream visual classification tasks.
Similar content being viewed by others
Notes
Available at: http://www.robots.ox.ac.uk/vgg_face2/
Available at: http://www.robots.ox.ac.uk/flowers/
Available at: http://www.image-net.org/
References
Yu J, Tan M, Zhang H, Rui Y, Tao D (2022) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Li H, Zeng N, Wu P, Clawson K (2022) Cov-net: a computer-aided diagnosis method for recognizing covid-19 from chest x-ray images via machine vision. Expert Syst Appl 207:118029
Wu P, Li H, Zeng N, Li F (2022) Fmd-yolo: an efficient face mask detection method for covid-19 prevention and control in public. Image Vis Comput 117:104341
Zeng N, Wu P, Wang Z, Li H, Liu W, Liu X (2022) A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans Instrum Meas 71:1–14
Zhang T, Zhang K, Xiao C, Xiong Z, Lu J (2022a) Joint channel-spatial attention network for super-resolution image quality assessment. Appl Intell 52(15):17118–17132
Zhang Q, Jia R S, Li Z H, Li Y C, Sun H M (2022b) Superresolution reconstruction of optical remote sensing images based on a multiscale attention adversarial network. Appl Intell 52(15):17896–17911
Chen H, Li C, Wang G, Li X, Rahaman M M, Sun H, Hu W, Li Y, Liu W, Sun C et al (2022) Gashis-transformer: a multi-scale visual transformer approach for gastric histopathological image detection. Pattern Recogn 130:108827
Ho J, Saharia C, Chan W, Fleet D J, Norouzi M, Salimans T (2022) Cascaded diffusion models for high fidelity image generation. J Mach Learn Res 23(47):1–33
Liu M Y, Huang X, Mallya A, Karras T, Aila T, Lehtinen J, Kautz J (2019) Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10551–10560
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63:139–144
Huang J, Liao J, Kwong S (2022) Unsupervised image-to-image translation via pre-trained stylegan2 network. IEEE Trans Multimed 24:1435–1448
Dai L, Tang J (2022) iflowgan: an invertible flow-based generative adversarial network for unsupervised image-to-image translation. IEEE Trans Pattern Anal Mach Intell 44(8):4151–4162
Zheng Z, Yang J, Yu Z, Wang Y, Sun Z, Zheng B (2022) Not every sample is efficient: analogical generative adversarial network for unpaired image-to-image translation. Neural Netw 148:166–175
Zhu Q, Mao Q, Jia H, Noi O E N, Tu J (2022) Convolutional relation network for facial expression recognition in the wild with few-shot learning. Expert Syst Appl 189:116046
Chen W, Zhang Z, Wang W, Wang L, Wang Z, Tan T (2023) Few-shot learning with unsupervised part discovery and part-aligned similarity. Pattern Recogn 133:108986
Wang Z, Ma P, Chi Z, Li D, Yang H, Du W (2022) Multi-attention mutual information distributed framework for few-shot learning. Expert Syst Appl 202:117062
Ding H, Zhang H, Jiang X (2023) Self-regularized prototypical network for few-shot semantic segmentation. Pattern Recogn 133:109018
Antoniou A, Storkey A, Edwards H (2018) Augmenting image classifiers using data augmentation generative adversarial networks. In: International conference on artificial neural networks, pp 594–603
Liang W, Liu Z, Liu C (2020) Dawson: a domain adaptive few shot generation framework. arXiv:200100576
Clouâtre L, Demers M (2019) Figr: few-shot image generation with reptile. arXiv:190102199
Bartunov S, Vetrov D (2018) Few-shot generative modelling with gmn. In: International conference on artificial intelligence and statistics, vol 84, pp 670–678
Hong Y, Niu L, Zhang J, Zhang L (2020a) Matchinggan: matching-based few-shot image generation. In: 2020 IEEE International conference on multimedia and expo (ICME), pp 1–6
Hong Y, Niu L, Zhang J, Zhao W, Fu C, Zhang L (2020b) F2gan: fusing-and-filling gan for few-shot image generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 2535–2543
Gu Z, Li W, Huo J, Wang L, Gao Y (2021) Lofgan: fusing local representations for few-shot image generation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8463–8471
Wang L, Ding Z, Fu Y (2021) Generic multi-label annotation via adaptive graph and marginalized augmentation. ACM Trans Knowl Discov Data (TKDD) 16(1):1–20
Zhang K, Cao Z, Wu J (2020) Circular shift: an effective data augmentation method for convolutional neural network on image classification. In: 2020 IEEE International conference on image processing (ICIP), pp 1676–1680
Tran N T, Tran V H, Nguyen N B, Nguyen T K, Cheung N M (2021) On data augmentation for gan training. IEEE Trans Image Process 30:1882–1897
Suh S, Lee H, Lukowicz P, Lee Y O (2021) Cegan: Classification enhancement generative adversarial networks for unraveling data imbalance problems. Neural Netw 133:69–86
Wang W, Bao J, Guo S (2022) Neural generative model for clustering by separating particularity and commonality. Inf Sci 589:813–826
Gnanha A T, Cao W, Mao X, Wu S, Wong H S, Li Q (2022) The residual generator: an improved divergence minimization framework for gan. Pattern Recogn 121:108222
Yu S, Zhang K, Xiao C, Huang J Z, Li M J, Onizuka M (2022) Hsgan: reducing mode collapse in gans by the latent code distance of homogeneous samples. Comput Vis Image Underst 214:103314
Wang D, Qin X, Song F, Cheng L (2022) Stabilizing training of generative adversarial nets via langevin stein variational gradient descent. IEEE Trans Neural Netw Learn Syst 33(7):2768–2780
Wei J, Liu M, Luo J, Zhu A, Davis J, Liu Y (2022) Duelgan: a duel between two discriminators stabilizes the gan training. In: European conference on computer vision. Springer, pp 290–317
Zhu J, Huang C G, Shen C, Shen Y (2021) Cross-domain open-set machinery fault diagnosis based on adversarial network with multiple auxiliary classifiers. IEEE Trans Ind Inform 18(11):8077–8086
Wang D, Qin X, Song F, Cheng L (2022) Stabilizing training of generative adversarial nets via langevin stein variational gradient descent. IEEE Trans Neural Netw Learn Syst 33(7):2768–2780
Liu B, Zhu Y, Song K, Elgammal A (2021) Towards faster and stabilized GAN training for high-fidelity few-shot image synthesis. In: 9th International conference on learning representations
Tseng H Y, Jiang L, Liu C, Yang M H, Yang W (2021) Regularizing generative adversarial networks under limited data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7921–7931
Jiang L, Dai B, Wu W, Loy C C (2021) Deceive d: adaptive pseudo augmentation for gan training with limited data. Adv Neural Inf Process Syst 34:21655–21667
Yang M, Niu S, Wang Z, Li D, Du W (2023) Dfsgan: introducing editable and representative attributes for few-shot image generation. Eng Appl Artif Intell 117:105519
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135
Nichol A, Schulman J (2018) Reptile: a scalable metalearning algorithm. 2(3):4 arXiv:180302999
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Annual conference on neural information processing systems, pp 3630–3638
Yang C, Shen Y, Zhou B (2021) Semantic hierarchy emerges in deep generative representations for scene synthesis. Int J Comput Vis 129(5):1451–1466
Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Yang J, Zhou P, Wang Z (2021) Enlightengan: deep light enhancement without paired supervision. IEEE Trans Image Process 30:2340–2349
Huang X, Belongie S J (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International conference on computer vision, pp 1510–1519
Li H, Wang S, Wan R, Kot A C (2022) Gmfad: towards generalized visual recognition via multilayer feature alignment and disentanglement. IEEE Trans Pattern Anal Mach Intell 44(3):1289–1303
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2642–2651
Cao Q, Shen L, Xie W, Parkhi O M, Zisserman A (2018) Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition, pp 67–74
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: 2008 Sixth Indian conference on computer vision, graphics & image processing, pp 722–729
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under Grant No.61976150, Guiding Local Scientific and Technological Development Foundation of China under Grant No.YDZJSX2021C005, and Shanxi Provincial Key Research and Development Programme of China under Grant No.2022ZDYF128.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, W., Xu, W., Wu, X. et al. AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation. Appl Intell 53, 20979–20997 (2023). https://doi.org/10.1007/s10489-023-04559-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04559-8