Skip to main content
Log in

Multi-level adaptive few-shot learning network combined with vision transformer

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Most few-shot learning methods usually focus on the improvement of the single high-level feature extractor, however, a lot of practices have proved that the low-level features also contain abundant visual information and play an important role in the learning of feature extractors. In this paper, we propose a multi-level adaptive vision transformer few-shot learning network (MLVT-FSL). First, we propose a two-branch feature extraction network, which employs the multi-level feature extractor and vision transformer to extract the multi-level features and obtain the global relationship. It can bring 1.16% improvement to the baseline model under the setting of 5-way 5-shot on the MiniImageNet dataset. Then we use a feature adjustment module (FAM) to adaptively adjust features to achieve task-specific and more discriminative embeddings, which can bring 0.65% improvement to our network under the 5-way 5-shot condition on the MiniImageNet dataset. To further evaluate the performance of MLVT-FSL, we conduct some extensive experiments on several standard few-shot classification benchmarks on MiniImageNet, TieredImageNet and the fine-grained dataset CUB-200. Particularly, the proposed MLVT-FSL can achieve 82.46% and 84.97% top-1 classification accuracy under the setting of 5-way 5-shot on the benchmark dataset MiniImageNet and TieredImageNet. In addition, MLVT-FSL can obtain 87.04% 5-way 5-shot classification accuracy on CUB-200. All these results verify the effectiveness and performance of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94(2):115–117

    Article  Google Scholar 

  • Cao C, Zhang Y (2022) Learning to compare relation: semantic alignment for few-shot learning. IEEE Trans Image Process 31:1462–1474

    Article  Google Scholar 

  • Clark EV, Casillas M (2015) First language acquisition

  • Dixit M, Kwitt R, Niethammer M, Vasconcelos N (2017) Aga: attribute-guided augmentation. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 7455–7463

  • Fan Q, Zhuo W, Tang CK, Tai YW (2020) Few-shot object detection with attention-RPN and multi-relation detector. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 4013–4022

  • Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135

  • Garcia V, Bruna J (2017) Few-shot learning with graph neural networks. arxiv preprint arxiv:1711.04043

  • Girshick R (2015) Fast r-cnn. In: Proceedings of The IEEE international conference on computer vision, pp 1440–1448

  • Guo Y, Cheung NM (2020) Attentive weights generation for few shot learning via information maximization. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 13499–13508

  • Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of The IEEE international conference on computer vision, pp 3018–3027

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 770–778

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  • Kwitt R, Hegenbart S, Niethammer M (2016) One-shot learning of scene locations via feature trajectory transfer. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 78–86

  • Lai N, Kan M, Han C, Song X, Shan S (2021) Learning to learn adaptive classifier-predictor for few-shot learning. IEEE Trans Neural Netw Learn Syst 32(8):3458–3470

    Article  Google Scholar 

  • Lake B, Salakhutdinov R, Gross J, Tenenbaum J (2011) One shot learning of simple visual concepts. In: Proceedings of the annual meeting of the cognitive science society, vol 33

  • Land EH (1977) The retinex theory of color vision. Sci Am 237(6):108–129

    Article  Google Scholar 

  • Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665

  • Li H, Tao R, Li J, Qin H, Ding Y, Wang S, Liu X (2021) Multi-pretext attention network for few-shot learning with self-supervision. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6

  • Li K, Zhang Y, Li K, Fu Y (2020) Adversarial feature hallucination networks for few-shot learning. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 13470–13479

  • Li X, Deng J, Fang Y (2022) Few-shot object detection on remote sensing images. IEEE Trans Geosci Remote Sens 60:1–14

    Google Scholar 

  • Li Z, Zhou F, Chen F, Li H (2017) Meta-sgd: learning to learn quickly for few-shot learning. arxiv preprint arxiv:1707.09835

  • Liu Y, Schiele B, Sun Q (2020) An ensemble of epoch-wise empirical bayes for few-shot learning. In: European conference on computer vision, pp 404–421

  • Markman EM (1989) Categorization and naming in children: problems of induction

  • Mishra N, Rohaninejad M, Chen X, Abbeel P (2017) A simple neural attentive meta-learner. arxiv preprint arxiv:1707.03141

  • Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms. arxiv preprint arxiv:1803.02999

  • Oreshkin B, Rodriguez P, Lacoste A (2018) Tadam: task dependent adaptive metric for improved few-shot learning. Adv Neural Inf Process Syst 31

  • Rajendran J, Irpan A, Jang E (2020) Meta-learning requires meta-augmentation. Adv Neural Inf Process Syst 33:5705–5715

    Google Scholar 

  • Rakhlin A (2016) Convolutional neural networks for sentence classification. arxiv preprint arXiv:1408.5882

  • Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: ICLR

  • Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. arxiv preprint arxiv:1803.00676

  • Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28

  • Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. arxiv preprint arxiv:1807.05960

  • Shaban A, Bansal S, Liu Z, Essa I, Boots B (2017) One-shot learning for semantic segmentation. arxiv preprint arxiv:1709.03410

  • Simon C, Koniusz P, Nock R, Harandi M (2020) Adaptive subspaces for few-shot learning. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 4136–4145

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv:1409.1556

  • Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30

  • Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 1199–1208

  • Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al. (2016) Matching networks for one shot learning. Adv Neural Inf Process Syst 29

  • Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  • Wang C, Gu H, Su W (2022) Sar image classification using contrastive learning and pseudo-labels with limited data. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  • Wang Y, Chao WL, Weinberger KQ, van der Maaten L (2019) Simpleshot: revisiting nearest-neighbor classification for few-shot learning. arxiv preprint arxiv:1911.04623

  • Wang Y, Xu C, Liu C, Zhang L, Fu Y (2020) Instance credibility inference for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12836–12845

  • Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  • Xiong C, Li W, Liu Y, Wang M (2021) Multi-dimensional edge features graph neural network on few-shot image classification. IEEE Signal Process Lett 28:573–577

    Article  Google Scholar 

  • Xu F, Tenenbaum JB (2007) Word learning as Bayesian inference. Psychol Rev 114(2):245–272

    Article  Google Scholar 

  • Ye HJ, Hu H, Zhan DC, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8808–8817

  • Yoon SW, Seo J, Moon J (2019) Tapnet: neural network augmented with task-adaptive projection for few-shot learning. In: International conference on machine learning, pp 7115–7123

  • Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567

  • Zhang Y, Zhang J, Guo X (2019) Kindling the darkness: a practical low-light image enhancer. In: Proceedings of The 27th ACM international conference on multimedia, pp 1632–1640

  • Zhang J, Zhang M, Lu Z, Xiang T (2021) Adargcn: adaptive aggregation GCN for few-shot learning. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3482–3491

  • Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: beyond efficient transformer for long sequence time-series forecasting. Proc AAAI Conf Artif Intell 35:11106–11115

    Google Scholar 

  • Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of The IEEE/CVF international conference on computer vision, pp 593–602

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Liaoning Province (No. 2020-MS-080), the National Key Research and Development Program of China (No. 2017YFF0108800).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hegui Zhu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, H., Cai, X., Dou, J. et al. Multi-level adaptive few-shot learning network combined with vision transformer. J Ambient Intell Human Comput 14, 12477–12491 (2023). https://doi.org/10.1007/s12652-022-04327-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-04327-5

Keywords

Navigation