Abstract
Most few-shot learning methods usually focus on the improvement of the single high-level feature extractor, however, a lot of practices have proved that the low-level features also contain abundant visual information and play an important role in the learning of feature extractors. In this paper, we propose a multi-level adaptive vision transformer few-shot learning network (MLVT-FSL). First, we propose a two-branch feature extraction network, which employs the multi-level feature extractor and vision transformer to extract the multi-level features and obtain the global relationship. It can bring 1.16% improvement to the baseline model under the setting of 5-way 5-shot on the MiniImageNet dataset. Then we use a feature adjustment module (FAM) to adaptively adjust features to achieve task-specific and more discriminative embeddings, which can bring 0.65% improvement to our network under the 5-way 5-shot condition on the MiniImageNet dataset. To further evaluate the performance of MLVT-FSL, we conduct some extensive experiments on several standard few-shot classification benchmarks on MiniImageNet, TieredImageNet and the fine-grained dataset CUB-200. Particularly, the proposed MLVT-FSL can achieve 82.46% and 84.97% top-1 classification accuracy under the setting of 5-way 5-shot on the benchmark dataset MiniImageNet and TieredImageNet. In addition, MLVT-FSL can obtain 87.04% 5-way 5-shot classification accuracy on CUB-200. All these results verify the effectiveness and performance of the proposed model.
Similar content being viewed by others
References
Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94(2):115–117
Cao C, Zhang Y (2022) Learning to compare relation: semantic alignment for few-shot learning. IEEE Trans Image Process 31:1462–1474
Clark EV, Casillas M (2015) First language acquisition
Dixit M, Kwitt R, Niethammer M, Vasconcelos N (2017) Aga: attribute-guided augmentation. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 7455–7463
Fan Q, Zhuo W, Tang CK, Tai YW (2020) Few-shot object detection with attention-RPN and multi-relation detector. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 4013–4022
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135
Garcia V, Bruna J (2017) Few-shot learning with graph neural networks. arxiv preprint arxiv:1711.04043
Girshick R (2015) Fast r-cnn. In: Proceedings of The IEEE international conference on computer vision, pp 1440–1448
Guo Y, Cheung NM (2020) Attentive weights generation for few shot learning via information maximization. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 13499–13508
Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of The IEEE international conference on computer vision, pp 3018–3027
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Kwitt R, Hegenbart S, Niethammer M (2016) One-shot learning of scene locations via feature trajectory transfer. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 78–86
Lai N, Kan M, Han C, Song X, Shan S (2021) Learning to learn adaptive classifier-predictor for few-shot learning. IEEE Trans Neural Netw Learn Syst 32(8):3458–3470
Lake B, Salakhutdinov R, Gross J, Tenenbaum J (2011) One shot learning of simple visual concepts. In: Proceedings of the annual meeting of the cognitive science society, vol 33
Land EH (1977) The retinex theory of color vision. Sci Am 237(6):108–129
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665
Li H, Tao R, Li J, Qin H, Ding Y, Wang S, Liu X (2021) Multi-pretext attention network for few-shot learning with self-supervision. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6
Li K, Zhang Y, Li K, Fu Y (2020) Adversarial feature hallucination networks for few-shot learning. In: Proceedings of The IEEE/CVF conference on computer vision and pattern recognition, pp 13470–13479
Li X, Deng J, Fang Y (2022) Few-shot object detection on remote sensing images. IEEE Trans Geosci Remote Sens 60:1–14
Li Z, Zhou F, Chen F, Li H (2017) Meta-sgd: learning to learn quickly for few-shot learning. arxiv preprint arxiv:1707.09835
Liu Y, Schiele B, Sun Q (2020) An ensemble of epoch-wise empirical bayes for few-shot learning. In: European conference on computer vision, pp 404–421
Markman EM (1989) Categorization and naming in children: problems of induction
Mishra N, Rohaninejad M, Chen X, Abbeel P (2017) A simple neural attentive meta-learner. arxiv preprint arxiv:1707.03141
Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms. arxiv preprint arxiv:1803.02999
Oreshkin B, Rodriguez P, Lacoste A (2018) Tadam: task dependent adaptive metric for improved few-shot learning. Adv Neural Inf Process Syst 31
Rajendran J, Irpan A, Jang E (2020) Meta-learning requires meta-augmentation. Adv Neural Inf Process Syst 33:5705–5715
Rakhlin A (2016) Convolutional neural networks for sentence classification. arxiv preprint arXiv:1408.5882
Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: ICLR
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. arxiv preprint arxiv:1803.00676
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. arxiv preprint arxiv:1807.05960
Shaban A, Bansal S, Liu Z, Essa I, Boots B (2017) One-shot learning for semantic segmentation. arxiv preprint arxiv:1709.03410
Simon C, Koniusz P, Nock R, Harandi M (2020) Adaptive subspaces for few-shot learning. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 4136–4145
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv:1409.1556
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of The IEEE conference on computer vision and pattern recognition, pp 1199–1208
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al. (2016) Matching networks for one shot learning. Adv Neural Inf Process Syst 29
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Wang C, Gu H, Su W (2022) Sar image classification using contrastive learning and pseudo-labels with limited data. IEEE Geosci Remote Sens Lett 19:1–5
Wang Y, Chao WL, Weinberger KQ, van der Maaten L (2019) Simpleshot: revisiting nearest-neighbor classification for few-shot learning. arxiv preprint arxiv:1911.04623
Wang Y, Xu C, Liu C, Zhang L, Fu Y (2020) Instance credibility inference for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12836–12845
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Xiong C, Li W, Liu Y, Wang M (2021) Multi-dimensional edge features graph neural network on few-shot image classification. IEEE Signal Process Lett 28:573–577
Xu F, Tenenbaum JB (2007) Word learning as Bayesian inference. Psychol Rev 114(2):245–272
Ye HJ, Hu H, Zhan DC, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8808–8817
Yoon SW, Seo J, Moon J (2019) Tapnet: neural network augmented with task-adaptive projection for few-shot learning. In: International conference on machine learning, pp 7115–7123
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567
Zhang Y, Zhang J, Guo X (2019) Kindling the darkness: a practical low-light image enhancer. In: Proceedings of The 27th ACM international conference on multimedia, pp 1632–1640
Zhang J, Zhang M, Lu Z, Xiang T (2021) Adargcn: adaptive aggregation GCN for few-shot learning. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3482–3491
Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: beyond efficient transformer for long sequence time-series forecasting. Proc AAAI Conf Artif Intell 35:11106–11115
Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of The IEEE/CVF international conference on computer vision, pp 593–602
Acknowledgements
This work is supported by the Natural Science Foundation of Liaoning Province (No. 2020-MS-080), the National Key Research and Development Program of China (No. 2017YFF0108800).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, H., Cai, X., Dou, J. et al. Multi-level adaptive few-shot learning network combined with vision transformer. J Ambient Intell Human Comput 14, 12477–12491 (2023). https://doi.org/10.1007/s12652-022-04327-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-04327-5