Self-attention network for few-shot learning based on nearest-neighbor algorithm

Wang, Guangpeng; Wang, Yongxiong

doi:10.1007/s00138-023-01375-5

Self-attention network for few-shot learning based on nearest-neighbor algorithm

Original Paper
Published: 10 February 2023

Volume 34, article number 28, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

673 Accesses
1 Citation
1 Altmetric
Explore all metrics

A Correction to this article was published on 14 March 2023

This article has been updated

Abstract

Few-shot learning is a challenging task because it focuses on classifying new object categories given only limited labeled samples and often results in poor generalization. Most of existing methods based on metric learning are not very simple for the networks. Moreover, these methods cannot solve low-data problem and are unable to extract discriminative features. This paper presents a simple, effective and general framework for few-shot image classification. Specifically, we first exploit data augmentation technique to alleviate overfitting problem. We propose a self-attention position attention module (PAM) which is utilized to extract discriminative features for constructing a few-shot representation model. Furthermore, we design a novel nearest-neighbor learner with feature transformation to obtain the appealing accuracy in few-shot learning (FSL). Our network is comprised of backbone and attention module and trained from scratch in an end-to-end manner. The backbone module is to extract multi-level features. The self-attention PAM is used to discover non-local information and allow long-range dependency. Excellent performance on benchmark demonstrates that our work provides a unified and effective approach for few-shot image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Learning to Prompt for Vision-Language Models

Article 31 July 2022

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

Change history

14 March 2023
A Correction to this paper has been published: https://doi.org/10.1007/s00138-023-01387-1

References

Gidaris, S., Bursuc, A., Komodakis, N., Pérez, P., Cord, M.: Boosting few-shot visual learning with self-supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8059–8068 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Käding, C., Rodner, E., Freytag, A., Denzler, J.: Fine-tuning deep neural networks in continuous learning scenarios. In: Asian Conference on Computer Vision, pp. 588–605. Springer (2016)
Wang, Y., Hu, C., Wang, G., Lin, X.: Contrastive representation for few-shot vehicle footprint recognition. In: 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2021)
Yoon, S.W., Seo, J., Moon, J.: Tapnet: neural network augmented with task-adaptive projection for few-shot learning. In: International Conference on Machine Learning, pp. 7115–7123. PMLR (2019)
Oreshkin, B.N., Rodriguez, P., Lacoste, A.: Tadam: task dependent adaptive metric for improved few-shot learning. arXiv preprint arXiv:1805.10123 (2018)
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: 5th International Conference on Learning Representations, ICLR 2017—Conference Track Proceedings, pp. 1–11 (2017)
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., De Freitas, N.: Learning to learn by gradient descent by gradient descent. Adv. Neural Inf. Process. Syst. 29, 3981–3989 (2016)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Li, Z., Zhou, F., Chen, F., Li, H.: Meta-sgd: learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017)
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)
Li, A., Huang, W., Lan, X., Feng, J., Li, Z., Wang, L.: Boosting few-shot learning with adaptive margin loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12576–12584 (2020)
Qiao, S., Liu, C., Shen, W., Yuille, A.L.: Few-shot image recognition by predicting parameters from activations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7229–7238 (2018)
Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Cross attention network for few-shot classification. Adv. Neural Inf. Process. Syst. 32 (2019)
Sun, Q., Liu, Y., Chua, T.-S., Schiele, B.: Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 403–412 (2019)
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29, 3630–3638 (2016)
Google Scholar
Wang, Y., Chao, W.-L., Weinberger, K.Q., van der Maaten, L.: Simpleshot: revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623 (2019)
Hui, B., Zhu, P., Hu, Q., Wang, Q.: Self-attention relation network for few-shot learning. In: 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 198–203. IEEE (2019)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)
Thrun, S., Pratt, L.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998)
Chapter MATH Google Scholar
Rusu, A.A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., Hadsell, R.: Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960 (2018)
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018)
Cai, Q., Pan, Y., Yao, T., Yan, C., Mei, T.: Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4080–4088 (2018)
Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, pp. 2554–2563. PMLR (2017)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16, 321–328 (2004)
Google Scholar
Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C.F., Huang, J.-B.: A closer look at few-shot classification. arXiv preprint arXiv:1904.04232 (2019)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447. PMLR (2019)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Park, J., Woo, S., Lee, J.-Y., Kweon, I.S.: Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Zhou, Q., Zhang, X., Zhang, Y.-D.: Ensemble learning with attention-based multiple instance pooling for classification of SPT. IEEE Trans. Circuits Syst. II Express Br. 69(3), 1927–1931 (2021)
Google Scholar
Wang, S.-H., Fernandes, S., Zhu, Z., Zhang, Y.-D.: AVNC: attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sens. J. 22(18), 17431–17438 (2021)
Article Google Scholar
Zhang, X., Qiang, Y., Sung, F., Yang, Y., Hospedales, T.: Relationnet2: deep comparison network for few-shot learning. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141 (2017)
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)
Gidaris, S., Komodakis, N.: Generating classification weights with gnn denoising autoencoders for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21–30 (2019)
Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K., Tenenbaum, J.B., Larochelle, H., Zemel, R.S.: Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Li, W., Wang, L., Xu, J., Huo, J., Gao, Y., Luo, J.: Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7260–7268 (2019)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Ye, H.-J., Hu, H., Zhan, D.-C., Sha, F.: Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8808–8817 (2020)
Tseng, H.-Y., Lee, H.-Y., Huang, J.-B., Yang, M.-H.: Cross-domain few-shot classification via learned feature-wise transformation. In: International Conference on Learning Representations (2020)
Liu, B., Cao, Y., Lin, Y., Li, Q., Zhang, Z., Long, M., Hu, H.: Negative margin matters: understanding margin in few-shot classification. In: European Conference on Computer Vision, pp. 438–455 (2020). Springer
Liu, C., Fu, Y., Xu, C., Yang, S., Li, J., Wang, C., Zhang, L.: Learning a few-shot embedding model with contrastive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8635–8643 (2021)
Ziko, I., Dolz, J., Granger, E., Ayed, I.B.: Laplacian regularized few-shot learning. In: International Conference on Machine Learning, pp. 11660–11670. PMLR (2020)

Download references

Funding

This work was supported by the National Science Foundation of Shanghai (No.22ZR1443700).

Author information

Authors and Affiliations

School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Guangpeng Wang & Yongxiong Wang

Authors

Guangpeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yongxiong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongxiong Wang.

Ethics declarations

Conflict of interest

The authors declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Funding information updated.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, G., Wang, Y. Self-attention network for few-shot learning based on nearest-neighbor algorithm. Machine Vision and Applications 34, 28 (2023). https://doi.org/10.1007/s00138-023-01375-5

Download citation

Received: 11 January 2022
Revised: 19 December 2022
Accepted: 13 January 2023
Published: 10 February 2023
DOI: https://doi.org/10.1007/s00138-023-01375-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-attention network for few-shot learning based on nearest-neighbor algorithm

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Learning to Prompt for Vision-Language Models

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Change history

14 March 2023

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-attention network for few-shot learning based on nearest-neighbor algorithm

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Learning to Prompt for Vision-Language Models

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Change history

14 March 2023

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation