Skip to main content
Log in

Light transformer learning embedding for few-shot classification with task-based enhancement

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The progress of the computer vision field is dependent on the large volume of labelled data, and it is a challenge to replicate these successes in real tasks with few labelled data. Fortunately, few-shot learning methods have made many promising attempts on few labelled data. In this paper, we propose a light transformer-based few-shot classification network under the framework of prototypical nets (PN) which has two distinctive hallmarks. First, we provide the local features combined with global features as the sample embedding, where the local features are gained by a CNN encoder and the global features are obtained by light transformer-based global information with a saliency detection structure (LT-GSE) simultaneously. Second, for each task, we use the class approximate degree as prior knowledge to interact with information among query samples at the category level, which makes the low-dimensional embedding space distribution more reasonable. The experimental results show that the proposed model can achieve 82.28% and 86.56% on the 5-way 5-shot classification task of mini ImageNet and tiered ImageNet respectively, which are the best performances of all comparable models. Moreover,few-shot task experiments on the Stanford Dogs and CUB-200 datasets also verify the superiority and robustness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Afrasiyabi A, Lalonde JF, Gagn’e C (2021) Mixture-based feature space learning for few-shot image classification. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 9021–9031

  2. Aimen A, Sidheekh S, Ladrecha B, Krishnan NC (2021) Task attended meta-learning for few-shot learning. In: Fifth workshop on meta-learning at the conference on neural information processing systems

  3. Berg A, O’Connor M, Cruz MT (2021) Keyword transformer: a self-attention model for keyword spotting. In: Interspeech, pp 4249–4253

  4. Chen WY, Liu YC, Kira Z, Wang YCF, Huang JB (2019) A closer look at few-shot classification. In: The seventh international conference on learning representations

  5. Chen Y, Liu Z, Xu H, Darrell T, Wang X (2021) Meta-baseline: exploring simple meta-learning for few-shot learning. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 9042–9051

  6. Chen Z, Fu Y, Wang YX, Ma L, Liu W, Hebert M (2019) Image deformation meta-networks for one-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8672–8681

  7. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR

  8. Erhan D, Bengio Y, Courville AC, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning. J Mach Learn Res 11:625–660

    MathSciNet  MATH  Google Scholar 

  9. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611

    Article  Google Scholar 

  10. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135

  11. Ge C, Liang Y, Song Y, Jiao J, Wang J, Luo P (2021) Revitalizing cnn attention via transformers in self-supervised visual representation learning. Adv Neural Inf Process Syst 34

  12. Gidaris S, Komodakis N (2019) Generating classification weights with gnn denoising autoencoders for few-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21–30

  13. Hansen N, Su H, Wang X (2021) Stabilizing deep q-learning with convnets and vision transformers under data augmentation. Adv Neural Inf Process Syst 34

  14. Hayashi T, Fujita H (2021) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Humanized Comput 12:1897–1911

    Article  Google Scholar 

  15. Hayashi T, Fujita H, Hernandez-Matamoros A (2021) Less complexity one-class classification approach using construction error of convolutional image transformation network. Inf Sci 560:217–234

    Article  MathSciNet  Google Scholar 

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  17. Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11,916–11,925

  18. Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with em routing. In: International conference on learning representations

  19. Hou R, Chang H, Ma B, Shan S, Chen X (2019) Cross attention network for few-shot classification. In: Advances in neural information processing systems, pp 4005–4016

  20. Ji W, Yan G, Li J, Piao Y, Yao S, Zhang M, Cheng L, Lu H (2022) Dmra: depth-induced multi-scale recurrent attention network for rgb-d saliency detection. IEEE Trans Image Process 31:2321–2336. https://doi.org/10.1109/TIP.2022.3154931

    Article  Google Scholar 

  21. Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC), vol 2

  22. Kim J, Kim T, Kim S, Yoo CD (2019) Edge-labeling graph neural network for few-shot learning. In: IEEE Conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp 11–20

  23. Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd international conference on machine learning (ICML), Lille France, 2015

  24. Kong L, Ding X, Chai X, Wang J, Li J (2022) Prototypical graph neural network for few-shot learning. In: Proceedings of 2021 Chinese intelligent systems conference, pp 580–586

  25. Laenen S, Bertinetto L (2021) On episodes, prototypical networks, and few-shot learning. Adv Neural Inf Process Syst:34

  26. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  27. Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665

  28. Li A, Luo T, Xiang T, Huang W, Wang L (2019) Few-shot learning with global class representations. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 9714–9723. https://doi.org/10.1109/ICCV.2019.00981

  29. Li H (2020) Pyramid spatial context features for salient object detection. IEEE Access 8:88,518–88,526

    Article  Google Scholar 

  30. Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp 1–10

  31. Li W, Wang L, Xu J, Huo J, Gao Y, Luo J (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, pp 7260–7268

  32. Liu Y, Schiele B, Sun Q (2020) An ensemble of epoch-wise empirical bayes for few-shot learning. In: European conference on computer vision, lecture notes in computer science, vol 12361, pp 404–421

  33. Metz L, Maheswaranathan N, Cheung B, Sohl-dickstein J (2019) Meta-learning update rules for unsupervised representation learning. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019

  34. Montabone S, Soto A (2010) Human detection using a mobile platform and novel features derived from a visual saliency mechanism. Image Vis Comput 28(3):391–402

    Article  Google Scholar 

  35. Niu Z, Zhong G, Yu H (2021) A review on the attention mechanism of deep learning. Neurocomputing 452:48–62

    Article  Google Scholar 

  36. Pan Y, Yao T, Li Y, Wang Y, Ngo C, Mei T (2019) Transferrable prototypical networks for unsupervised domain adaptation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp 2239–2247

  37. Park J, Dong IK, Choi B, Kang W, Kwon HW (2020) Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks. Sci Rep 10(1): 1012

    Article  Google Scholar 

  38. Rauber PE, Falcȧo AX, Telea AC (2016) Visualizing time-dependent data using dynamic t-sne. In: Eurographics conference on visualization, pp 73–77

  39. Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings

  40. Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2019) Meta-learning with latent embedding optimization. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019

  41. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 3856–3866

  42. Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 4077–4087

  43. Sun Q, Liu Y, Chua T, Schiele B (2019) Meta-transfer learning for few-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, pp 403–412

  44. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp 1199–1208

  45. Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: a good embedding is all you need?. In: European conference on computer vision, lecture notes in computer science, vol 12359, pp 266–282

  46. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jėgou H (2021) Training data-efficient image transformers & distillation through attention. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, Virtual event, proceedings of machine learning research, vol 139, pp 10,347–10,357

  47. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5998–6008

  48. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp 3630–3638

  49. Wang P, Wang X, Luo H, Zhou J, Zhou Z, Wang F, Li H, Jin R (2022) Scaled relu matters for training vision transformers. In: AAAI

  50. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,531–11,539. https://doi.org/10.1109/CVPR42600.2020.01155

  51. Wang W, Zhao S, Shen J, Hoi SC, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1448–1457

  52. Wei X, Wang P, Liu L, Shen C, Wu J (2019) Piecewise classifier mappings: learning fine-grained learners for novel categories with few examples. IEEE Trans Image Process 28(12):6116– 6125

    Article  MathSciNet  MATH  Google Scholar 

  53. Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech-ucsd birds 200

  54. Xing EP, Ng AY, Jordan MI, Russell SJ (2002) Distance metric learning with application to clustering with side-information. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15 [neural information processing systems, NIPS 2002, December 9–14, 2002, Vancouver, British Columbia, Canada], pp 505–512

  55. Xu Y, Zhang Q, Zhang J, Tao D (2021) Vitae: vision transformer advanced by exploring intrinsic inductive bias. In: Advances in neural information processing systems

  56. Yang S, Liu L, Xu M (2021) Free lunch for few-shot learning: distribution calibration. In: ICLR

  57. Yang Z, Zhu L, Wu Y, Yang Y (2020) Gated channel transformation for visual recognition. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition, pp 11,791–11,800

  58. Ye HJ, Hu H, Zhan DC, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8805–8814

  59. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567

  60. Zhang J, Fan DP, Dai Y, Anwar S, Saleh F, Aliakbarian S, Barnes N (2021) Uncertainty inspired rgb-d saliency detection. IEEE Trans Pattern Anal Mach Intell

  61. Zhang J, Zhang T, Dai Y, Harandi M, Hartley R (2018) Deep unsupervised saliency detection: a multiple noisy labeling perspective. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9029–9038

  62. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers

Download references

Acknowledgements

This study was funded by the Natural science foundation of Liaoning province (No. 2020-MS-080), and the National key R&D program of China (No. 2017YFF0108800).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hegui Zhu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, H., Zhao, R., Gao, Z. et al. Light transformer learning embedding for few-shot classification with task-based enhancement. Appl Intell 53, 7970–7987 (2023). https://doi.org/10.1007/s10489-022-03951-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03951-0

Keywords

Navigation