Light transformer learning embedding for few-shot classification with task-based enhancement

Zhu, Hegui; Zhao, Rong; Gao, Zhan; Tang, Qingsong; Jiang, Wuming

doi:10.1007/s10489-022-03951-0

Light transformer learning embedding for few-shot classification with task-based enhancement

Published: 01 August 2022

Volume 53, pages 7970–7987, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hegui Zhu ORCID: orcid.org/0000-0002-6501-4097¹,
Rong Zhao¹,
Zhan Gao¹,
Qingsong Tang¹ &
…
Wuming Jiang²

409 Accesses
1 Citation
Explore all metrics

Abstract

The progress of the computer vision field is dependent on the large volume of labelled data, and it is a challenge to replicate these successes in real tasks with few labelled data. Fortunately, few-shot learning methods have made many promising attempts on few labelled data. In this paper, we propose a light transformer-based few-shot classification network under the framework of prototypical nets (PN) which has two distinctive hallmarks. First, we provide the local features combined with global features as the sample embedding, where the local features are gained by a CNN encoder and the global features are obtained by light transformer-based global information with a saliency detection structure (LT-GSE) simultaneously. Second, for each task, we use the class approximate degree as prior knowledge to interact with information among query samples at the category level, which makes the low-dimensional embedding space distribution more reasonable. The experimental results show that the proposed model can achieve 82.28% and 86.56% on the 5-way 5-shot classification task of mini ImageNet and tiered ImageNet respectively, which are the best performances of all comparable models. Moreover,few-shot task experiments on the Stanford Dogs and CUB-200 datasets also verify the superiority and robustness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Proposal-Improved Few-Shot Embedding Model with Contrastive Learning

Object-Aware Attention in Few-Shot Learning

Critical direction projection networks for few-shot learning

Article 12 August 2021

References

Afrasiyabi A, Lalonde JF, Gagn’e C (2021) Mixture-based feature space learning for few-shot image classification. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 9021–9031
Aimen A, Sidheekh S, Ladrecha B, Krishnan NC (2021) Task attended meta-learning for few-shot learning. In: Fifth workshop on meta-learning at the conference on neural information processing systems
Berg A, O’Connor M, Cruz MT (2021) Keyword transformer: a self-attention model for keyword spotting. In: Interspeech, pp 4249–4253
Chen WY, Liu YC, Kira Z, Wang YCF, Huang JB (2019) A closer look at few-shot classification. In: The seventh international conference on learning representations
Chen Y, Liu Z, Xu H, Darrell T, Wang X (2021) Meta-baseline: exploring simple meta-learning for few-shot learning. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 9042–9051
Chen Z, Fu Y, Wang YX, Ma L, Liu W, Hebert M (2019) Image deformation meta-networks for one-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8672–8681
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR
Erhan D, Bengio Y, Courville AC, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning. J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611
Article Google Scholar
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135
Ge C, Liang Y, Song Y, Jiao J, Wang J, Luo P (2021) Revitalizing cnn attention via transformers in self-supervised visual representation learning. Adv Neural Inf Process Syst 34
Gidaris S, Komodakis N (2019) Generating classification weights with gnn denoising autoencoders for few-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21–30
Hansen N, Su H, Wang X (2021) Stabilizing deep q-learning with convnets and vision transformers under data augmentation. Adv Neural Inf Process Syst 34
Hayashi T, Fujita H (2021) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Humanized Comput 12:1897–1911
Article Google Scholar
Hayashi T, Fujita H, Hernandez-Matamoros A (2021) Less complexity one-class classification approach using construction error of convolutional image transformation network. Inf Sci 560:217–234
Article MathSciNet Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11,916–11,925
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with em routing. In: International conference on learning representations
Hou R, Chang H, Ma B, Shan S, Chen X (2019) Cross attention network for few-shot classification. In: Advances in neural information processing systems, pp 4005–4016
Ji W, Yan G, Li J, Piao Y, Yao S, Zhang M, Cheng L, Lu H (2022) Dmra: depth-induced multi-scale recurrent attention network for rgb-d saliency detection. IEEE Trans Image Process 31:2321–2336. https://doi.org/10.1109/TIP.2022.3154931
Article Google Scholar
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC), vol 2
Kim J, Kim T, Kim S, Yoo CD (2019) Edge-labeling graph neural network for few-shot learning. In: IEEE Conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp 11–20
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd international conference on machine learning (ICML), Lille France, 2015
Kong L, Ding X, Chai X, Wang J, Li J (2022) Prototypical graph neural network for few-shot learning. In: Proceedings of 2021 Chinese intelligent systems conference, pp 580–586
Laenen S, Bertinetto L (2021) On episodes, prototypical networks, and few-shot learning. Adv Neural Inf Process Syst:34
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665
Li A, Luo T, Xiang T, Huang W, Wang L (2019) Few-shot learning with global class representations. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 9714–9723. https://doi.org/10.1109/ICCV.2019.00981
Li H (2020) Pyramid spatial context features for salient object detection. IEEE Access 8:88,518–88,526
Article Google Scholar
Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp 1–10
Li W, Wang L, Xu J, Huo J, Gao Y, Luo J (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, pp 7260–7268
Liu Y, Schiele B, Sun Q (2020) An ensemble of epoch-wise empirical bayes for few-shot learning. In: European conference on computer vision, lecture notes in computer science, vol 12361, pp 404–421
Metz L, Maheswaranathan N, Cheung B, Sohl-dickstein J (2019) Meta-learning update rules for unsupervised representation learning. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019
Montabone S, Soto A (2010) Human detection using a mobile platform and novel features derived from a visual saliency mechanism. Image Vis Comput 28(3):391–402
Article Google Scholar
Niu Z, Zhong G, Yu H (2021) A review on the attention mechanism of deep learning. Neurocomputing 452:48–62
Article Google Scholar
Pan Y, Yao T, Li Y, Wang Y, Ngo C, Mei T (2019) Transferrable prototypical networks for unsupervised domain adaptation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp 2239–2247
Park J, Dong IK, Choi B, Kang W, Kwon HW (2020) Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks. Sci Rep 10(1): 1012
Article Google Scholar
Rauber PE, Falcȧo AX, Telea AC (2016) Visualizing time-dependent data using dynamic t-sne. In: Eurographics conference on visualization, pp 73–77
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2019) Meta-learning with latent embedding optimization. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 3856–3866
Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 4077–4087
Sun Q, Liu Y, Chua T, Schiele B (2019) Meta-transfer learning for few-shot learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, pp 403–412
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp 1199–1208
Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: a good embedding is all you need?. In: European conference on computer vision, lecture notes in computer science, vol 12359, pp 266–282
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jėgou H (2021) Training data-efficient image transformers & distillation through attention. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, Virtual event, proceedings of machine learning research, vol 139, pp 10,347–10,357
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5998–6008
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp 3630–3638
Wang P, Wang X, Luo H, Zhou J, Zhou Z, Wang F, Li H, Jin R (2022) Scaled relu matters for training vision transformers. In: AAAI
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,531–11,539. https://doi.org/10.1109/CVPR42600.2020.01155
Wang W, Zhao S, Shen J, Hoi SC, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1448–1457
Wei X, Wang P, Liu L, Shen C, Wu J (2019) Piecewise classifier mappings: learning fine-grained learners for novel categories with few examples. IEEE Trans Image Process 28(12):6116– 6125
Article MathSciNet MATH Google Scholar
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech-ucsd birds 200
Xing EP, Ng AY, Jordan MI, Russell SJ (2002) Distance metric learning with application to clustering with side-information. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15 [neural information processing systems, NIPS 2002, December 9–14, 2002, Vancouver, British Columbia, Canada], pp 505–512
Xu Y, Zhang Q, Zhang J, Tao D (2021) Vitae: vision transformer advanced by exploring intrinsic inductive bias. In: Advances in neural information processing systems
Yang S, Liu L, Xu M (2021) Free lunch for few-shot learning: distribution calibration. In: ICLR
Yang Z, Zhu L, Wu Y, Yang Y (2020) Gated channel transformation for visual recognition. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition, pp 11,791–11,800
Ye HJ, Hu H, Zhan DC, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8805–8814
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567
Zhang J, Fan DP, Dai Y, Anwar S, Saleh F, Aliakbarian S, Barnes N (2021) Uncertainty inspired rgb-d saliency detection. IEEE Trans Pattern Anal Mach Intell
Zhang J, Zhang T, Dai Y, Harandi M, Hartley R (2018) Deep unsupervised saliency detection: a multiple noisy labeling perspective. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9029–9038
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers

Download references

Acknowledgements

This study was funded by the Natural science foundation of Liaoning province (No. 2020-MS-080), and the National key R&D program of China (No. 2017YFF0108800).

Author information

Authors and Affiliations

College of Sciences, Northeastern University, Shenyang, 110819, China
Hegui Zhu, Rong Zhao, Zhan Gao & Qingsong Tang
Beijing EyeCool Technology Co., Ltd, Beijing, 100085, China
Wuming Jiang

Authors

Hegui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Rong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Qingsong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Wuming Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hegui Zhu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, H., Zhao, R., Gao, Z. et al. Light transformer learning embedding for few-shot classification with task-based enhancement. Appl Intell 53, 7970–7987 (2023). https://doi.org/10.1007/s10489-022-03951-0

Download citation

Accepted: 30 June 2022
Published: 01 August 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03951-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Light transformer learning embedding for few-shot classification with task-based enhancement

Abstract

Access this article

Similar content being viewed by others

A Proposal-Improved Few-Shot Embedding Model with Contrastive Learning

Object-Aware Attention in Few-Shot Learning

Critical direction projection networks for few-shot learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Light transformer learning embedding for few-shot classification with task-based enhancement

Abstract

Access this article

Similar content being viewed by others

A Proposal-Improved Few-Shot Embedding Model with Contrastive Learning

Object-Aware Attention in Few-Shot Learning

Critical direction projection networks for few-shot learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation