Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer

Zhang, Hongguang; Torr, Philip H. S.; Koniusz, Piotr

doi:10.1007/978-3-031-26348-4_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13845))

Included in the following conference series:

Asian Conference on Computer Vision

430 Accesses

Abstract

Current few-shot learning models capture visual object relations in the so-called meta-learning setting under a fixed-resolution input. However, such models have a limited generalization ability under the scale and location mismatch between objects, as only few samples from target classes are provided. Therefore, the lack of a mechanism to match the scale and location between pairs of compared images leads to the performance degradation. The importance of image contents varies across coarse-to-fine scales depending on the object and its class label, e.g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized visual patterns. In this paper, we study the impact of scale and location mismatch in the few-shot learning scenario, and propose a novel Spatially-aware Matching (SM) scheme to effectively perform matching across multiple scales and locations, and learn image relations by giving the highest weights to the best matching pairs. The SM is trained to activate the most related locations and scales between support and query data. We apply and evaluate SM on various few-shot learning models and backbones for comprehensive evaluations. Furthermore, we leverage an auxiliary self-supervisory discriminator to train/predict the spatial- and scale-level index of feature vectors we use. Finally, we develop a novel transformer-based pipeline to exploit self- and cross-attention in a spatially-aware matching process. Our proposed design is orthogonal to the choice of backbone and/or comparator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sparse spatial transformers for few-shot learning

Article 13 October 2023

Self-Supervised Contrastive Learning for Consistent Few-Shot Image Representations

PANet: Pluralistic Attention Network for Few-Shot Image Classification

Article Open access 29 June 2024

References

Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NIPS, pp. 3630–3638 (2016)
Google Scholar
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS, pp. 4077–4087 (2017)
Google Scholar
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning. arXiv preprint arXiv:1711.06025 (2017)
Zhang, H., Koniusz, P.: Power normalizing second-order similarity network for few-shot learning. In: WACV, pp. 1185–1193. IEEE (2019)
Google Scholar
Porikli, F., Tuzel, O.: Covariance tracker. In: CVPR (2006)
Google Scholar
Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. Trans. Imgage Process. 22, 2479–2494 (2013)
Article MathSciNet MATH Google Scholar
Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_32
Chapter Google Scholar
Koniusz, P., Yan, F., Gosselin, P.H., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. IEEE Trans. Pattern Anal. Mach. Intell. 39, 313–326 (2017)
Article Google Scholar
Koniusz, P., Zhang, H., Porikli, F.: A deeper look at power normalizations. In: CVPR, pp. 5774–5783 (2018)
Google Scholar
Wertheimer, D., Hariharan, B.: Few-shot learning with localization in realistic settings. In: CVPR, pp. 6558–6567 (2019)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR 2, 2169–2178 (2006)
Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, pp. 1794–1801 (2009)
Google Scholar
Miller, E.G., Matsakis, N.E., Viola, P.A.: Learning from one example through shared densities on transforms. In: CVPR, vol. 1, pp. 464–471 (2000)
Google Scholar
Li, F.F., VanRullen, R., Koch, C., Perona, P.: Rapid natural scene categorization in the near absence of attention. Proc. Natl. Acad. Sci. 99, 9596–9601 (2002)
Article Google Scholar
Fink, M.: Object classification from a single example utilizing class relevance metrics. In: NIPS, pp. 449–456 (2005)
Google Scholar
Bart, E., Ullman, S.: Cross-generalization: Learning novel classes from a single example by feature replacement. In: CVPR, pp. 672–679 (2005)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006)
Article Google Scholar
Lake, B.M., Salakhutdinov, R., Gross, J., Tenenbaum, J.B.: One shot learning of simple visual concepts. CogSci (2011)
Google Scholar
Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp. 1126–1135 (2017)
Google Scholar
Garcia, V., Bruna, J.: Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043 (2017)
Rusu, A.A., et al.: Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960 (2018)
Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: CVPR, pp. 4367–4375 (2018)
Google Scholar
Zhang, H., Zhang, J., Koniusz, P.: Few-shot learning via saliency-guided hallucination of samples. In: CVPR, pp. 2770–2779 (2019)
Google Scholar
Kim, J., Kim, T., Kim, S., Yoo, C.D.: Edge-labeling graph neural network for few-shot learning. In: CVPR (2019)
Google Scholar
Gidaris, S., Komodakis, N.: Generating classification weights with GNN denoising autoencoders for few-shot learning. In: CVPR (2019)
Google Scholar
Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Cross attention network for few-shot classification. In: NeurIPS, vol. 32 (2019)
Google Scholar
Wu, Z., Li, Y., Guo, L., Jia, K.: Parn: Position-aware relation networks for few-shot learning. In: ICCV, pp. 6659–6667 (2019)
Google Scholar
Hao, F., He, F., Cheng, J., Wang, L., Cao, J., Tao, D.: Collect and select: semantic alignment metric learning for few-shot learning. In: CVPR, pp. 8460–8469 (2019)
Google Scholar
Li, W., Wang, L., Xu, J., Huo, J., Gao, Y., Luo, J.: Revisiting local descriptor based image-to-class measure for few-shot learning. In: CVPR, pp. 7260–7268 (2019)
Google Scholar
Zhang, H., Li, H., Koniusz, P.: Multi-level second-order few-shot learning. IEEE Trans. Multimed. 99, 1–16 (2022)
Google Scholar
Ni, G., Zhang, H., Zhao, J., Xu, L., Yang, W., Lan, L.: ANF: attention-based noise filtering strategy for unsupervised few-shot classification. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds.) PRICAI 2021. LNCS (LNAI), vol. 13033, pp. 109–123. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89370-5_9
Chapter Google Scholar
Simon, C., Koniusz, P., Nock, R., Harandi, M.: On modulating the gradient for meta-learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 556–572. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_33
Chapter Google Scholar
Sun, K., Koniusz, P., Wang, Z.: Fisher-Bures adversary graph convolutional networks. UAI 115, 465–475 (2019)
Google Scholar
Zhu, H., Koniusz, P.: Simple spectral graph convolution. In: ICLR (2021)
Google Scholar
Zhu, H., Sun, K., Koniusz, P.: Contrastive Laplacian eigenmaps. In: NeurIPS, pp. 5682–5695 (2021)
Google Scholar
Zhang, Y., Zhu, H., Meng, Z., Koniusz, P., King, I.: Graph-adaptive rectified linear unit for graph neural networks. In: The Web Conference (WWW), pp. 1331–1339. ACM (2022)
Google Scholar
Zhang, Y., Zhu, H., Song, Z., Koniusz, P., King, I.: COSTA: covariance-preserving feature augmentation for graph contrastive learning. In: KDD, pp. 2524–2534. ACM (2022)
Google Scholar
Wang, L., Liu, J., Koniusz, P.: 3d skeleton-based few-shot action recognition with JEANIE is not so naïve. arXiv preprint arXiv: 2112.12668 (2021)
Wang, L., Koniusz, P.: Temporal-viewpoint transportation plan for skeletal few-shot action recognition. In: ACCV (2022)
Google Scholar
Wang, L., Koniusz, P.: Uncertainty-DTW for time series and sequences. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision–ECCV 2022. ECCV 2022. LNCS, vol. 13681, pp. 176–195. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19803-8_11
Kang, D., Kwon, H., Min, J., Cho, M.: Relational embedding for few-shot classification. In: ICCV, pp. 8822–8833 (2021)
Google Scholar
Zhu, H., Koniusz, P.: EASE: unsupervised discriminant subspace learning for transductive few-shot learning. In: CVPR (2022)
Google Scholar
Lu, C., Koniusz, P.: Few-shot keypoint detection with uncertainty learning for unseen species. In: CVPR (2022)
Google Scholar
Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 589–600. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_45
Chapter Google Scholar
Koniusz, P., Cherian, A., Porikli, F.: Tensor representations via kernel linearization for action recognition from 3d skeletons. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 37–53. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_3
Chapter Google Scholar
Koniusz, P., Wang, L., Cherian, A.: Tensor representations for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44, 648–665 (2022)
Article Google Scholar
Koniusz, P., Tas, Y., Porikli, F.: Domain adaptation by mixture of alignments of second-or higher-order scatter tensors. In: CVPR, vol. 2 (2017)
Google Scholar
Tas, Y., Koniusz, P.: CNN-based action recognition and supervised domain adaptation on 3d body skeletons via kernel feature maps. In: BMVC, p. 158. BMVA Press (2018)
Google Scholar
Zhang, H., Koniusz, P., Jian, S., Li, H., Torr, P.H.S.: Rethinking class relations: absolute-relative supervised and unsupervised few-shot learning. In: CVPR, pp. 9432–9441 (2021)
Google Scholar
Koniusz, P., Zhang, H.: Power normalizations in fine-grained image, few-shot image and graph classification. IEEE Trans. Pattern Anal. Mach. Intell. 44, 591–609 (2022)
Article Google Scholar
Zhang, S., Luo, D., Wang, L., Koniusz, P.: Few-shot object detection by second-order pooling. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12625, pp. 369–387. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69538-5_23
Chapter Google Scholar
Yu, X., Zhuang, Z., Koniusz, P., Li, H.: 6DoF object pose estimation via differentiable proxy voting regularizer. In: BMVC, BMVA Press (2020)
Google Scholar
Zhang, S., Wang, L., Murray, N., Koniusz, P.: Kernelized few-shot object detection with efficient integral aggregation. In: CVPR, pp. 19207–19216 (2022)
Google Scholar
Zhang, S., Murray, N., Wang, L., Koniusz, P.: Time-rEversed DiffusioN tEnsor Transformer: a new TENET of few-shot object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision–ECCV 2022. ECCV 2022. LNCS, vol. 13680, pp. 310–328. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20044-1_18
Simon, C., Koniusz, P., Harandi, M.: On learning the geodesic path for incremental learning. In: CVPR, pp. 1591–1600 (2021)
Google Scholar
Doersch, C., Gupta, A., Zisserman, A.: Crosstransformers: spatially-aware few-shot transfer. arXiv preprint arXiv:2007.11498 (2020)
Antoniou, A., Edwards, H., Storkey, A.: How to train your maml. arXiv preprint arXiv:1810.09502 (2018)
Oreshkin, B., Lopez, P.R., Lacoste, A.: TADAM: task dependent adaptive metric for improved few-shot learning. In: Advances in Neural Information Processing Systems, pp. 721–731 (2018)
Google Scholar
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: CVPR, pp. 10657–10665 (2019)
Google Scholar
Zhang, C., Cai, Y., Lin, G., Shen, C.: DeepEMD: few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: CVPR, pp. 12203–12213 (2020)
Google Scholar
Triantafillou, E., et al.: Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096 (2019)
Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. In: ICLR (2018)
Google Scholar
Koniusz, P., Tas, Y., Zhang, H., Harandi, M., Porikli, F., Zhang, R.: Museum exhibit identification challenge for the supervised domain adaptation and beyond. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 815–833. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_48
Chapter Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29
Chapter Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Fundation of China (No. 62106282), and Young Elite Scientists Sponsorship Program by CAST (No. 2021-JCJQ-QT-038). Code: https://github.com/HongguangZhang/smfsl-master.

Author information

Authors and Affiliations

Systems Engineering Institute, AMS, Shanghai, China
Hongguang Zhang
Data61/CSIRO, Sydney, Australia
Piotr Koniusz
Australian National University, Canberra, Australia
Piotr Koniusz
Oxford University, Oxford, UK
Philip H. S. Torr

Authors

Hongguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Philip H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Koniusz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongguang Zhang .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, H., Torr, P.H.S., Koniusz, P. (2023). Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13845. Springer, Cham. https://doi.org/10.1007/978-3-031-26348-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-26348-4_1
Published: 09 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26347-7
Online ISBN: 978-3-031-26348-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer