Skip to main content
Log in

Improved local-feature-based few-shot learning with Sinkhorn metrics

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Local-feature-based Few-Shot Learning (FSL) has attracked a lot of attention and achieved great progress recently. Given an image, the model extracts a group of local features through the Fully Convolutional Network (FCN), each of which contains information from the corresponding receptive field of the image. The challenging problem is that how to exploit the local-feature-level similarities to generate the image-level similarity. Towards this, many existing works have proposed different heuristic rules or settings. In this paper, we first follow existing works and systematically propose several modified methods for local feature matching, induced by a novel and improved heterogeneous matching mechanism. However, these heuristic methods are not optimal to highlight the most informative local feature pairs to represent the image-level similarity, and also can not generalize well to different tasks. Therefore, we propose a new idea called Sinkhorn Metrics (SM). We consider the local-feature-based FSL as the Regularized Optimal Transport (ROT) problem. The cost matrix is formed by the similarities of local feature pairs. The marginals indicating the importance of each local feature are obtained by a new attentive cross-comparison module. The optimal transportation plan is used as weights to aggregate all the local-feature-level similarities to obtain the image-level similarity. We exploit the Sinkhorn algorithm to solve the ROT problem, which is efficient for the end-to-end training. We conduct a hybrid experiment on SM with some heuristic baselines to demonstrate its compatibility. Extensive ablation studies are performed to fully evaluate important hyper-parameters and settings. Our method achieves a series of state-of-the-arts on multiple datasets in both the single-domain and cross-domain FSL scenarios (The code for evaluation, trained model, and datasets in this study are available at https://github.com/Wangduo428/few-shot-learning-SM).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Krizhevsky A, Sutskever I, and Hinton GE (2012) Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, NeurIPS, pp 1097–1105

  2. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, In: IEEE conference on computer vision and pattern recognition, CVPR, pp 770–778

  3. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  4. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks In: International Conference on Machine Learning, ICML, vol. 70, pp 1126–1135

  5. Antoniou A, Edwards H, Storkey AJ (2019) How to train your MAML. In: International conference on learning representations, ICLR, 2019

  6. Jamal MA, Qi G (2019) Task agnostic meta-learning for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR, 2019, pp 11 719–11 727

  7. Vinyals O, Blundell C, Lillicrap T, Wierstra D et al. (2016) Matching networks for one shot learning, in Advances in Neural Information Processing Systems, NeurIPS, pp 3630–3638

  8. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning, in Advances in Neural Information Processing Systems, NeurIPS, pp 4077–4087

  9. Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition, In: ICML Deep Learning Workshop, vol. 2

  10. Li W,  Wang L, Xu J, Huo J, Gao Y, Luo J (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR, pp 7260–7268

  11. Wu Z, Li Y, Guo L, Jia K (2019) Parn: position-aware relation networks for few-shot learning. In: IEEE international conference on computer vision, ICCV, October

  12. Hou R, Chang H, Ma B, Shan S, Chen X (2019) Cross attention network for few-shot classification. In: Advances in Neural Information Processing Systems, NeurIPS, pp 4005–4016

  13. Dong C, Li W, Huo J, Gu Z, Gao Y (2020) Learning task-aware local representations for few-shot learning. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20, pp 716–722

  14. Xue Z., Duan L, Li W, Chen L, Luo J (2020) Region comparison network for interpretable few-shot image classification, ArXiv, vol. abs/2009.03558,

  15. Sinkhorn R, Knopp P (1967) Concerning nonnegative matrices and doubly stochastic matrices, Pac J Math 21(2)

  16. Sinkhorn R (1974) Diagonal equivalence to matrices with prescribed row and column sums ii. Proc Am Math Soc 45(2):195–198

    Article  MathSciNet  Google Scholar 

  17. Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: International Conference on Learning Representations, ICLR

  18. Jiang X, Havaei M, Varno F, Chartrand G, Chapados N, Matwin S (2019) Learning to learn with conditional class dependencies. In: International conference on learning representations, ICLR,

  19. Sun Q, Liu Y, Chua T, Schiele B, Meta-transfer learning for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR, 2019, pp 403–412

  20. Gidaris S, Komodakis N (2018) Dynamic few-shot visual learning without forgetting, In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4367–4375

  21. S. Qiao, C. Liu, W. Shen, and A. L. Yuille, Few-shot image recognition by predicting parameters from activations, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 7229–7238

  22. Yoon SW,  Seo J, Moon J (2019) Tapnet: Neural network augmented with task-adaptive projection for few-shot learning, In: International Conference on Machine Learning, ICML, 97:7115–7123

  23. Gidaris S,  Komodakis N (2019) Generating classification weights with gnn denoising autoencoders for few-shot learning, In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  24. Guo Y, Cheung N-M (2020) Attentive weights generation for few shot learning via information maximization, In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  25.  Hariharan B, Girshick RB (2017) Low-shot visual recognition by shrinking and hallucinating features, In: IEEE International Conference on Computer Vision, ICCV, pp 3037–3046

  26. Schwartz E, Karlinsky L, Shtok J, Harary S, Marder M, Kumar A, Feris RS, Giryes R, Bronstein AM (2018) Delta-encoder: an effective sample synthesis method for few-shot object recognition, In: Advances in Neural Information Processing Systems, NeurIPS, pp. 2850–2860

  27. Chen Z, Fu Y, Chen K,  Jiang Y (2019) Image block augmentation for one-shot learning, in AAAI Conference on Artificial Intelligence, AAAI, , pp. 3379–3386

  28. Chen Z, Fu Y, Wang Y, Ma L, Liu W, Hebert M (2019) Image deformation meta-networks for one-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR , pp 8680–8689

  29. Tsutsui S, Fu Y, Crandall DJ (2019) Meta-reinforced synthetic data for one-shot fine-grained visual recognition, in Advances in Neural Information Processing Systems, NeurIPS, , pp. 3057–3066

  30. Mangla P, Singh M, Sinha A, Kumari N, Balasubramanian VN,  Krishnamurthy B (2020) Charting the right manifold: Manifold mixup for few-shot learning, in IEEE Winter Conference on Applications of Computer Vision, WACV, pp 2207–2216

  31. Sung Y, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: Relation network for few-shot learning, In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR , pp. 1199–1208

  32. Mehrotra A,  Dukkipati A ( 2017) Generative Adversarial Residual Pairwise Networks for One Shot Learning, arXiv e-prints

  33. Satorras VG, Estrach JB ( 2018) Few-shot learning with graph neural networks, In:International Conference on Learning Representations, ICLR

  34. YangL, Li L, Zhang Z,  Zhou X, Zhou E, Liu Y (2020) Dpgn: Distribution propagation graph network for few-shot learning. In:IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  35. Ma Y, Bai S, An S, Liu W, Liu A, Zhen X, Liu X Transductive relation-propagation network for few-shot learning. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2020, pp. 804–810

  36. Fei N,  Lu Z,  Gao Y, Tian J, Xiang T, Wen J.-R Meta-learning across meta-tasks for few-shot learning, arXiv e-prints, 2020

  37. Guan J, Lu Z, Xiang T, Wen J-R (2020) Few-shot learning as domain adaptation: Algorithm and analysis, arXiv e-prints

  38. Ye H-J, Hu H, Zhan D-C,  Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR , pp 8808–8817

  39. C. Xing, N. Rostamzadeh, B. N. Oreshkin, and P. O. Pinheiro, Adaptive cross-modal few-shot learning, in Advances in Neural Information Processing Systems, NeurIPS, 2019, pp. 4848–4858

  40. Li A,  Huang W, Lan X, Feng J, Li Z,. Wang L (2020) Boosting few-shot learning with adaptive margin loss, In:IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  41. Chen W,  Liu Y, Kira Z, Wang YF, Huang J (2019) A closer look at few-shot classification. In:International Conference on Learning Representations, ICLR

  42. Tseng H, Lee H, Huang J, Yang M (2020) Cross-domain few-shot classification via learned feature-wise transformation, In: International Conference on Learning Representations, ICLR

  43. Sun J, Lapuschkin S, Samek W, Zhao Y, Cheung N-M, Binder A (2020) Explanation-Guided Training for Cross-Domain Few-Shot Classification, arXiv e-prints,

  44. Lifchitz Y, Avrithis Y, Picard S, Bursuc A (2019) Dense classification and implanting for few-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 9258–9267

  45. Chu W-H, . Li Y-J, Chang J-C, Wang Y-CF (2019) Spot and learn: A maximum-entropy patch sampler for few-shot image classification, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  46. Zhang C,  Cai Y, Lin G, Shen C (2020) Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers, In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 12 203–12 213

  47. Karlinsky L, Shtok J, Alfassy A, Lichtenstein M, Harary S, Schwartz E, Doveh S, Sattigeri P,  Feris R, Bronstein A et al. (2020) Starnet: towards weakly supervised few-shot detection and explainable few-shot classification, arXiv e-print

  48. Wang Y, Chao W, Weinberger KQ, van der Maaten L (2019) Simpleshot: Revisiting nearest-neighbor classification for few-shot learning, arXiv e-prints

  49.  Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: A good embedding is all you need? In: ECCV, pp 266–282

  50. Oreshkin B, Rodríguez López P,  Lacoste A (2018) Tadam: Task dependent adaptive metric for improved few-shot learning, In: Advances in Neural Information Processing Systems, NeurIPS, pp 719–729

  51. Scott T, Ridgeway K, Mozer MC (2018) Adapted deep embeddings: A synthesis of methods for k-shot inductive transfer learning, in Advances in Neural Information Processing Systems, NeurIPS, pp. 76–85

  52. Wang D, Cheng Y, Yu M, Guo X, Zhang T (2019) A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing 349:202–211

    Article  Google Scholar 

  53.  Li H, Eigen D, Dodge S, Zeiler M, Wang X ( 2019) Finding task-relevant features for few-shot learning by category traversal, In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR pp 1–10

  54.  Simon C,  Koniusz P, Nock R, and Harandi M (2020) Adaptive subspaces for few-shot learning, In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  55. Lichtenstein M, Sattigeri P, Feris R, Giryes R, Karlinsky L (2020) Tafssl: Task-adaptive feature sub-space learning for few-shot classification, In:ECCV, pp 522–539

  56. Vallender S (1974) Calculation of the wasserstein distance between probability distributions on the line. Theory of Probability & Its Applications 18(4):784–786

    Article  Google Scholar 

  57. Cuturi M (2013) Sinkhorn distances: Lightspeed computation of optimal transportation distances, Advances in Neural Information Processing Systems, NeurIPS, 26:2292–2300

  58. Chizat L, Roussillon P, Léger F, Vialard F-X, Peyré G (2020) Faster wasserstein distance estimation with the sinkhorn divergence, arXiv preprint:arXiv:2006.08172

  59. Scetbon M, Cuturi M (2020) Linear time sinkhorn divergences using positive features, arXiv preprint:arXiv:2006.07057

  60. Rubner Y, Guibas LJ, Tomasi C (1997) The earth mover’s distance, multi-dimensional scaling, and color-based image retrieval, In:Proceedings of the ARPA image understanding workshop, 661:668

  61. Li P (2013) Tensor-sift based earth mover’s distance for contour tracking. Journal of mathematical imaging and vision 46(1):44–65

    Article  MathSciNet  Google Scholar 

  62. Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking, In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6951–6960

  63. Zhao Q, Yang Z, Tao H (2008) Differential earth mover’s distance with its applications to visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(2):274–287

    Article  Google Scholar 

  64. Liu S, Li Z, Sun J, Self-emd: Self-supervised object detection without imagenet, arXiv preprintarXiv:2011.13677, 2020

  65. Koltchinskii V, Panchenko D et al (2002) Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics 30(1):1–50

    Article  MathSciNet  Google Scholar 

  66. Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. in International Conference on Learning Representations, ICLR

  67. Wah C, Branson S,  Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200–2011 dataset

  68. Krause J, Stark M,  Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: IEEE International Conference on Computer Vision Workshops ICCV pp 554–561

  69. Zhou B, Lapedriza À, Khosla A, Oliva A, Torralba A (2018) Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6):1452–1464

    Article  Google Scholar 

  70. Horn GV, Aodha OM,  Song Y, Cui Y,  Sun C,  Shepard A, Adam H,  Perona P, Belongie SJ, The inaturalist species classification and detection dataset, In:EEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 8769–8778

  71. Ioffe S,Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, ICML 37: 448–456

  72. Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 10 657–10 665

  73. Rusu AA, Rao D, Sygnowski J, Vinyals O,Pascanu R, Osindero S, Hadsell R (2019) Meta-learning with latent embedding optimization, In:International Conference on Learning Representations, ICLR

  74. Liu Y, Sun Q, Liu A,  Su Y, Schiele B,  Chua T (2019) LCC: learning to customize and combine neural networks for few-shot learning, arXiv e-prints

  75. Afrasiyabi.A, Lalonde J-F, Gagné C, Associative alignment for few-shot image classification, In: ECCV, 2020, pp. 18–35

  76. Mishra N, Rohaninejad M, Chen X,  Abbeel P, A simple neural attentive meta-learner, In:International Conference on Learning Representations, ICLR , 2018

  77. Liu B, Cao Y, Lin Y, Li Q, Zhang Z, Long M,  Hu H, Negative margin matters: Understanding margin in few-shot classification, in ECCV, 2020, pp. 438–455

  78. Peyre G, Cuturi M (2019) Computational optimal transport, Foundations and Trends in Machine. Learning 11(5–6):355–607

    Google Scholar 

  79. Hore A, Ziou D, Image quality metrics: Psnr vs. ssim, in 2010 20th international conference on pattern recognition. IEEE, 2010, pp 2366–2369

  80. Sara U, Akter M, Uddin MS (2019) Image quality assessment through fsim, ssim, mse and psnr’a comparative study. J Comput Commun 7(3):8–18

    Article  Google Scholar 

  81. Ponomarenko N, Lukin V, Egiazarian K, Astola J, Carli M, Battisti F, Color image database for evaluation of image quality metrics, in 2008 IEEE 10th workshop on multimedia signal processing. IEEE, 2008, pp. 403–408

  82. Ahumada AJ (1993) Computational image quality metrics: A review. SID Digest 24:305–308

    Google Scholar 

  83. Mosser L, Dubrule O, and Blunt MJ (2017) Reconstruction of three-dimensional porous media using generative adversarial neural networks, Phys Rev E 96(4):043309

  84. Shams R, Masihi M, Boozarjomehry RB, Blunt MJ (2020) Coupled generative adversarial and auto-encoder neural networks to reconstruct three-dimensional multi-scale porous media. J Petrol Sci Eng 186:106794

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D., Ma, Q., Zheng, Q. et al. Improved local-feature-based few-shot learning with Sinkhorn metrics. Int. J. Mach. Learn. & Cyber. 13, 1099–1114 (2022). https://doi.org/10.1007/s13042-021-01437-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01437-y

Keywords

Navigation