Abstract
Local-feature-based Few-Shot Learning (FSL) has attracked a lot of attention and achieved great progress recently. Given an image, the model extracts a group of local features through the Fully Convolutional Network (FCN), each of which contains information from the corresponding receptive field of the image. The challenging problem is that how to exploit the local-feature-level similarities to generate the image-level similarity. Towards this, many existing works have proposed different heuristic rules or settings. In this paper, we first follow existing works and systematically propose several modified methods for local feature matching, induced by a novel and improved heterogeneous matching mechanism. However, these heuristic methods are not optimal to highlight the most informative local feature pairs to represent the image-level similarity, and also can not generalize well to different tasks. Therefore, we propose a new idea called Sinkhorn Metrics (SM). We consider the local-feature-based FSL as the Regularized Optimal Transport (ROT) problem. The cost matrix is formed by the similarities of local feature pairs. The marginals indicating the importance of each local feature are obtained by a new attentive cross-comparison module. The optimal transportation plan is used as weights to aggregate all the local-feature-level similarities to obtain the image-level similarity. We exploit the Sinkhorn algorithm to solve the ROT problem, which is efficient for the end-to-end training. We conduct a hybrid experiment on SM with some heuristic baselines to demonstrate its compatibility. Extensive ablation studies are performed to fully evaluate important hyper-parameters and settings. Our method achieves a series of state-of-the-arts on multiple datasets in both the single-domain and cross-domain FSL scenarios (The code for evaluation, trained model, and datasets in this study are available at https://github.com/Wangduo428/few-shot-learning-SM).
Similar content being viewed by others
References
Krizhevsky A, Sutskever I, and Hinton GE (2012) Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, NeurIPS, pp 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, In: IEEE conference on computer vision and pattern recognition, CVPR, pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks In: International Conference on Machine Learning, ICML, vol. 70, pp 1126–1135
Antoniou A, Edwards H, Storkey AJ (2019) How to train your MAML. In: International conference on learning representations, ICLR, 2019
Jamal MA, Qi G (2019) Task agnostic meta-learning for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR, 2019, pp 11 719–11 727
Vinyals O, Blundell C, Lillicrap T, Wierstra D et al. (2016) Matching networks for one shot learning, in Advances in Neural Information Processing Systems, NeurIPS, pp 3630–3638
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning, in Advances in Neural Information Processing Systems, NeurIPS, pp 4077–4087
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition, In: ICML Deep Learning Workshop, vol. 2
Li W, Wang L, Xu J, Huo J, Gao Y, Luo J (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR, pp 7260–7268
Wu Z, Li Y, Guo L, Jia K (2019) Parn: position-aware relation networks for few-shot learning. In: IEEE international conference on computer vision, ICCV, October
Hou R, Chang H, Ma B, Shan S, Chen X (2019) Cross attention network for few-shot classification. In: Advances in Neural Information Processing Systems, NeurIPS, pp 4005–4016
Dong C, Li W, Huo J, Gu Z, Gao Y (2020) Learning task-aware local representations for few-shot learning. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20, pp 716–722
Xue Z., Duan L, Li W, Chen L, Luo J (2020) Region comparison network for interpretable few-shot image classification, ArXiv, vol. abs/2009.03558,
Sinkhorn R, Knopp P (1967) Concerning nonnegative matrices and doubly stochastic matrices, Pac J Math 21(2)
Sinkhorn R (1974) Diagonal equivalence to matrices with prescribed row and column sums ii. Proc Am Math Soc 45(2):195–198
Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: International Conference on Learning Representations, ICLR
Jiang X, Havaei M, Varno F, Chartrand G, Chapados N, Matwin S (2019) Learning to learn with conditional class dependencies. In: International conference on learning representations, ICLR,
Sun Q, Liu Y, Chua T, Schiele B, Meta-transfer learning for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR, 2019, pp 403–412
Gidaris S, Komodakis N (2018) Dynamic few-shot visual learning without forgetting, In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4367–4375
S. Qiao, C. Liu, W. Shen, and A. L. Yuille, Few-shot image recognition by predicting parameters from activations, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 7229–7238
Yoon SW, Seo J, Moon J (2019) Tapnet: Neural network augmented with task-adaptive projection for few-shot learning, In: International Conference on Machine Learning, ICML, 97:7115–7123
Gidaris S, Komodakis N (2019) Generating classification weights with gnn denoising autoencoders for few-shot learning, In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Guo Y, Cheung N-M (2020) Attentive weights generation for few shot learning via information maximization, In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Hariharan B, Girshick RB (2017) Low-shot visual recognition by shrinking and hallucinating features, In: IEEE International Conference on Computer Vision, ICCV, pp 3037–3046
Schwartz E, Karlinsky L, Shtok J, Harary S, Marder M, Kumar A, Feris RS, Giryes R, Bronstein AM (2018) Delta-encoder: an effective sample synthesis method for few-shot object recognition, In: Advances in Neural Information Processing Systems, NeurIPS, pp. 2850–2860
Chen Z, Fu Y, Chen K, Jiang Y (2019) Image block augmentation for one-shot learning, in AAAI Conference on Artificial Intelligence, AAAI, , pp. 3379–3386
Chen Z, Fu Y, Wang Y, Ma L, Liu W, Hebert M (2019) Image deformation meta-networks for one-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR , pp 8680–8689
Tsutsui S, Fu Y, Crandall DJ (2019) Meta-reinforced synthetic data for one-shot fine-grained visual recognition, in Advances in Neural Information Processing Systems, NeurIPS, , pp. 3057–3066
Mangla P, Singh M, Sinha A, Kumari N, Balasubramanian VN, Krishnamurthy B (2020) Charting the right manifold: Manifold mixup for few-shot learning, in IEEE Winter Conference on Applications of Computer Vision, WACV, pp 2207–2216
Sung Y, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: Relation network for few-shot learning, In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR , pp. 1199–1208
Mehrotra A, Dukkipati A ( 2017) Generative Adversarial Residual Pairwise Networks for One Shot Learning, arXiv e-prints
Satorras VG, Estrach JB ( 2018) Few-shot learning with graph neural networks, In:International Conference on Learning Representations, ICLR
YangL, Li L, Zhang Z, Zhou X, Zhou E, Liu Y (2020) Dpgn: Distribution propagation graph network for few-shot learning. In:IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ma Y, Bai S, An S, Liu W, Liu A, Zhen X, Liu X Transductive relation-propagation network for few-shot learning. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2020, pp. 804–810
Fei N, Lu Z, Gao Y, Tian J, Xiang T, Wen J.-R Meta-learning across meta-tasks for few-shot learning, arXiv e-prints, 2020
Guan J, Lu Z, Xiang T, Wen J-R (2020) Few-shot learning as domain adaptation: Algorithm and analysis, arXiv e-prints
Ye H-J, Hu H, Zhan D-C, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR , pp 8808–8817
C. Xing, N. Rostamzadeh, B. N. Oreshkin, and P. O. Pinheiro, Adaptive cross-modal few-shot learning, in Advances in Neural Information Processing Systems, NeurIPS, 2019, pp. 4848–4858
Li A, Huang W, Lan X, Feng J, Li Z,. Wang L (2020) Boosting few-shot learning with adaptive margin loss, In:IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Chen W, Liu Y, Kira Z, Wang YF, Huang J (2019) A closer look at few-shot classification. In:International Conference on Learning Representations, ICLR
Tseng H, Lee H, Huang J, Yang M (2020) Cross-domain few-shot classification via learned feature-wise transformation, In: International Conference on Learning Representations, ICLR
Sun J, Lapuschkin S, Samek W, Zhao Y, Cheung N-M, Binder A (2020) Explanation-Guided Training for Cross-Domain Few-Shot Classification, arXiv e-prints,
Lifchitz Y, Avrithis Y, Picard S, Bursuc A (2019) Dense classification and implanting for few-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 9258–9267
Chu W-H, . Li Y-J, Chang J-C, Wang Y-CF (2019) Spot and learn: A maximum-entropy patch sampler for few-shot image classification, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang C, Cai Y, Lin G, Shen C (2020) Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers, In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 12 203–12 213
Karlinsky L, Shtok J, Alfassy A, Lichtenstein M, Harary S, Schwartz E, Doveh S, Sattigeri P, Feris R, Bronstein A et al. (2020) Starnet: towards weakly supervised few-shot detection and explainable few-shot classification, arXiv e-print
Wang Y, Chao W, Weinberger KQ, van der Maaten L (2019) Simpleshot: Revisiting nearest-neighbor classification for few-shot learning, arXiv e-prints
Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: A good embedding is all you need? In: ECCV, pp 266–282
Oreshkin B, Rodríguez López P, Lacoste A (2018) Tadam: Task dependent adaptive metric for improved few-shot learning, In: Advances in Neural Information Processing Systems, NeurIPS, pp 719–729
Scott T, Ridgeway K, Mozer MC (2018) Adapted deep embeddings: A synthesis of methods for k-shot inductive transfer learning, in Advances in Neural Information Processing Systems, NeurIPS, pp. 76–85
Wang D, Cheng Y, Yu M, Guo X, Zhang T (2019) A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing 349:202–211
Li H, Eigen D, Dodge S, Zeiler M, Wang X ( 2019) Finding task-relevant features for few-shot learning by category traversal, In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR pp 1–10
Simon C, Koniusz P, Nock R, and Harandi M (2020) Adaptive subspaces for few-shot learning, In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Lichtenstein M, Sattigeri P, Feris R, Giryes R, Karlinsky L (2020) Tafssl: Task-adaptive feature sub-space learning for few-shot classification, In:ECCV, pp 522–539
Vallender S (1974) Calculation of the wasserstein distance between probability distributions on the line. Theory of Probability & Its Applications 18(4):784–786
Cuturi M (2013) Sinkhorn distances: Lightspeed computation of optimal transportation distances, Advances in Neural Information Processing Systems, NeurIPS, 26:2292–2300
Chizat L, Roussillon P, Léger F, Vialard F-X, Peyré G (2020) Faster wasserstein distance estimation with the sinkhorn divergence, arXiv preprint:arXiv:2006.08172
Scetbon M, Cuturi M (2020) Linear time sinkhorn divergences using positive features, arXiv preprint:arXiv:2006.07057
Rubner Y, Guibas LJ, Tomasi C (1997) The earth mover’s distance, multi-dimensional scaling, and color-based image retrieval, In:Proceedings of the ARPA image understanding workshop, 661:668
Li P (2013) Tensor-sift based earth mover’s distance for contour tracking. Journal of mathematical imaging and vision 46(1):44–65
Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking, In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6951–6960
Zhao Q, Yang Z, Tao H (2008) Differential earth mover’s distance with its applications to visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(2):274–287
Liu S, Li Z, Sun J, Self-emd: Self-supervised object detection without imagenet, arXiv preprintarXiv:2011.13677, 2020
Koltchinskii V, Panchenko D et al (2002) Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics 30(1):1–50
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. in International Conference on Learning Representations, ICLR
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200–2011 dataset
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: IEEE International Conference on Computer Vision Workshops ICCV pp 554–561
Zhou B, Lapedriza À, Khosla A, Oliva A, Torralba A (2018) Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6):1452–1464
Horn GV, Aodha OM, Song Y, Cui Y, Sun C, Shepard A, Adam H, Perona P, Belongie SJ, The inaturalist species classification and detection dataset, In:EEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 8769–8778
Ioffe S,Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, ICML 37: 448–456
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 10 657–10 665
Rusu AA, Rao D, Sygnowski J, Vinyals O,Pascanu R, Osindero S, Hadsell R (2019) Meta-learning with latent embedding optimization, In:International Conference on Learning Representations, ICLR
Liu Y, Sun Q, Liu A, Su Y, Schiele B, Chua T (2019) LCC: learning to customize and combine neural networks for few-shot learning, arXiv e-prints
Afrasiyabi.A, Lalonde J-F, Gagné C, Associative alignment for few-shot image classification, In: ECCV, 2020, pp. 18–35
Mishra N, Rohaninejad M, Chen X, Abbeel P, A simple neural attentive meta-learner, In:International Conference on Learning Representations, ICLR , 2018
Liu B, Cao Y, Lin Y, Li Q, Zhang Z, Long M, Hu H, Negative margin matters: Understanding margin in few-shot classification, in ECCV, 2020, pp. 438–455
Peyre G, Cuturi M (2019) Computational optimal transport, Foundations and Trends in Machine. Learning 11(5–6):355–607
Hore A, Ziou D, Image quality metrics: Psnr vs. ssim, in 2010 20th international conference on pattern recognition. IEEE, 2010, pp 2366–2369
Sara U, Akter M, Uddin MS (2019) Image quality assessment through fsim, ssim, mse and psnr’a comparative study. J Comput Commun 7(3):8–18
Ponomarenko N, Lukin V, Egiazarian K, Astola J, Carli M, Battisti F, Color image database for evaluation of image quality metrics, in 2008 IEEE 10th workshop on multimedia signal processing. IEEE, 2008, pp. 403–408
Ahumada AJ (1993) Computational image quality metrics: A review. SID Digest 24:305–308
Mosser L, Dubrule O, and Blunt MJ (2017) Reconstruction of three-dimensional porous media using generative adversarial neural networks, Phys Rev E 96(4):043309
Shams R, Masihi M, Boozarjomehry RB, Blunt MJ (2020) Coupled generative adversarial and auto-encoder neural networks to reconstruct three-dimensional multi-scale porous media. J Petrol Sci Eng 186:106794
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, D., Ma, Q., Zheng, Q. et al. Improved local-feature-based few-shot learning with Sinkhorn metrics. Int. J. Mach. Learn. & Cyber. 13, 1099–1114 (2022). https://doi.org/10.1007/s13042-021-01437-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01437-y