Abstract
Deep learning-based object detection has made tremendous progress in the field of intelligent vision systems. However, one of its major complaints is the high demand for large amounts of experimental data. Few-shot object detection (FSOD) aims to identify novel objects with only a few training samples. Existing techniques don’t fully explore the potential mapping relationships between support and query features due to the limitation of global matching contrast. In this work, we propose a multi-projection filtering network (MPF-Net) to exploit feature relevance and aggregate the information between multiple scales, ensuring an optimal global representation. Furthermore, we take the lead in proposing a feature contrast filtering paradigm for the classification and regression subtasks in order to fully utilize fine-grained features for contrastive match. The multi-visual contrast approach motivates our model to gracefully handle a variety of difficult detection challenges such as scale discrepancies, occlusions, and feature confusions. MPF-Net accurately perceives features at various scales and adaptively excavates category information. Extensive experiments on PASCAL VOC and MS COCO datasets have demonstrated that our detectors significantly improve upon baseline detectors, especially for extremely low-shot settings (average accuracy improvement is up to 3.5% in 1-shot scenarios and 2.5% in 2-shot scenarios). In general, we propose a novel strategy to construct the few-shot feature space and achieve remarkable results.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.
References
He K, Gkioxari G, Dollár P, Girshick RB (2017) Mask R-CNN. In: ICCV, pp 980–2988. https://doi.org/10.1109/ICCV.2017.322
Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR, pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NeuralIPS, pp 91–99
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: CVPR, pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079
Huang H, Zhang J, Zhang J, Xu J, Wu Q (2021) Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE, TMM vol 23 pp 1666–1680. https://doi.org/10.1109/TMM.2020.3001510
Andrychowicz M, Denil M, Colmenarejo SG, Hoffman MW, Pfau D, Schaul T, de Freitas N (2016) Learning to learn by gradient descent by gradient descent. In: NeuralIPS pp 3981–3989
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Precup D, Teh YW (eds), ICML, vol 70 pp 1126–1135
Qiao L, Shi Y, Li J, Tian Y, Huang T, Wang Y (2019) Transductive episodic-wise adaptive metric for few-shot learning. In: ICCV pp 3602–3611. https://doi.org/10.1109/ICCV.2019.00370
Li Y, Liu Z, Yao L, Wang X, Wang C (2021) Attribute-modulated generative meta learning for zero-shot classification. arXiv:2104.10857
Yan X, Chen Z, Xu A, Wang X, Liang X, Lin L (2019) Meta R-CNN: towards general solver for instance-level low-shot learning. In: ICCV pp 9576–9585. https://doi.org/10.1109/ICCV.2019.00967
Wang Y, Ramanan D, Hebert M (2019) Meta-learning to detect rare objects. In: ICCV pp 9924–9933. https://doi.org/10.1109/ICCV.2019.01002
Xiao Y, Marlet R (2020) Few-shot object detection and viewpoint estimation for objects in the wild. In: Vedaldi A, Bischof H, Brox T, Frahm J (eds), ECCV, vol 12362 pp 192–210. https://doi.org/10.1007/978-3-030-58520-4_12
Karlinsky L, Shtok J, Harary S, Schwartz E, Aides A, Feris RS, Giryes R, Bronstein AM (2019) Repmet: representative-based metric learning for classification and few-shot object detection. In: CVPR pp 5197–5206. https://doi.org/10.1109/CVPR.2019.00534
Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE TMM vol 17 pp 1989–1999. https://doi.org/10.1109/TMM.2015.2477035
Li Y, Yao T, Pan Y, Chao H, Mei T (2020) Deep metric learning with density adaptivity. IEEE TMM vol 22 pp 1285–1297. https://doi.org/10.1109/TMM.2019.2939711
Kang B, Liu Z, Wang X, Yu F, Feng J, Darrell T (2019) Few-shot object detection via feature reweighting. In: ICCV pp 8419–8428
Hu H, Bai S, Li A, Cui J, Wang L (2021) Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR pp 10185–10194. https://doi.org/10.1109/CVPR46437.2021.01005
Zhang Y, Zhang X, Qiu RC, Li J, Xu H, Tian Q (2021) Semi-supervised contrastive learning with similarity co-calibration. arXiv:2105.07387
Li L, Jin W, Huang Y (2022) Few-shot contrastive learning for image classification and its application to insulator identification. Appl Intell vol 52 pp 6148–6163
Sun B, Li B, Cai S, Yuan Y, Zhang C (2021) FSCE: few-shot object detection via contrastive proposal encoding. In: CVPR pp 7352–7362
Deng C, Wang M, Liu L, Liu Y, Jiang Y (2022) Extended feature pyramid network for small object detection. IEEE, TMM 24:1968–1979. https://doi.org/10.1109/TMM.2021.3074273
Wang M, Ning H, Liu H (2023) Object detection based on few-shot learning via instance-level feature correlation and aggregation. Appl Intell 53:351–368
Li X, Sun Z, Xue J-H, Ma Z (2021) A concise review of recent few-shot meta-learning methods. Neurocomputing 456:463–468
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2019) Meta-learning with latent embedding optimization. In: ICLR
Hidalgo AC, Ger PM, LDLF (2022) Valentin, Using meta-learning to predict student performance in virtual learning environments. Appl Intell pp 1–14
Hidalgo AC, Ger PM, LDLF (2022) Valentin, Using meta-learning to predict student performance in virtual learning environments. Appl Intell pp 1–14
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: CVPR. pp 1199–1208
Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: NeuralIPS. pp 4077–4087
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: ICLR
Pérez-Rúa J, Zhu X, Hospedales TM, Xiang T (2020) Incremental few-shot object detection. In: CVPR, pp 13843–13852. https://doi.org/10.1109/CVPR42600.2020.01386
Huang L, Dai S, He Z (2023) Few-shot object detection with dense-global feature interaction and dual-contrastive learning. Appl Intell 53:14547–14564
Chen H, Wang Y, Wang G, Qiao Y (2018) LSTD: a low-shot transfer detector for object detection. In: McIlraith SA, Weinberger KQ (eds), AAAI, pp 2836–2843
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: CVPR, pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690
Wang X, Huang TE, Gonzalez J, Darrell T, Yu F (2020) Frustratingly simple few-shot object detection. In: ICML, 119:9919–9928
Chuang C, Robinson J, Lin Y, Torralba A, Jegelka S (2020) Debiased contrastive learning. In: NeurIPS
Cheng M, Wang H, Long Y (2022) Meta-learning-based incremental few-shot object detection. IEEE, TCSVT 32:2158–2169
WuJ, Liu S, Huang D, Wang Y (2020) Multi-scale positive sample refinement for few-shot object detection. In: Vedaldi A, Bischof H, Brox T, Frahm J (eds), ECCV 12361:456–472. https://doi.org/10.1007/978-3-030-58517-4_27
Al-Kaabi K, Monsefi R, Zabihzadeh D (2023) A framework to enhance generalization of deep metric learning methods using general discriminative feature learning and class adversarial neural networks. Appl Intell 53:8693–8711
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: CVPR 8:7132–7141. https://doi.org/10.1109/CVPR.2018.00745
Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: CVPR pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Yu L, Zhang J, Wu Q (2022) Dual attention on pyramid feature maps for image captioning. IEEE TMM pp 1775–1786. https://doi.org/10.1109/TMM.2021.3072479
Yang S, Wang Y, Chen K, Zeng W, Fei Z (2022) Attribute-aware feature encoding for object recognition and segmentation. IEEE TMM 24:3611–3623. https://doi.org/10.1109/TMM.2021.3103605
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds), ECCV 11211:3–19
Zhang T, Lin G, Cai J, Shen T, Shen C, Kot AC (2019) Decoupled spatial neural attention for weakly supervised semantic segmentation. IEEE TMM 21:2930–2941. https://doi.org/10.1109/TMM.2019.2914870
Liu H, Liu F, Fan X, Huang D (2021) Polarized self-attention: towards high-quality pixel-wise regression. arXiv:2107.00782
Emami H, Aliabadi MM, Dong M, Chinnam RB (2021) SPA-GAN: spatial attention GAN for image-to-image translation. IEEE TMM 23:391–401. https://doi.org/10.1109/TMM.2020.2975961
Li J, Pan Z, Liu Q, Wang Z (2021) Stacked u-shape network with channel-wise attention for salient object detection. IEEE TMM 23:1397–1409. https://doi.org/10.1109/TMM.2020.2997192
Park K, Soh JW, Cho NI (2021) A dynamic residual self-attention network for lightweight single image super-resolution. arXiv:2112.04488
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Liu L, Ma B, Zhang Y, Yi X, Li H (2021) Afd-net: adaptive fully-dual network for few-shot object detection. In: Shen HT, Zhuang Y, Smith JR, Yang Y, Cesar P, Metze F, Prabhakaran B (eds), ACMMM pp 2549–2557. https://doi.org/10.1145/3474085.3475428
Yan D, Huang J, Sun H, Ding F (2023) Few-shot object detection with weight imprinting. Cog Comput pp 1–11
Vu AKN, Nguyen ND, Nguyen ND, Nguyen VT, Ngo TD, Do TT, Nguyen TV (2022) Few-shot object detection via baby learning. Image Vision Comput 120:104398
Xia R, Li G, Huang Z, Meng H, Pang Y (2023) Bi-path combination yolo for real-time few-shot object detection. Pattern Recognition Letters 165:91–97
Acknowledgements
This research was supported by the National Natural Science Foundation of China (No. 62162008, 62006046, 32125033 and 31960548), Innovation and Entrepreneurship Project for Overseas Educated Talents in Guizhou Province (2022)-04, Guizhou Provincial Basic Research Program (ZK[2022]-108). Program of Introducing Talents of Discipline to Universities of China (111 Program, D20023).
Author information
Authors and Affiliations
Contributions
All authors contributed to the conceptualization and design of the study. Material preparation, data collection, and analysis were carried out by Han Chen, Kailin Xie, Qi Wang, Xue Wu, and Liang Lei. The corresponding author is Qi Wang. The first draft of the manuscript was written by Han Chen and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and informed consent for data used
Informed consent.
Competing interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, H., Wang, Q., Xie, K. et al. MPF-Net: multi-projection filtering network for few-shot object detection. Appl Intell 54, 7777–7792 (2024). https://doi.org/10.1007/s10489-024-05556-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05556-1