Skip to main content
Log in

MPF-Net: multi-projection filtering network for few-shot object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep learning-based object detection has made tremendous progress in the field of intelligent vision systems. However, one of its major complaints is the high demand for large amounts of experimental data. Few-shot object detection (FSOD) aims to identify novel objects with only a few training samples. Existing techniques don’t fully explore the potential mapping relationships between support and query features due to the limitation of global matching contrast. In this work, we propose a multi-projection filtering network (MPF-Net) to exploit feature relevance and aggregate the information between multiple scales, ensuring an optimal global representation. Furthermore, we take the lead in proposing a feature contrast filtering paradigm for the classification and regression subtasks in order to fully utilize fine-grained features for contrastive match. The multi-visual contrast approach motivates our model to gracefully handle a variety of difficult detection challenges such as scale discrepancies, occlusions, and feature confusions. MPF-Net accurately perceives features at various scales and adaptively excavates category information. Extensive experiments on PASCAL VOC and MS COCO datasets have demonstrated that our detectors significantly improve upon baseline detectors, especially for extremely low-shot settings (average accuracy improvement is up to 3.5% in 1-shot scenarios and 2.5% in 2-shot scenarios). In general, we propose a novel strategy to construct the few-shot feature space and achieve remarkable results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

  1. He K, Gkioxari G, Dollár P, Girshick RB (2017) Mask R-CNN. In: ICCV, pp 980–2988. https://doi.org/10.1109/ICCV.2017.322

  2. Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR, pp 779–788. https://doi.org/10.1109/CVPR.2016.91

  3. Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NeuralIPS, pp 91–99

  4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778

  5. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: CVPR, pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079

  6. Huang H, Zhang J, Zhang J, Xu J, Wu Q (2021) Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE, TMM vol 23 pp 1666–1680. https://doi.org/10.1109/TMM.2020.3001510

  7. Andrychowicz M, Denil M, Colmenarejo SG, Hoffman MW, Pfau D, Schaul T, de Freitas N (2016) Learning to learn by gradient descent by gradient descent. In: NeuralIPS pp 3981–3989

  8. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Precup D, Teh YW (eds), ICML, vol 70 pp 1126–1135

  9. Qiao L, Shi Y, Li J, Tian Y, Huang T, Wang Y (2019) Transductive episodic-wise adaptive metric for few-shot learning. In: ICCV pp 3602–3611. https://doi.org/10.1109/ICCV.2019.00370

  10. Li Y, Liu Z, Yao L, Wang X, Wang C (2021) Attribute-modulated generative meta learning for zero-shot classification. arXiv:2104.10857

  11. Yan X, Chen Z, Xu A, Wang X, Liang X, Lin L (2019) Meta R-CNN: towards general solver for instance-level low-shot learning. In: ICCV pp 9576–9585. https://doi.org/10.1109/ICCV.2019.00967

  12. Wang Y, Ramanan D, Hebert M (2019) Meta-learning to detect rare objects. In: ICCV pp 9924–9933. https://doi.org/10.1109/ICCV.2019.01002

  13. Xiao Y, Marlet R (2020) Few-shot object detection and viewpoint estimation for objects in the wild. In: Vedaldi A, Bischof H, Brox T, Frahm J (eds), ECCV, vol 12362 pp 192–210. https://doi.org/10.1007/978-3-030-58520-4_12

  14. Karlinsky L, Shtok J, Harary S, Schwartz E, Aides A, Feris RS, Giryes R, Bronstein AM (2019) Repmet: representative-based metric learning for classification and few-shot object detection. In: CVPR pp 5197–5206. https://doi.org/10.1109/CVPR.2019.00534

  15. Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE TMM vol 17 pp 1989–1999. https://doi.org/10.1109/TMM.2015.2477035

  16. Li Y, Yao T, Pan Y, Chao H, Mei T (2020) Deep metric learning with density adaptivity. IEEE TMM vol 22 pp 1285–1297. https://doi.org/10.1109/TMM.2019.2939711

  17. Kang B, Liu Z, Wang X, Yu F, Feng J, Darrell T (2019) Few-shot object detection via feature reweighting. In: ICCV pp 8419–8428

  18. Hu H, Bai S, Li A, Cui J, Wang L (2021) Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR pp 10185–10194. https://doi.org/10.1109/CVPR46437.2021.01005

  19. Zhang Y, Zhang X, Qiu RC, Li J, Xu H, Tian Q (2021) Semi-supervised contrastive learning with similarity co-calibration. arXiv:2105.07387

  20. Li L, Jin W, Huang Y (2022) Few-shot contrastive learning for image classification and its application to insulator identification. Appl Intell vol 52 pp 6148–6163

  21. Sun B, Li B, Cai S, Yuan Y, Zhang C (2021) FSCE: few-shot object detection via contrastive proposal encoding. In: CVPR pp 7352–7362

  22. Deng C, Wang M, Liu L, Liu Y, Jiang Y (2022) Extended feature pyramid network for small object detection. IEEE, TMM 24:1968–1979. https://doi.org/10.1109/TMM.2021.3074273

  23. Wang M, Ning H, Liu H (2023) Object detection based on few-shot learning via instance-level feature correlation and aggregation. Appl Intell 53:351–368

  24. Li X, Sun Z, Xue J-H, Ma Z (2021) A concise review of recent few-shot meta-learning methods. Neurocomputing 456:463–468

  25. Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2019) Meta-learning with latent embedding optimization. In: ICLR

  26. Hidalgo AC, Ger PM, LDLF (2022) Valentin, Using meta-learning to predict student performance in virtual learning environments. Appl Intell pp 1–14

  27. Hidalgo AC, Ger PM, LDLF (2022) Valentin, Using meta-learning to predict student performance in virtual learning environments. Appl Intell pp 1–14

  28. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: CVPR. pp 1199–1208

  29. Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: NeuralIPS. pp 4077–4087

  30. Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: ICLR

  31. Pérez-Rúa J, Zhu X, Hospedales TM, Xiang T (2020) Incremental few-shot object detection. In: CVPR, pp 13843–13852. https://doi.org/10.1109/CVPR42600.2020.01386

  32. Huang L, Dai S, He Z (2023) Few-shot object detection with dense-global feature interaction and dual-contrastive learning. Appl Intell 53:14547–14564

  33. Chen H, Wang Y, Wang G, Qiao Y (2018) LSTD: a low-shot transfer detector for object detection. In: McIlraith SA, Weinberger KQ (eds), AAAI, pp 2836–2843

  34. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: CVPR, pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690

  35. Wang X, Huang TE, Gonzalez J, Darrell T, Yu F (2020) Frustratingly simple few-shot object detection. In: ICML, 119:9919–9928

  36. Chuang C, Robinson J, Lin Y, Torralba A, Jegelka S (2020) Debiased contrastive learning. In: NeurIPS

  37. Cheng M, Wang H, Long Y (2022) Meta-learning-based incremental few-shot object detection. IEEE, TCSVT 32:2158–2169

  38. WuJ, Liu S, Huang D, Wang Y (2020) Multi-scale positive sample refinement for few-shot object detection. In: Vedaldi A, Bischof H, Brox T, Frahm J (eds), ECCV 12361:456–472. https://doi.org/10.1007/978-3-030-58517-4_27

  39. Al-Kaabi K, Monsefi R, Zabihzadeh D (2023) A framework to enhance generalization of deep metric learning methods using general discriminative feature learning and class adversarial neural networks. Appl Intell 53:8693–8711

  40. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: CVPR 8:7132–7141. https://doi.org/10.1109/CVPR.2018.00745

  41. Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: CVPR pp 936–944. https://doi.org/10.1109/CVPR.2017.106

  42. Yu L, Zhang J, Wu Q (2022) Dual attention on pyramid feature maps for image captioning. IEEE TMM pp 1775–1786. https://doi.org/10.1109/TMM.2021.3072479

  43. Yang S, Wang Y, Chen K, Zeng W, Fei Z (2022) Attribute-aware feature encoding for object recognition and segmentation. IEEE TMM 24:3611–3623. https://doi.org/10.1109/TMM.2021.3103605

  44. Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds), ECCV 11211:3–19

  45. Zhang T, Lin G, Cai J, Shen T, Shen C, Kot AC (2019) Decoupled spatial neural attention for weakly supervised semantic segmentation. IEEE TMM 21:2930–2941. https://doi.org/10.1109/TMM.2019.2914870

  46. Liu H, Liu F, Fan X, Huang D (2021) Polarized self-attention: towards high-quality pixel-wise regression. arXiv:2107.00782

  47. Emami H, Aliabadi MM, Dong M, Chinnam RB (2021) SPA-GAN: spatial attention GAN for image-to-image translation. IEEE TMM 23:391–401. https://doi.org/10.1109/TMM.2020.2975961

  48. Li J, Pan Z, Liu Q, Wang Z (2021) Stacked u-shape network with channel-wise attention for salient object detection. IEEE TMM 23:1397–1409. https://doi.org/10.1109/TMM.2020.2997192

  49. Park K, Soh JW, Cho NI (2021) A dynamic residual self-attention network for lightweight single image super-resolution. arXiv:2112.04488

  50. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  51. Liu L, Ma B, Zhang Y, Yi X, Li H (2021) Afd-net: adaptive fully-dual network for few-shot object detection. In: Shen HT, Zhuang Y, Smith JR, Yang Y, Cesar P, Metze F, Prabhakaran B (eds), ACMMM pp 2549–2557. https://doi.org/10.1145/3474085.3475428

  52. Yan D, Huang J, Sun H, Ding F (2023) Few-shot object detection with weight imprinting. Cog Comput pp 1–11

  53. Vu AKN, Nguyen ND, Nguyen ND, Nguyen VT, Ngo TD, Do TT, Nguyen TV (2022) Few-shot object detection via baby learning. Image Vision Comput 120:104398

  54. Xia R, Li G, Huang Z, Meng H, Pang Y (2023) Bi-path combination yolo for real-time few-shot object detection. Pattern Recognition Letters 165:91–97

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No. 62162008, 62006046, 32125033 and 31960548), Innovation and Entrepreneurship Project for Overseas Educated Talents in Guizhou Province (2022)-04, Guizhou Provincial Basic Research Program (ZK[2022]-108). Program of Introducing Talents of Discipline to Universities of China (111 Program, D20023).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the conceptualization and design of the study. Material preparation, data collection, and analysis were carried out by Han Chen, Kailin Xie, Qi Wang, Xue Wu, and Liang Lei. The corresponding author is Qi Wang. The first draft of the manuscript was written by Han Chen and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Qi Wang or Liang Lei.

Ethics declarations

Ethics approval and informed consent for data used

Informed consent.

Competing interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Wang, Q., Xie, K. et al. MPF-Net: multi-projection filtering network for few-shot object detection. Appl Intell 54, 7777–7792 (2024). https://doi.org/10.1007/s10489-024-05556-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05556-1

Keywords