Skip to main content
Log in

Feature enhancement modules applied to a feature pyramid network for object detection

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

A feature pyramid network (FPN) improves the ability of an object detection model to detect multiscale targets. However, the simple upsampling used in an FPN is not conducive to the propagation of deep semantic information, and redundant background information is not conducive to object detection. In this paper, we propose two plug-and-play modules for preexisting FPN-based architectures: a channel filtering module (CFM) and a spatial filtering module (SFM). The CFM learns the correlations between channels to improve the feature maps obtained via upsampling. The SFM introduces global information to improve the detection performance of the network. With the CFM and SFM, we improve the average precision (AP) of Faster R-CNN with an FPN by 0.9% to 1.3% on COCO, and we boost the AP of YOLOv5s with PANet by 2.8%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Code or data availability

The code and data are available.

References

  1. Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years. A survey. arXiv:1905.05055

  2. Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64. https://doi.org/10.1016/j.neucom.2020.01.085

    Article  Google Scholar 

  3. Kaur B, Singh S (2021) Object detection using deep learning: a review. In: Proceedings of the international conference on data science, machine learning and artificial intelligence, pp 328–334. https://doi.org/10.1145/3484824.3484889

  4. Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digit Signal Process. https://doi.org/10.1016/j.dsp.2022.103514

    Article  Google Scholar 

  5. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn. Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  6. Cai Z, Vasconcelos N (2017) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162.https://doi.org/10.1109/CVPR.2018.00644

  7. Li Y, Zheng H, Yan Z, Chen L (2019) Detail preservation and feature refinement for object detection. Neurocomputing 359:209–218. https://doi.org/10.1016/j.neucom.2019.05.086

    Article  Google Scholar 

  8. Qin H, Wu Y, Dong F, Sun S (2022) Dense sampling and detail enhancement network: Improved small object detection based on dense sampling and detail enhancement. IET Comput Vis. https://doi.org/10.1049/cvi2.12089

    Article  Google Scholar 

  9. Yan Z, Zheng H, Li Y (2022) Detail injection with heterogeneous composite backbone network for object detection. Multimed Tools Appl 81(8):11621–11637. https://doi.org/10.1007/s11042-022-12241-3

    Article  Google Scholar 

  10. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: European conference on computer vision, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  11. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125. https://doi.org/10.1109/CVPR.2017.106

  12. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) Detnet: a backbone network for object detection. arXiv:1804.06215

  13. Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30(10):3372–3386. https://doi.org/10.1109/TCSVT.2019.2950526

    Article  Google Scholar 

  14. Chalavadi V, Jeripothula P, Datla R, Ch SB (2022) mSODANet: a network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recognit 126:108548. https://doi.org/10.1016/j.patcog.2022.108548

    Article  Google Scholar 

  15. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212. https://doi.org/10.1109/CVPR.2018.00442

  16. Zhang X, Wu J, Peng Z, Meng M (2020) SODNet: small object detection using deconvolutional neural network. IET Image Process 14(8):1662–1669. https://doi.org/10.1049/iet-ipr.2019.0833

    Article  Google Scholar 

  17. Wu G, Guo Z, Shi X, Chen Q, Xu Y, Shibasaki R, Shao X (2018) A boundary regulated network for accurate roof segmentation and outline extraction. Remote Sens 10(8):1195. https://doi.org/10.3390/rs10081195

    Article  Google Scholar 

  18. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969. https://doi.org/10.1109/ICCV.2017.322

  19. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768. https://doi.org/10.1109/CVPR.2018.00913

  20. Bochkovskiy A, Wang C.-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934

  21. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062

  22. Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2874–2883. https://doi.org/10.1109/CVPR.2016.314

  23. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  24. Chen K, Cao Y, Loy CC, Lin D, Feichtenhofer C (2020) Feature pyramid grids. arXiv:2004.03580

  25. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv:1911.09516

  26. Jocher G, Chaurasia A, Stoken A, Borovec J, NanoCode012, Kwon Y, TaoXie, Fang J, imyhxy, Michael K (2022) ultralytics/yolov5: v6. 1-tensorrt, tensorflow edge tpu and openvino export and inference. Zenodo 22. https://doi.org/10.5281/zenodo.6222936

  27. Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C.L(2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  28. Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection. arXiv:1705.09587

  29. Zhou H, Li Z, Ning C, Tang J (2017) Cad: scale invariant framework for real-time object detection. In: Proceedings of the IEEE international conference on computer vision workshops, pp 760–768. https://doi.org/10.1109/ICCVW.2017.95

  30. Zhu Z, Li Z (2020) online video object detection via local and mid-range feature propagation. In: Proceedings of the 1st international workshop on human-centric multimedia analysis, pp 73–82. https://doi.org/10.1145/3422852.34234

  31. Huang Z, Wang J, Fu X, Yu T, Guo Y, Wang R (2020) DC-SPP-YOLO: dense connection and spatial pyramid pooling based yolo for object detection. Inf Sci 522:241–258. https://doi.org/10.1016/j.ins.2020.02.067

    Article  MathSciNet  Google Scholar 

  32. Cheng G, Si Y, Hong H, Yao X, Guo L (2021) Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci Remote Sens Lett 18(3):431–435. https://doi.org/10.1109/LGRS.2020.2975541

    Article  Google Scholar 

  33. Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2021) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662. https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  34. Tang L, Tang W, Qu X, Han Y, Wang W, Zhao B (2022) A scale-aware pyramid network for multi-scale object detection in SAR images. Remote Sens 14(4):973. https://doi.org/10.3390/rs14040973

    Article  Google Scholar 

  35. Qu X, Long E, Lv S, Chen P, Lai G, Yang Y, Du J (2021) Ship detection method based on scale matched r3det. In: 2021 3rd International conference on advanced information science and system (AISS 2021), pp 1–6. https://doi.org/10.1145/3503047.3503068

  36. Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z (2021) Effective fusion factor in FPN for tiny object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1160–1168. https://doi.org/10.1109/WACV48630.2021.00120

  37. Yang G, Wang Z, Zhuang S (2021) PFF-FPN: a parallel feature fusion module based on FPN in pedestrian detection. In: 2021 International conference on computer engineering and artificial intelligence (ICCEAI), pp 377–381. https://doi.org/10.1109/ICCEAI52939.2021.00075

  38. Zhou K, Zhang M, Wang H, Tan J (2022) Ship detection in SAR images based on multi-scale feature extraction and adaptive feature fusion. Remote Sens 14(3):755. https://doi.org/10.3390/rs14030755

    Article  Google Scholar 

  39. Zhang Y-M, Hsieh J-W, Lee C-C, Fan K-C (2022) SFPN: Synthetic FPN for object detection. arXiv:2203.02445

  40. Tang H, Yuan C, Li Z, Tang J (2022) learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognit. https://doi.org/10.1016/j.patcog.2022.1087

    Article  Google Scholar 

  41. Li Z, Sun Y, Zhang L, Tang J (2021) CTNet: context-based tandem network for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.31320

    Article  Google Scholar 

  42. Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z (2020) Dynamic convolution: attention over convolution kernels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11030–11039. https://doi.org/10.1109/CVPR42600.2020.01104

  43. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745

  44. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415

  45. Stergiou A, Poppe R, Kalliatakis G (2021) Refining activation downsampling with softpool. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10357–10366. https://doi.org/10.1109/ICCV48922.2021.01019

  46. Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696

  47. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6054–6063. https://doi.org/10.1109/ICCV.2019.00615

  48. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790. https://doi.org/10.1109/CVPR42600.2020.01079

  49. Wang S, Gong Y, Xing J, Huang L, Huang C, Hu W (2020) Rdsnet: a new deep architecture for reciprocal object detection and instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12208–12215. https://doi.org/10.1609/aaai.v34i07.6902

  50. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Proceedings of the European conference on computer vision (ECCV), pp 765–781. https://doi.org/10.1007/978-3-030-01264-9_45

  51. Pang J, Chen K, Shi J, Feng H, Ouyang W (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 821–830. https://doi.org/10.1109/CVPR.2019.00091

Download references

Funding

This work was supported in part by a Special Project of the Central Government for Local Science and Technology Development of Hubei Province (No. 2019ZYYD020), the Science and Technology Research Program of the Hubei Provincial Department of Education (No. T201805), two Projects of the Hubei University of Technology Ph.D. Research Startup Fund (No. BSQD2020015 and No. BSQD2020014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Lin.

Ethics declarations

Conflict of interest

This article is subject to no conflict of interest with any individual or organization.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, M., Lin, K., Huo, W. et al. Feature enhancement modules applied to a feature pyramid network for object detection. Pattern Anal Applic 26, 617–629 (2023). https://doi.org/10.1007/s10044-023-01152-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01152-0

Keywords

Navigation