Abstract
A feature pyramid network (FPN) improves the ability of an object detection model to detect multiscale targets. However, the simple upsampling used in an FPN is not conducive to the propagation of deep semantic information, and redundant background information is not conducive to object detection. In this paper, we propose two plug-and-play modules for preexisting FPN-based architectures: a channel filtering module (CFM) and a spatial filtering module (SFM). The CFM learns the correlations between channels to improve the feature maps obtained via upsampling. The SFM introduces global information to improve the detection performance of the network. With the CFM and SFM, we improve the average precision (AP) of Faster R-CNN with an FPN by 0.9% to 1.3% on COCO, and we boost the AP of YOLOv5s with PANet by 2.8%.
Similar content being viewed by others
Code or data availability
The code and data are available.
References
Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years. A survey. arXiv:1905.05055
Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64. https://doi.org/10.1016/j.neucom.2020.01.085
Kaur B, Singh S (2021) Object detection using deep learning: a review. In: Proceedings of the international conference on data science, machine learning and artificial intelligence, pp 328–334. https://doi.org/10.1145/3484824.3484889
Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digit Signal Process. https://doi.org/10.1016/j.dsp.2022.103514
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn. Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst. https://doi.org/10.1109/TPAMI.2016.2577031
Cai Z, Vasconcelos N (2017) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162.https://doi.org/10.1109/CVPR.2018.00644
Li Y, Zheng H, Yan Z, Chen L (2019) Detail preservation and feature refinement for object detection. Neurocomputing 359:209–218. https://doi.org/10.1016/j.neucom.2019.05.086
Qin H, Wu Y, Dong F, Sun S (2022) Dense sampling and detail enhancement network: Improved small object detection based on dense sampling and detail enhancement. IET Comput Vis. https://doi.org/10.1049/cvi2.12089
Yan Z, Zheng H, Li Y (2022) Detail injection with heterogeneous composite backbone network for object detection. Multimed Tools Appl 81(8):11621–11637. https://doi.org/10.1007/s11042-022-12241-3
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: European conference on computer vision, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125. https://doi.org/10.1109/CVPR.2017.106
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) Detnet: a backbone network for object detection. arXiv:1804.06215
Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30(10):3372–3386. https://doi.org/10.1109/TCSVT.2019.2950526
Chalavadi V, Jeripothula P, Datla R, Ch SB (2022) mSODANet: a network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recognit 126:108548. https://doi.org/10.1016/j.patcog.2022.108548
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212. https://doi.org/10.1109/CVPR.2018.00442
Zhang X, Wu J, Peng Z, Meng M (2020) SODNet: small object detection using deconvolutional neural network. IET Image Process 14(8):1662–1669. https://doi.org/10.1049/iet-ipr.2019.0833
Wu G, Guo Z, Shi X, Chen Q, Xu Y, Shibasaki R, Shao X (2018) A boundary regulated network for accurate roof segmentation and outline extraction. Remote Sens 10(8):1195. https://doi.org/10.3390/rs10081195
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969. https://doi.org/10.1109/ICCV.2017.322
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
Bochkovskiy A, Wang C.-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2874–2883. https://doi.org/10.1109/CVPR.2016.314
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Chen K, Cao Y, Loy CC, Lin D, Feichtenhofer C (2020) Feature pyramid grids. arXiv:2004.03580
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv:1911.09516
Jocher G, Chaurasia A, Stoken A, Borovec J, NanoCode012, Kwon Y, TaoXie, Fang J, imyhxy, Michael K (2022) ultralytics/yolov5: v6. 1-tensorrt, tensorflow edge tpu and openvino export and inference. Zenodo 22. https://doi.org/10.5281/zenodo.6222936
Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C.L(2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection. arXiv:1705.09587
Zhou H, Li Z, Ning C, Tang J (2017) Cad: scale invariant framework for real-time object detection. In: Proceedings of the IEEE international conference on computer vision workshops, pp 760–768. https://doi.org/10.1109/ICCVW.2017.95
Zhu Z, Li Z (2020) online video object detection via local and mid-range feature propagation. In: Proceedings of the 1st international workshop on human-centric multimedia analysis, pp 73–82. https://doi.org/10.1145/3422852.34234
Huang Z, Wang J, Fu X, Yu T, Guo Y, Wang R (2020) DC-SPP-YOLO: dense connection and spatial pyramid pooling based yolo for object detection. Inf Sci 522:241–258. https://doi.org/10.1016/j.ins.2020.02.067
Cheng G, Si Y, Hong H, Yao X, Guo L (2021) Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci Remote Sens Lett 18(3):431–435. https://doi.org/10.1109/LGRS.2020.2975541
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2021) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662. https://doi.org/10.1109/TPAMI.2019.2938758
Tang L, Tang W, Qu X, Han Y, Wang W, Zhao B (2022) A scale-aware pyramid network for multi-scale object detection in SAR images. Remote Sens 14(4):973. https://doi.org/10.3390/rs14040973
Qu X, Long E, Lv S, Chen P, Lai G, Yang Y, Du J (2021) Ship detection method based on scale matched r3det. In: 2021 3rd International conference on advanced information science and system (AISS 2021), pp 1–6. https://doi.org/10.1145/3503047.3503068
Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z (2021) Effective fusion factor in FPN for tiny object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1160–1168. https://doi.org/10.1109/WACV48630.2021.00120
Yang G, Wang Z, Zhuang S (2021) PFF-FPN: a parallel feature fusion module based on FPN in pedestrian detection. In: 2021 International conference on computer engineering and artificial intelligence (ICCEAI), pp 377–381. https://doi.org/10.1109/ICCEAI52939.2021.00075
Zhou K, Zhang M, Wang H, Tan J (2022) Ship detection in SAR images based on multi-scale feature extraction and adaptive feature fusion. Remote Sens 14(3):755. https://doi.org/10.3390/rs14030755
Zhang Y-M, Hsieh J-W, Lee C-C, Fan K-C (2022) SFPN: Synthetic FPN for object detection. arXiv:2203.02445
Tang H, Yuan C, Li Z, Tang J (2022) learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognit. https://doi.org/10.1016/j.patcog.2022.1087
Li Z, Sun Y, Zhang L, Tang J (2021) CTNet: context-based tandem network for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.31320
Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z (2020) Dynamic convolution: attention over convolution kernels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11030–11039. https://doi.org/10.1109/CVPR42600.2020.01104
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415
Stergiou A, Poppe R, Kalliatakis G (2021) Refining activation downsampling with softpool. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10357–10366. https://doi.org/10.1109/ICCV48922.2021.01019
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6054–6063. https://doi.org/10.1109/ICCV.2019.00615
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790. https://doi.org/10.1109/CVPR42600.2020.01079
Wang S, Gong Y, Xing J, Huang L, Huang C, Hu W (2020) Rdsnet: a new deep architecture for reciprocal object detection and instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12208–12215. https://doi.org/10.1609/aaai.v34i07.6902
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Proceedings of the European conference on computer vision (ECCV), pp 765–781. https://doi.org/10.1007/978-3-030-01264-9_45
Pang J, Chen K, Shi J, Feng H, Ouyang W (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 821–830. https://doi.org/10.1109/CVPR.2019.00091
Funding
This work was supported in part by a Special Project of the Central Government for Local Science and Technology Development of Hubei Province (No. 2019ZYYD020), the Science and Technology Research Program of the Hubei Provincial Department of Education (No. T201805), two Projects of the Hubei University of Technology Ph.D. Research Startup Fund (No. BSQD2020015 and No. BSQD2020014).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This article is subject to no conflict of interest with any individual or organization.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, M., Lin, K., Huo, W. et al. Feature enhancement modules applied to a feature pyramid network for object detection. Pattern Anal Applic 26, 617–629 (2023). https://doi.org/10.1007/s10044-023-01152-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01152-0