Abstract
PANet is widely used in various object detection tasks due to its powerful feature expression ability. However, PANet’s performance in complex scenarios is subpar, with frequent object omission or misidentification. We find that the reason for this phenomenon is that the receptive field of PANet can’t cover sufficient feature information, to deal with drastic changes of source object size. In order to solve this problem, this paper adopts dilated convolution technology and applies it to each parallel branch directly following the PANet network. This method can effectively represent the feature information of objects at different scales by integrating the information from small and large receptive fields into a new feature output. We also introduce residual structure to circumvent the network degradation caused by excessive convolutions. By combining the above methods, we build a new module named PANetW (PANet with Wider Receptive Fields). Taking YOLOX-S as the baseline, we comprehensively evaluated the proposed module PANetW on two datasets, VOC2007 and MSCOCO2017. The test results show that our PANetW achieves a high level of mean average precision (AP). On the VOC2007 dataset, the AP of our PANetW improves by 4.9% to 43.0%; on the MS COCO2017 dataset, the AP of PANetW is as high as 44.3%, far exceeding the current mainstream modules. The experimental results fully demonstrate the effectiveness of our module.








Similar content being viewed by others
Availability of data and materials
All data generated or analysed during this study are included in this published article
Code Availability
Code is available at https://github.com/ChenRan2000/PANetW.
References
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934https://doi.org/10.48550/arxiv.2004.10934
Carion N, Massa F, Synnaeve G, et al (2020) End-to-end object detection with transformers. In: Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I. Springer-Verlag, p 213-229. https://doi.org/10.1007/978-3-030-58452-8_13
Chen L, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Chen Q, Wang Y, Yang T, et al (2021) You only look one-level feature. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13034–13043. https://doi.org/10.1109/CVPR46437.2021.01284
Everingham M, Van Gool L, Williams C et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303–338. https://doi.org/10.1007/s11263-009-0275-4
Gao Z (2023) Yoloca: Center aware yolo for dense object detection. In: Journal of Physics: Conference Series, IOP Publishing, p 012019. https://doi.org/10.1088/1742-6596/2425/1/012019
Ge Z, Liu S, Wang F, et al (2021) Yolox: Exceeding yolo series in 2021. arXiv:2107.08430https://doi.org/10.48550/arXiv.2107.08430
Ghiasi G, Lin T, Le Q (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7029–7038. https://doi.org/10.1109/CVPR.2019.00720
Girshick R (2015) Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587. https://doi.org/10.1109/CVPR.2014.81
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Jocher G (2020) YOLOv5 by Ultralytics, 7.0. https://doi.org/10.5281/zenodo.3908559https://github.com/ultralytics/yolov5
Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics, 8.0.0. https://github.com/ultralytics/ultralytics
Karan A (2022) Has the future started? the current growth of artificial intelligence, machine learning, and deep learning. Iraqi J Comput Sci Math 3:115–123. https://doi.org/10.52866/IJCSM.2022.01.01.013
Li C, Li L, Jiang H, et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv:2209.02976https://doi.org/10.48550/arXiv.2209.02976
Li Y, Chen Y, Wang N, et al (2019) Scale-aware trident networks for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 6053–6062. https://doi.org/10.1109/ICCV.2019.00615
Lin T, Maire M, Belongie S, et al (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, pp 740–755 https://doi.org/10.1007/978-3-319-10602-1_48
Lin T, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Liu S, Huang D, et al (2018a) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400, https://doi.org/10.1007/978-3-030-01252-6_24
Liu S, Qi L, Qin H, et al (2018b) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768 https://doi.org/10.1109/CVPR.2018.00913
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767https://doi.org/10.48550/arXiv.1804.02767
Redmon J, Divvala S, Girshick R, et al (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556https://doi.org/10.48550/arXiv.1409.1556
Srivastava R, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387https://doi.org/10.48550/arXiv.1505.00387
Tabata A, Zimmer A, dos Santos Coelho L et al (2023) Analyzing carla ’s performance for 2d object detection and monocular depth estimation based on deep learning approaches. Expert Syst Appl 227:120200. https://doi.org/10.1016/j.eswa.2023.120200
Tan M, Pang R, Le Q (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079
Wang C, Bochkovskiy A, Liao H (2023a) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721
Wang N, Gao Y, Chen H et al (2021) Nas-fcos: efficient search for object detection architectures. Int J Comput Vis 129:3299–3312. https://doi.org/10.1007/s11263-021-01523-2
Wang X, Chen S, Wei G et al (2023) Tenet: Accurate light-field salient object detection with a transformer embedding network. Image Vis Comput 129:104595. https://doi.org/10.1016/j.imavis.2022.104595
Xu S, Wang X, Lv W, et al (2022) Pp-yoloe: An evolved version of yolo. arXiv:2203.16250
Yang K, Li J, Dai S et al (2023) Multiscale features integration based multiple-in-single-out network for object detection. Image Vis Comput 135:104714. https://doi.org/10.1016/j.imavis.2023.104714
Zhang D, Zhang H, Tang J, et al (2020) Feature pyramid transformer. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, Springer, pp 323–339.https://doi.org/10.1007/978-3-030-58604-1_20
Zhao G, Ge W, Yu Y (2021) Graphfpn: Graph feature pyramid network for object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 2743–2752. https://doi.org/10.1109/ICCV48922.2021.00276
Zhou Y (2024) A yolo-nl object detector for real-time detection. Expert Syst Appl 238:122256. https://doi.org/10.1016/j.eswa.2023.122256
Zoph B, Vasudevan V, Shlens J, et al (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8697–8710. https://doi.org/10.1109/CVPR.2018.00907
Acknowledgements
The authors acknowledge funding from the Research and Application of Multi-core Extreme Learning Machine Based on Low Quality Samples, Research Project of Education Department, Hunan Province, China(20A511), Central South University of Forestry and Technology Degree and Postgraduate Education Teaching Reform Project(2022JG006), Hunan Provincial Natural Science Foundation of China(2023JJ40272) and Research Foundation of Education Bureau of Hunan Province, China(22B0938)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A Abbreviations
Appendix A Abbreviations
We list the definitions for each abbreviation in Table 9.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, R., Xin, D., Wang, C. et al. PANetW: PANet with wider receptive fields for object detection. Multimed Tools Appl 83, 66517–66538 (2024). https://doi.org/10.1007/s11042-024-18219-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18219-7