PANetW: PANet with wider receptive fields for object detection

Chen, Ran; Xin, Dongjun; Wang, Chuanli; Wang, Peng; Tan, Junwen; Kang, Wenjie

doi:10.1007/s11042-024-18219-7

PANetW: PANet with wider receptive fields for object detection

Published: 24 January 2024

Volume 83, pages 66517–66538, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ran Chen¹,
Dongjun Xin¹,
Chuanli Wang¹,
Peng Wang¹,
Junwen Tan¹ &
…
Wenjie Kang^2,3

133 Accesses
1 Altmetric
Explore all metrics

Abstract

PANet is widely used in various object detection tasks due to its powerful feature expression ability. However, PANet’s performance in complex scenarios is subpar, with frequent object omission or misidentification. We find that the reason for this phenomenon is that the receptive field of PANet can’t cover sufficient feature information, to deal with drastic changes of source object size. In order to solve this problem, this paper adopts dilated convolution technology and applies it to each parallel branch directly following the PANet network. This method can effectively represent the feature information of objects at different scales by integrating the information from small and large receptive fields into a new feature output. We also introduce residual structure to circumvent the network degradation caused by excessive convolutions. By combining the above methods, we build a new module named PANetW (PANet with Wider Receptive Fields). Taking YOLOX-S as the baseline, we comprehensively evaluated the proposed module PANetW on two datasets, VOC2007 and MSCOCO2017. The test results show that our PANetW achieves a high level of mean average precision (AP). On the VOC2007 dataset, the AP of our PANetW improves by 4.9% to 43.0%; on the MS COCO2017 dataset, the AP of PANetW is as high as 44.3%, far exceeding the current mainstream modules. The experimental results fully demonstrate the effectiveness of our module.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved real-time object detection method based on YOLOv8: a refined approach

Article 22 November 2024

Refined feature enhancement network for object detection

Article Open access 09 November 2024

Enhanced Feature Fusion and Multiple Receptive Fields Object Detection

Availability of data and materials

All data generated or analysed during this study are included in this published article

Code Availability

Code is available at https://github.com/ChenRan2000/PANetW.

References

Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934 https://doi.org/10.48550/arxiv.2004.10934
Carion N, Massa F, Synnaeve G, et al (2020) End-to-end object detection with transformers. In: Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I. Springer-Verlag, p 213-229. https://doi.org/10.1007/978-3-030-58452-8_13
Chen L, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Chen Q, Wang Y, Yang T, et al (2021) You only look one-level feature. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13034–13043. https://doi.org/10.1109/CVPR46437.2021.01284
Everingham M, Van Gool L, Williams C et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303–338. https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Gao Z (2023) Yoloca: Center aware yolo for dense object detection. In: Journal of Physics: Conference Series, IOP Publishing, p 012019. https://doi.org/10.1088/1742-6596/2425/1/012019
Ge Z, Liu S, Wang F, et al (2021) Yolox: Exceeding yolo series in 2021. arXiv:2107.08430 https://doi.org/10.48550/arXiv.2107.08430
Ghiasi G, Lin T, Le Q (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7029–7038. https://doi.org/10.1109/CVPR.2019.00720
Girshick R (2015) Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587. https://doi.org/10.1109/CVPR.2014.81
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
Article Google Scholar
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Jocher G (2020) YOLOv5 by Ultralytics, 7.0. https://doi.org/10.5281/zenodo.3908559 https://github.com/ultralytics/yolov5
Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics, 8.0.0. https://github.com/ultralytics/ultralytics
Karan A (2022) Has the future started? the current growth of artificial intelligence, machine learning, and deep learning. Iraqi J Comput Sci Math 3:115–123. https://doi.org/10.52866/IJCSM.2022.01.01.013
Article Google Scholar
Li C, Li L, Jiang H, et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv:2209.02976 https://doi.org/10.48550/arXiv.2209.02976
Li Y, Chen Y, Wang N, et al (2019) Scale-aware trident networks for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 6053–6062. https://doi.org/10.1109/ICCV.2019.00615
Lin T, Maire M, Belongie S, et al (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, pp 740–755 https://doi.org/10.1007/978-3-319-10602-1_48
Lin T, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Liu S, Huang D, et al (2018a) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400, https://doi.org/10.1007/978-3-030-01252-6_24
Liu S, Qi L, Qin H, et al (2018b) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768 https://doi.org/10.1109/CVPR.2018.00913
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767 https://doi.org/10.48550/arXiv.1804.02767
Redmon J, Divvala S, Girshick R, et al (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 https://doi.org/10.48550/arXiv.1409.1556
Srivastava R, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387 https://doi.org/10.48550/arXiv.1505.00387
Tabata A, Zimmer A, dos Santos Coelho L et al (2023) Analyzing carla ’s performance for 2d object detection and monocular depth estimation based on deep learning approaches. Expert Syst Appl 227:120200. https://doi.org/10.1016/j.eswa.2023.120200
Article Google Scholar
Tan M, Pang R, Le Q (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079
Wang C, Bochkovskiy A, Liao H (2023a) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721
Wang N, Gao Y, Chen H et al (2021) Nas-fcos: efficient search for object detection architectures. Int J Comput Vis 129:3299–3312. https://doi.org/10.1007/s11263-021-01523-2
Article Google Scholar
Wang X, Chen S, Wei G et al (2023) Tenet: Accurate light-field salient object detection with a transformer embedding network. Image Vis Comput 129:104595. https://doi.org/10.1016/j.imavis.2022.104595
Article Google Scholar
Xu S, Wang X, Lv W, et al (2022) Pp-yoloe: An evolved version of yolo. arXiv:2203.16250
Yang K, Li J, Dai S et al (2023) Multiscale features integration based multiple-in-single-out network for object detection. Image Vis Comput 135:104714. https://doi.org/10.1016/j.imavis.2023.104714
Article Google Scholar
Zhang D, Zhang H, Tang J, et al (2020) Feature pyramid transformer. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, Springer, pp 323–339.https://doi.org/10.1007/978-3-030-58604-1_20
Zhao G, Ge W, Yu Y (2021) Graphfpn: Graph feature pyramid network for object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 2743–2752. https://doi.org/10.1109/ICCV48922.2021.00276
Zhou Y (2024) A yolo-nl object detector for real-time detection. Expert Syst Appl 238:122256. https://doi.org/10.1016/j.eswa.2023.122256
Article Google Scholar
Zoph B, Vasudevan V, Shlens J, et al (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8697–8710. https://doi.org/10.1109/CVPR.2018.00907

Download references

Acknowledgements

The authors acknowledge funding from the Research and Application of Multi-core Extreme Learning Machine Based on Low Quality Samples, Research Project of Education Department, Hunan Province, China(20A511), Central South University of Forestry and Technology Degree and Postgraduate Education Teaching Reform Project(2022JG006), Hunan Provincial Natural Science Foundation of China(2023JJ40272) and Research Foundation of Education Bureau of Hunan Province, China(22B0938)

Author information

Authors and Affiliations

Central South University of Forestry and Technology, 498 Shaoshan South Road, Changsha, 410004, Hunan, China
Ran Chen, Dongjun Xin, Chuanli Wang, Peng Wang & Junwen Tan
Hunan Provincial Key Laboratory of Network Investigational Technology, Hunan Police Academy, Changsha, China
Wenjie Kang
College of Systems Engineering, National University of Defense Technology, Changsha, China
Wenjie Kang

Authors

Ran Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dongjun Xin
View author publications
You can also search for this author in PubMed Google Scholar
Chuanli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junwen Tan
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongjun Xin.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Abbreviations

We list the definitions for each abbreviation in Table 9.

Table 9 Abbreviation table

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, R., Xin, D., Wang, C. et al. PANetW: PANet with wider receptive fields for object detection. Multimed Tools Appl 83, 66517–66538 (2024). https://doi.org/10.1007/s11042-024-18219-7

Download citation

Received: 12 September 2023
Revised: 25 December 2023
Accepted: 08 January 2024
Published: 24 January 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11042-024-18219-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PANetW: PANet with wider receptive fields for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved real-time object detection method based on YOLOv8: a refined approach

Refined feature enhancement network for object detection

Enhanced Feature Fusion and Multiple Receptive Fields Object Detection

Availability of data and materials

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix A Abbreviations

Appendix A Abbreviations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now