Skip to main content
Log in

PANetW: PANet with wider receptive fields for object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

PANet is widely used in various object detection tasks due to its powerful feature expression ability. However, PANet’s performance in complex scenarios is subpar, with frequent object omission or misidentification. We find that the reason for this phenomenon is that the receptive field of PANet can’t cover sufficient feature information, to deal with drastic changes of source object size. In order to solve this problem, this paper adopts dilated convolution technology and applies it to each parallel branch directly following the PANet network. This method can effectively represent the feature information of objects at different scales by integrating the information from small and large receptive fields into a new feature output. We also introduce residual structure to circumvent the network degradation caused by excessive convolutions. By combining the above methods, we build a new module named PANetW (PANet with Wider Receptive Fields). Taking YOLOX-S as the baseline, we comprehensively evaluated the proposed module PANetW on two datasets, VOC2007 and MSCOCO2017. The test results show that our PANetW achieves a high level of mean average precision (AP). On the VOC2007 dataset, the AP of our PANetW improves by 4.9% to 43.0%; on the MS COCO2017 dataset, the AP of PANetW is as high as 44.3%, far exceeding the current mainstream modules. The experimental results fully demonstrate the effectiveness of our module.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

All data generated or analysed during this study are included in this published article

Code Availability

Code is available at https://github.com/ChenRan2000/PANetW.

References

  1. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934https://doi.org/10.48550/arxiv.2004.10934

  2. Carion N, Massa F, Synnaeve G, et al (2020) End-to-end object detection with transformers. In: Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I. Springer-Verlag, p 213-229. https://doi.org/10.1007/978-3-030-58452-8_13

  3. Chen L, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  4. Chen Q, Wang Y, Yang T, et al (2021) You only look one-level feature. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13034–13043. https://doi.org/10.1109/CVPR46437.2021.01284

  5. Everingham M, Van Gool L, Williams C et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303–338. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  6. Gao Z (2023) Yoloca: Center aware yolo for dense object detection. In: Journal of Physics: Conference Series, IOP Publishing, p 012019. https://doi.org/10.1088/1742-6596/2425/1/012019

  7. Ge Z, Liu S, Wang F, et al (2021) Yolox: Exceeding yolo series in 2021. arXiv:2107.08430https://doi.org/10.48550/arXiv.2107.08430

  8. Ghiasi G, Lin T, Le Q (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7029–7038. https://doi.org/10.1109/CVPR.2019.00720

  9. Girshick R (2015) Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169

  10. Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587. https://doi.org/10.1109/CVPR.2014.81

  11. He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  12. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  13. Jocher G (2020) YOLOv5 by Ultralytics, 7.0. https://doi.org/10.5281/zenodo.3908559https://github.com/ultralytics/yolov5

  14. Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics, 8.0.0. https://github.com/ultralytics/ultralytics

  15. Karan A (2022) Has the future started? the current growth of artificial intelligence, machine learning, and deep learning. Iraqi J Comput Sci Math 3:115–123. https://doi.org/10.52866/IJCSM.2022.01.01.013

    Article  Google Scholar 

  16. Li C, Li L, Jiang H, et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv:2209.02976https://doi.org/10.48550/arXiv.2209.02976

  17. Li Y, Chen Y, Wang N, et al (2019) Scale-aware trident networks for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 6053–6062. https://doi.org/10.1109/ICCV.2019.00615

  18. Lin T, Maire M, Belongie S, et al (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, pp 740–755 https://doi.org/10.1007/978-3-319-10602-1_48

  19. Lin T, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106

  20. Liu S, Huang D, et al (2018a) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400, https://doi.org/10.1007/978-3-030-01252-6_24

  21. Liu S, Qi L, Qin H, et al (2018b) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768 https://doi.org/10.1109/CVPR.2018.00913

  22. Liu W, Anguelov D, Erhan D, et al (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  23. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690

  24. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767https://doi.org/10.48550/arXiv.1804.02767

  25. Redmon J, Divvala S, Girshick R, et al (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91

  26. Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  27. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556https://doi.org/10.48550/arXiv.1409.1556

  28. Srivastava R, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387https://doi.org/10.48550/arXiv.1505.00387

  29. Tabata A, Zimmer A, dos Santos Coelho L et al (2023) Analyzing carla ’s performance for 2d object detection and monocular depth estimation based on deep learning approaches. Expert Syst Appl 227:120200. https://doi.org/10.1016/j.eswa.2023.120200

    Article  Google Scholar 

  30. Tan M, Pang R, Le Q (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079

  31. Wang C, Bochkovskiy A, Liao H (2023a) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721

  32. Wang N, Gao Y, Chen H et al (2021) Nas-fcos: efficient search for object detection architectures. Int J Comput Vis 129:3299–3312. https://doi.org/10.1007/s11263-021-01523-2

    Article  Google Scholar 

  33. Wang X, Chen S, Wei G et al (2023) Tenet: Accurate light-field salient object detection with a transformer embedding network. Image Vis Comput 129:104595. https://doi.org/10.1016/j.imavis.2022.104595

    Article  Google Scholar 

  34. Xu S, Wang X, Lv W, et al (2022) Pp-yoloe: An evolved version of yolo. arXiv:2203.16250

  35. Yang K, Li J, Dai S et al (2023) Multiscale features integration based multiple-in-single-out network for object detection. Image Vis Comput 135:104714. https://doi.org/10.1016/j.imavis.2023.104714

    Article  Google Scholar 

  36. Zhang D, Zhang H, Tang J, et al (2020) Feature pyramid transformer. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, Springer, pp 323–339.https://doi.org/10.1007/978-3-030-58604-1_20

  37. Zhao G, Ge W, Yu Y (2021) Graphfpn: Graph feature pyramid network for object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 2743–2752. https://doi.org/10.1109/ICCV48922.2021.00276

  38. Zhou Y (2024) A yolo-nl object detector for real-time detection. Expert Syst Appl 238:122256. https://doi.org/10.1016/j.eswa.2023.122256

    Article  Google Scholar 

  39. Zoph B, Vasudevan V, Shlens J, et al (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8697–8710. https://doi.org/10.1109/CVPR.2018.00907

Download references

Acknowledgements

The authors acknowledge funding from the Research and Application of Multi-core Extreme Learning Machine Based on Low Quality Samples, Research Project of Education Department, Hunan Province, China(20A511), Central South University of Forestry and Technology Degree and Postgraduate Education Teaching Reform Project(2022JG006), Hunan Provincial Natural Science Foundation of China(2023JJ40272) and Research Foundation of Education Bureau of Hunan Province, China(22B0938)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongjun Xin.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A   Abbreviations

Appendix A   Abbreviations

We list the definitions for each abbreviation in Table 9.

Table 9 Abbreviation table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Xin, D., Wang, C. et al. PANetW: PANet with wider receptive fields for object detection. Multimed Tools Appl 83, 66517–66538 (2024). https://doi.org/10.1007/s11042-024-18219-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18219-7

Keywords