Improved YOLOX for pedestrian detection in crowded scenes

Gao, Fei; Cai, Changxin; Jia, Ruohui; Hu, Xinzhong

doi:10.1007/s11554-023-01287-7

Improved YOLOX for pedestrian detection in crowded scenes

Original Research Paper
Published: 28 February 2023

Volume 20, article number 24, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Fei Gao^1,2,
Changxin Cai^1,2,
Ruohui Jia^1,2 &
…
Xinzhong Hu^1,2

718 Accesses
3 Citations
Explore all metrics

Abstract

In recent years, object detection in computer vision has developed rapidly. However, crowded pedestrian detection in object detection remains a challenging problem, especially in one-stage detectors where improved solutions are rare. In this paper, we propose a novel crowded pedestrian detection method called YOLO-CPD which works better than other one-stage models in crowded environments. Our method primarily enhances the ability of the one-stage detector to detect multiple overlapping objects in a single area. The core of our approach is to use boxes difference to adjust the IoU value of the Non-Maximum Suppression (NMS) and to improve the Intersection over Union regression loss (IoU Loss), with an Optimised Score Module (OPSC). Compared to the baseline, YOLO-CPD can improve the Average Precision (AP) by a 5.04% increase, Recall by a 2.17% increase and the log-average Miss Rate (\(MR^{-2}\)) by a 5.12% reduction on the CrowdHuman dataset. In addition, YOLO-CPD also achieved good results in the WiderPerson dataset, demonstrating the strong generalisation capability of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic-driven multi-camera pedestrian detection

Article Open access 09 April 2022

Alejandro López-Cifuentes, Marcos Escudero-Viñolo, … Pablo Carballeira

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

Count- and Similarity-Aware R-CNN for Pedestrian Detection

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp. 5561–5569 (2017)
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European conference on computer vision, pp. 213–229. Springer (2020)
Chu, J., Guo, Z., Leng, L.: Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6, 19959–19967 (2018)
Article Google Scholar
Chu, J., Zhang, Y., Li, S., Leng, L., Miao, J.: Syncretic-nms: A merging non-maximum suppression algorithm for instance segmentation. IEEE Access 8, 114705–114714 (2020)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE (2009)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Patt. Analy. Mach. Intell. 36(8), 1532–1545 (2014)
Article Google Scholar
Everingham, M., Eslami, S., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. Intern. J. Comput. Vision 111(1), 98–136 (2015)
Article Google Scholar
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988 (2017)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Liu, S., Huang, D., Wang, Y.: Adaptive nms: Refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6459–6468 (2019)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer (2016)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved detection. arXiv preprint arXiv:1406.1134 (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, vol. 28 (2015)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666 (2019)
Rukhovich, D., Sofiiuk, K., Galeev, D., Barinova, O., Konushin, A.: Iterdet: iterative scheme for object detection in crowded environments. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pp. 344–354. Springer, Germany (2021)
Google Scholar
Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos:a simple and strong anchor-free object detector. IEEE Trans. Patt. Analy. Mach. Intell. 44(4), 1922 (2020)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, vol. 30 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774–7783 (2018)
Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., Vajda, P.: Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020)
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: Proceedings of the IEEE international conference on computer vision, pp. 82–90 (2015)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: An advanced object detection network. In: Proceedings of the 24th ACM international conference on Multimedia, pp. 516–520 (2016)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster r-cnn doing well for pedestrian detection? In: European conference on computer vision, pp. 443–457. Springer (2016)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 637–653 (2018)
Zhang, S., Xie, Y., Wan, J., Xia, H., Li, S.Z., Guo, G.: Widerperson: a diverse dataset for dense pedestrian detection in the wild. IEEE Trans. Multimed. 22(2), 380–393 (2019)
Article Google Scholar
Zhang, Y., Chu, J., Leng, L., Miao, J.: Mask-refined r-cnn: a network for refining object details in instance segmentation. Sensors 20(4), 1010 (2020)
Article Google Scholar
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12993–13000 (2020)
Zhou, C., Yuan, J.: Multi-label learning of part detectors for heavily occluded pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3486–3495 (2017)

Download references

Acknowledgements

This work was supported by Performance Analysis and Optimal Design of Networked Intelligent Systems under Multiple Communication Constraints (62173049) and supported by Collaboration and Optimization of Hybrid Multi-Intelligent Systems Based on Learning Algorithms (61772086), National Natural Science Foundation of China.

Author information

Authors and Affiliations

Electronic & Information School, Yangtze University, No. 1, South Ring Road, Jingzhou, 434023, Hubei, China
Fei Gao, Changxin Cai, Ruohui Jia & Xinzhong Hu
National Demonstration Center for Experimental Electrical & Electronic Education, Yangtze University, No. 1, South Ring Road, Jingzhou, 434023, Hubei, China
Fei Gao, Changxin Cai, Ruohui Jia & Xinzhong Hu

Authors

Fei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Changxin Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ruohui Jia
View author publications
You can also search for this author in PubMed Google Scholar
Xinzhong Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changxin Cai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, F., Cai, C., Jia, R. et al. Improved YOLOX for pedestrian detection in crowded scenes. J Real-Time Image Proc 20, 24 (2023). https://doi.org/10.1007/s11554-023-01287-7

Download citation

Received: 25 September 2022
Accepted: 01 February 2023
Published: 28 February 2023
DOI: https://doi.org/10.1007/s11554-023-01287-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved YOLOX for pedestrian detection in crowded scenes

Abstract

Access this article

Similar content being viewed by others

Semantic-driven multi-camera pedestrian detection

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

Count- and Similarity-Aware R-CNN for Pedestrian Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Semantic-driven multi-camera pedestrian detection

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

Count- and Similarity-Aware R-CNN for Pedestrian Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation