SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Zhang, Yuting; Wang, Zongyan; Li, Menglong; Gao, Pei

doi:10.1007/s11760-023-02812-8

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Original Paper
Published: 18 October 2023

Volume 18, pages 863–876, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Yuting Zhang¹,
Zongyan Wang¹,
Menglong Li¹ &
…
Pei Gao^1,2

514 Accesses
Explore all metrics

Abstract

The traditional multi-person human pose estimation method has several problems including low real-time detection effect, low recognition efficiency, and a large number of calculation parameters. In this paper, we propose a lightweight network SP-YOLO based on the YOLO-Pose algorithm for real-time human pose estimation. Firstly, a lightweight ghost spatial pyramid pooling-fast (GSPPF) module is proposed by using GhostNet instead of convolution in the CSPDarkNet backbone network to make the pooling structure do layer-by-layer maxpooling operation and improve the speed of the backbone network. The ultra-lightweight attention mechanism called the shuffle attention (SA) module, is added between layers. Secondly, the neck structure is enhanced, and a more efficient slim path aggregation network (Slim-PAN) module is proposed to optimize the input relationship in the pyramid-enhanced feature structure. The loss function is refined to assign different weights to human pose detection and keypoint detection. Finally, the SP-YOLO method is evaluated through dataset evaluation and real-time detection experiments. A comparison with several other methods on the COCO2017-Keypoints dataset demonstrates that SP-YOLO improves the average precision (AP) by 1.5% compared to the original YOLO-Pose while reducing the number of parameters by 7.9 M. In the real-time detection experiment, the proposed method proves to be effective for end-to-end real-time detection of human poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Human Pose Estimation Based on Stacked Hourglass Network

Article 21 March 2023

Human pose estimation model based on DiracNets and integral pose regression

Article 14 March 2023

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

Article 19 July 2022

Data availability

The data come from the common dataset.

References

Li, Y., Jia, S., Li, Q.: BalanceHRNet: an effective network for bottom-up human pose estimation. Neural Netw. 161, 297–305 (2023). https://doi.org/10.1016/j.neunet.2023.01.036
Article Google Scholar
Miki, D., Abe, S., Chen, S., et al.: Robust human pose estimation from distorted wide-angle images through iterative search of transformation parameters. SIViP 14, 693–700 (2020). https://doi.org/10.1007/s11760-019-01602-5
Article Google Scholar
Lu, H., Shao, X., Xiao, Y.: Pose estimation with segmentation consistency. IEEE Trans. Image Process. 22(10), 4040–4048 (2013). https://doi.org/10.1109/TIP.2013.2268975
Article MathSciNet Google Scholar
Wan, T., Luo, Y., Zhang, Z., et al.: TSNet: tree structure network for human pose estimation. SIViP 16, 551–558 (2022). https://doi.org/10.1007/s11760-021-01999-y
Article Google Scholar
Ke, X., Liu, T., Li, Z.: Human attribute recognition method based on pose estimation and multiple-feature fusion. SIViP 14, 1441–1449 (2020). https://doi.org/10.1007/s11760-020-01690-8
Article Google Scholar
Dayarathna, T., Muthukumarana, T., Rathnayaka, Y., Denman, S., de Silva, C., Pemasiri, A., Ahmedt-Aristizabal, D.: Privacy-preserving in-bed pose monitoring: a fusion and reconstruction study. Expert Syst. Appl. 213, 119139 (2023). https://doi.org/10.1016/j.eswa.2022.119139
Article Google Scholar
Gomes, M.E.N., Macêdo, D., Zanchettin, C., De-Mattos-Neto, P.S.G., Oliveira, A.: Multi-human fall detection and localization in videos. Comput. Vision Image Understand. 220, 103442 (2022). https://doi.org/10.1016/j.cviu.2022.103442
Article Google Scholar
Cao, D., Liu, W., Xing, W., et al.: Human pose estimation based on feature enhancement and multi-scale feature fusion. SIViP 17, 643–650 (2023). https://doi.org/10.1007/s11760-022-02271-7
Article Google Scholar
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In European conference on computer vision (ECCV), (2018). https://doi.org/10.48550/arXiv.1804.06208
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: The IEEE conference on computer vision and pattern recognition (CVPR), (2018). https://doi.org/10.48550/arXiv.1711.07319
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: The IEEE conference on computer vision and pattern recognition (CVPR), (2019). https://doi.org/10.48550/arXiv.1902.09212
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: HigherHRNet: scale-aware representation learning for bottom-up human pose estimation. In: The IEEE conference on computer vision and pattern recognition (CVPR), (2020). https://doi.org/10.48550/arXiv.1908.10357
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B:. DeepCut: joint subset partition and labeling for multi person pose estimation. In: The IEEE conference on computer vision and pattern recognition (CVPR), (2016), pp. 4929–4937. https://doi.org/10.1109/CVPR.2016.533
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schieke, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: The European conference on computer vision (ECCV), (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021). https://doi.org/10.1109/TPAMI.2019.2929257
Article Google Scholar
Yang, S., Feng, Z., Wang, Z., Li, Y., Zhang, S., Quan, Z., Xia, S., Yang, W.: Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention. Pattern Recogn. 136, 109232 (2023). https://doi.org/10.1016/j.patcog.2022.109232
Article Google Scholar
Xu, Y., Piao, Z., Zhang, Z., Liu, W., Gao, S.: SUNNet: a novel framework for simultaneous human parsing and pose estimation. Neurocomputing 444, 349–355 (2021). https://doi.org/10.1016/j.neucom.2020.01.123
Article Google Scholar
Lamas, A., Tabik, S., Montes, A.C., Pérez-Hernández, F., García, J., Olmos, R., Herrera, F.: Human pose estimation for mitigating false negatives in weapon detection in video-surveillance. Neurocomputing 489, 488–503 (2022). https://doi.org/10.1016/j.neucom.2021.12.059
Article Google Scholar
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, pp. 13708–13717, (2021). https://doi.org/10.1109/CVPR46437.2021.01350
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition (CVPR), (2018). https://doi.org/10.48550/arXiv.1709.01507
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: The European conference on computer vision (ECCV), (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Zhang, Q., Yang, Y.: SA-Net: shuffle attention for deep convolutional neural networks. arXiv preprint, (2021). https://doi.org/10.48550/arXiv.2102.00240
Jiang, C., Huang, K., Zhang, S., Wang, X., Xiao, J., Goulermas, Y.: Aggregated pyramid gating network for human pose estimation without pre-training. Pattern Recogn. 138, 109429 (2023). https://doi.org/10.1016/j.patcog.2023.109429
Article Google Scholar
Zhong, F., Li, M., Zhang, K., Hu, J., Liu, L.: DSPNet: a low computational-cost network for human pose estimation. Neurocomputing 423, 327–335 (2021). https://doi.org/10.1016/j.neucom.2020.11.003
Article Google Scholar
Maji, D., Nagori, S., Mathew, M., Poddar, D.: YOLO-Pose: enhancing YOLO for multi-person pose estimation using object keypoint similarity loss. arXiv preprint, (2022). https://doi.org/10.48550/arXiv.2204.06806
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: IEEE conference on computer vision and pattern recognition (CVPR), (2020). https://doi.org/10.48550/arXiv.1911.11907
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: The European conference on computer vision (ECCV), vol. 8693, pp. 740-755, (2014)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: The European conference on computer vision (ECCV), (2016). https://doi.org/10.48550/arXiv.1603.06937
Geng, Z., Sun, K., Xiao, B., Zhang, Z., Wang, J.: Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression. In: The IEEE conference on computer vision and pattern recognition (CVPR), (2021). https://doi.org/10.48550/arXiv.2104.02300
Osokin, D.: Real-time 2D multi-person pose estimation on CPU: lightweight OpenPose. arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1811.12004

Download references

Funding

The authors wish to express their gratitude to the National Ministry of Science and Technology Innovation Method Special(2020IM020700).

Author information

Authors and Affiliations

School of Mechanical Engineering, North University of China, Taiyuan, 030051, Shanxi, China
Yuting Zhang, Zongyan Wang, Menglong Li & Pei Gao
Department of Mechanical and Electrical Engineering, Shanxi Polytechnic College, Taiyuan, 030006, Shanxi, China
Pei Gao

Authors

Yuting Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zongyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Menglong Li
View author publications
You can also search for this author in PubMed Google Scholar
Pei Gao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The first draft of the manuscript was written by YZ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zongyan Wang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Wang, Z., Li, M. et al. SP-YOLO: an end-to-end lightweight network for real-time human pose estimation. SIViP 18, 863–876 (2024). https://doi.org/10.1007/s11760-023-02812-8

Download citation

Received: 17 August 2023
Revised: 20 September 2023
Accepted: 25 September 2023
Published: 18 October 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11760-023-02812-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Abstract

Access this article

Similar content being viewed by others

Improving Human Pose Estimation Based on Stacked Hourglass Network

Human pose estimation model based on DiracNets and integral pose regression

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Abstract

Access this article

Similar content being viewed by others

Improving Human Pose Estimation Based on Stacked Hourglass Network

Human pose estimation model based on DiracNets and integral pose regression

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation