skip to main content
10.1145/3616901.3616957acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfaimlConference Proceedingsconference-collections
research-article

Multiscale Models for Real-Time Human Pose Estimation

Published: 05 March 2024 Publication History

Abstract

Although the lightweight OpenPose uses the depth-wise separable convolution to form the backbone to improve the computing efficiency of the model on the CPU, the lightweight OpenPose also has the following shortcomings. (1) The model has weak feature processing ability. (2) The model cannot identify multi-scale features well. In order to solve the above problems, a multi-scale model is proposed. We adjusted the network structure of the lightweight OpenPose model with reference to the network structure of RES2NET. The modified model not only recognizes multi-scale features but also has strong feature processing capabilities. The AP value of our model on the COCO2017 validation set is 0.412. The AP value of Multiscale Models is 0.037 higher than lightweight OpenPose on the COCO2017 validation set. Our future direction is to discover the influence of the model's ability to extract multi-scale features on the model AP value.

References

[1]
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25.
[2]
Ren S, He K, Girshick R, Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.
[3]
Selvaraju R R, Cogswell M, Das A, Grad-cam: Visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE international conference on computer vision. 2017: 618-626.
[4]
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[J]. Advances in neural information processing systems, 2014, 27.
[5]
Howard A G, Zhu M, Chen B, Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
[6]
Sandler M, Howard A, Zhu M, Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
[7]
Osokin D. Real-time 2d multi-person pose estimation on cpu: Lightweight openpose[J]. arXiv preprint arXiv:1811.12004, 2018.
[8]
Girshick R, Donahue J, Darrell T, Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
[9]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[10]
He K, Zhang X, Ren S, Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.
[11]
Lin T Y, Dollár P, Girshick R, Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.
[12]
Liu W, Anguelov D, Erhan D, Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
[13]
Chen C F, Fan Q, Mallinar N, Big-little net: An efficient multi-scale feature representation for visual and speech recognition[J]. arXiv preprint arXiv:1807.03848, 2018.
[14]
Chen Y, Fan H, Xu B, Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3435-3444.
[15]
Cheng B, Xiao R, Wang J, High frequency residual learning for multi-scale image classification[J]. arXiv preprint arXiv:1905.02649, 2019.
[16]
Sun K, Xiao B, Liu D, Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 5693-5703.
[17]
Sun K, Zhao Y, Jiang B, High-resolution representations for labeling pixels and regions[J]. arXiv preprint arXiv:1904.04514, 2019.
[18]
Chen C F, Fan Q, Mallinar N, Big-little net: An efficient multi-scale feature representation for visual and speech recognition[J]. arXiv preprint arXiv:1807.03848, 2018.
[19]
Chen Y, Fan H, Xu B, Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3435-3444.
[20]
Cheng B, Xiao R, Wang J, High frequency residual learning for multi-scale image classification[J]. arXiv preprint arXiv:1905.02649, 2019.
[21]
Sun K, Zhao Y, Jiang B, High-resolution representations for labeling pixels and regions[J]. arXiv preprint arXiv:1904.04514, 2019.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
FAIML '23: Proceedings of the 2023 International Conference on Frontiers of Artificial Intelligence and Machine Learning
April 2023
296 pages
ISBN:9798400707544
DOI:10.1145/3616901
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

FAIML 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 18
    Total Downloads
  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media