research-article

ConvPose: An efficient human pose estimation method based on ConvNeXt

Authors:

Zhisong QinAuthors Info & Claims

CSSE '22: Proceedings of the 5th International Conference on Computer Science and Software Engineering

Pages 80 - 84

https://doi.org/10.1145/3569966.3569989

Published: 20 December 2022 Publication History

Abstract

Human pose estimation methods have developed rapidly in recent years and many high precision models have emerged. However, the computational costs of these methods are often very huge, especially for transformer-based models. In this work, we propose ConvPose, an efficient human pose estimation model based on convolutional neural network architecture. ConvPose uses an efficient single branch structure, using the ConvNeXt Block as a baseline and incorporating the Coordinate Attention module. This composition not only provides better feature extraction capabilities, but also can efficiently obtain the global dependency relationships between human keypoints and scenes. The effective combination of the large convolution kernel and the attention module gives our model the ability to focus more on detailed features when oriented to complex scenes. In addition, the number of parameters and GFLOPs of our model are at a lighter level compared to current high- performance models, which offers more possibilities for deployment of the model in low-end devices. Experiments show that our model achieves 73.6AP on the MS-COCO dataset with only 6.3M parameters, which is a very competitive result.

References

[1]

Sun K, Xiao B, Liu D, Deep High-Resolution Representation Learning for Human Pose Estimation[J]. arXiv e-prints, 2019.

[2]

Vaswani A, Shazeer N, Parmar N, Attention Is All You Need[C]// arXiv. arXiv, 2017.

[3]

Yang S, Quan Z, Nie M, TransPose: Keypoint Localization via Transformer[C]// 2020.

[4]

Liu Z, Mao H, Wu C Y, A ConvNet for the 2020s[J]. arXiv e-prints, 2022.

[5]

Hou Q, Zhou D, Feng J . Coordinate Attention for Efficient Mobile Network Design[C]// 2021.

[6]

Wang Y, Li M, Cai H, Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation[J]. 2022.

[7]

Bao W, Yang Y, Liang D, Multi-Residual Module Stacked Hourglass Networks for Human Pose Estimation[J]. Journal of Beijing Institute of Technology, 2020, 29(1):10.

[8]

Yang W, Li S, Ouyang W, Learning Feature Pyramids for Human Pose Estimation[C]// IEEE Computer Society. IEEE Computer Society, 2017.

[9]

Bin Xiao, Haiping Wu, and Yichen Wei. Simple base-lines for human pose estimation and tracking. In ECCV, pages 466–481, 2018.

[10]

He K, Zhang X, Ren S, Deep Residual Learning for Image Recognition[J]. IEEE, 2016.

[11]

Jie H, Li S, Gang S, Squeeze-and-Excitation Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, PP(99).

[12]

WOO S, PARK J, LEE J Y, et al.Cbam:convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision)ECCV(, 2018:3-19.

[13]

Sandler M, Howard A, Zhu M, MobileNetV2: Inverted Residuals and Linear Bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.

[14]

Lin T Y, Maire M, Belongie S, Microsoft COCO: Common Objects in Context[J]. Springer International Publishing, 2014.

[15]

Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, pages 3686–3693, 2014.

Cited By

Chinivar SM.S. RJ.S. AK.R. V(2024)V-LTCS: Backbone exploration for Multimodal Misogynous Meme detectionNatural Language Processing Journal10.1016/j.nlp.2024.100109(100109)Online publication date: Oct-2024
https://doi.org/10.1016/j.nlp.2024.100109
Liu ZChen CXu HHao SLiu L(2023)ConvCCpose: Learning Coordinate Classification Tokens for Human Pose Estimation Based on ConvNeXt2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT)10.1109/ACAIT60137.2023.10528558(391-396)Online publication date: 10-Nov-2023
https://doi.org/10.1109/ACAIT60137.2023.10528558

Index Terms

ConvPose: An efficient human pose estimation method based on ConvNeXt
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Human pose estimation via multi-layer composite models

We introduce a hierarchical part-based approach for human pose estimation in static images. Our model is a multi-layer composite of tree-structured pictorial-structure models, each modeling human pose at a different scale and with a different graphical ...
A survey of human pose estimation

Summarization of methods on human pose estimation in recent years.Conclusion of the traditional human pose estimation methods.Illustrated based on a two-stage framework.Comprehensive comparisons are given based on the open source methods. Estimating ...
Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
Computer Vision – ECCV 2022
Abstract
Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. While existing methods try to handle occlusion with pose priors/constraints, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CSSE '22: Proceedings of the 5th International Conference on Computer Science and Software Engineering

October 2022

753 pages

ISBN:9781450397780

DOI:10.1145/3569966

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Science and Technology Major Project of Guangxi Zhuang Autonomous Region Government

Conference

CSSE 2022

CSSE 2022: 2022 5th International Conference on Computer Science and Software Engineering

October 21 - 23, 2022

Guilin, China

Acceptance Rates

Overall Acceptance Rate 33 of 74 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
104
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chinivar SM.S. RJ.S. AK.R. V(2024)V-LTCS: Backbone exploration for Multimodal Misogynous Meme detectionNatural Language Processing Journal10.1016/j.nlp.2024.100109(100109)Online publication date: Oct-2024
https://doi.org/10.1016/j.nlp.2024.100109
Liu ZChen CXu HHao SLiu L(2023)ConvCCpose: Learning Coordinate Classification Tokens for Human Pose Estimation Based on ConvNeXt2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT)10.1109/ACAIT60137.2023.10528558(391-396)Online publication date: 10-Nov-2023
https://doi.org/10.1109/ACAIT60137.2023.10528558

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten