research-article

Human Pose Estimation based on Attention Multi-resolution Network

Authors:
Congcong Zhang

Beijing Union University, Beijing, China

Beijing Union University, Beijing, China
View Profile

,
Ning He

Beijing Union University, Beijing, China

Beijing Union University, Beijing, China
View Profile

,
Qixiang Sun

Beijing Union University, Beijing, China

Beijing Union University, Beijing, China
View Profile

,
Xiaojie Yin

Beijing Union University, Beijing, China

Beijing Union University, Beijing, China
View Profile

,
Ke Lu

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China
View Profile

ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalAugust 2021Pages 682–687https://doi.org/10.1145/3460426.3463668

Published:01 September 2021Publication History

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 682–687

ABSTRACT

Recently, multi-resolution neural networks, which combine features of different resolutions, have achieved good results in human pose estimation tasks. In this paper, we propose an attention-mechanism-based multi-resolution network, which adds an attention mechanism to the High-Resolution Network (HRNet) to enhance the feature representation of the network. It improves the ability of networks with different resolutions to extract key features from images, and causes the output to contain more effective multi-resolution representation information, so that the corresponding point positions of human joints can be estimated more accurately. Experiments on the MPII and COCO datasets, and verification on the MPII datasets, obtained an average accuracy of 90.3% under the [email protected] evaluation standard, and good results were also achieved on the COCO dataset (with an AP of 76.5). The experimental results show that our network model is effective in improving the accuracy of key point estimation in the human pose estimation task.

References

Pishchulin L, Andriluka M, Gehler P, et al. 2013.Strong appearance and expressive spatial models for human pose estimation[C]//The IEEE International Conference on Computer Vision (ICCV). 3487--3494.Google Scholar
Yang Y, Ramanan D. 2011.Articulated pose estimation with flexible mixtures-of-parts[C]// Computer Vision & Pattern Recognition. IEEE, 1385--1392.Google Scholar
Pishchulin L, Andriluka M, Gehler P, et al. 2013.Poselet Conditioned Pictorial Structures[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 588--59.Google Scholar
Newell A, Yang K, Deng J. 2016.Stacked hourglass networks for human pose estimation[C]// The European Conference on Computer Vision (ECCV). 483--499.Google Scholar
Ke Sun, Bin Xiao, Dong Liu, et al. 2019.Deep High-Resolution Representation Learning for Human Pose Estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5693--5703.Google Scholar
Kun Zhang, Peng He, Ping Yao, Ge Chen, Chuanguang Yang, Huimin Li, Li Fu, and Tianyao Zheng. 2019.DNANet: De-Normalized Attention Based Multi-Resolution Network for Human Pose Estimation. CoRR abs/1909.05090 (2019)Google Scholar
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun. 2018.Cascaded Pyramid Network for Multi-Person Pose Estimation[J]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7103--7112.Google Scholar
Xiao B, Wu H, Wei Y. 2018.Simple Baselines for Human Pose Estimation and Tracking[J]// ECCV, 472--487.Google Scholar
Yang C, An Z, Zhu H, et al. 2020.Gated Convolutional Networks with Hybrid Connectivity for Image Classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7):12581--12588.Google ScholarCross Ref
Fu J, Liu J, Tian H, et al. 2020.Dual Attention Network for Scene Segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3146-- 3154.Google Scholar
Yousong Z, Chaoyang Z, Haiyun G, et al. 2018.Attention CoupleNet: Fully Convolutional Attention Coupling Network for Object Detection[J]. IEEE Transactions on Image Processing, 1--1.Google Scholar
Chaudhari S, Polatkan G, Ramanath R, et al. 2019.An Attentive Survey of Attention Models[J].Google Scholar
Chu X, Yang W, Ouyang W, et al. 2017.Multi-Context Attention for Human Pose Estimation[C]// The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1831--1840.Google Scholar
Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze-and-Excitation Networks[J]. IEEE Trans Pattern Anal Mach Intell. Epub 2019 Apr 29. PMID: 31034408. 2020 Aug;42(8):2011--2023.Google ScholarDigital Library
Woo S, Park J, Lee Jy, ET AL. CBAM: Convolutional Block Attention Module[J]. Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3--19.Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. 2016. Deep Residual Learning for Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770--778.Google Scholar
Andriluka M, Pishchulin L, Gehler P, et al. 2014.Human Pose Estimation: New Benchmark and State of the Art Analysis[C]//Computer Vision and Pattern Recognition (CVPR). IEEE, 3686--3693.Google Scholar
Tsungyi Lin, Michael Maire, Serge J Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. 2014.Microsoft coco: Common objects in context. European Conference on Computer Vision, 740--755.Google Scholar
Kingma D, Ba J. 2014.Adam: A Method for Stochastic Optimization[J]. Computer Science, arXiv preprint arXiv:1412.6980.Google Scholar
Tang, Wei, Pei Yu, and Ying Wu. 2018.Deeply Learned Compositional Models for Human Pose Estimation[J]. Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarDigital Library

Recommendations

Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation
Pattern Recognition and Machine Intelligence
Abstract
In this paper, we propose attention maps at various scales on multi-resolution feature extractor baseline network for human pose estimation. The baseline network captures information across various scales with the help of repeated bottom-up and ...
Read More
Lightweight Non-local High-Resolution Networks for Human Pose Estimation
Image and Graphics
Abstract
Human pose estimation is one of the fundamental tasks in computer vision, applied in areas such as motion recognition, games, and animation production. Most of the current deep network models entail deepening the number of network layers to obtain ...
Read More
Full-resolution encoder-decoder networks with multi-scale feature fusion for human pose estimation
MMAsia '20: Proceedings of the 2nd ACM International Conference on Multimedia in Asia

To achieve more accurate 2D human pose estimation, we extend the successful encoder-decoder network, simple baseline network (SBN), in three ways. To reduce the quantization errors caused by the large output stride size, two more decoder modules are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
August 2021
715 pages
ISBN:9781450384636
DOI:10.1145/3460426
General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attention mechanism
feature fusion
human pose estimation
multi-resolution networks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 119
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Human Pose Estimation based on Attention Multi-resolution Network

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Recommendations

Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation

Lightweight Non-local High-Resolution Networks for Human Pose Estimation

Full-resolution encoder-decoder networks with multi-scale feature fusion for human pose estimation