ABSTRACT
Recently, multi-resolution neural networks, which combine features of different resolutions, have achieved good results in human pose estimation tasks. In this paper, we propose an attention-mechanism-based multi-resolution network, which adds an attention mechanism to the High-Resolution Network (HRNet) to enhance the feature representation of the network. It improves the ability of networks with different resolutions to extract key features from images, and causes the output to contain more effective multi-resolution representation information, so that the corresponding point positions of human joints can be estimated more accurately. Experiments on the MPII and COCO datasets, and verification on the MPII datasets, obtained an average accuracy of 90.3% under the [email protected] evaluation standard, and good results were also achieved on the COCO dataset (with an AP of 76.5). The experimental results show that our network model is effective in improving the accuracy of key point estimation in the human pose estimation task.
- Pishchulin L, Andriluka M, Gehler P, et al. 2013.Strong appearance and expressive spatial models for human pose estimation[C]//The IEEE International Conference on Computer Vision (ICCV). 3487--3494.Google Scholar
- Yang Y, Ramanan D. 2011.Articulated pose estimation with flexible mixtures-of-parts[C]// Computer Vision & Pattern Recognition. IEEE, 1385--1392.Google Scholar
- Pishchulin L, Andriluka M, Gehler P, et al. 2013.Poselet Conditioned Pictorial Structures[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 588--59.Google Scholar
- Newell A, Yang K, Deng J. 2016.Stacked hourglass networks for human pose estimation[C]// The European Conference on Computer Vision (ECCV). 483--499.Google Scholar
- Ke Sun, Bin Xiao, Dong Liu, et al. 2019.Deep High-Resolution Representation Learning for Human Pose Estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5693--5703.Google Scholar
- Kun Zhang, Peng He, Ping Yao, Ge Chen, Chuanguang Yang, Huimin Li, Li Fu, and Tianyao Zheng. 2019.DNANet: De-Normalized Attention Based Multi-Resolution Network for Human Pose Estimation. CoRR abs/1909.05090 (2019)Google Scholar
- Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun. 2018.Cascaded Pyramid Network for Multi-Person Pose Estimation[J]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7103--7112.Google Scholar
- Xiao B, Wu H, Wei Y. 2018.Simple Baselines for Human Pose Estimation and Tracking[J]// ECCV, 472--487.Google Scholar
- Yang C, An Z, Zhu H, et al. 2020.Gated Convolutional Networks with Hybrid Connectivity for Image Classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7):12581--12588.Google ScholarCross Ref
- Fu J, Liu J, Tian H, et al. 2020.Dual Attention Network for Scene Segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3146-- 3154.Google Scholar
- Yousong Z, Chaoyang Z, Haiyun G, et al. 2018.Attention CoupleNet: Fully Convolutional Attention Coupling Network for Object Detection[J]. IEEE Transactions on Image Processing, 1--1.Google Scholar
- Chaudhari S, Polatkan G, Ramanath R, et al. 2019.An Attentive Survey of Attention Models[J].Google Scholar
- Chu X, Yang W, Ouyang W, et al. 2017.Multi-Context Attention for Human Pose Estimation[C]// The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1831--1840.Google Scholar
- Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze-and-Excitation Networks[J]. IEEE Trans Pattern Anal Mach Intell. Epub 2019 Apr 29. PMID: 31034408. 2020 Aug;42(8):2011--2023.Google ScholarDigital Library
- Woo S, Park J, Lee Jy, ET AL. CBAM: Convolutional Block Attention Module[J]. Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3--19.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. 2016. Deep Residual Learning for Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770--778.Google Scholar
- Andriluka M, Pishchulin L, Gehler P, et al. 2014.Human Pose Estimation: New Benchmark and State of the Art Analysis[C]//Computer Vision and Pattern Recognition (CVPR). IEEE, 3686--3693.Google Scholar
- Tsungyi Lin, Michael Maire, Serge J Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. 2014.Microsoft coco: Common objects in context. European Conference on Computer Vision, 740--755.Google Scholar
- Kingma D, Ba J. 2014.Adam: A Method for Stochastic Optimization[J]. Computer Science, arXiv preprint arXiv:1412.6980.Google Scholar
- Tang, Wei, Pei Yu, and Ying Wu. 2018.Deeply Learned Compositional Models for Human Pose Estimation[J]. Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarDigital Library
Recommendations
Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation
Pattern Recognition and Machine IntelligenceAbstractIn this paper, we propose attention maps at various scales on multi-resolution feature extractor baseline network for human pose estimation. The baseline network captures information across various scales with the help of repeated bottom-up and ...
Lightweight Non-local High-Resolution Networks for Human Pose Estimation
Image and GraphicsAbstractHuman pose estimation is one of the fundamental tasks in computer vision, applied in areas such as motion recognition, games, and animation production. Most of the current deep network models entail deepening the number of network layers to obtain ...
Full-resolution encoder-decoder networks with multi-scale feature fusion for human pose estimation
MMAsia '20: Proceedings of the 2nd ACM International Conference on Multimedia in AsiaTo achieve more accurate 2D human pose estimation, we extend the successful encoder-decoder network, simple baseline network (SBN), in three ways. To reduce the quantization errors caused by the large output stride size, two more decoder modules are ...
Comments