Hope: heatmap and offset for pose estimation

Xiao, Jing; Li, Haichao; Qu, Guangzhuo; Fujita, Hamido; Cao, Yang; Zhu, Jia; Huang, Changqin

doi:10.1007/s12652-021-03124-w

Hope: heatmap and offset for pose estimation

Original Research
Published: 19 March 2021

Volume 13, pages 2937–2949, (2022)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Jing Xiao¹,
Haichao Li¹,
Guangzhuo Qu¹,
Hamido Fujita ORCID: orcid.org/0000-0001-5256-210X^2,3,4,
Yang Cao¹,
Jia Zhu¹ &
…
Changqin Huang⁵

978 Accesses
9 Citations
Explore all metrics

Abstract

The progress on human pose estimation by deep neural networks has been significantly advanced in recent years. However, the problem of precision loss caused by the prediction of the coordinates back to the original image has been neglected. In this paper, we propose a simple but effective method using Heatmap and Offset for Pose Estimation (HOPE). In order to solve the human pose estimation problem, firstly a general top-down method is used in HOPE to generate the human detection box based on a detector, and then the keypoints in each cropped box image are located. To alleviate the precision loss of mapping process, HOPE embeds the coordinate offset into the structure of the neural network, allowing the network to self-learn the slight offset in the mapping process in an end-to-end manner, which improves the accuracy in the current field of pose estimation. Experimental results on the multi-person pose estimation dataset MSCOCO, the single-person pose estimation dataset MPII and CrowdPose Pose Estimation dataset indicate that our method achieves state-of-the-art performance in terms of accuracy and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Alyammahi S, Bhaskar H, Ruta D, Al-Mualla M (2017) People detection and articulated pose estimation framework for crowded scenes. Knowl Based Syst 131:83–104
Article Google Scholar
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp 3686–3693. https://doi.org/10.1109/CVPR.2014.471
Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, pp 468–475. https://doi.org/10.1109/FG.2017.64
Cai Y, Wang Z, Luo Z, Yin B, Du A, Wang H, Zhou X, Zhou E, Zhang X, Sun J (2020) Learning delicate local representations for multi-person pose estimation. arXiv:200304030
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7291–7299. https://doi.org/10.1109/CVPR.2017.143
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744. https://papers.nips.cc/paper/2014/file/8b6dd7db9af49e67306feb59a8bdc52c-Paper.pdf
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2019) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. arXiv:190810357
Cho E, Kim D (2014) Accurate human pose estimation by aggregating multiple pose hypotheses using modified kernel density approximation. IEEE Signal Process Lett 22(4):445–449
Article Google Scholar
Dong R, Pan X, Li F (2019) Denseu-net-based semantic segmentation of small objects in urban remote sensing images. IEEE Access 7:65347–65356
Article Google Scholar
Duan P, Wang T, Cui M, Sang H, Sun Q (2019) Multi-person pose estimation based on a deep convolutional neural network. J Vis Commun Image Represent 63:245–252
Article Google Scholar
Ghaneizad M, Kavehvash Z, Mehrany K, Hosseini SAT (2017) A fast bottom-up approach toward three-dimensional human pose estimation using an array of cameras. Opt Lasers Eng 95:69–77
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969. https://doi.org/10.1109/TPAMI.2018.2844175
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, pp 34–50. https://doi.org/10.1007/978-3-319-46466-4_3
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Jammalamadaka N, Zisserman A, Jawahar C (2017) Human pose search using deep networks. Image Vis Comput 59:31–43
Article Google Scholar
Kang B, Nguyen TQ (2019) Random forest with learned representations for semantic segmentation. IEEE Trans Image Process 28(7):3542–3555
Article MathSciNet Google Scholar
Kuo P, Makris D, Nebel JC (2011) Integration of bottom-up/top-down approaches for 2d pose estimation using probabilistic Gaussian modelling. Comput Vis Image Underst 115(2):242–255
Article Google Scholar
Li J, Wang C, Zhu H, Mao Y, Fang HS, Lu C (2019a) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10863–10872
Li R, Liu Z, Tan J (2019b) A survey on 3d hand pose estimation: Cameras, methods, and datasets. Pattern Recogn 93:251–272
Article Google Scholar
Li R, Zou K, Wang W (2020) Application of human body gesture recognition algorithm based on deep learning in non-contact human body measurement. J Ambient Intell Humani Comput. https://doi.org/10.1007/s12652-020-01993-1
Liang G, Lan X, Wang J, Wang J, Zheng N (2017) A limb-based graphical model for human pose estimation. IEEE Trans Syst Man Cybern Syst 48(7):1080–1092
Article Google Scholar
Liang S, Sun X, Wei Y (2018) Compositional human pose regression. Comput Vis Image Underst 176:1–8
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Liu Y, Wang Q, Jiang Y, Lei Y (2014) Supervised locality discriminant manifold learning for head pose estimation. Knowl Based Syst 66:126–135
Article Google Scholar
Liu Z, Zhu J, Bu J, Chen C (2015) A survey of human pose estimation: the body parts parsing based methods. J Vis Commun Image Represent 32:10–19
Article Google Scholar
Liu Z, Li X, Luo P, Loy CC, Tang X (2017) Deep learning Markov random field for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 40(8):1814–1828
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Luo Y, Xu Z, Liu P, Du Y, Guo JM (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155
Article MathSciNet Google Scholar
MSCOCO (2015) Keypoints evaluation metric. http://mscoco.org/dataset/keypoints-eval
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: International Conference on Pattern Recognition, vol 3. IEEE, pp 850–855. https://doi.org/10.1109/ICPR.2006.479
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499. https://doi.org/10.1007/978-3-319-46484-8_29
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911. https://doi.org/10.1109/CVPR.2017.395
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99. https://doi.org/10.1109/TPAMI.2016.2577031
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Shamsafar F, Ebrahimnezhad H (2020) Uniting holistic and part-based attitudes for accurate and robust deep human pose estimation. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02347-7
Silva LJS, da Silva DLS, Raposo A, Velho L, Lopes H (2019) Tensorpose: real-time pose estimation for interactive applications. Comput Gr 85:1–14
Article Google Scholar
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Tang Y, Wang J, Wang X, Gao B, Dellandréa E, Gaizauskas R, Chen L (2017) Visual and semantic knowledge transfer for large scale semi-supervised object detection. IEEE Trans Pattern Anal Mach Intell 40(12):3045–3058
Article Google Scholar
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481. https://doi.org/10.1007/978-3-030-01231-1_29
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp 2403–2412
Zhang Q, Lin J, Zhuge J, Yuan W (2019) Multi-level and multi-scale deep saliency network for salient object detection. J Vis Commun Image Represent 59:415–424
Article Google Scholar
Zhang X, Chen Z, Wu QJ, Cai L, Lu D, Li X (2018) Fast semantic segmentation for scene perception. IEEE Trans Ind Inf 15(2):1183–1192
Article Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers to improve the quality of this paper. This work was partially supported by the National Natural Science Foundation of China project No. 61702126, the Natural Science Foundation of Guangdong Province project No. 2018A030313318 and the Key-Area Research and Development Program of Guangdong Province project No. 2019B111101001.

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, 510631, China
Jing Xiao, Haichao Li, Guangzhuo Qu, Yang Cao & Jia Zhu
Faculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Vietnam
Hamido Fujita
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Hamido Fujita
Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan
Hamido Fujita
College of Teacher Education, Zhejiang Normal University, Jinhua, 321004, Zhejiang, China
Changqin Huang

Authors

Jing Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Haichao Li
View author publications
You can also search for this author in PubMed Google Scholar
Guangzhuo Qu
View author publications
You can also search for this author in PubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Yang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Changqin Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamido Fujita.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, J., Li, H., Qu, G. et al. Hope: heatmap and offset for pose estimation. J Ambient Intell Human Comput 13, 2937–2949 (2022). https://doi.org/10.1007/s12652-021-03124-w

Download citation

Received: 08 November 2020
Accepted: 05 March 2021
Published: 19 March 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s12652-021-03124-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hope: heatmap and offset for pose estimation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hope: heatmap and offset for pose estimation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation