Multipath affinage stacked—hourglass networks for human pose estimation

Hua, Guoguang; Li, Lihong; Liu, Shiguang

doi:10.1007/s11704-019-8266-2

Multipath affinage stacked—hourglass networks for human pose estimation

Research Article
Published: 03 January 2020

Volume 14, article number 144701, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Guoguang Hua¹,
Lihong Li¹ &
Shiguang Liu²

119 Accesses
35 Citations
Explore all metrics

Abstract

Recently, stacked hourglass network has shown outstanding performance in human pose estimation. However, repeated bottom-up and top-down stride convolution operations in deep convolutional neural networks lead to a significant decrease in the initial image resolution. In order to address this problem, we propose to incorporate affinage module and residual attention module into stacked hourglass network for human pose estimation. This paper introduces a novel network architecture to replace the stacked hourglass network of up-sampling operation for getting high-resolution features. We refer to the architecture as an affinage module which is critical to improve the performance of the stacked hourglass network. Additionally, we also propose a novel residual attention module to increase the supervision of up-sample process. The effectiveness of the introduced module is evaluated on standard benchmarks. Various experimental results demonstrated that our method can achieve more accurate and more robust human pose estimation results in images with complex background.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stacked Mixed-Scale Networks for Human Pose Estimation

Improving Human Pose Estimation Based on Stacked Hourglass Network

Article 21 March 2023

Lite Hourglass Network for Multi-person Pose Estimation

References

Chen K, Ding G, Han J. Attribute-based supervised deep learning model for action recognition. Frontiers of Computer Science, 2017, 11(2): 219–229
Article Google Scholar
Varior R R, Shuai B, Lu J. A siamese long short-term memory architecture for human re-identification. In: Proceedings of European Conference on Computer Vision. 2016, 135–153
Sapp B, Taskar B. MODEC: multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3674–3681
Felzenszwalb P, Mcallester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008
Pishchulin L, Andriluka M, Gehler P. Strong appearance and expressive spatial models for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. 2014, 3487–3494
Johnson S, Everingham M. Learning effective human pose estimation from inaccurate annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 1465–1472
Ouyang W, Chu X, Wang X. Multi-source deep learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 2329–2336
Ladicky L, Torr P H S, Zisserman A. Human pose estimation using a joint pixel-wise and part-wise formulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3578–3585
Liu S G, Li Y, Hua G. Human pose estimation in video via structured space learning and halfway temporal evaluation. IEEE Transactions on Circuits and Systems for Video Technology. 2018, 1
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1097–1105
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of International Conference on Machine Learning. 2015, 448–456
Szegedy C, Liu W, Jia Y. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1–9
Li Y, Liu S G. Temporal-coherency-aware human pose estimation in video via pre-trained res-net and flow-CNN. In: Proceedings of International Conference on Computer Animation and Social Agents. 2017, 150–159
Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference. 2010, 1–11
Andriluka M, Pishchulin L, Gehler P. 2D Human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3686–3693
Newell A, Yang K, Deng J. Stacked hourglass networks forhuman pose estimation. In: Proceedings of European Conference on Computer Vision. 2016, 483–499
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431–3440
Andriluka M, Roth S, Schiele B. Pictorial structures revisited: people detection and articulated pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1014–1021
Andriluka M, Roth S, Schiele B. Monocular 3D pose estimation and tracking by detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010, 623–630
Lopez Q, Manuel I. Mixing body-parts model for 2D human pose estimation in stereo videos. IET Computer Vision, 2017, 11(6): 426–433
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005, 886–893
Dogan E, Eren G, Wolf C. Multi-view pose estimation with mixtures-of-parts and adaptive viewpoint selection. IET Computer Vision, 2018, 12(4): 403–411
Article Google Scholar
Toshev A, Szegedy C. DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 1653–1660
Tompson J, Goroshin R, Jain A. Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 648–656
Tompson J, Jain A, LeCun Y. Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of the 28th Annual Conference on Neural Information Processing Systems. 2014, 1799–1807
Carreira J, Agrawal P, Fragkiadaki K. Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4733–4742
Wei S E, Ramakrishna V, Kanade T. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4724–4732
Cao Z, Simon T, ShihEn W. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1302–1310
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1520–1528
Rematas K, Ritschel T, Fritz M. Deep reflectance maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4508–s4516
He K M, Zhang X, Ren S. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2017–2025
Ferrari V, Marin M, Zisserman A. Progressive search space reduction for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8
Yang W, Li S, Ouyang W. Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 1281–1290
Yang Y, Ramanan D. Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2878–2890
Article Google Scholar
Yu X, Zhou F, Chandraker M. Deep deformation network for object landmark localization. In: Proceedings of European Conference on Computer Vision. 2016, 52–70
Belagiannis V, Zisserman A. Recurrent human pose estimation. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition. 2017, 468–475
Lifshitz I, Fetaya E, Ullman S. Human pose estimation using deep consensus voting. In: Proceedings of European Conference on Computer Vision. 2016, 246–260
Pishchulin L, Insafutdinov E, Tang S. Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4929–4937
Insafutdinov E, Pishchulin L, Andres B. Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 34–50
Hu P, Ramanan D. Bottom-up and top-down reasoning with hierarchical rectified gaussians. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5600–5609

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61672375 and 61170118).

Author information

Authors and Affiliations

School of Information and Electrical Engineering, Hebei University of Engineering, Handan, 056038, China
Guoguang Hua & Lihong Li
School of Computer Science and Technology, Division of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Shiguang Liu

Authors

Guoguang Hua
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Li
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiguang Liu.

Additional information

Guoguang Hua graduated from School of Information and Electrical Engineering, Hebei University of Engineering, China. His research interest is computer vision.

Lihong Li is a professor at School of Information and Electrical Engineering, Hebei University of Engineering, China. She graduated from Hebei University of Technology, China. Her research interests include image/video editing and computer vision.

Shiguang Liu is a professor at School of Computer Science and Technology, Tianjing University, China. He graduated from Zhejiang University and received a PhD from State Key Lab of CAD & CG. His research interests include modelling and simulation, realistic image synthesis, image/video editing, computer animation, and virtual reality, etc.

Electronic supplementary material