Skip to main content
Log in

Understanding holistic human pose using class-specific convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a method to capture human pose from individual real-world RGB images using a deep learning technique. The current works on estimating human pose by deep learning are designed in a detection or a regression framework, and in a part-based manner. As a new perspective, we introduce a classification scheme for this problem, which reasons the pose holistically. To the best of our knowledge, this is the first work for holistic human pose classification task that owes its feasibility to the great power of convolutional neural networks in feature learning. After training a convolutional neural network to classify the input image to one of the KeyPoses, the final pose is computed as a linear combination of several KeyPoses. In this new holistic classification attitude, the vast and high degree of freedom human pose space is divided into a finite number of subspaces and the convolutional neural network shows promising results in learning the features of each subspace. Empirical results (PCP and PCK rates) demonstrate that the proposed scheme is successfully able to understand human pose (i.e., predict a valid, true and coarse pose) in real-world unconstrained images with challenges like severe occlusion, high articulation, low quality and cluttered background. Furthermore, using the proposed method, the need for defining a complex model (such as appearance model or joints pairwise relations) is relieved. We have also verified a potential application of our proposed method in semantic image retrieval based on human pose.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 22(10):1533–1545

    Article  Google Scholar 

  2. Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3D human pose estimation. In: British machine vision conference (BMVC)

  3. Andriluka M, Pishchulin L, Gehle P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern recognition (CVPR). http://human-pose.mpi-inf.mpg.de/

  4. Belagiannis V, Amann C, Navab N, Ilic S (2014) Holistic Human Pose Estimation with Regression Forests. In: Articulated Motion and Deformable Objects (AMDO)

  5. Belagiannis V, Rupprecht C, Carneiro G (2015) Robust optimization for deep regression. In: International Conference on Computer Vision (ICCV) https://doi.org/10.1109/ICCV.2015.324

  6. Berg A, Deng J, Fei-Fei L (2010) Large Scale Visual Recognition Challenge. http://www.image-net.org/challenges/LSVRC

  7. Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3D human pose annotations. In: International Conference on Computer Vision (ICCV)

  8. Butepage J, Black MJ, Kragic D, Kjellström H (2017) Deep representation learning for human motion prediction and classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  9. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  10. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: IEEE conference on computer vision and pattern recognition (CVPR)

  11. Chao Y-W, Yang J, Price B, Cohen S, Deng J (2017) Forecasting human dynamics from static images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  12. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional networks. In: British Machine Vision Conference (BMVC)

  13. Chen X, Yuille A (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. Advances in neural information processing systems (NIPS)

  14. Chen Y, Yang X, Zhong B, Pan S, Chen D, Zhang H (2016) CNNTracker: online discriminative object tracking via deep convolutional neural network. Appl Soft Comput 38:1088–1098

    Article  Google Scholar 

  15. Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR)

  16. Cui J, Liu Y, Xu Y (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43(4):996–1002. https://doi.org/10.1109/TSMCA.2012.2223670

    Article  Google Scholar 

  17. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition (CVPR). Pp 886-893. https://doi.org/10.1109/CVPR.2005.177

  18. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  19. Eichner M, Ferrari V (2009) Better appearance models for pictorial structures. In: Proceedings of the British Machine Vision Conference (BMVC). https://doi.org/10.5244/C.23.3

  20. Eigen D, Krishnan D, Fergus R (2013) Restoring an image taken through a window covered with dirt or rain. In: international conference on computer vision (ICCV)

  21. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial Structures for Object Recognition. International Journal of Computer Vision (IJCV)

  22. Felzenszwalb PF, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: IEEE conference on computer vision and pattern recognition (CVPR)

  23. Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2008.4587468

  24. Ferrari V, Marin-Jiminez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: Proceedings of the IEEE Conferencen Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2008.4587468

  25. Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2015) DeepProposal: hunting objects by cascading deep convolutional layers. In: international conference on computer vision (ICCV)

  26. Girshick R (2015) Fast R-CNN. In: International Conference on Computer Vision (ICCV)

  27. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR)

  28. Gkioxari G, Hariharan B, Girshick R, Malik J (2014) R-CNNs for pose estimation and action detection. arXiv:1406.5212

  29. Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: IEEE conference on computer vision and pattern recognition (CVPR)

  30. He T, Mao H, Yi Z (2016) Moving object recognition using multi-view three-dimensional convolutional neural networks. Neural Comput & Applic. https://doi.org/10.1007/s00521-016-2277-9

  31. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision (ECCV) Cham. Springer international publishing, pp 34-50. https://doi.org/10.1007/978-3-319-46466-4_3

  32. Ionescu C, Li F, Sminchisescu C (2011) Latent structured models for human pose estimation. In: International conference on computer vision (ICCV). pp 2220–2227

  33. Iqbal U, Milan A, Gall J (2017) PoseTrack: joint multi-person pose estimation and tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  34. Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning Human Pose Estimation Features with Convolutional Networks. arXiv:1312.7302

  35. Johnson S, Everingham M (2010) Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In: British Machine Vision Conference (BMVC). http://www.comp.leeds.ac.uk/mat4saj/lsp.html. https://doi.org/10.5244/C.24.12

  36. Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: IEEE conference on computer vision and pattern recognition (CVPR). http://www.comp.leeds.ac.uk/mat4saj/lspet.html

  37. Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2015) Pictorial structures on RGB-D images for human pose estimation in the operating room. In: Medical Image Computing and Computer-Assisted Intervention. pp 363–370. https://doi.org/10.1007/978-3-319-24553-9_45

  38. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Neural information processing systems (NIPS). pp 1106–1114

  39. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: the IEEE vol 11. pp 2278–2324

  40. Lifshitz I, Fetaya E, Ullman S (2016) Human pose estimation using deep consensus voting. In: European conference on computer vision (ECCV)

  41. Liu Y, Zhang X, Cui J (2010) Visual analysis of child-adult interactive behaviors in video sequences. In: International Conference on Virtual Systems and Multimedia (VSMM) https://doi.org/10.1109/VSMM.2010.5665969

  42. Ye Liu, Jinshi Cui, Zhao H (2012) Fusion of low-and high-dimensional approaches by Trackers sampling for generic human motion tracking. In: 21st International Conference on Pattern Recognition (ICPR)

  43. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. In: International conference on artificial intelligence (IJCAI), pp 1617–1623

  44. Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI Conference on Artificial Intelligence

  45. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. https://doi.org/10.1016/j.neucom.2015.08.096

    Article  Google Scholar 

  46. Lowe DG (1999) Object recognition from local scale-invariant features. In: International Conference on Computer Vision. pp 1150–1157. https://doi.org/10.1109/ICCV.1999.790410

  47. Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimedia Tools Appl 76(8):10701–10719

    Article  Google Scholar 

  48. Martinez J, Black MJ, Romero J (2017) On human motion prediction using recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  49. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. arXiv:1603.06937

  50. Ojala T, Pietikäinen M, Harwood D (1994) Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: IAPR. Int Conf Pattern Recog (ICPR):582–585

  51. Ouyang W, Chu X, Wang X (2014) Multi-source deep learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR)

  52. Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput & Applic. https://doi.org/10.1007/s00521-016-2294-8

  53. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  54. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep Face Recognition. In: British Machine Vision Conference (BMVC)

  55. Pfister T, Simonyan K, Charles J, Zisserman A (2014) Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Asian Conference on Computer Vision (ACCV)

  56. Pfister T, Charles J, Zisserman A (2015) Flowing ConvNets for human pose estimation in videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  57. Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling in: international conference on machine learning (ICML). Pp 82-90

  58. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2015) DeepCut: joint subset partition and labeling for multi person pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR)

  59. Ramanan D (2006) Learning to parse images of articulated objects. Neural information processing systems (NIPS)

  60. Rogez G, Rihan J, Ramalingam S, Orrite C, Torr PH (2008) Randomized trees for human pose detection. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 1–8

  61. Shotton J, Fitzgibbon A, Sharp T, Cook M, Finocchio M, Moore R, Kohli P, Criminisi A, Kipman A (2013) Efficient human pose estimation from single depth images. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(12):2821–2840. https://doi.org/10.1109/TPAMI.2012.241

    Article  Google Scholar 

  62. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition arXiv:1409.1556

  63. Tompson J, Jain A, LeCun Y, Bregler C (2014) Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation. Neural Information Processing Systems (NIPS)

  64. Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR)

  65. Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)

  66. Tran D, Forsyth D (2010) Improved human parsing with a full relational model. In: European conference on computer vision (ECCV) http://vision.cs.uiuc.edu/humanparse/

  67. Uijlings JRR, Ferrari V (2015) Situational object boundary detection. In: IEEE conference on computer vision and pattern recognition (CVPR)

  68. Wang Y, Tran D, Liao Z (2011) Learning hierarchical Poselets for human parsing. In: IEEE conference on computer vision and pattern recognition (CVPR). http://ieeexplore.ieee.org/abstract/document/5995519/; https://doi.org/10.1109/CVPR.2011.5995519

  69. Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: IEEE conference on computer vision and pattern recognition (CVPR)

  70. Xu L, Ren JS, Liu C, Jia J (2014) Deep Convolutional Neural Network for Image Deconvolution. In: Neural Information Processing Systems (NIPS)

  71. Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE transactions on pattern analysis and machine intelligence (PAMI)

  72. Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)

  73. Zhang K, Liu Q, Wu Y, Yang MH (2016) Robust visual tracking via convolutional networks without training. IEEE Trans Image Process 25(4):1779–1792. https://doi.org/10.1109/TIP.2016.2531283

    MathSciNet  Google Scholar 

  74. Zhou F, De la Torre F (2016) Spatio-temporal matching for human pose estimation in video. IEEE Trans Pattern Anal Mach Intell (PAMI) 38(8):1492–1504. https://doi.org/10.1109/TPAMI.2016.2526002

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hossein Ebrahimnezhad.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shamsafar, F., Ebrahimnezhad, H. Understanding holistic human pose using class-specific convolutional neural network. Multimed Tools Appl 77, 23193–23225 (2018). https://doi.org/10.1007/s11042-018-5617-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5617-1

Keywords

Navigation