Skip to main content
Log in

Uniting holistic and part-based attitudes for accurate and robust deep human pose estimation

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Deep learning has been utilized in many intelligent systems, including computer vision techniques. Human pose estimation is one of the popular tasks in computer vision that has benefited from modern feature learning strategies. In this regard, recent advances propose part-based approaches since pose estimation based on parts can produce more accurate results than when the human shape is considered holistically as one unbreakable, but deformable object. However, in real-word scenarios, problems like occlusion and cluttered background make difficulties in part-based methods. In this paper, we propose to unite the two attitudes of the part-based and the holistic pose predictions to make more accurate and more robust estimations. These two schemes are modeled using convolutional neural networks as regression and classification tasks in order, and are combined in three frameworks: multitasking, series, and parallel. Each of these settings has its own advantages, and the experimental results on the LSP test set demonstrate that it is essential to observe subjects, both based on parts and holistically in order to achieve more accurate and more robust estimation of human pose in challenging scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Agarwal A, Triggs B (2006) Recovering 3D human pose from monocular images. IEEE Trans Pattern Anal Mach Intell 28(1):44–58. https://doi.org/10.1109/TPAMI.2006.21

    Article  Google Scholar 

  • Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern, pp 3686–3693. https://doi.org/10.1109/CVPR.2014.471

  • Belagiannis V, Rupprecht C, Carneiro G, Navab N (2015) Robust optimization for deep regression. In: International conference on computer vision, pp 2830–2838. https://doi.org/10.1109/ICCV.2015.324

  • Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: IEEE international conference on automatic face and gesture recognition, pp 468–475. https://doi.org/10.1109/FG.2017.64

  • Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: IEEE conference on computer vision and pattern, pp 4733–4742. https://doi.org/10.1109/CVPR.2016.512

  • Chen X, Yuille A (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744

  • Chu X, Ouyang W, Li H, Wang X (2016) Structured feature learning for pose estimation. In: IEEE conference on computer vision and pattern, vol 2016-Dec, pp 4715–4723. https://doi.org/10.1109/CVPR.2016.510. arXiv:1603.09065

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Conf Comput Vis Pattern 1:886–893

    Google Scholar 

  • Dolláar P, Zitnick CL (2015) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570. https://doi.org/10.1109/TPAMI.2014.2377715

    Article  Google Scholar 

  • Eichner M, Ferrari V (2009) better appearance models for pictorial structures. In: British machine vision conference, pp 3.1–3.11. DOIurlhttps://doi.org/10.5244/C.23.3.arXiv:1504.08083

  • Eichner M, Ferrari V (2012) Appearance sharing for collective human pose estimation. In: Asian conference on computer vision. Springer, Berlin, pp 138–151

  • Fan X, Zheng K, Lin Y, Song W (2015) Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: IEEE conference on computer vision and pattern, pp 1347–1355. https://doi.org/10.1109/CVPR.2015.7298740

  • Felzenszwalb PF, Girshick RB, McAllester D (2010a) Cascade object detection with deformable part models. In: IEEE conference on computer vision and pattern, pp 2241–2248. https://doi.org/10.1109/CVPR.2010.5539906

  • Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79

    Article  Google Scholar 

  • Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010b) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  • Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multi-scale, deformable part model. In: IEEE conference on computer vision and pattern, pp 1–8

  • Gavrila DM (2007) A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE Trans Pattern Anal Mach Intell 29(8):1408–1421. https://doi.org/10.1109/TPAMI.2007.1062

    Article  Google Scholar 

  • Hernández-Vela A, Sclaroff S, Escalera S (2016) Poselet-based contextual rescoring for human pose estimation via pictorial structures. Int J Comput Vis 118(1):49–64

    Article  MathSciNet  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  • Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2014) Learning human pose estimation features with convolutional networks. In: International conference on learning representations. arXiv:1312.7302

  • Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: British machine vision conference, pp 12.1–12.11. https://doi.org/10.5244/C.24.12

  • Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: IEEE conference on computer vision and pattern, pp 1465–1472. https://doi.org/10.1109/CVPR.2011.5995318

  • Kiefel M, Gehler PV (2014) Human pose estimation with fields of parts. In: European conference on computer vision, pp 331–346

  • Kokkinos I (2012) bounding part scores for rapid detection with deformable part models. In: European conference on computer vision, vol 7585 LNCS, pp 41–50

  • Li S, Liu ZQ, Chan AB (2015) Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. Int J Comput Vis 113(1):19–36. https://doi.org/10.1007/s11263-014-0767-8. arXiv:1406.3474

    Article  MathSciNet  Google Scholar 

  • Lifshitz I, Fetaya E, Ullman S (2016) Human pose estimation using deep consensus voting. In: European conference on computer vision, pp 246–260

  • Liu T, Liu J, Xm Luo (2014) Radio tomographic imaging based body pose sensing for fall detection. J Ambient Intell Humaniz Comput 5(6):897–907

    Article  Google Scholar 

  • Mori G, Malik J (2002) Estimating human body configurations using shape context matching. In: European conference on computer vision, pp 666–680. https://doi.org/10.1007/3-540-47977-5

  • Ojala T, Pietikainen M, Harwood D (1994) Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. Int Conf Pattern Recogn 1:582–585. https://doi.org/10.1109/ICPR.1994.576366

    Article  Google Scholar 

  • Ouyang W, Chu X, Wang X (2014) Multi-source deep learning for human pose estimation. In: IEEE conference on computer vision and pattern, pp 2329–2336

  • Pishchulin L, Andriluka M, Gehler P, Schiele B (2013a) Poselet conditioned pictorial structures. In: IEEE conference on computer vision and pattern, pp 588–595. https://doi.org/10.1109/CVPR.2013.82

  • Pishchulin L, Andriluka M, Gehler P, Schiele B (2013b) Strong appearance and expressive spatial models for human pose estimation. In: International conference on computer vision, pp 3487–3494

  • Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: IEEE conference on computer vision and pattern, pp 4929–4937. https://doi.org/10.1109/CVPR.2016.533

  • Pishchulin L, Jain A, Andriluka M, Thormählen T, Schiele B (2012) Articulated people detection and pose estimation: reshaping the future. In: IEEE Conference on computer vision and pattern, pp 3178–3185

  • Rafi U, Leibe B, Gall J, Kostrikov I (2016) An efficient convolutional network for human pose estimation. In: British machine vision conference, pp 109.1–109.11. https://doi.org/10.5244/C.30.109

  • Ramakrishna V, Munoz D, Hebert M, Andrew Bagnell J, Sheikh Y (2014) Pose machines: articulated pose estimation via inference machines. In: European conference on computer vision, vol 8690 LNCS, pp 33–47

  • Rogez G, Rihan J, Ramalingam S, Orrite C, Torr PH (2008) Randomized trees for human pose detection. In: IEEE conference on computer vision and pattern, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587617

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  • Shakhnarovich G, Viola P, Darrell T (2003) Fast pose estimation with parameter-sensitive hashing. In: International conference on computer vision, pp 750–757 vol. 2. https://doi.org/10.1109/ICCV.2003.1238424

  • Shamsafar F, Ebrahimnezhad H (2018) Understanding holistic human pose using class-specific convolutional neural network. Multimed Tools Appl 77(18):23193–23225. https://doi.org/10.1007/s11042-018-5617-1

    Article  Google Scholar 

  • Sun X, Shang J, Liang S, Wei Y (2017) Compositional human pose regression. In: International conference on computer vision, pp 2621–2630. https://doi.org/10.1109/ICCV.2017.284. arXiv:1704.00159

  • Tompson J, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807

  • Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern, pp 1653–1660. https://doi.org/10.1109/CVPR.2014.214

  • Ukita N, Uematsu Y (2018) Semi-and weakly-supervised human pose estimation. Comput Vis Image Underst 170:67–78

    Article  Google Scholar 

  • Vedaldi A, Lenc K (2015) MatConvNet: convolutional neural networks for MATLAB. In: ACM international conference on multimedia, pp 689–692. https://doi.org/10.1145/2733373.2807412. http://www.vlfeat.org/matconvnet/

  • Wang F, Li Y (2013) beyond physical connections: tree models in human pose estimation. In: IEEE conference on computer vision and pattern, pp 596–603

  • Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: IEEE conference on computer vision and pattern, pp 4724–4732. https://doi.org/10.1109/CVPR.2016.511

  • Yan C, Gong B, Wei Y, Gao Y (2020a) Deep multi-view enhancement hashing for image retrieval. IEEE Trans Pattern Anal Mach Intell 20:20

    Google Scholar 

  • Yan C, Shao B, Zhao H, Ning R, Zhang Y, Xu F (2020b) 3d room layout estimation from a single RGB image. IEEE Trans Multimed 20:20

    Google Scholar 

  • Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 32(12):2878–2890

    Article  Google Scholar 

  • Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: IEEE conference on computer vision and pattern, pp 3073–3082

  • Yu X, Zhou F, Chandraker M (2016) Deep deformation network for object landmark localization. In: European conference on computer vision, vol 9909 LNCS, pp 52–70. arXiv:1605.01014

  • Zavala-Mondragon LA, Lamichhane B, Zhang L, de Haan G (2019) CNN-skelpose: a CNN-based skeleton estimation algorithm for clinical applications. J Ambient Intell Human Comput 20:1–12

    Google Scholar 

  • Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: European conference on computer vision workshop, vol 9915 LNCS, pp 186–201. arXiv:1609.05317

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hossein Ebrahimnezhad.

Ethics declarations

Conflict of interest

Faranak Shamsafar declares that she has no conflict of interest. Hossein Ebrahimnezhad declares that he has no conflict of interest.

Funding

This research received no specific grant from any funding agency in the public, commercial, or non-profit sectors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shamsafar, F., Ebrahimnezhad, H. Uniting holistic and part-based attitudes for accurate and robust deep human pose estimation. J Ambient Intell Human Comput 12, 2339–2353 (2021). https://doi.org/10.1007/s12652-020-02347-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02347-7

Keywords

Navigation