Multiple human 3D pose estimation from multiview images

Ershadi-Nasab, Sara; Noury, Erfan; Kasaei, Shohreh; Sanaei, Esmaeil

doi:10.1007/s11042-017-5133-8

Multiple human 3D pose estimation from multiview images

Published: 04 September 2017

Volume 77, pages 15573–15601, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sara Ershadi-Nasab¹,
Erfan Noury²,
Shohreh Kasaei ORCID: orcid.org/0000-0002-3831-0878² &
…
Esmaeil Sanaei¹

1624 Accesses
64 Citations
3 Altmetric
Explore all metrics

Abstract

Multiple human 3D pose estimation is a challenging task. It is mainly because of large variations in the scale and pose of humans, fast motions, multiple persons in the scene, and arbitrary number of visible body parts due to occlusion or truncation. Some of these ambiguities can be resolved by using multiview images. This is due to the fact that more evidences of body parts would be available in multiple views. In this work, a novel method for multiple human 3D pose estimation using evidences in multiview images is proposed. The proposed method utilizes a fully connected pairwise conditional random field that contains two types of pairwise terms. The first pairwise term encodes the spatial dependencies among human body joints based on an articulated human body configuration. The second pairwise term is based on the output of a 2D deep part detector. An approximate inference is then performed using the loopy belief propagation algorithm. The proposed method is evaluated on the Campus, Shelf, Utrecht Multi-Person Motion benchmark, Human3.6M, KTH Football II, and MPII Cooking datasets. Experimental results indicate that the proposed method achieves substantial improvements over the existing state-of-the-art methods in terms of the probability of correct pose and the mean per joint position error performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://ipl.ce.sharif.edu/3D_pose.html

References

Afrouzian R, Seyedarabi H, Kasaei S (2016) Pose estimation of soccer players using multiple uncalibrated cameras. Multimed Tools Appl 75(12):6809–6827. https://doi.org/10.1007/s11042-015-2611-8
Article Google Scholar
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723. https://doi.org/10.1109/TAC.1974.1100705
Article MathSciNet MATH Google Scholar
Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: British Machine Vision Conference, vol. 2. BMVA Press
Amin S, Müller P, Bulling A, Andriluka M (2014) Test-time adaptation for 3d human pose estimation. In: German conference on pattern recognition, pp 253–264. Springer
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern recognition (CVPR)
Belagiannis V, Zisserman A (2016). Recurrent human pose estimation. arXiv:1605.02914
Belagiannis V, Amann C, Navab N, Ilic S (2014) Holistic human pose estimation with regression forests. In: Articulated motion and deformable objects, pp 20–30. Springer
Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2014) 3d pictorial structures for multiple human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1669–1676. IEEE
Belagiannis V, Wang X, Schiele B, Fua P, Ilic S, Navab N (2014) Multiple human pose estimation with temporally consistent 3D pictorial structures. In: ChaLearn looking at people workshop, European conference on computer vision (ECCV2014). IEEE
Belagiannis V, Rupprecht C, Carneiro G, Navab N (2015) Robust optimization for deep regression. In: 2015 IEEE international conference on computer vision (ICCV), pp 2830–2838. IEEE
Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2015) 3d pictorial structures revisited: Multiple human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence
Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 33(9):1806–1819
Article Google Scholar
Bishop MC (2006) Pattern Recognition and Machine Learning. Springer, Berlin
MATH Google Scholar
Bourdev L, Maji S, Brox T, Malik J (2010) Detecting people using mutually consistent poselet activations. In: Computer Vision–ECCV, pp 168–181. Springer
Burenius M, Sullivan J, Carlsson S (2013) 3d pictorial structures for multiple view articulated pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3618–3625. IEEE
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR
Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2014) Upper body pose estimation with temporal sequential forests. In: Proceedings of the British machine vision conference, pp 1–12. BMVA Press
Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2016) Personalizing human video pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3063– 3072
Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744
Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3041–3048. IEEE
Dong J, Chen Q, Xia W, Huang Z, Yan S (2013) A deformable mixture parsing model with parselets. In: IEEE international conference on computer vision (ICCV), pp 3408–3415. IEEE
Dong J, Chen Q, Shen X, Yang J, Yan S (2014) Towards unified human parsing and pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 843–850. IEEE
Felzenszwalb PF, Huttenlocher DP (2006) Efficient belief propagation for early vision. Int J Comput Vis 70(1):41–54
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Article Google Scholar
Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Fischler MA, Elschlager RA (1973) The representation and matching of pictorial structures. IEEE Trans Comput 100(1):67–92
Article Google Scholar
Holt B, Ong EJ, Cooper H, Bowden R (2011) Putting the pieces together: Connected poselets for human pose estimation. In: IEEE international conference on computer vision workshops (ICCV Workshops), pp 1196–1201. IEEE
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In: Leibe B (ed) Computer Vision – ECCV 2016, Lecture Notes in Computer Science, vol. 9910, pp. 34–50. Springer, Amsterdam, The Netherlands. https://doi.org/10.1007/978-3-319-46466-4_3
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Article Google Scholar
Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning human pose estimation features with convolutional networks. arXiv:1312.7302
Jain A, Tompson J, LeCun Y, Bregler C (2014) Modeep: A deep learning framework using motion features for human pose estimation. In: Asian conference on computer vision, pp 302–315. Springer
Jammalamadaka N, Zisserman A, Jawahar CV (2017) Human pose search using deep networks. Image Vis Comput 59:31–43. https://doi.org/10.1016/j.imavis.2016.12.002.
Article Google Scholar
Kazemi V, Sullivan J (2012) Using richer models for articulated pose estimation of footballers. In: BMVC, pp 1–10
Kazemi V, Burenius M, Azizpour H, Sullivan J (2013) Multi-view body part recognition with random forests. In: 24th British machine vision conference. British machine vision association
Kiefel M, Gehler P (2014) Human pose estimation with fields of parts. In: Computer Vision–ECCV, pp 331–346. Springer
Li S, Zhang W, Chan AB (2017) Maximum-margin structured learning with deep networks for 3d human pose estimation. Int J Comput Vis 122(1):149–168. https://doi.org/10.1007/s11263-016-0962-x
Article MathSciNet Google Scholar
Mooij JM (2010) libDAI: A free and open source C++ library for discrete approximate inference in graphical models. J. Mach Learn Res 11:2169–2173. http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf
MATH Google Scholar
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: ECCV
Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Harvesting multiple views for marker-less 3d human pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1913–1921
Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 588–595. IEEE
Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Strong appearance and expressive spatial models for human pose estimation. In: IEEE international conference on computer vision (ICCV), pp 3487–3494. IEEE
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: Joint subset partition and labeling for multi person pose estimation. In: 29th IEEE conference on computer vision and pattern recognition (CVPR 2016), pp. 4929–4937. IEEE Computer Society, Las Vegas, NV, USA. https://doi.org/10.1109/CVPR.2016.533
Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 1194–1201. IEEE
Schick A, Stiefelhagen R (2015) 3d pictorial structures for human pose estimation with supervoxels. In: 2015 IEEE winter conference on applications of computer vision (WACV), pp. 140–147. IEEE
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Article Google Scholar
Tekin B, Katircioglu I, Salzmann M, Lepetit V, Fua P (2016) Structured prediction of 3d human pose with deep neural networks. CoRR arXiv:1605.05180
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Tran D, Forsyth D (2010) Improved human parsing with a full relational model. In: Computer Vision–ECCV, pp 227–240. Springer
Van der Aa N, Luo X, Giezeman GJ, Tan RT, Veltkamp RC (2011) Umpm benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 1264–1269. IEEE
Yan C, Zhang Y, Dai F, Wang X, Li L, Dai Q (2014) Parallel deblocking filter for hevc on many-core processor. Electron Lett 50(5):367–368
Article Google Scholar
Yan C, Zhang Y, Dai F, Zhang J, Li L, Dai Q (2014) Efficient parallel hevc intra-prediction on many-core processor. Electron Lett 50(11):805–806
Article Google Scholar
Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for hevc coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576
Article Google Scholar
Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for hevc motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089
Article Google Scholar
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 1385–1392. IEEE
Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890
Article Google Scholar
Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: Computer Vision–ECCV 2016 Workshops, pp 186–201. Springer
Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4966–4975

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran
Sara Ershadi-Nasab & Esmaeil Sanaei
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Erfan Noury & Shohreh Kasaei

Authors

Sara Ershadi-Nasab
View author publications
You can also search for this author in PubMed Google Scholar
Erfan Noury
View author publications
You can also search for this author in PubMed Google Scholar
Shohreh Kasaei
View author publications
You can also search for this author in PubMed Google Scholar
Esmaeil Sanaei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shohreh Kasaei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ershadi-Nasab, S., Noury, E., Kasaei, S. et al. Multiple human 3D pose estimation from multiview images. Multimed Tools Appl 77, 15573–15601 (2018). https://doi.org/10.1007/s11042-017-5133-8

Download citation

Received: 19 January 2017
Revised: 09 July 2017
Accepted: 20 August 2017
Published: 04 September 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11042-017-5133-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple human 3D pose estimation from multiview images

Abstract

Access this article

Similar content being viewed by others

3D Human Pose Estimation Based on Multi-feature Extraction

3D human pose estimation by depth map

Unsupervised universal hierarchical multi-person 3D pose estimation for natural scenes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple human 3D pose estimation from multiview images

Abstract

Access this article

Similar content being viewed by others

3D Human Pose Estimation Based on Multi-feature Extraction

3D human pose estimation by depth map

Unsupervised universal hierarchical multi-person 3D pose estimation for natural scenes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation