Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision

Véges, Márton; Lőrincz, András

doi:10.1007/978-3-030-61609-0_21

Márton Véges¹¹ &
András Lőrincz¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12396))

Included in the following conference series:

International Conference on Artificial Neural Networks

3552 Accesses

Abstract

In 3D human pose estimation one of the biggest problems is the lack of large, diverse datasets. This is especially true for multi-person 3D pose estimation, where, to our knowledge, there are only machine generated annotations available for training. To mitigate this issue, we introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion. Due to the existence of cheap sensors, videos with depth maps are widely available, and our method can exploit a large, unannotated dataset. Our algorithm is a monocular, multi-person, absolute pose estimator. We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates. Also, our model achieves state-of-the-art results on the MuPoTS-3D dataset by a considerable margin. Our code will be publicly available (https://github.com/vegesm/wdspose).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cross-View Self-fusion for Self-supervised 3D Human Pose Estimation in the Wild

Unsupervised Cross-Modal Alignment for Multi-person 3D Pose Estimation

HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint, arXiv:1607.06450 (2016)
Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: CVPR, pp. 1669–1676, June 2014
Google Scholar
Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 678–694. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_41
Chapter Google Scholar
Dabral, R., Gundavarapu, N.B., Mitra, R., Sharma, A., Ramakrishnan, G., Jain, A.: Multi-person 3D human pose estimation from monocular images. In: 3DV (2019)
Google Scholar
Drover, D., Mv, R., Chen, C.H., Agrawal, A., Tyagi, A., Huynh, C.P.: Can 3D pose be learned from 2D projections alone? In: ECCV Workshops, pp. 78–94 (2019)
Google Scholar
Fang, H.S., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: AAAI (2018)
Google Scholar
Firman, M.: RGBD datasets: past, present and future. In: CVPR Workshops, June 2016
Google Scholar
Geman, S., McClure, D.E.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. 52(4), 5–21 (1987)
MathSciNet Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV, October 2017
Google Scholar
Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3D pose estimation (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)
Google Scholar
Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015)
Google Scholar
Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3D human pose using multi-view geometry. In: CVPR, June 2019
Google Scholar
Li, C., Lee, G.H.: Generating multiple hypotheses for 3D human pose estimation with mixture density network. In: CVPR, June 2019
Google Scholar
Li, S., Chan, A.B.: 3D human pose estimation from monocular images with deep convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 332–347. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16808-1_23
Chapter Google Scholar
Li, Z., Snavely, N.: Megadepth: learning single-view depth prediction from internet photos. In: CVPR (2018)
Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV, pp. 2659–2668 (2017)
Google Scholar
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV, pp. 506–516 (2017)
Google Scholar
Mehta, D., et al.: Single-shot multi-person 3D pose estimation from monocular RGB. In: 3DV, September 2018
Google Scholar
Moon, G., Chang, J.Y., Lee, K.M.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV, October 2019
Google Scholar
Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3D human pose estimation. In: CVPR (2018)
Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR, pp. 1263–1272 (2017)
Google Scholar
Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_41
Chapter Google Scholar
Rhodin, H., Salzmann, M., Fua, P.: Unsupervised geometry-aware representation for 3D human pose estimation. In: ECCV (2018)
Google Scholar
Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net: localization-classification-regression for human pose. In: CVPR, pp. 1216–1224, July 2017
Google Scholar
Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. TPAMI 42(5), 1146–1161 (2019)
Google Scholar
Sigal, L., Balan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV 87(1), 4 (2009)
Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)
Google Scholar
Sun, X., Xiao, B., Liang, S., Wei, Y.: Integral human pose regression. In: ECCV, pp. 529–545, September 2018
Google Scholar
Véges, M., Lőrincz, A.: Absolute human pose estimation with depth prediction network. In: IJCNN, pp. 1–7, July 2019
Google Scholar
Véges, M., Varga, V., Lőrincz, A.: 3D human pose estimation with siamese equivariant embedding. Neurocomputing 339, 194–201 (2019)
Article Google Scholar
Wan, C., Probst, T., Gool, L.V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: CVPR, pp. 10853–10862 (2019)
Google Scholar
Wang, L., et al.: Generalizing monocular 3D human pose estimation in the wild. In: ICCV Workshops, October 2019
Google Scholar
Wasenmüller, O., Stricker, D.: Comparison of Kinect V1 and V2 depth images in terms of accuracy and precision. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10117, pp. 34–45. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54427-4_3
Chapter Google Scholar
Yuan, S., Stenger, B., Kim, T.K.: RGB-based 3D hand pose estimation via privileged learning with depth images (2018)
Google Scholar
Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3D pose and shape estimation of multiple people in natural scenes the importance of multiple scene constraints. In: CVPR, pp. 2148–2157 (2018)
Google Scholar
Zanfir, A., Marinoiu, E., Zanfir, M., Popa, A.I., Sminchisescu, C.: Deep network for the integrated 3D sensing of multiple people in natural images. In: NIPS, pp. 8410–8419 (2018)
Google Scholar
Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: CVPR, pp. 4447–4455 (2015)
Google Scholar
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV, pp. 398–407, October 2017
Google Scholar

Download references

Acknowledgment

MV received support from the European Union and co-financed by the European Social Fund (EFOP-3.6.3-16-2017-00002). AL was supported by the National Research, Development and Innovation Fund of Hungary via the Thematic Excellence Programme funding scheme under Project no. ED_18-1-2019-0030 titled Application-specific highly reliable IT solutions.

Author information

Authors and Affiliations

Eotvos Lorand University, Budapest, Hungary
Márton Véges & András Lőrincz

Authors

Márton Véges
View author publications
You can also search for this author in PubMed Google Scholar
András Lőrincz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Márton Véges .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Véges, M., Lőrincz, A. (2020). Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12396. Springer, Cham. https://doi.org/10.1007/978-3-030-61609-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-61609-0_21
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61608-3
Online ISBN: 978-3-030-61609-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-View Self-fusion for Self-supervised 3D Human Pose Estimation in the Wild

Unsupervised Cross-Modal Alignment for Multi-person 3D Pose Estimation

HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-View Self-fusion for Self-supervised 3D Human Pose Estimation in the Wild

Unsupervised Cross-Modal Alignment for Multi-person 3D Pose Estimation

HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation