Skip to main content

MHPro: Multi-hypothesis Probabilistic Modeling for Human Mesh Recovery

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

  • 1308 Accesses

Abstract

Recovering 3D human meshes from monocular images is an inherently ambiguous and challenging task due to depth ambiguity, joint occlusion and truncation. However, most recent works avoid modeling uncertainty, typically obtaining a single reconstruction for a given input. In contrast, this paper presents the ambiguity of reception reconstruction and considers the problem as an inverse problem for which multiple feasible solutions exist. Our method, MHPro, first constructs a probability distribution and obtains a set of feasible recovery results (i.e. multi-hypotheses), from monocular images. Intra-hypothesis refinement is then performed to achieve independent feature enhancement. Finally, the multi-hypothesis features are aggregated by inter-hypothesis communication to recover the final 3D human mesh. The effectiveness of our method is validated on two benchmark datasets, Human3.6M and 3DPW, where experimental results show that our method achieves state-of-the-art performance and recovers more accurate human meshes. Our results validate the importance of intra-hypothesis refinement and inter-hypothesis communication in probabilistic modeling and show optimal performance across a variety of settings. Our source code will be available at http://cic.tju.edu.cn/faculty/likun/projects/MHPro.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Duan, H., Zhao, Y., Chen, K., et al.: Revisiting skeleton-based action recognition. In: CVPR, pp. 2969–2978 (2022)

    Google Scholar 

  2. Liu, Y., Sivaparthipan, C.B., Shankar, A.: Human-computer Interaction Based Visual Feedback System for Augmentative and Alternative Communication. Int. J. Speech Technol. 25, 305–314 (2022). https://doi.org/10.1007/s10772-021-09901-4

    Article  Google Scholar 

  3. Weng, C.Y., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3D character animation from a single photo. In: CVPR, pp. 5908–5917 (2019)

    Google Scholar 

  4. Khirodkar, R., Tripathi, S., Kitani, K.: Occluded human mesh recovery. In: CVPR, pp. 1715–1725 (2022)

    Google Scholar 

  5. Zheng, C., Wu, W., Yang, T., et al.: Deep learning-based human pose estimation: a survey. ArXiv:2012.13392 (2020)

  6. Tian, Y., Zhang, H., Liu, Y., et al.: Recovering 3D human mesh from monocular images: a survey. ArXiv:2203.01923 (2022)

  7. Loper, M., Mahmood, N., Romero, J., et al.: SMPL: a skinned multi-person linear model. TOG 34(6), 1–16 (2015)

    Article  Google Scholar 

  8. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  9. Lassner, C., Romero, J., Kiefel, M., et al.: Unite the people: closing the loop between 3D and 2D human representations. In: CVPR, pp. 6050–6059 (2017)

    Google Scholar 

  10. Song, J., Chen, X., Hilliges, O.: Human body model fitting by learned gradient descent. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 744–760. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_44

    Chapter  Google Scholar 

  11. Kanazawa, A., Black, M.J., Jacobs, D.W., et al.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)

    Google Scholar 

  12. Pavlakos, G., Zhu, L., Zhou, X., et al.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR, pp. 459–468 (2018)

    Google Scholar 

  13. Kolotouros, N., Pavlakos, G., Black, M.J., et al.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV, pp. 2252–2261 (2019)

    Google Scholar 

  14. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR, pp. 4501–4510 (2019)

    Google Scholar 

  15. Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 752–768. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_44

    Chapter  Google Scholar 

  16. Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: CVPR, pp. 5253–5263 (2020)

    Google Scholar 

  17. Jiang, W., Kolotouros, N., Pavlakos, G., et al.: Coherent reconstruction of multiple humans from a single image. In: CVPR, pp. 5579–5588 (2020)

    Google Scholar 

  18. Lee, G.H., Lee, S.W.: Uncertainty-aware human mesh recovery from video by learning part-based 3D dynamics. In: ICCV, pp. 12375–12384 (2021)

    Google Scholar 

  19. Zhang, H., Tian, Y., Zhou, X., et al.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: ICCV, pp. 11446–11456 (2021)

    Google Scholar 

  20. Wan, Z., Li, Z., Tian, M., et al.: Encoder-decoder with multi-level attention for 3D human shape and pose estimation. In: ICCV, pp. 13033–13042 (2021)

    Google Scholar 

  21. Kocabas, M., Huang, C.H.P., Hilliges, O., et al.: PARE: part attention regressor for 3D human body estimation. In: ICCV, pp. 11127–11137 (2021)

    Google Scholar 

  22. Li, C., Lee, G.H.: Generating multiple hypotheses for 3D human pose estimation with mixture density network. In: CVPR, pp. 9887–9895 (2019)

    Google Scholar 

  23. Li, C., Lee, G.H.: Weakly supervised generative network for multiple 3D human pose hypotheses. ArXiv:2008.05770 (2020)

  24. Biggs, B., Novotny, D., Ehrhardt, S., et al.: 3D multi-bodies: fitting sets of plausible 3D human models to ambiguous image data. In: NIPS, vol. 33, pp. 20496–20507 (2020)

    Google Scholar 

  25. Oikarinen, T., Hannah, D., Kazerounian, S.: GraphMDN: leveraging graph structure and deep learning to solve inverse problems. In: IJCNN, pp. 1–9 (2021)

    Google Scholar 

  26. Wehrbein, T., Rudolph, M., Rosenhahn, B., et al.: Probabilistic monocular 3D human pose estimation with normalizing flows. In: ICCV, pp. 11199–11208 (2021)

    Google Scholar 

  27. Kolotouros, N., Pavlakos, G., Jayaraman, D., et al.: Probabilistic modeling for human mesh recovery. In: ICCV, pp. 11605–11614 (2021)

    Google Scholar 

  28. Sengupta, A., Budvytis, I., Cipolla, R.: Hierarchical kinematic probability distributions for 3D human shape and pose estimation from images in the wild. In: ICCV, pp. 11219–11229 (2021)

    Google Scholar 

  29. Li, W., Liu, H., Tang, H., et al.: MHFormer: multi-hypothesis transformer for 3D human pose estimation. ArXiv:2111.12707 (2021)

  30. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS, vol. 30 (2017)

    Google Scholar 

  31. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. ArXiv:2010.11929 (2020)

  32. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: CVPR, pp. 1954–1963 (2021)

    Google Scholar 

  33. Jiang, Y., Chang, S., Wang, Z.: TransGAN: two pure transformers can make one strong GAN, and that can scale up. In: NIPS, vol. 34 (2021)

    Google Scholar 

  34. Chen, H., Wang, Y., Guo, T., et al.: Pre-trained image processing transformer. In: CVPR, pp. 12299–12310 (2021)

    Google Scholar 

  35. Dai, Z., Cai, B., Lin, Y., et al.: UP-DETR: unsupervised pre-training for object detection with transformers. In: CVPR, pp. 1601–1610 (2021)

    Google Scholar 

  36. Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31

    Chapter  Google Scholar 

  37. Zhou, Y., Barnes, C., Lu, J., et al.: On the continuity of rotation representations in neural networks. In: CVPR, pp. 5745–5753 (2019)

    Google Scholar 

  38. Chen, C.F.R., Fan, Q., Panda, R.: CrossViT: cross-attention multi-scale vision transformer for image classification. In: ICCV, pp. 357–366 (2021)

    Google Scholar 

  39. Wei, X., Zhang, T., Li, Y., et al.: Multi-modality cross attention network for image and sentence matching. In: CVPR, pp. 10941–10950 (2020)

    Google Scholar 

  40. Hou, R., Chang, H., Ma, B., et al.: Cross attention network for few-shot classification. In: NIPS, vol. 32 (2019)

    Google Scholar 

  41. Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2013)

    Article  Google Scholar 

  42. Mehta, D., Rhodin, H., Casas, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV, pp. 506–516 (2017)

    Google Scholar 

  43. von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_37

    Chapter  Google Scholar 

  44. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  45. Andriluka, M., Pishchulin, L., Gehler, P., et al.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, pp. 3686–3693 (2014)

    Google Scholar 

  46. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, no. 4, p. 5 (2010)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (62171317 and 62122058).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xuan, H., Zhang, J., Li, K. (2022). MHPro: Multi-hypothesis Probabilistic Modeling for Human Mesh Recovery. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20497-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20496-8

  • Online ISBN: 978-3-031-20497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics