MHPro: Multi-hypothesis Probabilistic Modeling for Human Mesh Recovery

Xuan, Haibiao; Zhang, Jinsong; Li, Kun

doi:10.1007/978-3-031-20497-5_18

Haibiao Xuan¹²,
Jinsong Zhang¹² &
Kun Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1308 Accesses

Abstract

Recovering 3D human meshes from monocular images is an inherently ambiguous and challenging task due to depth ambiguity, joint occlusion and truncation. However, most recent works avoid modeling uncertainty, typically obtaining a single reconstruction for a given input. In contrast, this paper presents the ambiguity of reception reconstruction and considers the problem as an inverse problem for which multiple feasible solutions exist. Our method, MHPro, first constructs a probability distribution and obtains a set of feasible recovery results (i.e. multi-hypotheses), from monocular images. Intra-hypothesis refinement is then performed to achieve independent feature enhancement. Finally, the multi-hypothesis features are aggregated by inter-hypothesis communication to recover the final 3D human mesh. The effectiveness of our method is validated on two benchmark datasets, Human3.6M and 3DPW, where experimental results show that our method achieves state-of-the-art performance and recovers more accurate human meshes. Our results validate the importance of intra-hypothesis refinement and inter-hypothesis communication in probabilistic modeling and show optimal performance across a variety of settings. Our source code will be available at http://cic.tju.edu.cn/faculty/likun/projects/MHPro.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Duan, H., Zhao, Y., Chen, K., et al.: Revisiting skeleton-based action recognition. In: CVPR, pp. 2969–2978 (2022)
Google Scholar
Liu, Y., Sivaparthipan, C.B., Shankar, A.: Human-computer Interaction Based Visual Feedback System for Augmentative and Alternative Communication. Int. J. Speech Technol. 25, 305–314 (2022). https://doi.org/10.1007/s10772-021-09901-4
Article Google Scholar
Weng, C.Y., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3D character animation from a single photo. In: CVPR, pp. 5908–5917 (2019)
Google Scholar
Khirodkar, R., Tripathi, S., Kitani, K.: Occluded human mesh recovery. In: CVPR, pp. 1715–1725 (2022)
Google Scholar
Zheng, C., Wu, W., Yang, T., et al.: Deep learning-based human pose estimation: a survey. ArXiv:2012.13392 (2020)
Tian, Y., Zhang, H., Liu, Y., et al.: Recovering 3D human mesh from monocular images: a survey. ArXiv:2203.01923 (2022)
Loper, M., Mahmood, N., Romero, J., et al.: SMPL: a skinned multi-person linear model. TOG 34(6), 1–16 (2015)
Article Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Lassner, C., Romero, J., Kiefel, M., et al.: Unite the people: closing the loop between 3D and 2D human representations. In: CVPR, pp. 6050–6059 (2017)
Google Scholar
Song, J., Chen, X., Hilliges, O.: Human body model fitting by learned gradient descent. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 744–760. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_44
Chapter Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., et al.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)
Google Scholar
Pavlakos, G., Zhu, L., Zhou, X., et al.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR, pp. 459–468 (2018)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., et al.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV, pp. 2252–2261 (2019)
Google Scholar
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR, pp. 4501–4510 (2019)
Google Scholar
Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 752–768. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_44
Chapter Google Scholar
Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: CVPR, pp. 5253–5263 (2020)
Google Scholar
Jiang, W., Kolotouros, N., Pavlakos, G., et al.: Coherent reconstruction of multiple humans from a single image. In: CVPR, pp. 5579–5588 (2020)
Google Scholar
Lee, G.H., Lee, S.W.: Uncertainty-aware human mesh recovery from video by learning part-based 3D dynamics. In: ICCV, pp. 12375–12384 (2021)
Google Scholar
Zhang, H., Tian, Y., Zhou, X., et al.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: ICCV, pp. 11446–11456 (2021)
Google Scholar
Wan, Z., Li, Z., Tian, M., et al.: Encoder-decoder with multi-level attention for 3D human shape and pose estimation. In: ICCV, pp. 13033–13042 (2021)
Google Scholar
Kocabas, M., Huang, C.H.P., Hilliges, O., et al.: PARE: part attention regressor for 3D human body estimation. In: ICCV, pp. 11127–11137 (2021)
Google Scholar
Li, C., Lee, G.H.: Generating multiple hypotheses for 3D human pose estimation with mixture density network. In: CVPR, pp. 9887–9895 (2019)
Google Scholar
Li, C., Lee, G.H.: Weakly supervised generative network for multiple 3D human pose hypotheses. ArXiv:2008.05770 (2020)
Biggs, B., Novotny, D., Ehrhardt, S., et al.: 3D multi-bodies: fitting sets of plausible 3D human models to ambiguous image data. In: NIPS, vol. 33, pp. 20496–20507 (2020)
Google Scholar
Oikarinen, T., Hannah, D., Kazerounian, S.: GraphMDN: leveraging graph structure and deep learning to solve inverse problems. In: IJCNN, pp. 1–9 (2021)
Google Scholar
Wehrbein, T., Rudolph, M., Rosenhahn, B., et al.: Probabilistic monocular 3D human pose estimation with normalizing flows. In: ICCV, pp. 11199–11208 (2021)
Google Scholar
Kolotouros, N., Pavlakos, G., Jayaraman, D., et al.: Probabilistic modeling for human mesh recovery. In: ICCV, pp. 11605–11614 (2021)
Google Scholar
Sengupta, A., Budvytis, I., Cipolla, R.: Hierarchical kinematic probability distributions for 3D human shape and pose estimation from images in the wild. In: ICCV, pp. 11219–11229 (2021)
Google Scholar
Li, W., Liu, H., Tang, H., et al.: MHFormer: multi-hypothesis transformer for 3D human pose estimation. ArXiv:2111.12707 (2021)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS, vol. 30 (2017)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. ArXiv:2010.11929 (2020)
Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: CVPR, pp. 1954–1963 (2021)
Google Scholar
Jiang, Y., Chang, S., Wang, Z.: TransGAN: two pure transformers can make one strong GAN, and that can scale up. In: NIPS, vol. 34 (2021)
Google Scholar
Chen, H., Wang, Y., Guo, T., et al.: Pre-trained image processing transformer. In: CVPR, pp. 12299–12310 (2021)
Google Scholar
Dai, Z., Cai, B., Lin, Y., et al.: UP-DETR: unsupervised pre-training for object detection with transformers. In: CVPR, pp. 1601–1610 (2021)
Google Scholar
Zeng, Y., Fu, J., Chao, H.: Learning joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31
Chapter Google Scholar
Zhou, Y., Barnes, C., Lu, J., et al.: On the continuity of rotation representations in neural networks. In: CVPR, pp. 5745–5753 (2019)
Google Scholar
Chen, C.F.R., Fan, Q., Panda, R.: CrossViT: cross-attention multi-scale vision transformer for image classification. In: ICCV, pp. 357–366 (2021)
Google Scholar
Wei, X., Zhang, T., Li, Y., et al.: Multi-modality cross attention network for image and sentence matching. In: CVPR, pp. 10941–10950 (2020)
Google Scholar
Hou, R., Chang, H., Ma, B., et al.: Cross attention network for few-shot classification. In: NIPS, vol. 32 (2019)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2013)
Article Google Scholar
Mehta, D., Rhodin, H., Casas, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV, pp. 506–516 (2017)
Google Scholar
von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_37
Chapter Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P., et al.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, pp. 3686–3693 (2014)
Google Scholar
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, no. 4, p. 5 (2010)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (62171317 and 62122058).

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Haibiao Xuan, Jinsong Zhang & Kun Li

Authors

Haibiao Xuan
View author publications
You can also search for this author in PubMed Google Scholar
Jinsong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xuan, H., Zhang, J., Li, K. (2022). MHPro: Multi-hypothesis Probabilistic Modeling for Human Mesh Recovery. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_18
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MHPro: Multi-hypothesis Probabilistic Modeling for Human Mesh Recovery