research-article

Visibility-guided Human Body Reconstruction from Uncalibrated Multi-view Cameras

Authors:

Weiyao LinAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 589 - 598

https://doi.org/10.1145/3652583.3658110

Published: 07 June 2024 Publication History

Abstract

We present a novel method for 3D human body reconstruction with multi-view images from calibration-free cameras by multi-view fusion with explicit visibility modelling. Existing multi-view methods usually establish geometric constraints by using accurate camera intrinsic and extrinsic parameters. Despite remarkable performances, multi-view camera calibration often requires complex operations and additional maintenance to fix camera positions and angles, which restrict its applicability to real-world scenarios. In contrast, we leverage vertex-wise visibility prediction as calibration cues to guide the multi-view human body aggregation, which eliminates the need for camera calibration. Specifically, we estimate the UV position map and the vertex-wise visibility map of human body in each camera view, which allows us to align and aggregate multi-view information in a hierarchical manner. To further improve the alignment between human body and vertex-wise visual features, we propose an Occlusion-aware UV-pixel Refinement (OUVR) module, which takes the previous result of coarse alignment as input. The visible vertices are disentangled from the UV map and are reprojected on the image to describe the misalignment of current body estimation and image features. The UV map representation is adopted throughout the refinement process to avoid the potential error propagation brought by parametric representation. The effectiveness of our approach is validated on 3D human body reconstruction, as it surpasses current leading multi-view fusion methods, and showing comparable performance to methods that require accurate multi-view camera calibration.

References

[1]

Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. Scape: shape completion and animation of people. In ACM SIGGRAPH 2005 Papers. 408--416.

Digital Library

[2]

Alexandru O Bua lan and Michael J Black. 2008. The naked truth: Estimating body shape under clothing. In Computer Vision--ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12--18, 2008, Proceedings, Part II 10. Springer, 15--29.

[3]

Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2014. 3D pictorial structures for multiple human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1669--1676.

Digital Library

[4]

Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2015. 3d pictorial structures revisited: Multiple human pose estimation. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 10 (2015), 1929--1942.

[5]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part V 14. Springer, 561--578.

[6]

Magnus Burenius, Josephine Sullivan, and Stefan Carlsson. 2013. 3D pictorial structures for multiple view articulated pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3618--3625.

Digital Library

[7]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291--7299.

[8]

Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. 2020. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VII 16. Springer, 769--787.

[9]

Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J Black. 2020. Monocular expressive body regression through body-driven attention. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part X 16. Springer, 20--40.

[10]

Leonardo Citraro, Pablo Márquez-Neila, Stefano Savare, Vivek Jayaram, Charles Dubout, Félix Renaut, Andres Hasfura, Horesh Ben Shitrit, and Pascal Fua. 2020. Real-time camera pose estimation for sports fields. Machine Vision and Applications, Vol. 31 (2020), 1--13.

[11]

Zijian Dong, Jie Song, Xu Chen, Chen Guo, and Otmar Hilliges. 2021. Shape-aware multi-person pose estimation from multi-view images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11158--11168.

[12]

Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional Multi-person Pose Estimation. In ICCV.

[13]

Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European conference on computer vision (ECCV). 534--551.

Digital Library

[14]

Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kovs ecká, and Ziyan Wu. 2020. Hierarchical kinematic human mesh recovery. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVII 16. Springer, 768--784.

[15]

Peng Guan, Alexander Weiss, Alexandru O Balan, and Michael J Black. 2009. Estimating human shape and pose from a single image. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 1381--1388.

[16]

Riza Alp Guler and Iasonas Kokkinos. 2019. Holopose: Holistic 3d human reconstruction in-the-wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10884--10894.

[17]

Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7297--7306.

[18]

Nils Hasler, Hanno Ackermann, Bodo Rosenhahn, Thorsten Thorm"ahlen, and Hans-Peter Seidel. 2010. Multilinear pose and body shape estimation of dressed subjects from image sets. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1823--1830.

[19]

Yihui He, Rui Yan, Katerina Fragkiadaki, and Shoou-I Yu. 2020. Epipolar transformer for multi-view human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 1036--1037.

[20]

Buzhen Huang, Yuan Shu, Jingyi Ju, and Yangang Wang. 2022a. Occluded Human Body Capture with Self-Supervised Spatial-Temporal Motion Prior. arXiv preprint arXiv:2207.05375 (2022).

[21]

Buzhen Huang, Yuan Shu, Tianshu Zhang, and Yangang Wang. 2021. Dynamic multi-person mesh recovery from uncalibrated multi-view cameras. In 2021 International Conference on 3D Vision (3DV). IEEE, 710--720.

[22]

Buzhen Huang, Tianshu Zhang, and Yangang Wang. 2022b. Object-Occluded Human Shape and Pose Estimation with Probabilistic Latent Consistency. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 4 (2022), 5010--5026.

Digital Library

[23]

Shiyao Huang, Xianghua Ying, Jiangpeng Rong, Zeyu Shang, and Hongbin Zha. 2016. Camera calibration from periodic motion of a pedestrian. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3025--3033.

[24]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 7 (2013), 1325--1339.

[25]

Karim Iskakov, Egor Burkov, Victor Lempitsky, and Yury Malkov. 2019. Learnable triangulation of human pose. In Proceedings of the IEEE/CVF international conference on computer vision. 7718--7727.

[26]

Kai Jia, Hongwen Zhang, Liang An, and Yebin Liu. 2023. Delving Deep into Pixel Alignment Feature for Accurate Multi-view Human Mesh Recovery. In Proceedings of the AAAI Conference on Artificial Intelligence.

Digital Library

[27]

Hanbyul Joo, Natalia Neverova, and Andrea Vedaldi. 2021. Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 2021 International Conference on 3D Vision (3DV). IEEE, 42--52.

[28]

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7122--7131.

[29]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[30]

Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. 2019b. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF international conference on computer vision. 2252--2261.

[31]

Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019a. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4501--4510.

[32]

Nikos Kolotouros, Georgios Pavlakos, Dinesh Jayaraman, and Kostas Daniilidis. 2021. Probabilistic modeling for human mesh recovery. In Proceedings of the IEEE/CVF international conference on computer vision. 11605--11614.

[33]

Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J Black, and Peter V Gehler. 2017. Unite the people: Closing the loop between 3d and 2d human representations. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6050--6059.

[34]

Jiefeng Li, Chao Xu, Zhicun Chen, Siyuan Bian, Lixin Yang, and Cewu Lu. 2021. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3383--3393.

[35]

Junbang Liang and Ming C Lin. 2019. Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the IEEE/CVF international conference on computer vision. 4352--4362.

[36]

Kevin Lin, Lijuan Wang, and Zicheng Liu. 2021. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1954--1963.

[37]

Matthew Loper, Naureen Mahmood, and Michael J Black. 2014. MoSh: motion and shape capture from sparse markers. ACM Trans. Graph., Vol. 33, 6 (2014), 220--1.

Digital Library

[38]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Transactions on Graphics, Vol. 34, 6 (2015).

Digital Library

[39]

Julieta Martinez, Rayat Hossain, Javier Romero, and James J Little. 2017. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision. 2640--2649.

[40]

Gyeongsik Moon and Kyoung Mu Lee. 2020a. I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VII 16. Springer, 752--768.

Digital Library

[41]

Gyeongsik Moon and Kyoung Mu Lee. 2020b. Pose2pose: 3d positional pose-guided 3d rotational pose prediction for expressive 3d human pose and mesh estimation. arXiv preprint arXiv:2011.11534, Vol. 1, 2 (2020).

[42]

Lea Muller, Ahmed AA Osman, Siyu Tang, Chun-Hao P Huang, and Michael J Black. 2021. On self-contact and human pose. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9990--9999.

[43]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part VIII 14. Springer, 483--499.

[44]

Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 international conference on 3D vision (3DV). IEEE, 484--494.

[45]

Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 459--468.

[46]

Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, and Wenjun Zeng. 2019. Cross view fusion for 3d human pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision. 4342--4351.

[47]

István Sárándi, Timm Linder, Kai O Arras, and Bastian Leibe. 2018. How robust is 3D human pose estimation to occlusion? arXiv preprint arXiv:1808.09316 (2018).

[48]

Akash Sengupta, Ignas Budvytis, and Roberto Cipolla. 2021. Probabilistic 3D human shape and pose estimation from multiple unconstrained images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16094--16104.

[49]

Long Sha, Jennifer Hobbs, Panna Felsen, Xinyu Wei, Patrick Lucey, and Sujoy Ganguly. 2020. End-to-end camera calibration for broadcast videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13627--13636.

[50]

Soyong Shin and Eni Halilaj. 2020. Multi-view human pose and shape estimation using learnable volumetric aggregation. arXiv preprint arXiv:2011.13427 (2020).

[51]

Hui Shuai, Lele Wu, and Qingshan Liu. 2022. Adaptive multi-view and temporal fusing transformer for 3d human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 4 (2022), 4122--4135.

Digital Library

[52]

Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5693--5703.

[53]

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J Black, and Tao Mei. 2021. Monocular, one-stage, regression of multiple 3d people. In Proceedings of the IEEE/CVF international conference on computer vision. 11179--11188.

[54]

Denis Tome, Chris Russell, and Lourdes Agapito. 2017. Lifting from the deep: Convolutional 3d pose estimation from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2500--2509.

[55]

Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European conference on computer vision (ECCV). 20--36.

Digital Library

[56]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[57]

Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. 2019. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10965--10974.

[58]

Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV). 466--481.

Digital Library

[59]

Rongchang Xie, Chunyu Wang, and Yizhou Wang. 2020. Metafuse: A pre-trained fusion model for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13686--13695.

[60]

Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019. Denserac: Joint 3d pose and shape estimation by dense render-and-compare. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7760--7770.

[61]

Farid Yagubbayli, Yida Wang, Alessio Tonioni, and Federico Tombari. 2021. Legoformer: Transformers for block-by-block multi-view 3d reconstruction. arXiv preprint arXiv:2106.12102 (2021).

[62]

Pengfei Yao, Zheng Fang, Fan Wu, Yao Feng, and Jiwei Li. 2019. Densebody: Directly regressing dense 3d human pose and shape from a single color image. arXiv preprint arXiv:1903.10153 (2019).

[63]

Zhixuan Yu, Linguang Zhang, Yuanlu Xu, Chengcheng Tang, Luan Tran, Cem Keskin, and Hyun Soo Park. 2022. Multiview Human Body Reconstruction from Uncalibrated Cameras. Advances in Neural Information Processing Systems, Vol. 35 (2022), 7879--7891.

[64]

Andrei Zanfir, Eduard Gabriel Bazavan, Hongyi Xu, William T Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2020. Weakly supervised 3d human pose and shape reconstruction with normalizing flows. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VI 16. Springer, 465--481.

[65]

Andrei Zanfir, Elisabeta Marinoiu, and Cristian Sminchisescu. 2018. Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2148--2157.

[66]

Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, and Xiaogang Wang. 2020. 3d human mesh regression with dense correspondence. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7054--7063.

[67]

Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. 2021c. Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11446--11456.

[68]

Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng, et al. 2021a. Direct multi-view multi-person 3d pose estimation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 13153--13164.

[69]

Tianshu Zhang, Buzhen Huang, and Yangang Wang. 2020b. Object-occluded human shape and pose estimation from a single color image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7376--7385.

[70]

Yuxiang Zhang, Liang An, Tao Yu, Xiu Li, Kun Li, and Yebin Liu. 2020a. 4d association graph for realtime multi-person motion capture using multiple video cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1324--1333.

[71]

Yuxiang Zhang, Zhe Li, Liang An, Mengcheng Li, Tao Yu, and Yebin Liu. 2021b. Lightweight multi-person total motion capture using sparse multi-view cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5560--5569.

[72]

Yang Zheng, Ruizhi Shao, Yuxiang Zhang, Tao Yu, Zerong Zheng, Qionghai Dai, and Yebin Liu. 2021. Deepmulticap: Performance capture of multiple characters using sparse multiview cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6239--6249.

Index Terms

Visibility-guided Human Body Reconstruction from Uncalibrated Multi-view Cameras
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
      2. Image and video acquisition

Recommendations

Multiview human body reconstruction from uncalibrated cameras
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

We present a new method to reconstruct 3D human body pose and shape by fusing visual features from multiview images captured by uncalibrated cameras. Existing multiview approaches often use spatial camera calibration (intrinsic and extrinsic parameters) ...
Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning
Abstract
Multiple human 3D pose estimation is a useful but challenging task in computer vison applications. The ambiguities in estimation of 2D and 3D poses of multiple persons can be verified by using multi-view frames, in which the occluded or self-...
3DTV view generation using uncalibrated pure rotating and zooming cameras

This paper proposes a novel method for synthesizing free viewpoint video captured by uncalibrated pure rotating and zooming cameras. Neither intrinsic nor extrinsic parameters of our cameras are known. Projective grid space (PGS), which is the 3D space ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
106
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)18

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten