research-article

3D-Label-Free Human Mesh Recovery Using Multi-view Consistency

Authors:
Zeyong Wu

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

0009-0008-6829-5796
View Profile

,
Yihua Tan

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

0000-0003-0963-5339
View Profile

,
Yiwen Zeng

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China

0009-0007-5805-211X
View Profile

,
Chuanzhi Xu

School of Physical Education, Huazhong University of Science and Technology, China

School of Physical Education, Huazhong University of Science and Technology, China

0009-0009-0613-3705
View Profile

ICDIP '23: Proceedings of the 15th International Conference on Digital Image ProcessingMay 2023Article No.: 76Pages 1–8https://doi.org/10.1145/3604078.3604154

Published:26 October 2023Publication History

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

Pages 1–8

ABSTRACT

For human action analysis, 3D mesh reconstruction of the human body is very important. Most methods require a large number of training data with 3D human-annotated labels. Such training cost is high and the scales of the existing 3D human-annotated label datasets generally cannot match complex actions in complex environments. Therefore, some recent researchers have begun to study the 3D pseudo-label methods. To have sufficient constraints, most 3D pseudo-label human mesh recovery algorithms rely heavily on 3D pseudo-labels provided by some existing unsupervised 3D human pose estimation algorithms. Unfortunately, it is hard to guarantee an accurate 3D pose estimation with unsupervised learning approaches. The inaccurate 3D pseudo-labels bring negative effects on model training. To solve this problem, we propose an end-to-end 3D-label-free training framework by using multi-view consistency to provide sufficient constraints instead of any 3D human-annotated labels or 3D pseudo-labels. The multi-view consistency exploits the human body consistency attributes in multi-view images to provide self-supervised constraints. Our method is evaluated on two benchmark datasets (Human3.6M and MPI-INF-3DHP) and exhibits competitive experimental results.

References

Boulic, R., Bécheiraz, P., Emering, L., & Thalmann, D. 1997. Integration of motion control techniques for virtual human and avatar real-time animation. In Proceedings of the ACM symposium on Virtual reality software and technology. ACM, Lausanne, Switzerland, 111-118. https://doi.org/10.1145/261135.261156Google ScholarDigital Library
Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Haoshu Fang, Ze Ma, Mingyang Chen, Cewu Lu. 2020. Pastanet: Toward human activity knowledge engine. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 382-391. https://doi.org/10.1109/CVPR42600.2020.00046Google ScholarCross Ref
Seyma Yucer and Yusuf Sinan Akgul. 2018. 3D Human Action Recognition with Siamese-LSTM Based Deep Metric Learning. Journal of Image and Graphics 6, 1 (June 2018), 21-26. http://dx.doi.org/10.18178/joig.6.1.21-26Google ScholarCross Ref
Naresh Kumar and Nagarajan Sukavanam. 2018. Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints. Journal of Image and Graphics 6, 2 (January 2018), 174-180. http://dx.doi.org/10.18178/joig.6.2.174-180Google ScholarCross Ref
Muhammad Hassan, Tasweer Ahmad, Nudrat Liaqat, Ali Farooq, Syed Asghar Ali, and Syed Rizwan hassan. 2014. A Review on Human Actions Recognition Using Vision Based Techniques. Journal of Image and Graphics, 2, 1, (January 2014), 28-32. http://dx.doi.org/10.12720/joig.2.1.28-32Google ScholarCross Ref
Tasweer Ahmad, Junaid Rafique, Hassam Muazzam, and Tahir Rizvi. 2015. Using Discrete Cosine Transform Based Features for Human Action Recognition. Journal of Image and Graphics 3, 2, (January 2015), 96-101. http://dx.doi.org/10.18178/joig.3.2.96-101Google ScholarCross Ref
Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu. 2020. Detailed 2D-3D Joint Representation for Human-Object Interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 10166-10175. https://doi.org/10.1109/CVPR42600.2020.01018Google ScholarCross Ref
Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers. 2005. SCAPE: shape completion and animation of people. Acm Transactions on Graphics 24, 3, (July 2005), 408-416. http://dx.doi.org/10.1145/1073204.1073207Google ScholarDigital Library
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. Osman, Dimitrios Tzionas, Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 10975-10985. https://doi.org/10.1109/CVPR.2019.01123Google ScholarCross Ref
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, Michael J. Black. 2015. SMPL: A skinned multi-person linear model. Acm Transactions on Graphics 34, 6, (November 2015), 1-16. http://dx.doi.org/10.1145/2816795.2818013Google ScholarDigital Library
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, Michael J Black. 2016. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, Amsterdam, Netherlands, 561-578. https://doi.org/10.1007/978-3-319-46454-1_34Google ScholarCross Ref
Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, Peter V. Gehler. 2017. Unite the People: Closing the Loop Between 3D and 2D Human Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, 6050-6059. https://doi.org/10.1109/CVPR.2017.500Google ScholarCross Ref
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, USA, 7122-7131. https://doi.org/10.1109/CVPR.2018.00744Google ScholarCross Ref
Muhammed Kocabas, Nikos Athanasiou, Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 5253-5263. https://doi.org/10.1109/CVPR42600.2020.00530Google ScholarCross Ref
Jiefeng Li, Chao Xu, Zhicun Chen, Siyuan Bian, Lixin Yang, Cewu Lu. 2021. HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Nashville, TN, USA, 3383-3393. https://doi.org/10.1109/CVPR46437.2021.00339Google ScholarCross Ref
Georgios Pavlakos, Nikos Kolotouros, Kostas Daniilidis. 2019. Texturepose: Supervising human mesh estimation with texture consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 803-812. https://doi.org/10.1109/ICCV.2019.00089Google ScholarCross Ref
Shashank Tripathi, Siddhant Ranade, Ambrish Tyagi, Amit Agrawal. 2020. Posenet3d: Learning temporally consistent 3d human pose via knowledge distillation. In Proceedings of the International Conference on 3D Vision (3DV). IEEE, Fukuoka, Japan, 311-321. https://doi.org/10.1109/3DV50981.2020.00041Google ScholarCross Ref
Zhenbo Yu, Junjie Wang, Jingwei Xu, Bingbing Ni, Chenglong Zhao, Minsi Wang, Wenjun Zhang. 2021. Skeleton2Mesh: Kinematics Prior Injected Unsupervised Human Mesh Recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 8619-8629. https://doi.org/10.1109/ICCV48922.2021.00850Google ScholarCross Ref
Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV, Stefan Stojanov, James M. Rehg. 2019. Unsupervised 3d pose estimation with geometric self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, USA, 5714-5724. https://doi.org/10.1109/CVPR.2019.00586Google ScholarCross Ref
Xiaodan Hu, Narendra Ahuja. 2021. Unsupervised 3d pose estimation for hierarchical dance video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 11015-11024. https://doi.org/10.1109/ICCV48922.2021.01083Google ScholarCross Ref
N Dinesh Reddy, Laurent Guigues, Leonid Pishchulin, Jayan Eledath, Srinivasa G. Narasimhan. 2021. Tessetrack: End-to-end learnable multi-person articulated 3d pose tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Nashville, TN, USA, 15190-15200. https://doi.org/10.1109/CVPR46437.2021.01494Google ScholarCross Ref
Catalin Ionescu, Dragos Papava, Vlad Olaru, Cristian Sminchisescu. 2013. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (December 2013) 1325-1339. https://doi.org/10.1109/TPAMI.2013.248Google ScholarDigital Library
Dushyant Mehta; Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In Proceedings of the International Conference on 3D Vision (3DV). IEEE, Qingdao, China, 506-516. https://doi.org/10.1109/3DV.2017.00064Google ScholarCross Ref
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, Y. Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv preprint arxiv:1406.1078, 2014. http://dx.doi.org/10.3115/v1/D14-1179Google ScholarCross Ref
Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, Zhenan Sun. 2021. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 11446-11456. https://doi.org/10.1109/ICCV48922.2021.01125Google ScholarCross Ref
Muhammed Kocabas, Salih Karagoz, Emre Akbas. 2019. Self-supervised learning of 3d human pose using multi-view geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, USA, 1077-1086. https://doi.org/10.1109/CVPR.2019.00117Google ScholarCross Ref
Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin. 2019. Graphonomy: Universal human parsing via graph transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, USA, 7450-7459. https://doi.org/10.1109/CVPR.2019.00763Google ScholarCross Ref
Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada. 2018. Neural 3d mesh renderer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, USA, 3907-3916. https://doi.org/10.1109/CVPR.2018.00411Google ScholarCross Ref
Matthew Loper, Naureen Mahmood, Michael J Black. 2014. MoSh: Motion and shape capture from sparse markers. Acm Transactions on Graphics 33, 6 (December 2014), 1-13. http://dx.doi.org/10.1145/2661229.2661273Google ScholarDigital Library
Zhongguo Li, Magnus Oskarsson, Anders Heyden. 2021. 3D human pose and shape estimation through collaborative learning and multi-view model-fitting. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV), Springer, Waikoloa, HI, USA, 1888-1897. https://doi.org/10.1109/WACV48630.2021.00193Google ScholarCross Ref
Sam Johnson, Mark Everingham. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. 2010. In Proceedings of British Machine Vision Conference (BMVC). British Machine Vision Association, Aberystwyth, UK, 1-11. http://dx.doi.org/10.5244/C.24.12Google ScholarCross Ref
Sam Johnson, Mark Everingham. 2011. Learning effective human pose estimation from inaccurate annotation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Colorado Springs, CO, USA, 1465-1472. https://doi.org/10.1109/CVPR.2011.5995318Google ScholarDigital Library
Nikos Kolotouros, Georgios Pavlakos, Michael J Black, Kostas Daniilidis. 2019 Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Seoul, Korea (South), 2252-2261. https://doi.org/10.1109/ICCV.2019.00234Google ScholarCross Ref
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei Rezvani Nezhad, Hans-Peter Seidel, Weipeng Xu, Dan Casas, Christian Theobalt. 2017. Vnect: Real-time 3d human pose estimation with a single rgb camera. Acm Transactions on Graphics 36, 4 (May 2017), 1-14. http://dx.doi.org/10.1145/3072959.3073596Google ScholarDigital Library

Index Terms

3D-Label-Free Human Mesh Recovery Using Multi-view Consistency
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Label recovery and label correlation co-learning for multi-view multi-label classification with incomplete labels
Abstract
Multi-view multi-label learning (MVML) is an important paradigm in machine learning, where each instance is represented by several heterogeneous views and associated with a set of class labels. However, label incompleteness and the ignorance of ...
Read More
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Read More
Weakly-supervised multi-view multi-instance multi-label learning
IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Multi-view, Multi-instance, and Multi-label Learning (M3L) can model complex objects (bags), which are represented with different feature views, made of diverse instances, and annotated with discrete nonexclusive labels. Existing M3L approaches assume a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing
May 2023
711 pages
ISBN:9798400708237
DOI:10.1145/3604078

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D-label-free
human mesh recovery
multi-view consistency
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 21
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

3D-Label-Free Human Mesh Recovery Using Multi-view Consistency

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Label recovery and label correlation co-learning for multi-view multi-label classification with incomplete labels

Semi-supervised multi-label classification using incomplete label information

Weakly-supervised multi-view multi-instance multi-label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

3D-Label-Free Human Mesh Recovery Using Multi-view Consistency

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Label recovery and label correlation co-learning for multi-view multi-label classification with incomplete labels

Semi-supervised multi-label classification using incomplete label information

Weakly-supervised multi-view multi-instance multi-label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media