skip to main content
10.1145/3604078.3604154acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdipConference Proceedingsconference-collections
research-article

3D-Label-Free Human Mesh Recovery Using Multi-view Consistency

Authors Info & Claims
Published:26 October 2023Publication History

ABSTRACT

For human action analysis, 3D mesh reconstruction of the human body is very important. Most methods require a large number of training data with 3D human-annotated labels. Such training cost is high and the scales of the existing 3D human-annotated label datasets generally cannot match complex actions in complex environments. Therefore, some recent researchers have begun to study the 3D pseudo-label methods. To have sufficient constraints, most 3D pseudo-label human mesh recovery algorithms rely heavily on 3D pseudo-labels provided by some existing unsupervised 3D human pose estimation algorithms. Unfortunately, it is hard to guarantee an accurate 3D pose estimation with unsupervised learning approaches. The inaccurate 3D pseudo-labels bring negative effects on model training. To solve this problem, we propose an end-to-end 3D-label-free training framework by using multi-view consistency to provide sufficient constraints instead of any 3D human-annotated labels or 3D pseudo-labels. The multi-view consistency exploits the human body consistency attributes in multi-view images to provide self-supervised constraints. Our method is evaluated on two benchmark datasets (Human3.6M and MPI-INF-3DHP) and exhibits competitive experimental results.

References

  1. Boulic, R., Bécheiraz, P., Emering, L., & Thalmann, D. 1997. Integration of motion control techniques for virtual human and avatar real-time animation. In Proceedings of the ACM symposium on Virtual reality software and technology. ACM, Lausanne, Switzerland, 111-118. https://doi.org/10.1145/261135.261156Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Haoshu Fang, Ze Ma, Mingyang Chen, Cewu Lu. 2020. Pastanet: Toward human activity knowledge engine. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 382-391. https://doi.org/10.1109/CVPR42600.2020.00046Google ScholarGoogle ScholarCross RefCross Ref
  3. Seyma Yucer and Yusuf Sinan Akgul. 2018. 3D Human Action Recognition with Siamese-LSTM Based Deep Metric Learning. Journal of Image and Graphics 6, 1 (June 2018), 21-26. http://dx.doi.org/10.18178/joig.6.1.21-26Google ScholarGoogle ScholarCross RefCross Ref
  4. Naresh Kumar and Nagarajan Sukavanam. 2018. Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints. Journal of Image and Graphics 6, 2 (January 2018), 174-180. http://dx.doi.org/10.18178/joig.6.2.174-180Google ScholarGoogle ScholarCross RefCross Ref
  5. Muhammad Hassan, Tasweer Ahmad, Nudrat Liaqat, Ali Farooq, Syed Asghar Ali, and Syed Rizwan hassan. 2014. A Review on Human Actions Recognition Using Vision Based Techniques. Journal of Image and Graphics, 2, 1, (January 2014), 28-32. http://dx.doi.org/10.12720/joig.2.1.28-32Google ScholarGoogle ScholarCross RefCross Ref
  6. Tasweer Ahmad, Junaid Rafique, Hassam Muazzam, and Tahir Rizvi. 2015. Using Discrete Cosine Transform Based Features for Human Action Recognition. Journal of Image and Graphics 3, 2, (January 2015), 96-101. http://dx.doi.org/10.18178/joig.3.2.96-101Google ScholarGoogle ScholarCross RefCross Ref
  7. Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu. 2020. Detailed 2D-3D Joint Representation for Human-Object Interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 10166-10175. https://doi.org/10.1109/CVPR42600.2020.01018Google ScholarGoogle ScholarCross RefCross Ref
  8. Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers. 2005. SCAPE: shape completion and animation of people. Acm Transactions on Graphics 24, 3, (July 2005), 408-416. http://dx.doi.org/10.1145/1073204.1073207Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. Osman, Dimitrios Tzionas, Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 10975-10985. https://doi.org/10.1109/CVPR.2019.01123Google ScholarGoogle ScholarCross RefCross Ref
  10. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, Michael J. Black. 2015. SMPL: A skinned multi-person linear model. Acm Transactions on Graphics 34, 6, (November 2015), 1-16. http://dx.doi.org/10.1145/2816795.2818013Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, Michael J Black. 2016. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, Amsterdam, Netherlands, 561-578. https://doi.org/10.1007/978-3-319-46454-1_34Google ScholarGoogle ScholarCross RefCross Ref
  12. Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, Peter V. Gehler. 2017. Unite the People: Closing the Loop Between 3D and 2D Human Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, 6050-6059. https://doi.org/10.1109/CVPR.2017.500Google ScholarGoogle ScholarCross RefCross Ref
  13. Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, USA, 7122-7131. https://doi.org/10.1109/CVPR.2018.00744Google ScholarGoogle ScholarCross RefCross Ref
  14. Muhammed Kocabas, Nikos Athanasiou, Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 5253-5263. https://doi.org/10.1109/CVPR42600.2020.00530Google ScholarGoogle ScholarCross RefCross Ref
  15. Jiefeng Li, Chao Xu, Zhicun Chen, Siyuan Bian, Lixin Yang, Cewu Lu. 2021. HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Nashville, TN, USA, 3383-3393. https://doi.org/10.1109/CVPR46437.2021.00339Google ScholarGoogle ScholarCross RefCross Ref
  16. Georgios Pavlakos, Nikos Kolotouros, Kostas Daniilidis. 2019. Texturepose: Supervising human mesh estimation with texture consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 803-812. https://doi.org/10.1109/ICCV.2019.00089Google ScholarGoogle ScholarCross RefCross Ref
  17. Shashank Tripathi, Siddhant Ranade, Ambrish Tyagi, Amit Agrawal. 2020. Posenet3d: Learning temporally consistent 3d human pose via knowledge distillation. In Proceedings of the International Conference on 3D Vision (3DV). IEEE, Fukuoka, Japan, 311-321. https://doi.org/10.1109/3DV50981.2020.00041Google ScholarGoogle ScholarCross RefCross Ref
  18. Zhenbo Yu, Junjie Wang, Jingwei Xu, Bingbing Ni, Chenglong Zhao, Minsi Wang, Wenjun Zhang. 2021. Skeleton2Mesh: Kinematics Prior Injected Unsupervised Human Mesh Recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 8619-8629. https://doi.org/10.1109/ICCV48922.2021.00850Google ScholarGoogle ScholarCross RefCross Ref
  19. Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV, Stefan Stojanov, James M. Rehg. 2019. Unsupervised 3d pose estimation with geometric self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, USA, 5714-5724. https://doi.org/10.1109/CVPR.2019.00586Google ScholarGoogle ScholarCross RefCross Ref
  20. Xiaodan Hu, Narendra Ahuja. 2021. Unsupervised 3d pose estimation for hierarchical dance video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 11015-11024. https://doi.org/10.1109/ICCV48922.2021.01083Google ScholarGoogle ScholarCross RefCross Ref
  21. N Dinesh Reddy, Laurent Guigues, Leonid Pishchulin, Jayan Eledath, Srinivasa G. Narasimhan. 2021. Tessetrack: End-to-end learnable multi-person articulated 3d pose tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Nashville, TN, USA, 15190-15200. https://doi.org/10.1109/CVPR46437.2021.01494Google ScholarGoogle ScholarCross RefCross Ref
  22. Catalin Ionescu, Dragos Papava, Vlad Olaru, Cristian Sminchisescu. 2013. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (December 2013) 1325-1339. https://doi.org/10.1109/TPAMI.2013.248Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dushyant Mehta; Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In Proceedings of the International Conference on 3D Vision (3DV). IEEE, Qingdao, China, 506-516. https://doi.org/10.1109/3DV.2017.00064Google ScholarGoogle ScholarCross RefCross Ref
  24. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, Y. Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv preprint arxiv:1406.1078, 2014. http://dx.doi.org/10.3115/v1/D14-1179Google ScholarGoogle ScholarCross RefCross Ref
  25. Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, Zhenan Sun. 2021. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 11446-11456. https://doi.org/10.1109/ICCV48922.2021.01125Google ScholarGoogle ScholarCross RefCross Ref
  26. Muhammed Kocabas, Salih Karagoz, Emre Akbas. 2019. Self-supervised learning of 3d human pose using multi-view geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, USA, 1077-1086. https://doi.org/10.1109/CVPR.2019.00117Google ScholarGoogle ScholarCross RefCross Ref
  27. Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin. 2019. Graphonomy: Universal human parsing via graph transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, USA, 7450-7459. https://doi.org/10.1109/CVPR.2019.00763Google ScholarGoogle ScholarCross RefCross Ref
  28. Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada. 2018. Neural 3d mesh renderer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, USA, 3907-3916. https://doi.org/10.1109/CVPR.2018.00411Google ScholarGoogle ScholarCross RefCross Ref
  29. Matthew Loper, Naureen Mahmood, Michael J Black. 2014. MoSh: Motion and shape capture from sparse markers. Acm Transactions on Graphics 33, 6 (December 2014), 1-13. http://dx.doi.org/10.1145/2661229.2661273Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zhongguo Li, Magnus Oskarsson, Anders Heyden. 2021. 3D human pose and shape estimation through collaborative learning and multi-view model-fitting. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV), Springer, Waikoloa, HI, USA, 1888-1897. https://doi.org/10.1109/WACV48630.2021.00193Google ScholarGoogle ScholarCross RefCross Ref
  31. Sam Johnson, Mark Everingham. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. 2010. In Proceedings of British Machine Vision Conference (BMVC). British Machine Vision Association, Aberystwyth, UK, 1-11. http://dx.doi.org/10.5244/C.24.12Google ScholarGoogle ScholarCross RefCross Ref
  32. Sam Johnson, Mark Everingham. 2011. Learning effective human pose estimation from inaccurate annotation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Colorado Springs, CO, USA, 1465-1472. https://doi.org/10.1109/CVPR.2011.5995318Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nikos Kolotouros, Georgios Pavlakos, Michael J Black, Kostas Daniilidis. 2019 Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Seoul, Korea (South), 2252-2261. https://doi.org/10.1109/ICCV.2019.00234Google ScholarGoogle ScholarCross RefCross Ref
  34. Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei Rezvani Nezhad, Hans-Peter Seidel, Weipeng Xu, Dan Casas, Christian Theobalt. 2017. Vnect: Real-time 3d human pose estimation with a single rgb camera. Acm Transactions on Graphics 36, 4 (May 2017), 1-14. http://dx.doi.org/10.1145/3072959.3073596Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. 3D-Label-Free Human Mesh Recovery Using Multi-view Consistency

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing
      May 2023
      711 pages
      ISBN:9798400708237
      DOI:10.1145/3604078

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format