research-article

SimHMR: A Simple Query-based Framework for Parameterized Human Mesh Reconstruction

Authors:

Zhiguo CaoAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 6918 - 6927

https://doi.org/10.1145/3581783.3611814

Published: 27 October 2023 Publication History

Abstract

Human Mesh Reconstruction (HMR) aims to recover 3D human poses and shapes from a single image. Existing parameterized HMR approaches follow the "representation-to-reasoning'' paradigm to predict human body and pose parameters. This paradigm typically involves intermediate representation and complex pipeline, where potential side effects may occur that could hinder performance. In contrast, query-based non-parameterized methods directly output 3D joints and mesh vertices, but they rely on excessive queries for prediction, leading to low efficiency and robustness. In this work, we propose a simple query-based framework, dubbed SimHMR, for parameterized human mesh reconstruction. This framework streamlines the prediction process by using a few parameterized queries, which effectively removes the need for hand-crafted intermediate representation and reasoning pipeline. Different from query-based non-parameterized HMR that uses excessive coordinate queries, SimHMR only requires a few semantic queries, which physically correspond to pose, shape, and camera. The use of semantic queries significantly improves the efficiency and robustness in extreme scenarios, e.g., occlusions. Without bells and whistles, øurs achieves state-of-the-art performance on 3DPW and Human3.6M benchmarks, and surpasses existing methods on challenging 3DPW-OCC. Code available at https://github.com/inso-13/SimHMR github.com/inso-13/SimHMR

References

[1]

Mykhaylo Andriluka, Leonid Pishchulin, Peter V. Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 3686--3693. https://doi.org/10.1109/CVPR.2014.471

Digital Library

[2]

Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. arXiv Comput. Res. Repository, Vol. abs/1607.06450 (2016).

[3]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding?. In Proc. Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 813--824. http://proceedings.mlr.press/v139/bertasius21a.html

[4]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, and Michael J. Black. 2016. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 9909), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer-Verlag, 561--578. https://doi.org/10.1007/978-3-319-46454-1_34

[5]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, and Laurent Sifre. 2022. Improving Language Models by Retrieving from Trillions of Tokens. In Proc. Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (Eds.). PMLR, 2206--2240. https://proceedings.mlr.press/v162/borgeaud22a.html

[6]

Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 1 (2018), 172--186. https://doi.org/10.1109/TPAMI.2019.2929257

Digital Library

[7]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Proc. Eur. Conf. Comput. Vis. Springer-Verlag, 213--229. https://doi.org/10.1007/978-3-030-58452-8_13

Digital Library

[8]

Junhyeong Cho, Kim Youwang, and Tae-Hyun Oh. 2022. Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 13661), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer-Verlag, 342--359. https://doi.org/10.1007/978-3-031-19769-7_20

Digital Library

[9]

Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. 2020. Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12352), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 769--787. https://doi.org/10.1007/978-3-030-58571-6_45

Digital Library

[10]

Hongsuk Choi, Gyeongsik Moon, JoonKyu Park, and Kyoung Mu Lee. 2022. Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 1465--1474. https://doi.org/10.1109/CVPR52688.2022.00153

[11]

Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. 2020. Monocular Expressive Body Regression Through Body-Driven Attention. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12355), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 20--40. https://doi.org/10.1007/978-3-030-58607-2_2

Digital Library

[12]

MMHuman3D Contributors. 2021. OpenMMLab 3D Human Parametric Model Toolbox and Benchmark. https://github.com/open-mmlab/mmhuman3d.

[13]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proc. Conf. Association Comput. Linguistics, Anna Korhonen, David R. Traum, and Lluís Mà rquez (Eds.). Association for Computational Linguistics, 2978--2988. https://doi.org/10.18653/v1/p19-1285

[14]

Tri Dao, Daniel Y. Fu, Khaled Kamal Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré. 2022. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. arXiv Comput. Res. Repository, Vol. abs/2212.14052 (2022). https://doi.org/10.48550/arXiv.2212.14052 [arXiv]2212.14052

[15]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proc. Int. Conf. Learn. Repr. OpenReview.net. https://openreview.net/forum?id=YicbFdNTTy

[16]

Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecká, and Ziyan Wu. 2020. Hierarchical Kinematic Human Mesh Recovery. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12362), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 768--784. https://doi.org/10.1007/978-3-030-58520-4_45

Digital Library

[17]

John C Gower. 1975. Generalized procrustes analysis. Psychometrika, Vol. 40 (1975), 33--51.

[18]

Meng-Hao Guo, Junxiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, and Shi-Min Hu. 2021. PCT: Point cloud transformer. Comput. Vis. Media, Vol. 7, 2 (2021), 187--199. https://doi.org/10.1007/s41095-021-0229-5

[19]

Xian-Feng Han, Yu-Jia Kuang, and Guo-Qiang Xiao. 2021. Point Cloud Learning with Transformer. arXiv Comput. Res. Repository, Vol. abs/2104.13636 (2021). [arXiv]2104.13636 https://arxiv.org/abs/2104.13636

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. 770--778. https://doi.org/10.1109/CVPR.2016.90

[21]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, 7 (2014), 1325--1339. https://doi.org/10.1109/TPAMI.2013.248

Digital Library

[22]

Yifan Jiang, Shiyu Chang, and Zhangyang Wang. 2021. TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up. In Proc. Adv. Neural Inf. Process. Syst., Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 14745--14758. https://proceedings.neurips.cc/paper/2021/hash/7c220a2091c26a7f5e9f1cfb099511e3-Abstract.html

[23]

Sam Johnson and Mark Everingham. 2010. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In Proc. Brit. Machine Vis. Conf., Frédéric Labrosse, Reyer Zwiggelaar, Yonghuai Liu, and Bernie Tiddeman (Eds.). British Machine Vision Association, 1--11. https://doi.org/10.5244/C.24.12

[24]

Hanbyul Joo, Natalia Neverova, and Andrea Vedaldi. 2021. Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation. In Proc. Int. Conf. 3D Vis. Computer Vision Foundation / IEEE, 42--52. https://doi.org/10.1109/3DV53792.2021.00015

[25]

Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-End Recovery of Human Shape and Pose. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 7122--7131. https://doi.org/10.1109/CVPR.2018.00744

[26]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proc. Int. Conf. Learn. Repr.

[27]

Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 5252--5262. https://doi.org/10.1109/CVPR42600.2020.00530

[28]

Muhammed Kocabas, Chun-Hao P. Huang, Otmar Hilliges, and Michael J. Black. 2021. PARE: Part Attention Regressor for 3D Human Body Estimation. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 11107--11117. https://doi.org/10.1109/ICCV48922.2021.01094

[29]

Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 2252--2261. https://doi.org/10.1109/ICCV.2019.00234

[30]

Anne Lauscher, Vinit Ravishankar, Ivan Vulic, and Goran Glavas. 2020. From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers. arXiv Comput. Res. Repository, Vol. abs/2005.00633 (2020). [arXiv]2005.00633 https://arxiv.org/abs/2005.00633

[31]

Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, and Ce Liu. 2022. ViTGAN: Training GANs with Vision Transformers. In Proc. Int. Conf. Learn. Repr. OpenReview.net. https://openreview.net/forum?id=dwg5rXg1WS_

[32]

Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, and Zhuowen Tu. 2021. Pose Recognition With Cascade Transformers. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 1944--1953. https://doi.org/10.1109/CVPR46437.2021.00198

[33]

Zhihao Li, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, and Youliang Yan. 2022. CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 13665), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer-Verlag, 590--606. https://doi.org/10.1007/978-3-031-20065-6_34

Digital Library

[34]

Kevin Lin, Lijuan Wang, and Zicheng Liu. 2021a. End-to-End Human Pose and Mesh Reconstruction with Transformers. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 1954--1963. https://doi.org/10.1109/CVPR46437.2021.00199

[35]

Kevin Lin, Lijuan Wang, and Zicheng Liu. 2021b. Mesh Graphormer. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 12919--12928. https://doi.org/10.1109/ICCV48922.2021.01270

[36]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Proc. Eur. Conf. Comput. Vis. 740--755. https://doi.org/10.1007/978-3-319-10602-1_48

[37]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: a skinned multi-person linear model. ACM Trans. Graph., Vol. 34, 6 (2015), 248:1--248:16. https://doi.org/10.1145/2816795.2818013

Digital Library

[38]

William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. In Spec. Interest Group Comput. Graph. Interactive Technol., Maureen C. Stone (Ed.). ACM Press, 163--169. https://doi.org/10.1145/37401.37422

Digital Library

[39]

Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision. In Proc. Int. Conf. 3D Vis. Computer Vision Foundation / IEEE, 506--516. https://doi.org/10.1109/3DV.2017.00064

[40]

Gyeongsik Moon and Kyoung Mu Lee. 2020. I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12352), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 752--768. https://doi.org/10.1007/978-3-030-58571-6_44

Digital Library

[41]

OpenAI. 2023. GPT-4 Technical Report. arXiv Comput. Res. Repository, Vol. abs/2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774 [arXiv]2303.08774

[42]

Ahmed A. A. Osman, Timo Bolkart, and Michael J. Black. 2020. STAR: Sparse Trained Articulated Human Body Regressor. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12351), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 598--613. https://doi.org/10.1007/978-3-030-58539-6_36

Digital Library

[43]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proc. Adv. Neural Inf. Process. Syst., Vol. 32.

[44]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body From a Single Image. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 10975--10985. https://doi.org/10.1109/CVPR.2019.01123

[45]

Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to Estimate 3D Human Pose and Shape From a Single Color Image. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 459--468. https://doi.org/10.1109/CVPR.2018.00055

[46]

Jonas Pfeiffer, Ivan Vulic, Iryna Gurevych, and Sebastian Ruder. 2020. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proc. Conf. Empirical Meth. in Natural Lang. Proc., Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 7654--7673. https://doi.org/10.18653/v1/2020.emnlp-main.617

[47]

Michael S. Ryoo, A. J. Piergiovanni, Anurag Arnab, Mostafa Dehghani, and Anelia Angelova. 2021. TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? arXiv Comput. Res. Repository, Vol. abs/2106.11297 (2021). showeprint[arXiv]2106.11297 https://arxiv.org/abs/2106.11297

[48]

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Hao Li, and Angjoo Kanazawa. 2019. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 2304--2314. https://doi.org/10.1109/ICCV.2019.00239

[49]

Shunsuke Saito, Tomas Simon, Jason M. Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 81--90. https://doi.org/10.1109/CVPR42600.2020.00016

[50]

Jie Song, Xu Chen, and Otmar Hilliges. 2020. Human Body Model Fitting by Learned Gradient Descent. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12365), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 744--760. https://doi.org/10.1007/978-3-030-58565-5_44

Digital Library

[51]

Robin Strudel, Ricardo Garcia Pinel, Ivan Laptev, and Cordelia Schmid. 2021. Segmenter: Transformer for Semantic Segmentation. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 7242--7252. https://doi.org/10.1109/ICCV48922.2021.00717

[52]

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, and Tao Mei. 2021. Monocular, One-stage, Regression of Multiple 3D People. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 11159--11168. https://doi.org/10.1109/ICCV48922.2021.01099

[53]

Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 5348--5357. https://doi.org/10.1109/ICCV.2019.00545

[54]

Gül Varol, Duygu Ceylan, Bryan C. Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. BodyNet: Volumetric Inference of 3D Human Body Shapes. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 11211), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer-Verlag, 20--38. https://doi.org/10.1007/978-3-030-01234-2_2

Digital Library

[55]

Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 11214), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer-Verlag, 614--631. https://doi.org/10.1007/978-3-030-01249-6_37

Digital Library

[56]

Yikai Wang, TengQi Ye, Lele Cao, Wenbing Huang, Fuchun Sun, Fengxiang He, and Dacheng Tao. 2022. Bridged Transformer for Vision and Point Cloud 3D Object Detection. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 12104--12113. https://doi.org/10.1109/CVPR52688.2022.01180

[57]

Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J. Black. 2022. ICON: Implicit Clothed humans Obtained from Normals. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 13286--13296. https://doi.org/10.1109/CVPR52688.2022.01294

[58]

Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T. Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2020. GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 6183--6192. https://doi.org/10.1109/CVPR42600.2020.00622

[59]

Youze Xue, Jiansheng Chen, Yudong Zhang, Cheng Yu, Huimin Ma, and Hongbing Ma. 2022. 3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers. In Proc. ACM Int. Conf. Multimedia, João Magalhães, Alberto Del Bimbo, Shin'ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni (Eds.). ACM Press, 6765--6773. https://doi.org/10.1145/3503161.3548133

Digital Library

[60]

Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T. Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2021. THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 12951--12960. https://doi.org/10.1109/ICCV48922.2021.01273

[61]

Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. 2021. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 11426--11436. https://doi.org/10.1109/ICCV48922.2021.01125

[62]

Ce Zheng, Matias Mendieta, Taojiannan Yang, and Chen Chen. 2022. HeatER: An Efficient and Unified Network for Human Reconstruction via Heatmap-based TransformER.

Cited By

Shen WYin WWang HWei CCai ZYang LLin GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh RecoveryProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681641(6093-6102)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681641
Tang TLiu HYou YWang TLi WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680881

Index Terms

SimHMR: A Simple Query-based Framework for Parameterized Human Mesh Reconstruction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Human mesh reconstruction (HMR) from monocular video is the key step to many mixed reality and robotic applications. Although existing methods show promising results by capturing frames' temporal information, these methods predict human mesh with the ...
A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Existing deep learning-based human mesh reconstruction approaches have a tendency to build larger networks to achieve higher accuracy. Computational complexity and model size are often neglected, despite being key characteristics for practical use of ...
HyperGraph based human mesh hierarchical representation and reconstruction from a single image
Abstract
Reconstructing 3D human mesh from monocular images has been extensively studied. However, the existing non-parametric reconstruction methods are inefficient when modeling vertex relationship concerning human information due to they generally ...
Graphical abstract

Display Omitted
Highlights
- We propose a novel hypergraph-based human mesh hierarchical representation
- We introduce a HyperGraph Attention-based human mesh reconstruction network.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
170
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)7

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shen WYin WWang HWei CCai ZYang LLin GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh RecoveryProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681641(6093-6102)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681641
Tang TLiu HYou YWang TLi WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680881

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten