skip to main content
10.1145/3581783.3611814acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

SimHMR: A Simple Query-based Framework for Parameterized Human Mesh Reconstruction

Published: 27 October 2023 Publication History

Abstract

Human Mesh Reconstruction (HMR) aims to recover 3D human poses and shapes from a single image. Existing parameterized HMR approaches follow the "representation-to-reasoning'' paradigm to predict human body and pose parameters. This paradigm typically involves intermediate representation and complex pipeline, where potential side effects may occur that could hinder performance. In contrast, query-based non-parameterized methods directly output 3D joints and mesh vertices, but they rely on excessive queries for prediction, leading to low efficiency and robustness. In this work, we propose a simple query-based framework, dubbed SimHMR, for parameterized human mesh reconstruction. This framework streamlines the prediction process by using a few parameterized queries, which effectively removes the need for hand-crafted intermediate representation and reasoning pipeline. Different from query-based non-parameterized HMR that uses excessive coordinate queries, SimHMR only requires a few semantic queries, which physically correspond to pose, shape, and camera. The use of semantic queries significantly improves the efficiency and robustness in extreme scenarios, e.g., occlusions. Without bells and whistles, øurs achieves state-of-the-art performance on 3DPW and Human3.6M benchmarks, and surpasses existing methods on challenging 3DPW-OCC. Code available at https://github.com/inso-13/SimHMR github.com/inso-13/SimHMR

References

[1]
Mykhaylo Andriluka, Leonid Pishchulin, Peter V. Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 3686--3693. https://doi.org/10.1109/CVPR.2014.471
[2]
Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. arXiv Comput. Res. Repository, Vol. abs/1607.06450 (2016).
[3]
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding?. In Proc. Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 813--824. http://proceedings.mlr.press/v139/bertasius21a.html
[4]
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, and Michael J. Black. 2016. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 9909), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer-Verlag, 561--578. https://doi.org/10.1007/978-3-319-46454-1_34
[5]
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, and Laurent Sifre. 2022. Improving Language Models by Retrieving from Trillions of Tokens. In Proc. Int. Conf. Mach. Learn. (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (Eds.). PMLR, 2206--2240. https://proceedings.mlr.press/v162/borgeaud22a.html
[6]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 1 (2018), 172--186. https://doi.org/10.1109/TPAMI.2019.2929257
[7]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Proc. Eur. Conf. Comput. Vis. Springer-Verlag, 213--229. https://doi.org/10.1007/978-3-030-58452-8_13
[8]
Junhyeong Cho, Kim Youwang, and Tae-Hyun Oh. 2022. Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 13661), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer-Verlag, 342--359. https://doi.org/10.1007/978-3-031-19769-7_20
[9]
Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. 2020. Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12352), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 769--787. https://doi.org/10.1007/978-3-030-58571-6_45
[10]
Hongsuk Choi, Gyeongsik Moon, JoonKyu Park, and Kyoung Mu Lee. 2022. Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 1465--1474. https://doi.org/10.1109/CVPR52688.2022.00153
[11]
Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. 2020. Monocular Expressive Body Regression Through Body-Driven Attention. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12355), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 20--40. https://doi.org/10.1007/978-3-030-58607-2_2
[12]
MMHuman3D Contributors. 2021. OpenMMLab 3D Human Parametric Model Toolbox and Benchmark. https://github.com/open-mmlab/mmhuman3d.
[13]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proc. Conf. Association Comput. Linguistics, Anna Korhonen, David R. Traum, and Lluís Mà rquez (Eds.). Association for Computational Linguistics, 2978--2988. https://doi.org/10.18653/v1/p19-1285
[14]
Tri Dao, Daniel Y. Fu, Khaled Kamal Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré. 2022. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. arXiv Comput. Res. Repository, Vol. abs/2212.14052 (2022). https://doi.org/10.48550/arXiv.2212.14052 [arXiv]2212.14052
[15]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proc. Int. Conf. Learn. Repr. OpenReview.net. https://openreview.net/forum?id=YicbFdNTTy
[16]
Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecká, and Ziyan Wu. 2020. Hierarchical Kinematic Human Mesh Recovery. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12362), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 768--784. https://doi.org/10.1007/978-3-030-58520-4_45
[17]
John C Gower. 1975. Generalized procrustes analysis. Psychometrika, Vol. 40 (1975), 33--51.
[18]
Meng-Hao Guo, Junxiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, and Shi-Min Hu. 2021. PCT: Point cloud transformer. Comput. Vis. Media, Vol. 7, 2 (2021), 187--199. https://doi.org/10.1007/s41095-021-0229-5
[19]
Xian-Feng Han, Yu-Jia Kuang, and Guo-Qiang Xiao. 2021. Point Cloud Learning with Transformer. arXiv Comput. Res. Repository, Vol. abs/2104.13636 (2021). [arXiv]2104.13636 https://arxiv.org/abs/2104.13636
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. 770--778. https://doi.org/10.1109/CVPR.2016.90
[21]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, 7 (2014), 1325--1339. https://doi.org/10.1109/TPAMI.2013.248
[22]
Yifan Jiang, Shiyu Chang, and Zhangyang Wang. 2021. TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up. In Proc. Adv. Neural Inf. Process. Syst., Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 14745--14758. https://proceedings.neurips.cc/paper/2021/hash/7c220a2091c26a7f5e9f1cfb099511e3-Abstract.html
[23]
Sam Johnson and Mark Everingham. 2010. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In Proc. Brit. Machine Vis. Conf., Frédéric Labrosse, Reyer Zwiggelaar, Yonghuai Liu, and Bernie Tiddeman (Eds.). British Machine Vision Association, 1--11. https://doi.org/10.5244/C.24.12
[24]
Hanbyul Joo, Natalia Neverova, and Andrea Vedaldi. 2021. Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation. In Proc. Int. Conf. 3D Vis. Computer Vision Foundation / IEEE, 42--52. https://doi.org/10.1109/3DV53792.2021.00015
[25]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-End Recovery of Human Shape and Pose. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 7122--7131. https://doi.org/10.1109/CVPR.2018.00744
[26]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proc. Int. Conf. Learn. Repr.
[27]
Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 5252--5262. https://doi.org/10.1109/CVPR42600.2020.00530
[28]
Muhammed Kocabas, Chun-Hao P. Huang, Otmar Hilliges, and Michael J. Black. 2021. PARE: Part Attention Regressor for 3D Human Body Estimation. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 11107--11117. https://doi.org/10.1109/ICCV48922.2021.01094
[29]
Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 2252--2261. https://doi.org/10.1109/ICCV.2019.00234
[30]
Anne Lauscher, Vinit Ravishankar, Ivan Vulic, and Goran Glavas. 2020. From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers. arXiv Comput. Res. Repository, Vol. abs/2005.00633 (2020). [arXiv]2005.00633 https://arxiv.org/abs/2005.00633
[31]
Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, and Ce Liu. 2022. ViTGAN: Training GANs with Vision Transformers. In Proc. Int. Conf. Learn. Repr. OpenReview.net. https://openreview.net/forum?id=dwg5rXg1WS_
[32]
Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, and Zhuowen Tu. 2021. Pose Recognition With Cascade Transformers. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 1944--1953. https://doi.org/10.1109/CVPR46437.2021.00198
[33]
Zhihao Li, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, and Youliang Yan. 2022. CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 13665), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer-Verlag, 590--606. https://doi.org/10.1007/978-3-031-20065-6_34
[34]
Kevin Lin, Lijuan Wang, and Zicheng Liu. 2021a. End-to-End Human Pose and Mesh Reconstruction with Transformers. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 1954--1963. https://doi.org/10.1109/CVPR46437.2021.00199
[35]
Kevin Lin, Lijuan Wang, and Zicheng Liu. 2021b. Mesh Graphormer. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 12919--12928. https://doi.org/10.1109/ICCV48922.2021.01270
[36]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Proc. Eur. Conf. Comput. Vis. 740--755. https://doi.org/10.1007/978-3-319-10602-1_48
[37]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: a skinned multi-person linear model. ACM Trans. Graph., Vol. 34, 6 (2015), 248:1--248:16. https://doi.org/10.1145/2816795.2818013
[38]
William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. In Spec. Interest Group Comput. Graph. Interactive Technol., Maureen C. Stone (Ed.). ACM Press, 163--169. https://doi.org/10.1145/37401.37422
[39]
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision. In Proc. Int. Conf. 3D Vis. Computer Vision Foundation / IEEE, 506--516. https://doi.org/10.1109/3DV.2017.00064
[40]
Gyeongsik Moon and Kyoung Mu Lee. 2020. I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12352), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 752--768. https://doi.org/10.1007/978-3-030-58571-6_44
[41]
OpenAI. 2023. GPT-4 Technical Report. arXiv Comput. Res. Repository, Vol. abs/2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774 [arXiv]2303.08774
[42]
Ahmed A. A. Osman, Timo Bolkart, and Michael J. Black. 2020. STAR: Sparse Trained Articulated Human Body Regressor. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12351), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 598--613. https://doi.org/10.1007/978-3-030-58539-6_36
[43]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proc. Adv. Neural Inf. Process. Syst., Vol. 32.
[44]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body From a Single Image. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 10975--10985. https://doi.org/10.1109/CVPR.2019.01123
[45]
Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to Estimate 3D Human Pose and Shape From a Single Color Image. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 459--468. https://doi.org/10.1109/CVPR.2018.00055
[46]
Jonas Pfeiffer, Ivan Vulic, Iryna Gurevych, and Sebastian Ruder. 2020. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proc. Conf. Empirical Meth. in Natural Lang. Proc., Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 7654--7673. https://doi.org/10.18653/v1/2020.emnlp-main.617
[47]
Michael S. Ryoo, A. J. Piergiovanni, Anurag Arnab, Mostafa Dehghani, and Anelia Angelova. 2021. TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? arXiv Comput. Res. Repository, Vol. abs/2106.11297 (2021). showeprint[arXiv]2106.11297 https://arxiv.org/abs/2106.11297
[48]
Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Hao Li, and Angjoo Kanazawa. 2019. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 2304--2314. https://doi.org/10.1109/ICCV.2019.00239
[49]
Shunsuke Saito, Tomas Simon, Jason M. Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 81--90. https://doi.org/10.1109/CVPR42600.2020.00016
[50]
Jie Song, Xu Chen, and Otmar Hilliges. 2020. Human Body Model Fitting by Learned Gradient Descent. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 12365), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer-Verlag, 744--760. https://doi.org/10.1007/978-3-030-58565-5_44
[51]
Robin Strudel, Ricardo Garcia Pinel, Ivan Laptev, and Cordelia Schmid. 2021. Segmenter: Transformer for Semantic Segmentation. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 7242--7252. https://doi.org/10.1109/ICCV48922.2021.00717
[52]
Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, and Tao Mei. 2021. Monocular, One-stage, Regression of Multiple 3D People. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 11159--11168. https://doi.org/10.1109/ICCV48922.2021.01099
[53]
Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 5348--5357. https://doi.org/10.1109/ICCV.2019.00545
[54]
Gül Varol, Duygu Ceylan, Bryan C. Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. BodyNet: Volumetric Inference of 3D Human Body Shapes. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 11211), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer-Verlag, 20--38. https://doi.org/10.1007/978-3-030-01234-2_2
[55]
Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera. In Proc. Eur. Conf. Comput. Vis. (Lecture Notes in Computer Science, Vol. 11214), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer-Verlag, 614--631. https://doi.org/10.1007/978-3-030-01249-6_37
[56]
Yikai Wang, TengQi Ye, Lele Cao, Wenbing Huang, Fuchun Sun, Fengxiang He, and Dacheng Tao. 2022. Bridged Transformer for Vision and Point Cloud 3D Object Detection. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 12104--12113. https://doi.org/10.1109/CVPR52688.2022.01180
[57]
Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J. Black. 2022. ICON: Implicit Clothed humans Obtained from Normals. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 13286--13296. https://doi.org/10.1109/CVPR52688.2022.01294
[58]
Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T. Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2020. GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. In Proc. IEEE Conf. Comput. Vis. Patt. Recogn. Computer Vision Foundation / IEEE, 6183--6192. https://doi.org/10.1109/CVPR42600.2020.00622
[59]
Youze Xue, Jiansheng Chen, Yudong Zhang, Cheng Yu, Huimin Ma, and Hongbing Ma. 2022. 3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers. In Proc. ACM Int. Conf. Multimedia, João Magalhães, Alberto Del Bimbo, Shin'ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni (Eds.). ACM Press, 6765--6773. https://doi.org/10.1145/3503161.3548133
[60]
Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T. Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2021. THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 12951--12960. https://doi.org/10.1109/ICCV48922.2021.01273
[61]
Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. 2021. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In Proc. IEEE Int. Conf. Comput. Vis. Computer Vision Foundation / IEEE, 11426--11436. https://doi.org/10.1109/ICCV48922.2021.01125
[62]
Ce Zheng, Matias Mendieta, Taojiannan Yang, and Chen Chen. 2022. HeatER: An Efficient and Unified Network for Human Reconstruction via Heatmap-based TransformER.

Cited By

View all
  • (2024)HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh RecoveryProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681641(6093-6102)Online publication date: 28-Oct-2024
  • (2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024

Index Terms

  1. SimHMR: A Simple Query-based Framework for Parameterized Human Mesh Reconstruction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. human mesh reconstruction
    2. semantic query
    3. vision transformers

    Qualifiers

    • Research-article

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)80
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh RecoveryProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681641(6093-6102)Online publication date: 28-Oct-2024
    • (2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media