Skip to main content

Advertisement

Log in

Animatable 3D Gaussians for modeling dynamic humans

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

We present an animatable 3D Gaussian representation for synthesizing high-fidelity human videos under novel views and poses in real time. Given multi-view videos of a human subject, we learn a collection of 3D Gaussians in the canonical space of the rest pose. Each Gaussian is associated with a few basic properties (i.e., position, opacity, scale, rotation, spherical harmonics coefficients) representing the average human appearance across all video frames, as well as a latent code and a set of blend weights for dynamic appearance correction and pose transformation. The latent code is fed to an Multi-layer Perceptron (MLP) with a target pose to correct Gaussians in the canonical space to capture appearance changes under the target pose. The corrected Gaussians are then transformed to the target pose using linear blend skinning (LBS) with their blend weights. High-fidelity human images under novel views and poses can be rendered in real time through Gaussian splatting. Compared to state-of-the-art NeRF-based methods, our animatable Gaussian representation produces more compelling results with well captured details, and achieves superior rendering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Loper M, Mahmood N, Romero J, Pons-Moll G, Black M J. SMPL: a skinned multi-person linear model. In: Whitton M C, ed. Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023, 851–866

    Chapter  Google Scholar 

  2. Allen B, Curless B, Popović Z. The space of human body shapes: reconstruction and parameterization from range scans. ACM Transactions on Graphics, 2003, 22(3): 587–594

    Article  MATH  Google Scholar 

  3. Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R. NeRF: representing scenes as neural radiance fields for view synthesis. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 405–421

    Google Scholar 

  4. Peng S, Dong J, Wang Q, Zhang S, Shuai Q, Zhou X, Bao H. Animatable neural radiance fields for modeling dynamic human bodies. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 14294–14303

    Google Scholar 

  5. Zhao F, Yang W, Zhang J, Lin P, Zhang Y, Yu J, Xu L. HumanNeRF: efficiently generated human radiance field from sparse inputs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 7733–7743

    MATH  Google Scholar 

  6. Chen Y, Wang X, Chen X, Zhang Q, Li X, Guo Y, Wang J, Wang F. UV volumes for real-time rendering of editable free-view human performance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 16621–16631

    MATH  Google Scholar 

  7. Peng S, Zhang Y, Xu Y, Wang Q, Shuai Q, Bao H, Zhou X. Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 9050–9059

    MATH  Google Scholar 

  8. Lin H, Peng S, Xu Z, Yan Y, Shuai Q, Bao H, Zhou X. Efficient neural radiance fields for interactive free-viewpoint video. In: Proceedings of the SIGGRAPH Asia 2022 Conference Papers. 2022, 39

    MATH  Google Scholar 

  9. Kerbl B, Kopanas G, Leimkuehler T, Drettakis G. 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 2023, 42(4): 139

    Article  Google Scholar 

  10. Yang Z, Gao X, Zhou W, Jiao S, Zhang Y, Jin X. Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction. 2023, arXiv preprint arXiv: 2309.13101

  11. Jacobson A, Deng Z, Kavan L, Lewis J P. Skinning: real-time shape deformation (full text not available). In: Proceedings of the ACM SIGGRAPH 2014 Courses. 2014, 24

    MATH  Google Scholar 

  12. Joo H, Simon T, Sheikh Y. Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 8320–8329

    MATH  Google Scholar 

  13. Osman A A A, Bolkart T, Black M J. STAR: sparse trained articulated human body regressor. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 598–613

    MATH  Google Scholar 

  14. Zhang C, Pujades S, Black M J, Pons-Moll G. Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 5484–5493

    MATH  Google Scholar 

  15. Guan P, Reiss L, Hirshberg D A, Weiss A, Black M J. DRAPE: dressing any PErson. ACM Transactions on Graphics, 2012, 31(4): 35

    Article  Google Scholar 

  16. Xu F, Liu Y, Stoll C, Tompkin J, Bharaj G, Dai Q, Seidel H P, Kautz J, Theobalt C. Video-based characters: creating new human performances from a multi-view video database. In: Proceedings of the ACM SIGGRAPH 2011 Papers. 2011, 32

    Google Scholar 

  17. Habermann M, Liu L, Xu W, Zollhoefer M, Pons-Moll G, Theobalt C. Real-time deep dynamic characters. ACM Transactions on Graphics, 2021, 40(4): 94

    Article  Google Scholar 

  18. Lombardi S, Simon T, Saragih J, Schwartz G, Lehrmann A, Sheikh Y. Neural volumes: learning dynamic renderable volumes from images. ACM Transactions on Graphics, 2019, 38(4): 65

    Article  Google Scholar 

  19. Wu M, Wang Y, Hu Q, Yu J. Multi-view neural human rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 1679–1688

    MATH  Google Scholar 

  20. Bagautdinov T, Wu C, Simon T, Prada F, Shiratori T, Wei S E, Xu W, Sheikh Y, Saragih J. Driving-signal aware full-body avatars. ACM Transactions on Graphics, 2021, 40(4): 143

    Article  Google Scholar 

  21. Ma S, Simon T, Saragih J, Wang D, Li Y, De La Torre F, Sheikh Y. Pixel codec avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 64–73

    Google Scholar 

  22. Yang G, Vo M, Neverova N, Ramanan D, Vedaldi A, Joo H. BANMo: building animatable 3D neural models from many casual videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 2853–2863

    Google Scholar 

  23. Xu Z, Peng S, Lin H, He G, Sun J, Shen Y, Bao H, Zhou X. 4K4D: realtime 4D view synthesis at 4K resolution. 2023, arXiv preprint arXiv: 2310.11448

  24. Xu Z, Peng S, Geng C, Mou L, Yan Z, Sun J, Bao H, Zhou X. Relightable and animatable neural avatar from sparse-view video. 2023, arXiv preprint arXiv: 2308.07903

  25. Peng B, Hu J, Zhou J, Gao X, Zhang J. IntrinsicNGP: intrinsic coordinate based hash encoding for human NeRF. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(8): 5679–5692

    Article  Google Scholar 

  26. Zheng Z, Huang H, Yu T, Zhang H, Guo Y, Liu Y. Structured local radiance fields for human avatar modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 15872–15882

    Google Scholar 

  27. Wang L, Zhang J, Liu X, Zhao F, Zhang Y, Zhang Y, Wu M, Yu J, Xu L. Fourier PlenOctrees for dynamic radiance field rendering in real-time. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 13514–13524

    MATH  Google Scholar 

  28. Jiang T, Chen X, Song J, Hilliges O. InstantAvatar: learning avatars from monocular video in 60 seconds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 16922–16932

    Google Scholar 

  29. Müller T, Evans A, Schied C, Keller A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 2022, 41(4): 102

    Article  MATH  Google Scholar 

  30. Buehler C, Bosse M, McMillan L, Gortler S, Cohen M. Unstructured lumigraph rendering. In: Whitton M C, ed. Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023, 52

    MATH  Google Scholar 

  31. Davis A, Levoy M, Durand F. Unstructured light fields. Computer Graphics Forum, 2012, 31(2pt1): 305–314

    Article  MATH  Google Scholar 

  32. Eisemann M, De Decker B, Magnor M, Bekaert P, De Aguiar E, Ahmed N, Theobalt C, Sellent A. Floating textures. Computer Graphics Forum, 2008, 27(2): 409–418

    Article  Google Scholar 

  33. Yu A, Li R, Tancik M, Li H, Ng R, Kanazawa A. PlenOctrees for realtime rendering of neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 5732–5741

    MATH  Google Scholar 

  34. Garbin S J, Kowalski M, Johnson M, Shotton J, Valentin J. FastNeRF: high-fidelity neural rendering at 200FPS. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 14326–14335

    Google Scholar 

  35. Zielonka W, Bagautdinov T, Saito S, Zollhöfer M, Thies J, Romero J. Drivable 3D Gaussian avatars. 2023, arXiv preprint arXiv: 2311.08581

  36. Li Z, Zheng Z, Wang L, Liu Y. Animatable Gaussians: learning pose-dependent Gaussian maps for high-fidelity human avatar modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024, 19711–19722

    Google Scholar 

  37. Wang L, Zhao X, Sun J, Zhang Y, Zhang H, Yu T, Liu Y. StyleAvatar: real-time photo-realistic portrait avatar from a single video. In: Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings. 2023, 67

    MATH  Google Scholar 

  38. Jena R, Iyer G S, Choudhary S, Smith B, Chaudhari P, Gee J. SplatArmor: articulated Gaussian splatting for animatable humans from monocular RGB videos. 2023, arXiv preprint arXiv: 2311.10812

  39. Moreau A, Song J, Dhamo H, Shaw R, Zhou Y, Pérez-Pellitero E. Human Gaussian splatting: real-time rendering of animatable avatars. 2023, arXiv preprint arXiv: 2311.17113

  40. Kocabas M, Chang J H R, Gabriel J, Tuzel O, Ranjan A. HUGS: human Gaussian splats. 2023, arXiv preprint arXiv: 2311.17910

  41. Hu S, Liu Z. GauHuman: articulated Gaussian splatting from monocular human videos. 2023, arXiv preprint arXiv: 2312.02973

  42. Lei J, Wang Y, Pavlakos G, Liu L, Daniilidis K. GART: Gaussian articulated template models. 2023, arXiv preprint arXiv: 2311.16099

  43. Hu L, Zhang H, Zhang Y, Zhou B, Liu B, Zhang S, Nie L. GaussianAvatar: towards realistic human avatar modeling from a single video via animatable 3D Gaussians. 2023, arXiv preprint arXiv: 2312.02134

  44. Xiang J, Gao X, Guo Y, Zhang J. FlashAvatar: high-fidelity digital avatar rendering at 300FPS. 2023, arXiv preprint arXiv: 2312.02214

  45. Lin S, Ryabtsev A, Sengupta S, Curless B, Seitz S, Kemelmacher-Shlizerman I. Real-time high-resolution background matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8758–8767

    Google Scholar 

  46. Shoemake K, Duff T. Matrix animation and polar decomposition. In: Proceedings of the Conference on Graphics Interface. 1992, 258–264

    MATH  Google Scholar 

  47. Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 586–595

    Google Scholar 

  48. Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325–1339

    Article  Google Scholar 

  49. Joo H, Simon T, Li X, Liu H, Tan L, Gui L, Banerjee S, Godisart T, Nabbe B, Matthews I, Kanade T, Nobuhara S, Sheikh Y. Panoptic studio: a massively multiview system for social interaction capture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(1): 190–204

    Article  Google Scholar 

  50. Liu Y, Li Z, Liu Y, Wang H. TexVocab: texture vocabulary-conditioned human avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 1715–1725

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2022YFF0902302) and the National Natural Science Foundation of China (Grant Nos. 62172357 & 62322209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanlin Weng.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Yukun Xu is a postgraduate at the State Key Lab of CAD&CG, Zhejiang University, China. His research interests include computer graphics and digital human.

Keyang Ye received the bachelor’s degree from National University of Defense Technology, China in 2022. He is currently working toward the PhD degree in the Graphics and Parallel Systems Lab of Zhejiang University, China. His research interests include animation and rendering.

Tianjia Shao is a professor in the State Key Laboratory of CAD&CG, Zhejiang University, China. Previously, he was a Lecturer in the School of Computing, University of Leeds, UK. He received his PhD in computer science from Institute for Advanced Study, Tsinghua University, and his BS from the Department of Automation, Tsinghua University, China. His research focuses on 3D modeling, digital human, and computer animation.

Yanlin Weng received the bachelor’s and master’s degrees in control science and engineering from Zhejiang University, China and the PhD degree in computer science from the University of Wisconsin - Milwaukee, USA. She is currently an associate professor with the School of Computer Science and Technology, Zhejiang University. Her research interests include computer graphics and multimedia.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Y., Ye, K., Shao, T. et al. Animatable 3D Gaussians for modeling dynamic humans. Front. Comput. Sci. 19, 199704 (2025). https://doi.org/10.1007/s11704-024-40497-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-40497-5

Keywords