Animatable Human Rendering from Monocular Video via Pose-Independent Deformation

Duan, Tong; Jiang, Zekai; Ma, Zipei; Zhang, Dongyu

doi:10.1007/978-981-97-8508-7_17

Tong Duan¹⁵,
Zekai Jiang¹⁵,
Zipei Ma¹⁵ &
…
Dongyu Zhang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15036))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

135 Accesses

Abstract

Rendering animatable avatars from monocular videos has significant applications in the broader field of interactive entertainment. Previous methods based on Neural Radiance Field (NeRF) struggle with long training time and tend to overfit on seen poses. To address this, we introduce PID-NeRF, a novel framework with a Pose-Independent Deformation (PID) module. Specifically, PID module learns a multi-entity shared skinning prior and optimizes instance-level non-rigid offsets in UV-H space, which is independent of human motion. The pose-independence enable our model unify the backward and forward human skeleton deformations in same network parameters, increasing the generalizability of our skinning prior. Additionally, a bounded segment modeling (BSM) strategy is utilized with a window function to smooth overlapping regions of bounding boxes, to balance the training speed and rendering quality. Extensive experiments demonstrate that our method achieves better results than the state-of- the-art methods in novel-view and novel-pose synthesis on multiple datasets.

This work is supported by Guangdong Basic and Applied Basic Research Foundation Under Grant No. 2024A1515011741.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images

MeshAvatar: Learning High-Quality Triangular Human Avatars from Multi-view Videos

Relightable Neural Actor with Intrinsic Decomposition and Pose Control

References

Easymocap-make human motion capture easier. Github (2021). https://github.com/zju3dv/EasyMocap
Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3d people models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8387–8397 (2018)
Google Scholar
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5470–5479 (2022)
Google Scholar
Cao, A., Johnson, J.: Hexplane: a fast representation for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 130–141 (2023)
Google Scholar
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: European Conference on Computer Vision, pp. 333–350. Springer (2022)
Google Scholar
Drebin, R.A., Carpenter, L., Hanrahan, P.: Volume rendering. ACM Siggraph Comput. Graph. 22(4), 65–74 (1988)
Google Scholar
Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12479–12488 (2023)
Google Scholar
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5501–5510 (2022)
Google Scholar
Geng, C., Peng, S., Xu, Z., Bao, H., Zhou, X.: Learning neural volumetric representations of dynamic humans in minutes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8770 (2023)
Google Scholar
Işık, M., et al.: Humanrf: high-fidelity neural radiance fields for humans in motion (2023). arXiv:2305.06356
Jiang, T., Chen, X., Song, J., Hilliges, O.: Instantavatar: learning avatars from monocular video in 60 seconds (2022)
Google Scholar
Jiang, W., Yi, K.M., Samei, G., Tuzel, O., Ranjan, A.: Neuman: Neural human radiance field from a single video. In: European Conference on Computer Vision, pp. 402–418. Springer (2022)
Google Scholar
Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., Theobalt, C.: Neural actor: neural free-view synthesis of human actors with pose control. ACM Trans. Graph. (TOG) 40(6), 1–16 (2021)
Google Scholar
Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34(6) (2015)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41(4), 1–15 (2022)
Article Google Scholar
Peng, S., et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9054–9063 (2021)
Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10318–10327 (2021)
Google Scholar
Reiser, C., Peng, S., Liao, Y., Geiger, A.: Kilonerf: speeding up neural radiance fields with thousands of tiny mlps. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14335–14345 (2021)
Google Scholar
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Weng, C.Y., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: Humannerf: free-viewpoint rendering of moving people from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16210–16220 (2022)
Google Scholar
Yi, T., Fang, J., Wang, X., Liu, W.: Generalizable neural voxels for fast human radiance fields (2023). arxiv:2303.15387
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5752–5761 (2021)
Google Scholar
Yu, Z., Cheng, W., Liu, X., Wu, W., Lin, K.Y.: Monohuman: animatable human neural field from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16943–16953 (2023)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zheng, Z., Huang, H., Yu, T., Zhang, H., Guo, Y., Liu, Y.: Structured local radiance fields for human avatar modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15893–15903 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Tong Duan, Zekai Jiang, Zipei Ma & Dongyu Zhang

Authors

Tong Duan
View author publications
You can also search for this author in PubMed Google Scholar
Zekai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zipei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Dongyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongyu Zhang .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1246 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, T., Jiang, Z., Ma, Z., Zhang, D. (2025). Animatable Human Rendering from Monocular Video via Pose-Independent Deformation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15036. Springer, Singapore. https://doi.org/10.1007/978-981-97-8508-7_17

Download citation

DOI: https://doi.org/10.1007/978-981-97-8508-7_17
Published: 03 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8507-0
Online ISBN: 978-981-97-8508-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Animatable Human Rendering from Monocular Video via Pose-Independent Deformation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images

MeshAvatar: Learning High-Quality Triangular Human Avatars from Multi-view Videos

Relightable Neural Actor with Intrinsic Decomposition and Pose Control

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1246 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Animatable Human Rendering from Monocular Video via Pose-Independent Deformation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images

MeshAvatar: Learning High-Quality Triangular Human Avatars from Multi-view Videos

Relightable Neural Actor with Intrinsic Decomposition and Pose Control

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1246 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation