Skip to main content

Animatable Human Rendering from Monocular Video via Pose-Independent Deformation

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15036))

Included in the following conference series:

  • 135 Accesses

Abstract

Rendering animatable avatars from monocular videos has significant applications in the broader field of interactive entertainment. Previous methods based on Neural Radiance Field (NeRF) struggle with long training time and tend to overfit on seen poses. To address this, we introduce PID-NeRF, a novel framework with a Pose-Independent Deformation (PID) module. Specifically, PID module learns a multi-entity shared skinning prior and optimizes instance-level non-rigid offsets in UV-H space, which is independent of human motion. The pose-independence enable our model unify the backward and forward human skeleton deformations in same network parameters, increasing the generalizability of our skinning prior. Additionally, a bounded segment modeling (BSM) strategy is utilized with a window function to smooth overlapping regions of bounding boxes, to balance the training speed and rendering quality. Extensive experiments demonstrate that our method achieves better results than the state-of- the-art methods in novel-view and novel-pose synthesis on multiple datasets.

This work is supported by Guangdong Basic and Applied Basic Research Foundation Under Grant No. 2024A1515011741.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Easymocap-make human motion capture easier. Github (2021). https://github.com/zju3dv/EasyMocap

  2. Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3d people models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8387–8397 (2018)

    Google Scholar 

  3. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5470–5479 (2022)

    Google Scholar 

  4. Cao, A., Johnson, J.: Hexplane: a fast representation for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 130–141 (2023)

    Google Scholar 

  5. Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: European Conference on Computer Vision, pp. 333–350. Springer (2022)

    Google Scholar 

  6. Drebin, R.A., Carpenter, L., Hanrahan, P.: Volume rendering. ACM Siggraph Comput. Graph. 22(4), 65–74 (1988)

    Google Scholar 

  7. Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12479–12488 (2023)

    Google Scholar 

  8. Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5501–5510 (2022)

    Google Scholar 

  9. Geng, C., Peng, S., Xu, Z., Bao, H., Zhou, X.: Learning neural volumetric representations of dynamic humans in minutes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8770 (2023)

    Google Scholar 

  10. Işık, M., et al.: Humanrf: high-fidelity neural radiance fields for humans in motion (2023). arXiv:2305.06356

  11. Jiang, T., Chen, X., Song, J., Hilliges, O.: Instantavatar: learning avatars from monocular video in 60 seconds (2022)

    Google Scholar 

  12. Jiang, W., Yi, K.M., Samei, G., Tuzel, O., Ranjan, A.: Neuman: Neural human radiance field from a single video. In: European Conference on Computer Vision, pp. 402–418. Springer (2022)

    Google Scholar 

  13. Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., Theobalt, C.: Neural actor: neural free-view synthesis of human actors with pose control. ACM Trans. Graph. (TOG) 40(6), 1–16 (2021)

    Google Scholar 

  14. Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)

    Article  Google Scholar 

  15. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34(6) (2015)

    Google Scholar 

  16. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)

    Article  Google Scholar 

  17. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41(4), 1–15 (2022)

    Article  Google Scholar 

  18. Peng, S., et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9054–9063 (2021)

    Google Scholar 

  19. Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10318–10327 (2021)

    Google Scholar 

  20. Reiser, C., Peng, S., Liao, Y., Geiger, A.: Kilonerf: speeding up neural radiance fields with thousands of tiny mlps. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14335–14345 (2021)

    Google Scholar 

  21. Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)

    Google Scholar 

  22. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  23. Weng, C.Y., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: Humannerf: free-viewpoint rendering of moving people from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16210–16220 (2022)

    Google Scholar 

  24. Yi, T., Fang, J., Wang, X., Liu, W.: Generalizable neural voxels for fast human radiance fields (2023). arxiv:2303.15387

  25. Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5752–5761 (2021)

    Google Scholar 

  26. Yu, Z., Cheng, W., Liu, X., Wu, W., Lin, K.Y.: Monohuman: animatable human neural field from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16943–16953 (2023)

    Google Scholar 

  27. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  28. Zheng, Z., Huang, H., Yu, T., Zhang, H., Guo, Y., Liu, Y.: Structured local radiance fields for human avatar modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15893–15903 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongyu Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1246 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duan, T., Jiang, Z., Ma, Z., Zhang, D. (2025). Animatable Human Rendering from Monocular Video via Pose-Independent Deformation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15036. Springer, Singapore. https://doi.org/10.1007/978-981-97-8508-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8508-7_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8507-0

  • Online ISBN: 978-981-97-8508-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics