Skip to main content

Novel View Synthesis of Dynamic Human with Sparse Cameras

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13069))

Included in the following conference series:

  • 2046 Accesses

Abstract

This paper proposes a new method to synthesize a novel view of a human in motion. For image-based rendering, the challenging problem is to synthesize an image of a novel view with sparse images. As the number of cameras decreases, there will exist missing regions in the synthetic image. To address this challenge, we use a skinned multi-person linear model (SMPL) model to represent the surface and posture of the human body in motion and correlate the images of the human in different poses. If the missing pixel at the novel view is visible at other times, we can use spatio-temporal sequence information to complete it. Therefore, we choose images from different frames to synthesize images of the novel view. Then, we use deformable convolutional network to align these images and take advantage of ConvLSTM to perform temporal aggregation. Finally, we can obtain a more realistic free-view image of the human. This method allows us to freely move the camera view in time and space to synthesize free-view video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, S., et al.: Building Rome in a day. Commun. ACM 54(10), 105–112 (2011)

    Article  Google Scholar 

  2. Bansal, A., Vo, M., Sheikh, Y., Ramanan, D., Narasimhan, S.: 4D visualization of dynamic events from unconstrained multi-view videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5366–5375 (2020)

    Google Scholar 

  3. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  4. Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520 (2017)

    Google Scholar 

  5. Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. (ToG) 34(4), 1–13 (2015)

    Article  Google Scholar 

  6. Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)

    Google Scholar 

  7. Frahm, J., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27

    Chapter  Google Scholar 

  8. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)

    Google Scholar 

  9. Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1434–1441. IEEE (2010)

    Google Scholar 

  10. Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 770–785 (2018)

    Google Scholar 

  11. Guo, K., et al.: The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans. Graph. (TOG) 38(6), 1–19 (2019)

    Google Scholar 

  12. Hedman, P., Philip, J., Price, T., Frahm, J.M., Drettakis, G., Brostow, G.: Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. (TOG) 37(6), 1–15 (2018)

    Article  Google Scholar 

  13. Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8320–8329 (2018)

    Google Scholar 

  14. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  15. Lempitsky, V., Ivanov, D.: Seamless mosaicing of image-based texture maps. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6. IEEE (2007)

    Google Scholar 

  16. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)

    Article  Google Scholar 

  17. Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)

    Article  Google Scholar 

  18. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24

    Chapter  Google Scholar 

  19. Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352 (2015)

    Google Scholar 

  20. Riegler, G., Koltun, V.: Free view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 623–640. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_37

    Chapter  Google Scholar 

  21. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

    Google Scholar 

  22. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 2015, pp. 802–810 (2015)

    Google Scholar 

  23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  24. Waechter, M., Moehrle, N., Goesele, M.: Let there be color! Large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_54

    Chapter  Google Scholar 

  25. Wang, K., Shen, S.: MVDepthNet: real-time multiview depth estimation neural network. In: 2018 International conference on 3D vision (3DV), pp. 248–257. IEEE (2018)

    Google Scholar 

  26. Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1954–1963 (2019)

    Google Scholar 

  27. Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., Xu, C.: Zooming Slow-Mo: fast and accurate one-stage space-time video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3370–3379 (2020)

    Google Scholar 

  28. Yoon, J.S., Kim, K., Gallo, O., Park, H.S., Kautz, J.: Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5336–5345 (2020)

    Google Scholar 

  29. Yu, T., et al.: BodyFusion: real-time capture of human motion and surface geometry using a single depth camera. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 910–919 (2017)

    Google Scholar 

  30. Yu, T., et al.: DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7287–7296 (2018)

    Google Scholar 

  31. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)

    Google Scholar 

Download references

Acknowlegements

The authors acknowledge much support by the National Nature Science Foundation of China (No. 61931012), Open Program of National Key Laboratory of Science and Technology on Space Intelligent Control (KGJZDSYS-2018-02), and Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX20-0822, SJCX20-0255).

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lv, X., Wang, Y., Xu, F., Nie, J., Xu, F., Gao, H. (2021). Novel View Synthesis of Dynamic Human with Sparse Cameras. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13069. Springer, Cham. https://doi.org/10.1007/978-3-030-93046-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93046-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93045-5

  • Online ISBN: 978-3-030-93046-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics