Novel View Synthesis of Dynamic Human with Sparse Cameras

Lv, Xun; Wang, Yuan; Xu, Feiyi; Nie, Jianhui; Xu, Feng; Gao, Hao

doi:10.1007/978-3-030-93046-2_37

Xun Lv¹⁴,
Yuan Wang¹⁴,
Feiyi Xu¹⁴,
Jianhui Nie¹⁴,
Feng Xu^15,16 &
…
Hao Gao^14,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13069))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

2046 Accesses

Abstract

This paper proposes a new method to synthesize a novel view of a human in motion. For image-based rendering, the challenging problem is to synthesize an image of a novel view with sparse images. As the number of cameras decreases, there will exist missing regions in the synthetic image. To address this challenge, we use a skinned multi-person linear model (SMPL) model to represent the surface and posture of the human body in motion and correlate the images of the human in different poses. If the missing pixel at the novel view is visible at other times, we can use spatio-temporal sequence information to complete it. Therefore, we choose images from different frames to synthesize images of the novel view. Then, we use deformable convolutional network to align these images and take advantage of ConvLSTM to perform temporal aggregation. Finally, we can obtain a more realistic free-view image of the human. This method allows us to freely move the camera view in time and space to synthesize free-view video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, S., et al.: Building Rome in a day. Commun. ACM 54(10), 105–112 (2011)
Article Google Scholar
Bansal, A., Vo, M., Sheikh, Y., Ramanan, D., Narasimhan, S.: 4D visualization of dynamic events from unconstrained multi-view videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5366–5375 (2020)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520 (2017)
Google Scholar
Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. (ToG) 34(4), 1–13 (2015)
Article Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Frahm, J., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27
Chapter Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)
Google Scholar
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1434–1441. IEEE (2010)
Google Scholar
Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 770–785 (2018)
Google Scholar
Guo, K., et al.: The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans. Graph. (TOG) 38(6), 1–19 (2019)
Google Scholar
Hedman, P., Philip, J., Price, T., Frahm, J.M., Drettakis, G., Brostow, G.: Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. (TOG) 37(6), 1–15 (2018)
Article Google Scholar
Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8320–8329 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lempitsky, V., Ivanov, D.: Seamless mosaicing of image-based texture maps. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6. IEEE (2007)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)
Article Google Scholar
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)
Article Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352 (2015)
Google Scholar
Riegler, G., Koltun, V.: Free view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 623–640. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_37
Chapter Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 2015, pp. 802–810 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Waechter, M., Moehrle, N., Goesele, M.: Let there be color! Large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_54
Chapter Google Scholar
Wang, K., Shen, S.: MVDepthNet: real-time multiview depth estimation neural network. In: 2018 International conference on 3D vision (3DV), pp. 248–257. IEEE (2018)
Google Scholar
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1954–1963 (2019)
Google Scholar
Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., Xu, C.: Zooming Slow-Mo: fast and accurate one-stage space-time video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3370–3379 (2020)
Google Scholar
Yoon, J.S., Kim, K., Gallo, O., Park, H.S., Kautz, J.: Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5336–5345 (2020)
Google Scholar
Yu, T., et al.: BodyFusion: real-time capture of human motion and surface geometry using a single depth camera. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 910–919 (2017)
Google Scholar
Yu, T., et al.: DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7287–7296 (2018)
Google Scholar
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
Google Scholar

Download references

Acknowlegements

The authors acknowledge much support by the National Nature Science Foundation of China (No. 61931012), Open Program of National Key Laboratory of Science and Technology on Space Intelligent Control (KGJZDSYS-2018-02), and Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX20-0822, SJCX20-0255).

Author information

Authors and Affiliations

Nanjing University of Posts and Telecommunications, Nanjing, China
Xun Lv, Yuan Wang, Feiyi Xu, Jianhui Nie & Hao Gao
Tsinghua University, Beijing, China
Feng Xu
Hangzhou Zhouxi Institute of Brain and Intelligence, Hangzhou, China
Feng Xu & Hao Gao

Authors

Xun Lv
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feiyi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhui Nie
View author publications
You can also search for this author in PubMed Google Scholar
Feng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Yiran Chen
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
University of British Columbia, Vancouver, BC, Canada
Jane Wang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Ruiping Wang
Xidian University, Xi’an, China
Weisheng Dong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lv, X., Wang, Y., Xu, F., Nie, J., Xu, F., Gao, H. (2021). Novel View Synthesis of Dynamic Human with Sparse Cameras. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13069. Springer, Cham. https://doi.org/10.1007/978-3-030-93046-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-93046-2_37
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93045-5
Online ISBN: 978-3-030-93046-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Novel View Synthesis of Dynamic Human with Sparse Cameras