Multi-view Consistency View Synthesis

Wu, Xiaodi; Zhang, Zhiqiang; Yu, Wenxin; Chen, Shiyu; Gao, Yufei; Chen, Peng; Gong, Jun

doi:10.1007/978-981-99-8148-9_25

Xiaodi Wu¹⁰,
Zhiqiang Zhang¹⁰,
Wenxin Yu¹⁰,
Shiyu Chen¹⁰,
Yufei Gao¹¹,
Peng Chen¹² &
…
Jun Gong¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1966))

Included in the following conference series:

International Conference on Neural Information Processing

474 Accesses

Abstract

Novel view synthesis (NVS) aims to synthesize photo-realistic images depicting a scene by utilizing existing source images. The core objective is that the synthesized images are supposed to be as close as possible to the scene content. In recent years, various approaches shift the focus towards the visual effect of images in continuous space or time. While current methods for static scenes treat the rendering of images as isolated processes, neglecting the geometric consistency in static scenes. This usually results in incoherent visual experiences like flicker or artifacts in synthesized image sequences. To address this limitation, we propose Multi-View Consistency View Synthesis (MCVS). MCVS leverages long short-term memory (LSTM) and self-attention mechanism to model the spatial correlation between synthesized images, hence forcing them closer to the ground truth. MCVS not only enhances multi-view consistency but also improves the overall quality of the synthesized images. The proposed method is evaluated on the Tanks and Temples dataset, and the FVS dataset. On average, the Learned Perceptual Image Patch Similarity (LPIPS) is better than state-of-the-art approaches by 0.14 to 0.16%, indicating the superiority of our approach.

This Research is Supported by National Key Research and Development Program from Ministry of Science and Technology of the PRC (No. 2021ZD0110600), Sichuan Science and Technology Program (No. 2022ZYD0116), Sichuan Provincial M. C. Integration Office Program, and IEDA Laboratory of SWUST.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aliev, K.-A., Sevastopolsky, A., Kolos, M., Ulyanov, D., Lempitsky, V.: Neural point-based graphics. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXII. LNCS, vol. 12367, pp. 696–712. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_42
Chapter Google Scholar
Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520 (2017)
Google Scholar
Flynn, J., Neulander, I., Philbin, J., Snavely, N.: Deepstereo: learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5515–5524 (2016)
Google Scholar
Garbin, S.J., Kowalski, M., Johnson, M., Shotton, J., Valentin, J.: FastNeRF: high-fidelity neural rendering at 200FPS. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14346–14355 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jena, S., Multon, F., Boukhayma, A.: Neural mesh-based graphics. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022, Part III. LNCS, vol. 13803, pp. 739–757. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_45
Chapter Google Scholar
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. (ToG) 36(4), 1–13 (2017)
Article Google Scholar
Lin, Z., Li, M., Zheng, Z., Cheng, Y., Yuan, C.: Self-attention ConvLSTM for spatiotemporal prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11531–11538 (2020)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Reiser, C., Peng, S., Liao, Y., Geiger, A.: KiloNeRF: speeding up neural radiance fields with thousands of tiny MLPS. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14335–14345 (2021)
Google Scholar
Riegler, G., Koltun, V.: Free view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XIX. LNCS, vol. 12364, pp. 623–640. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_37
Chapter Google Scholar
Riegler, G., Koltun, V.: Stable view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12216–12225 (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part III. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Solovev, P., Khakhulin, T., Korzhenkov, D.: Self-improving multiplane-to-layer images for novel view synthesis. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4309–4318 (2023)
Google Scholar
Suhail, M., Esteves, C., Sigal, L., Makadia, A.: Generalizable patch-based neural rendering. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 156–174. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_10
Chapter Google Scholar
Wang, Y., Long, M., Wang, J., Gao, Z., Yu, P.S.: PredRNN: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Zhang, K., Riegler, G., Snavely, N., Koltun, V.: NeRF++: analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492 (2020)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Southwest University of Science and Technology, Mianyang, China
Xiaodi Wu, Zhiqiang Zhang, Wenxin Yu & Shiyu Chen
Hosei University, Chiyoda City, Japan
Yufei Gao
Chengdu Hongchengyun Technology Co., Ltd., Chengdu, China
Peng Chen
Southwest Automation Research Institute, San Antonio, USA
Jun Gong

Authors

Xiaodi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yufei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhiqiang Zhang or Wenxin Yu .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X. et al. (2024). Multi-view Consistency View Synthesis. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_25

Download citation

DOI: https://doi.org/10.1007/978-981-99-8148-9_25
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics