Abstract
Recent generalizable NeRF methods synthesize novel view images without optimizing per-scene via constructing radiation fields from 2D features. However, most of the existing methods are slow in the rendering process due to querying millions of 3D points to the NeRF model. In this paper, we propose a photorealistic novel view synthesis method with generalizable and efficient rendering. Specifically, given a set of multi-view images, we utilize a multi-scale scene geometry predictor consisting of MVS and NeRF to infer key points from coarse to fine. In addition, to obtain more accurate key point positions and features, we design an uncertainty-guided sampling strategy based on depth prediction and uncertainty perception. With the key points and scene geometry features, we propose a rendering network to synthesize full-resolution images. This process is fully differentiable, allowing us to train the network with only RGB images. Compared with state-of-the-art baselines, the experimental results show that our model is more efficient and has higher rendering quality on various synthetic and real datasets. With the multi-scale scene geometry predictor and uncertainty-aware sampling strategy, our approach infers geometry information efficiently and improves the rendering speed significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vision 120(2), 153–168 (2016)
Chen, A., et al.: MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo. In: CVPR, pp. 14124–14133 (2021)
Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: CVPR, pp. 1538–1547 (2019)
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: CVPR, pp. 2524–2534 (2020)
Chibane, J., Bansal, A., Lazova, V., Pons-Moll, G.: Stereo Radiance Fields (SRF): learning view synthesis for sparse views of novel scenes. In: CVPR, pp. 7911–7920 (2021)
De Bonet, J.S., Viola, P.: Poxels: probabilistic voxelized volume reconstruction. In: ICCV, vol. 2 (1999)
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2009)
Garbin, S.J., Kowalski, M., Johnson, M., Shotton, J., Valentin, J.: FastNeRF: high-fidelity neural rendering at 200 FPS. In: CVPR, pp. 14346–14355 (2021)
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: CVPR, pp. 2495–2504 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Im, S., Jeon, H.G., Lin, S., Kweon, I.S., et al.: DPSNet: end-to-end deep plane sweep stereo. In: ICLR (2019)
Johari, M.M., Lepoittevin, Y., Fleuret, F.: GeoNeRF: generalizing NeRF with geometry priors. In: CVPR, pp. 18365–18375 (2022)
Liu, Y., et al.: Neural rays for occlusion-aware image-based rendering. In: CVPR, pp. 7824–7833 (2022)
Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: NeRF in the wild: neural radiance fields for unconstrained photo collections. In: CVPR, pp. 7210–7219 (2021)
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Pan, X., Lai, Z., Song, S., Huang, G.: ActiveNeRF: learning where to see with uncertainty estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13693, pp. 230–246. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_14
Reiser, C., Peng, S., Liao, Y., Geiger, A.: KiloNeRF: speeding up neural radiance fields with thousands of tiny MLPs. In: CVPR, pp. 14335–14345 (2021)
Roessle, B., Barron, J.T., Mildenhall, B., Srinivasan, P.P., Nießner, M.: Dense depth priors for neural radiance fields from sparse input views. In: CVPR, pp. 12892–12901 (2022)
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Wadhwani, K., Kojima, T.: SqueezeNeRF: further factorized FastNeRF for memory-efficient inference. In: CVPR, pp. 2717–2725 (2022)
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: CVPR, pp. 4690–4699 (2021)
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5
Xu, Q., et al.: Point-NeRF: point-based neural radiance fields. In: CVPR, pp. 5438–5448 (2022)
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: CVPR, pp. 4877–4886 (2020)
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_47
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: PlenOctrees for real-time rendering of neural radiance fields. In: CVPR, pp. 5752–5761 (2021)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelNeRF: neural radiance fields from one or few images. In: CVPR, pp. 4578–4587 (2021)
Zhang, J., Yao, Y., Li, S., Luo, Z., Fang, T.: Visibility-aware multi-view stereo network. BMVC (2020)
Acknowledgements
This work was supported by the Natural Science Foundation of Guangdong Province, China No. 2022A1515010148.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mo, Z., Wu, W., Yu, W., Zhang, T., Ke, Z., Huang, J. (2023). Fast Generalizable Novel View Synthesis with Uncertainty-Aware Sampling. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14256. Springer, Cham. https://doi.org/10.1007/978-3-031-44213-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-44213-1_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44212-4
Online ISBN: 978-3-031-44213-1
eBook Packages: Computer ScienceComputer Science (R0)