skip to main content
10.1145/3595916.3626380acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception

Published: 01 January 2024 Publication History

Abstract

In recent years, neural radiance fields have exhibited impressive performance in novel view synthesis. However, exploiting complex network structures to achieve generalizable NeRF usually results in inefficient rendering. Existing methods for accelerating rendering directly employ simpler inference networks or fewer sampling points, leading to unsatisfactory synthesis quality. To address the challenge of balancing rendering speed and quality in generalizable NeRF, we propose a novel framework, NeRF-SDP, which achieves both efficiency and high fidelity by introducing scene depth perception. We incorporate more scene information into the radiance field by using our proposed geometry feature extraction and depth-encoded ray transformer to improve the model’s inference capabilities with sparse points. With the aid of scene depth perception, NeRF-SDP can better understand the scene’s structure, thus better reconstructing the objects’ edges with significantly fewer artifacts. Experimental results demonstrate that NeRF-SDP achieves comparable synthesis quality to state-of-the-art methods while significantly improving rendering efficiency. Furthermore, ablation studies confirm that the depth-encoded ray transformer enhances the model’s robustness to varying numbers of sampling points.

References

[1]
Benjamin Attal, Eliot Laidlaw, Aaron Gokaslan, Changil Kim, Christian Richardt, James Tompkin, and Matthew O’Toole. 2021. Törf: Time-of-flight radiance fields for dynamic scene view synthesis. Advances in Neural Information Processing Systems 34 (2021).
[2]
Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 425–432.
[3]
Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG) 32, 3 (2013), 1–12.
[4]
Gaurav Chaurasia, Olga Sorkine, and George Drettakis. 2011. Silhouette-Aware Warping for Image-Based Rendering. In Computer Graphics Forum, Vol. 30. Wiley Online Library, 1223–1232.
[5]
Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. 2021. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the International Conference on Computer Vision. 14124–14133.
[6]
Shu Chen, Junyao Li, Yang Zhang, and Beiji Zou. 2023. Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis. arXiv preprint arXiv:2304.05218 (2023).
[7]
Yurui Chen, Chun Gu, Feihu Zhang, and Li Zhang. 2023. Single-view Neural Radiance Fields with Depth Teacher. arXiv preprint arXiv:2303.09952 (2023).
[8]
Zheng Chen, Yan-Pei Cao, Yuan-Chen Guo, Chen Wang, Ying Shan, and Song-Hai Zhang. 2023. PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas. arXiv preprint arXiv:2306.01531 (2023).
[9]
Julian Chibane, Aayush Bansal, Verica Lazova, and Gerard Pons-Moll. 2021. Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7911–7920.
[10]
Paul E Debevec, Camillo J Taylor, and Jitendra Malik. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 11–20.
[11]
Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. 2022. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12882–12891.
[12]
Anthony G. Francis, Brandon Kinman, Krista Ann Reymann, Laura Downs, Nathan Koenig, Ryan M. Hickman, Thomas B. McHugh, and Vincent Olivier Vanhoucke (Eds.). 2022. Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[14]
Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG) 37, 6 (2018), 1–15.
[15]
Mohammad Mahdi Johari, Yann Lepoittevin, and François Fleuret. 2022. Geonerf: Generalizing nerf with geometry priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18365–18375.
[16]
James T Kajiya and Brian P Von Herzen. 1984. Ray tracing volume densities. ACM SIGGRAPH computer graphics 18, 3 (1984), 165–174.
[17]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[18]
Jiaxin Li, Zijian Feng, Qi She, Henghui Ding, Changhu Wang, and Gim Hee Lee. 2021. Mine: Towards continuous depth mpi with nerf for novel view synthesis. In Proceedings of the International Conference on Computer Vision. 12578–12588.
[19]
Haotong Lin, Sida Peng, Zhen Xu, Yunzhi Yan, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2022. Efficient Neural Radiance Fields for Interactive Free-viewpoint Video. In SIGGRAPH Asia Conference Proceedings.
[20]
Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, and Wenping Wang. 2022. Neural rays for occlusion-aware image-based rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7824–7833.
[21]
Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–14.
[22]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision. Springer, 405–421.
[23]
Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Joerg H. Mueller, Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, and Markus Steinberger. 2021. DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks. Computer Graphics Forum 40, 4 (2021). https://doi.org/10.1111/cgf.14340
[24]
Malte Prinzler, Otmar Hilliges, and Justus Thies. 2023. DINER: Depth-aware Image-based NEural Radiance Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[25]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.
[26]
René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021. Vision transformers for dense prediction. In Proceedings of the International Conference on Computer Vision. 12179–12188.
[27]
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. 2021. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In Proceedings of the International Conference on Computer Vision. 10901–10911.
[28]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.
[29]
Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision. Springer, 501–518.
[30]
Yue Shi, Dingyi Rong, Bingbing Ni, Chang Chen, and Wenjun Zhang. 2022. GARF: Geometry-Aware Generalized Neural Radiance Field. arXiv preprint arXiv:2212.02280 (2022).
[31]
Sudipta Sinha, Drew Steedly, and Rick Szeliski. 2009. Piecewise planar stereo for image-based rendering. In Proceedings of the International Conference on Computer Vision. 1881–1888.
[32]
Alex Trevithick and Bo Yang. 2021. Grf: Learning a general radiance field for 3d representation and rendering. In Proceedings of the International Conference on Computer Vision. 15182–15192.
[33]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[34]
Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Ziwei Liu. 2023. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. arXiv preprint arXiv:2303.16196 (2023).
[35]
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. 2021. Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4690–4699.
[36]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[37]
Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu, and Jie Zhou. 2021. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In Proceedings of the International Conference on Computer Vision. 5610–5619.
[38]
Wenqi Xian, Jia-Bin Huang, Johannes Kopf, and Changil Kim. 2021. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9421–9431.
[39]
Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. 2022. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5438–5448.
[40]
Ganlin Yang, Guoqiang Wei, Zhizheng Zhang, Yan Lu, and Dong Liu. 2023. MRVM-NeRF: Mask-Based Pretraining for Neural Radiance Fields. arXiv preprint arXiv:2304.04962 (2023).
[41]
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision. 767–783.
[42]
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4578–4587.
[43]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595.

Cited By

View all
  • (2024)GS2-GNeSF: Geometry-Semantics Synergy for Generalizable Neural Semantic FieldsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681156(8884-8892)Online publication date: 28-Oct-2024

Index Terms

  1. NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
    December 2023
    745 pages
    ISBN:9798400702051
    DOI:10.1145/3595916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. depth
    2. fast rendering
    3. generalization
    4. neural radiance field
    5. novel view synthesis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    MMAsia '23
    Sponsor:
    MMAsia '23: ACM Multimedia Asia
    December 6 - 8, 2023
    Tainan, Taiwan

    Acceptance Rates

    Overall Acceptance Rate 59 of 204 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)128
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GS2-GNeSF: Geometry-Semantics Synergy for Generalizable Neural Semantic FieldsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681156(8884-8892)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media