research-article

NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception

Authors:

Wenjun ZhangAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 11, Pages 1 - 7

https://doi.org/10.1145/3595916.3626380

Published: 01 January 2024 Publication History

Abstract

In recent years, neural radiance fields have exhibited impressive performance in novel view synthesis. However, exploiting complex network structures to achieve generalizable NeRF usually results in inefficient rendering. Existing methods for accelerating rendering directly employ simpler inference networks or fewer sampling points, leading to unsatisfactory synthesis quality. To address the challenge of balancing rendering speed and quality in generalizable NeRF, we propose a novel framework, NeRF-SDP, which achieves both efficiency and high fidelity by introducing scene depth perception. We incorporate more scene information into the radiance field by using our proposed geometry feature extraction and depth-encoded ray transformer to improve the model’s inference capabilities with sparse points. With the aid of scene depth perception, NeRF-SDP can better understand the scene’s structure, thus better reconstructing the objects’ edges with significantly fewer artifacts. Experimental results demonstrate that NeRF-SDP achieves comparable synthesis quality to state-of-the-art methods while significantly improving rendering efficiency. Furthermore, ablation studies confirm that the depth-encoded ray transformer enhances the model’s robustness to varying numbers of sampling points.

References

[1]

Benjamin Attal, Eliot Laidlaw, Aaron Gokaslan, Changil Kim, Christian Richardt, James Tompkin, and Matthew O’Toole. 2021. Törf: Time-of-flight radiance fields for dynamic scene view synthesis. Advances in Neural Information Processing Systems 34 (2021).

[2]

Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 425–432.

Digital Library

[3]

Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG) 32, 3 (2013), 1–12.

Digital Library

[4]

Gaurav Chaurasia, Olga Sorkine, and George Drettakis. 2011. Silhouette-Aware Warping for Image-Based Rendering. In Computer Graphics Forum, Vol. 30. Wiley Online Library, 1223–1232.

[5]

Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. 2021. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the International Conference on Computer Vision. 14124–14133.

[6]

Shu Chen, Junyao Li, Yang Zhang, and Beiji Zou. 2023. Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis. arXiv preprint arXiv:2304.05218 (2023).

[7]

Yurui Chen, Chun Gu, Feihu Zhang, and Li Zhang. 2023. Single-view Neural Radiance Fields with Depth Teacher. arXiv preprint arXiv:2303.09952 (2023).

[8]

Zheng Chen, Yan-Pei Cao, Yuan-Chen Guo, Chen Wang, Ying Shan, and Song-Hai Zhang. 2023. PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas. arXiv preprint arXiv:2306.01531 (2023).

[9]

Julian Chibane, Aayush Bansal, Verica Lazova, and Gerard Pons-Moll. 2021. Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7911–7920.

[10]

Paul E Debevec, Camillo J Taylor, and Jitendra Malik. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 11–20.

Digital Library

[11]

Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. 2022. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12882–12891.

[12]

Anthony G. Francis, Brandon Kinman, Krista Ann Reymann, Laura Downs, Nathan Koenig, Ryan M. Hickman, Thomas B. McHugh, and Vincent Olivier Vanhoucke (Eds.). 2022. Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[14]

Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG) 37, 6 (2018), 1–15.

Digital Library

[15]

Mohammad Mahdi Johari, Yann Lepoittevin, and François Fleuret. 2022. Geonerf: Generalizing nerf with geometry priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18365–18375.

[16]

James T Kajiya and Brian P Von Herzen. 1984. Ray tracing volume densities. ACM SIGGRAPH computer graphics 18, 3 (1984), 165–174.

[17]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[18]

Jiaxin Li, Zijian Feng, Qi She, Henghui Ding, Changhu Wang, and Gim Hee Lee. 2021. Mine: Towards continuous depth mpi with nerf for novel view synthesis. In Proceedings of the International Conference on Computer Vision. 12578–12588.

[19]

Haotong Lin, Sida Peng, Zhen Xu, Yunzhi Yan, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2022. Efficient Neural Radiance Fields for Interactive Free-viewpoint Video. In SIGGRAPH Asia Conference Proceedings.

[20]

Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, and Wenping Wang. 2022. Neural rays for occlusion-aware image-based rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7824–7833.

[21]

Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–14.

Digital Library

[22]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision. Springer, 405–421.

Digital Library

[23]

Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Joerg H. Mueller, Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, and Markus Steinberger. 2021. DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks. Computer Graphics Forum 40, 4 (2021). https://doi.org/10.1111/cgf.14340

[24]

Malte Prinzler, Otmar Hilliges, and Justus Thies. 2023. DINER: Depth-aware Image-based NEural Radiance Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[25]

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.

[26]

René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021. Vision transformers for dense prediction. In Proceedings of the International Conference on Computer Vision. 12179–12188.

[27]

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. 2021. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In Proceedings of the International Conference on Computer Vision. 10901–10911.

[28]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.

[29]

Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision. Springer, 501–518.

[30]

Yue Shi, Dingyi Rong, Bingbing Ni, Chang Chen, and Wenjun Zhang. 2022. GARF: Geometry-Aware Generalized Neural Radiance Field. arXiv preprint arXiv:2212.02280 (2022).

[31]

Sudipta Sinha, Drew Steedly, and Rick Szeliski. 2009. Piecewise planar stereo for image-based rendering. In Proceedings of the International Conference on Computer Vision. 1881–1888.

[32]

Alex Trevithick and Bo Yang. 2021. Grf: Learning a general radiance field for 3d representation and rendering. In Proceedings of the International Conference on Computer Vision. 15182–15192.

[33]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).

[34]

Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Ziwei Liu. 2023. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. arXiv preprint arXiv:2303.16196 (2023).

[35]

Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. 2021. Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4690–4699.

[36]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.

Digital Library

[37]

Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu, and Jie Zhou. 2021. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In Proceedings of the International Conference on Computer Vision. 5610–5619.

[38]

Wenqi Xian, Jia-Bin Huang, Johannes Kopf, and Changil Kim. 2021. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9421–9431.

[39]

Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. 2022. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5438–5448.

[40]

Ganlin Yang, Guoqiang Wei, Zhizheng Zhang, Yan Lu, and Dong Liu. 2023. MRVM-NeRF: Mask-Based Pretraining for Neural Radiance Fields. arXiv preprint arXiv:2304.04962 (2023).

[41]

Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision. 767–783.

Digital Library

[42]

Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4578–4587.

[43]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595.

Cited By

Wang CZhao NCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)GS2-GNeSF: Geometry-Semantics Synergy for Generalizable Neural Semantic FieldsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681156(8884-8892)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681156

Index Terms

NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception
1. Computing methodologies
  1. Computer graphics
    1. Rendering

Recommendations

Adaptive Shells for Efficient Neural Radiance Field Rendering

Neural radiance fields achieve unprecedented quality for novel view synthesis, but their volumetric formulation remains expensive, requiring a huge number of samples to render high-resolution images. Volumetric encodings are essential to represent fuzzy ...
Fast radiance field reconstruction from sparse inputs
Abstract
Neural Radiance Field (NeRF) has emerged as a powerful method in data-driven 3D reconstruction because of its simplicity and state-of-the-art performance. However, NeRF requires densely captured calibrated images and lengthy training and ...
Highlights
- A fast radiance field reconstruction method from a sparse set of images with silhouettes.
- An explicit–implicit radiance field representation integrating NeRF and Shape from Silhouette.
- Voxel dilating and pruning schemes refining ...
DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Neural Radiance Field (NeRF) and its variants have exhibited great success on representing 3D scenes and synthesizing photo-realistic novel views. However, they are generally based on the pinhole camera model and assume all-in-focus inputs. This limits ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Fundamental Research Funds for the Central Universities

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
184
Total Downloads

Downloads (Last 12 months)128
Downloads (Last 6 weeks)5

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang CZhao NCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)GS2-GNeSF: Geometry-Semantics Synergy for Generalizable Neural Semantic FieldsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681156(8884-8892)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681156

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten