skip to main content
10.1145/3664647.3681393acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering

Published: 28 October 2024 Publication History

Abstract

We propose Interpretable Neural Radiance Fields (InNeRF) for generalizable 3D scene representation and rendering. In contrast to previous image-based rendering, which used two independent working processes of pooling-based fusion and MLP-based rendering, our framework unifies source-view fusion and target-view rendering processes via an end-to-end interpretable Transformer-based network. InNeRF enables the investigation of deep relationships between the target-rendering view and source views that were previously neglected by pooling-based fusion and fragmented rendering procedures. As a result, InNeRF improves model interpretability by enhancing the shape and appearance consistency of a 3D scene in both the surrounding view space and the ray-cast space. For a query rendering 3D point, InNeRF integrates both its projected 2D pixels from the surrounding source views and its adjacent 3D points along the query ray and simultaneously decodes this information into the query 3D point representation. Experiments show that InNeRF outperforms state-of-the-art image-based neural rendering methods in both scene-agnostic and per-scene finetuning scenarios, especially when there is a considerable disparity between source views and rendering views. The interpretation experiment shows that InNeRF can explain a query rendering process.

References

[1]
Haotian Bai, Yiqi Lin, Yize Chen, and Lin Wang. 2023. Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8785--8795.
[2]
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. 2023. Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 19697--19705.
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[4]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, Springer International Publishing, Cham, 213--229.
[5]
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. 2024. pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19457--19467.
[6]
Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. 2021. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14124--14133.
[7]
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In International Conference on Machine Learning. PMLR, 1691--1703.
[8]
Zhang Chen, Zhong Li, Liangchen Song, Lele Chen, Jingyi Yu, Junsong Yuan, and Yi Xu. 2023. NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4182--4194.
[9]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR).
[10]
Yiming Gao, Yan-Pei Cao, and Ying Shan. 2023. SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes. arXiv preprint arXiv:2304.08971 (2023).
[11]
Georgia Gkioxari, Jitendra Malik, and Justin Johnson. 2019. Mesh r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9785--9795.
[12]
Wenbo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao, Xiao Liu, and Yuewen Ma. 2023. Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 19774--19783.
[13]
L. Jiang, Z. Yang, S. Shi, V. Golyanik, D. Dai, and B. Schiele. 2023. Self-Supervised Pre-Training with Masked Shape Prediction for 3D Scene Understanding. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1168--1178.
[14]
Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, and Jiaya Jia. 2022. Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8500--8509.
[15]
Gengyan Li, Abhimitra Meka, Franziska Mueller, Marcel C. Buehler, Otmar Hilliges, and Thabo Beeler. 2022. EyeNeRF: A Hybrid Representation for Photorealistic Synthesis, Animation and Relighting of Human Eyes. ACM Trans. Graph., Vol. 41, 4 (2022).
[16]
Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, and Noah Snavely. 2023. Dynibar: Neural dynamic image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4273--4284.
[17]
Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, and Zhiguo Cao. 2024. Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7654--7663.
[18]
Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, and Song Han. 2023. FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Xiaoxu Meng, Weikai Chen, and Bo Yang. 2023. NeAT: Learning Neural Implicit Surfaces with Arbitrary Topologies from Multi-view Images. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (June 2023).
[20]
Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines. ACM Transactions on Graphics (TOG) (2019).
[21]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405--421.
[22]
Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. 2020. Polygen: An autoregressive generative model of 3d meshes. In International Conference on Machine Learning. PMLR, 7220--7229.
[23]
Thu Nguyen-Phuoc, Feng Liu, and Lei Xiao. 2022. SNeRF: Stylized Neural Implicit Representations for 3D Scenes. ACM Trans. Graph., Vol. 41, 4 (2022).
[24]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652--660.
[25]
Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S. Refaat, Rami Al-Rfou, and Benjamin Sapp. 2023. MotionLM: Multi-Agent Motion Forecasting as Language Modeling. arxiv: 2309.16534 [cs.CV]
[26]
Qing Shuai, Chen Geng, Qi Fang, Sida Peng, Wenhao Shen, Xiaowei Zhou, and Hujun Bao. 2022. Novel View Synthesis of Human Interactions from Sparse Multi-View Videos. In ACM SIGGRAPH 2022 Conference Proceedings (Vancouver, BC, Canada) (SIGGRAPH '22). Association for Computing Machinery, New York, NY, USA.
[27]
Mukund Varma T, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, and Zhangyang Wang. 2023. Is Attention All That NeRF Needs?. In International Conference on Learning Representations (ICLR).
[28]
Alex Trevithick and Bo Yang. 2021. Grf: Learning a general radiance field for 3d representation and rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15182--15192.
[29]
Dan Wang, Xinrui Cui, Xun Chen, Rabab Ward, and Z. Jane Wang. 2021. Interpreting Bottom-Up Decision-Making of CNNs via Hierarchical Inference. IEEE Transactions on Image Processing, Vol. 30 (2021), 6701--6714.
[30]
Dan Wang, Xinrui Cui, Xun Chen, Zhengxia Zou, Tianyang Shi, Septimiu Salcudean, Z. Jane Wang, and Rabab Ward. 2021. Multi-View 3D Reconstruction With Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 5722--5731.
[31]
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. 2021. Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4690--4699.
[32]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, Vol. 13, 4 (2004), 600--612.
[33]
Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, and Weidi Xie. 2023. Learning open-vocabulary semantic segmentation models from natural language supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2935--2944.
[34]
Bo Yang, Stefano Rosa, Andrew Markham, Niki Trigoni, and Hongkai Wen. 2018. Dense 3D object reconstruction from a single depth view. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 12 (2018), 2820--2834.
[35]
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4578--4587.
[36]
Heng Yu, Joel Julin, Zoltan A Milacski, Koichiro Niinuma, and László A Jeni. 2023. DyLiN: Making Light Field Networks Dynamic. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12397--12406.
[37]
H. Yu, Z. Qin, J. Hou, M. Saleh, D. Li, B. Busam, and S. Ilic. 2023. Rotation-Invariant Transformer for Point Cloud Matching. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5384--5393.
[38]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

Index Terms

  1. InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
      October 2024
      11719 pages
      ISBN:9798400706868
      DOI:10.1145/3664647
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. network interpretability
      2. neural rendering

      Qualifiers

      • Research-article

      Conference

      MM '24
      Sponsor:
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne VIC, Australia

      Acceptance Rates

      MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 238
        Total Downloads
      • Downloads (Last 12 months)238
      • Downloads (Last 6 weeks)186
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media