skip to main content
10.1145/3579371.3589056acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

NeuRex: A Case for Neural Rendering Acceleration

Published:17 June 2023Publication History

ABSTRACT

This paper presents NeuRex, an accelerator architecture that efficiently performs the modern neural rendering pipeline with an algorithmic enhancement and supporting hardware. NeuRex leverages the insights from an in-depth analysis of the state-of-the-art neural scene representation to make the multi-resolution hash encoding, which is the key operational primitive in modern neural renderings, more hardware-friendly and features a specialized hash encoding engine that enables us to effectively perform the primitive and the overall rendering pipeline. We implement and synthesize NeuRex using a commercial 28nm process technology and evaluate two versions of NeuRex (NeuRex-Edge, NeuRex-Server) on a range of scenes with different image resolutions for mobile and high-end computing platforms. Our evaluation shows that NeuRex achieves up to 9.88× and 3.11× speedups against the mobile and high-end consumer GPUs with a substantially small area overhead and lower energy consumption.

References

  1. 1994. The Stanford 3D Scanning Repository. https://graphics.stanford.edu/data/3Dscanrep/Google ScholarGoogle Scholar
  2. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-Layer CNN Accelerators. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  4. JEDEC Solid State Technology Association. 2014. JEDEC Standard JESD209-4: Low Power Double Data Rate 4 (LPDDR4). JEDEC, Virginia, USA.Google ScholarGoogle Scholar
  5. JEDEC Solid State Technology Association. 2015. JEDEC Standard JESD235A: High Bandwidth Memory (HBM) DRAM. JEDEC, Virginia, USA.Google ScholarGoogle Scholar
  6. Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P. Srinivasan. 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  7. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models Are Few-Shot Learners. In Conference on Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  8. John Burgess. 2020. RTX on---The NVIDIA Turing GPU. IEEE Micro (2020).Google ScholarGoogle ScholarCross RefCross Ref
  9. Karthik Chandrasekar, Christian Weis, Yonghui Li, Sven Goossens, Matthias Jung, Omar Naji, Benny Akesson, Norbert Wehn, and Kees Goossens. 2012. DRAMPower: Open-source DRAM power & Energy Estimation Tool. http://www.drampower.infoGoogle ScholarGoogle Scholar
  10. Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. 2022. TensoRF: Tensorial Radiance Fields. In European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  11. Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning Continuous Image Representation with Local Implicit Image Function. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  12. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  13. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).Google ScholarGoogle Scholar
  15. Yu Feng, Boyuan Tian, Tiancheng Xu, Paul Whatmough, and Yuhao Zhu. 2020. Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarCross RefCross Ref
  16. Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger. 2018. A Configurable Cloud-Scale DNN Processor for Real-Time AI. In ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. 2022. Plenoxels: Radiance Fields Without Neural Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  18. Stephan J. Garbin, Marek Kowalski, Matthew Johnson, Jamie Shotton, and Julien P. C. Valentin. 2021. FastNeRF: High-Fidelity Neural Rendering at 200FPS. In IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  19. Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, and T. N. Vijaykumar. 2019. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks. In 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MIRCO).Google ScholarGoogle Scholar
  20. Tae Jun Ham, Yejin Lee, Seong Hoon Seo, Soosung Kim, Hyunji Choi, Sung Jun Jung, and Jae W. Lee. 2021. ELSA: Hardware-Software Co-Design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks. In ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  21. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Conference on Neural Information Processing Systems (NeurIPS).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Peter Hedman, Pratul P. Srinivasan, Ben Mildenhall, Jonathan T. Barron, and Paul Debevec. 2021. Baking Neural Radiance Fields for Real-Time View Synthesis. In IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  24. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, and Xuan Zhang. 2020. RecNMP: Accelerating Personalized Recommendation with near-Memory Processing. In ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters (CAL) (2016).Google ScholarGoogle Scholar
  27. Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction. ACM Transactions on Graphics (SIGGRAPH) (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lufei Liu, Wesley Chang, Francois Demoullin, Yuan Hsi Chou, Mohammadreza Saed, David Pankratz, Tyler Nowicki, and Tor M. Aamodt. 2021. Intersection Prediction for Accelerated GPU Ray Tracing. In 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  30. Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. 2020. Neural Sparse Voxel Fields. In Conference on Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  31. Yashuai Lü, Libo Huang, Li Shen, and Zhiying Wang. 2017. Unleashing the Power of GPU for Physically-Based Rendering via Dynamic Ray Shuffling. In 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Julien N. P. Martel, David B. Lindell, Connor Z. Lin, Eric R. Chan, Marco Monteiro, and Gordon Wetzstein. 2021. ACORN: Adaptive Coordinate Networks for Neural Scene Representation. ACM Transactions on Graphics (SIGGRAPH) (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yiqun Mei, Yuchen Fan, and Yuqian Zhou. 2021. Image Super-Resolution With Non-Local Sparse Attention. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  34. Micron. 2016. Automotive LPDDR4/LPDDR4X SDRAM. Micron Technology, Inc, Boise, USA.Google ScholarGoogle Scholar
  35. Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Duncan J.M Moss, Srivatsan Krishnan, Eriko Nurvitadhi, Piotr Ratuszniak, Chris Johnson, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and Philip H.W. Leong. 2018. A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle Scholar
  37. Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Transactions on Graphics (SIGGRAPH) (2022).Google ScholarGoogle Scholar
  38. Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Joerg H. Mueller, Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, and Markus Steinberger. 2021. DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks. Computer Graphics Forum (EGSR) (2021).Google ScholarGoogle Scholar
  39. Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. 2020. Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  40. Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, and Debbie Marr. 2016. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. In International Conference on Field-Programmable Technology (FPT).Google ScholarGoogle ScholarCross RefCross Ref
  41. Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, and Guy Boudoukh. 2017. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. NVIDIA. 2018. NVIDIA Xavier System-on-Chip, HotChips 30.Google ScholarGoogle Scholar
  43. NVIDIA. 2020. GeForce RTX 3070 Family. Retrieved April 10, 2023 from https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3070-3070ti/Google ScholarGoogle Scholar
  44. Mike O'Connor, Niladrish Chatterjee, Donghyuk Lee, John Wilson, Aditya Agrawal, Stephen W. Keckler, and William J. Dally. 2017. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems. In 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  45. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  46. Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  47. Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  48. C. Reiser, S. Peng, Y. Liao, and A. Geiger. 2021. KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs. In IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  49. Alexander Reshetov, Alexei Soupikov, and Jim Hurley. 2005. Multi-Level Ray Tracing Algorithm. ACM Transactions on Graphics (SIGGRAPH) (2005).Google ScholarGoogle Scholar
  50. Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Srinivas Devadas, Ronald Dreslinski, Christopher Peikert, and Daniel Sanchez. 2021. F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption. In 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Vincent Sitzmann, Michael Zollhoefer, and Gordon Wetzstein. 2019. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In Conference on Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  52. Cheng Sun, Min Sun, and Hwann-Tzong Chen. 2022. Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  53. Synopsys. 2023. Design Compiler - Synopsys. Retrieved April 10, 2023 from https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/dc-ultra.htmlGoogle ScholarGoogle Scholar
  54. Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, and Sanja Fidler. 2021. Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  55. Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, Tomas Simon, Christian Theobalt, Matthias Niessner, Jonathan T. Barron, Gordon Wetzstein, Michael Zollhoefer, and Vladislav Golyanik. 2021. Advances in Neural Rendering. Google ScholarGoogle ScholarCross RefCross Ref
  56. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Conference on Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  57. Y. Yao, Z. Luo, S. Li, J. Zhang, Y. Ren, L. Zhou, T. Fang, and L. Quan. 2020. BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  58. Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. 2021. PlenOctrees for Real-time Rendering of Neural Radiance Fields. In IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  59. Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarCross RefCross Ref
  60. Han Zhao, Weihao Cui, Quan Chen, Jieru Zhao, Jingwen Leng, and Minyi Guo. 2021. Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks. In IEEE 39th International Conference on Computer Design (ICCD).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. NeuRex: A Case for Neural Rendering Acceleration

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture
        June 2023
        1225 pages
        ISBN:9798400700958
        DOI:10.1145/3579371

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 June 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate543of3,203submissions,17%

        Upcoming Conference

        ISCA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader