skip to main content
10.1145/3613424.3614288acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Treelet Prefetching For Ray Tracing

Published:08 December 2023Publication History

ABSTRACT

Ray tracing is traditionally only used in offline rendering to produce images of high fidelity because it is computationally expensive. Recent Graphics Processing Units (GPUs) have included dedicated accelerators to bring ray tracing to real-time rendering for video games and other graphics applications. These accelerators focus on finding the closest intersection between a ray and a scene using a hierarchical tree data structure called a Bounding Volume Hierarchy (BVH) tree. However, BVH tree traversal is still very costly due to divergent rays accessing different parts of the tree, with each ray following a unique pointer-chasing sequence that is difficult to optimize with traditional methods. To address this, we propose treelet prefetching to reduce the latency of ray traversal. Treelets are smaller subtrees created by splitting the BVH tree. When a ray visits a treelet root node, we prefetch the corresponding treelet, enabling deeper levels of the tree to be fetched in advance. This reduces the latency associated with pointer-chasing during tree traversal. Our approach uses a hardware prefetcher with a two-stack treelet based traversal algorithm, maximizing the benefits of treelet prefetching. Our simulation results show treelet prefetching on average improves performance of the baseline RT Unit in Vulkan-Sim by 32.1% while maintaining the same power consumption.

References

  1. 2020. Unreal Engine 4 Ray Tracing Features Settings. Retrieved April 22, 2023 from https://docs.unrealengine.com/4.26/en-US/RenderingAndGraphics/RayTracing/RayTracingSettings/#: :text=Ray%20Tracing-, Samples%20Per%20Pixel,sample%20per%20pixel%20by%20default.Google ScholarGoogle Scholar
  2. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. Retrieved April 27, 2023 from https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdfGoogle ScholarGoogle Scholar
  3. 2022. Real-Time Ray Tracing on Intel Arc Graphics. Retrieved April 27, 2023 from https://game.intel.com/story/intel-arc-graphics-ray-tracing/Google ScholarGoogle Scholar
  4. 2023. NVIDIA ADA GPU ARCHITECTURE. Retrieved April 27, 2023 from https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvidia-ada-gpu-architecture.pdfGoogle ScholarGoogle Scholar
  5. Timo Aila and Tero Karras. 2010. Architecture considerations for tracing incoherent rays. In Proc. ACM Conf. on High Performance Graphics (HPG). 113–122.Google ScholarGoogle Scholar
  6. Sam Ainsworth and Timothy M. Jones. 2016. Graph Prefetching Using Data Structure Knowledge. In Proc. ACM Conf. on Supercomputing (ICS).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cesar Avalos Baddouh, Mahmoud Khairy, Roland N Green, Mathias Payer, and Timothy G Rogers. 2021. Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proc. IEEE Symp. on Perf. Analysis of Systems and Software (ISPASS). 163–174.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kristof Beets. 2021. Rays Your Game: Introduction to the PowerVR Photon Architecture. https://imaginationtech.com/products/gpu/graphics-architecture/powervr-photon/Google ScholarGoogle Scholar
  10. Carsten Benthin, Ingo Wald, Sven Woop, and Attila T. Áfra. 2018. Compressed-Leaf Bounding Volume Hierarchies. In Proc. ACM Conf. on High Performance Graphics (HPG).Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. John Burgess. 2020. RTX on—the NVIDIA Turing GPU. IEEE Micro 40, 2 (2020), 36–44.Google ScholarGoogle ScholarCross RefCross Ref
  12. Tien-Fu Chen and Jean-Loup Baer. 1995. Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers (TOC) (1995).Google ScholarGoogle Scholar
  13. Min Feng, Changhui Lin, and Rajiv Gupta. 2012. PLDS: Partitioning Linked Data Structures for Parallelism. ACM Transactions on Architecture and Code Optimization (TACO) (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J.W.C. Fu, J.H. Patel, and B.L. Janssens. 1992. Stride Directed Prefetching In Scalar Processors. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarGoogle Scholar
  15. Kirill Garanzha and Charles Loop. 2010. Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing. Computer Graphics Forum (2010).Google ScholarGoogle Scholar
  16. Kirill Garanzha and Charles Loop. 2010. Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing. Computer Graphics Forum 29, 2 (2010), 289–298.Google ScholarGoogle ScholarCross RefCross Ref
  17. Hui Guo, Libo Huang, Yashuai Yashuai Lü, Jianqiao Ma, Cheng Qian, Sheng Ma, and Zhiying Wang. 2018. Accelerating BFS via Data Structure-Aware Prefetching on GPU. IEEE Access (2018).Google ScholarGoogle ScholarCross RefCross Ref
  18. Michael Guthe. 2014. Latency Considerations of Depth-first GPU Ray Tracing. In Eurographics 2014 - Short Papers.Google ScholarGoogle Scholar
  19. N.P. Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  20. James T. Kajiya. 1986. The Rendering Equation. In Proc. Int’l Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH). 143–150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G Rogers, Tor M Aamodt, and Nikos Hardavellas. 2021. AccelWattch: A Power Modeling Framework for Modern GPUs. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 738–753.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gunjae Koo, Hyeran Jeon, Zhenhong Liu, Nam Sung Kim, and Murali Annavaram. 2018. CTA-Aware Prefetching and Scheduling for GPU. In Proc. IEEE Int’l Parallel and Distributed Processing Symp. (IPDPS).Google ScholarGoogle ScholarCross RefCross Ref
  23. Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2015. Memory considerations for low energy ray tracing. In Computer Graphics Forum, Vol. 34. 47–59.Google ScholarGoogle Scholar
  24. Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc. 2010. Many-Thread Aware Prefetching Mechanisms for GPGPU Applications. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Gabor Liktor and Karthik Vaidyanathan. 2016. Bandwidth-Efficient BVH Layout for Incremental Hardware Traversal. In Proc. ACM Conf. on High Performance Graphics (HPG).Google ScholarGoogle Scholar
  26. Daqi Lin, Konstantin Shkurko, Ian Mallett, and Cem Yuksel. 2019. Dual-Split Trees. In Proc. ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D). Article 3, 9 pages.Google ScholarGoogle Scholar
  27. Lufei Liu, Wesley Chang, Francois Demoullin, Yuan Hsi Chou, Mohammadreza Saed, David Pankratz, Tyler Nowicki, and Tor M Aamodt. 2021. Intersection Prediction for Accelerated GPU Ray Tracing. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 709–723.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lufei Liu, Mohammadreza Saed, Yuan Hsi Chou, Davit Grigoryan, Tyler Nowicki, and Tor M. Aamodt. 2023. LumiBench: A Benchmark Suite for Hardware Ray Tracing. In Proc. IEEE Symp. on Workload Characterization (IISWC).Google ScholarGoogle ScholarCross RefCross Ref
  29. Peng Liu, Jiyang Yu, and Michael C. Huang. 2016. Thread-Aware Adaptive Prefetcher on Multicore Systems: Improving the Performance for Multithreaded Workloads. In ACM Transactions on Architecture and Code Optimization (TACO).Google ScholarGoogle Scholar
  30. Daniel Meister, Jakub Boksansky, Michael Guthe, and Jiri Bittner. 2020. On Ray Reordering Techniques for Faster GPU Ray Tracing. In Proc. ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D). 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Daniel Meister, Shinji Ogaki, Carsten Benthin, Michael J. Doyle, Michael Guthe, and Jirí Bittner. 2021. A Survey on Bounding Volume Hierarchies for Ray Tracing. Computer Graphics Forum (2021).Google ScholarGoogle Scholar
  32. Bochang Moon, Yongyoung Byun, Tae-Joon Kim, Pio Claudio, Hye-Sun Kim, Yun-Ji Ban, Seung Woo Nam, and Sung-Eui Yoon. 2010. Cache-Oblivious Ray Reordering. ACM Transactions on Graphics (TOG) (2010).Google ScholarGoogle Scholar
  33. Paul Arthur Navratil, Donald S. Fussell, Calvin Lin, and William R. Mark. 2007. Dynamic Ray Scheduling to Improve Ray Coherence and Bandwidth Utilization. In IEEE Symposium on Interactive Ray Tracing. 95–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K.J. Nesbit, A.S. Dhodapkar, and J.E. Smith. 2004. AC/DC: an adaptive data cache prefetcher. In Proc. IEEE/ACM Conf. on Par. Arch. and Comp. Tech. (PACT).Google ScholarGoogle Scholar
  35. K.J. Nesbit and J.E. Smith. 2004. Data Cache Prefetching Using a Global History Buffer. In Proc. IEEE Symp. on High-Perf. Computer Architecture (HPCA).Google ScholarGoogle Scholar
  36. Lars Nyland, John R. Nickolls, Gentaro Hirota, and Tanmoy Mandal. 2008. Systems and methods for coalescing memory accesses of parallel threads. Patent No. US20090240895A1, Filed Mar. 24th., 2008, Issued Dec. 27th., 2011.Google ScholarGoogle Scholar
  37. S. Palacharla and R.E. Kessler. 1994. Evaluating stream buffers as a secondary cache replacement. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  38. Matt Pharr and Greg Humphreys. 2018. Physically Based Rendering, Third Edition: From Theory To Implementation. Morgan Kaufmann Publishers Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan. 1997. Rendering Complex Scenes with Memory-Coherent Ray Tracing. In Proc. Int’l Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH). 101–108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shafiur Rahman, Nael Abu-Ghazaleh, and Rajiv Gupta. 2020. GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarGoogle ScholarCross RefCross Ref
  41. Mohammadreza Saed, Yuan Hsi Chou, Lufei Liu, Tyler Nowicki, and Tor M. Aamodt. 2022. Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ankit Sethia, Ganesh Dasika, Mehrzad Samadi, and Scott Mahlke. 2013. APOGEE: Adaptive prefetching on GPUs for energy efficiency. In Proc. IEEE/ACM Conf. on Par. Arch. and Comp. Tech. (PACT).Google ScholarGoogle Scholar
  43. Konstantin Shkurko, Tim Grant, Daniel Kopta, Ian Mallett, Cem Yuksel, and Erik Brunvand. 2017. Dual Streaming for Hardware-Accelerated Ray Tracing. In Proc. ACM Conf. on High Performance Graphics (HPG).Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. James E. Stine, Ivan Castellanos, Michael Wood, Jeff Henson, Fred Love, W. Rhett Davis, Paul D. Franzon, Michael Bucher, Sunil Basavarajaiah, Julie Oh, and Ravi Jenkal. 2007. FreePDK: An Open-Source Variation-Aware Design Kit. In IEEE International Conference on Microelectronic Systems Education.Google ScholarGoogle Scholar
  45. Nishil Talati, Kyle May, Armand Behroozi, Yichen Yang, Kuba Kaszyk, Christos Vasiladiotis, Tarunesh Verma, Lu Li, Brandon Nguyen, Jiawen Sun, John Magnus Morton, Agreen Ahmadi, Todd Austin, Michael O’Boyle, Scott Mahlke, Trevor Mudge, and Ronald Dreslinski. 2021. Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design. In Proc. IEEE Symp. on High-Perf. Computer Architecture (HPCA).Google ScholarGoogle ScholarCross RefCross Ref
  46. Ingo Wald, Sven Woop, Carsten Benthin, Gregory S. Johnson, and Manfred Ernst. 2014. Embree: A Kernel Framework for Efficient CPU Ray Tracing. ACM Transactions on Graphics (TOG) (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Pengyu Wang, Lu Zhang, Chao Li, and Minyi Guo. 2019. Excavating the Potential of GPU for Accelerating Graph Traversal. In Proc. IEEE Int’l Parallel and Distributed Processing Symp. (IPDPS).Google ScholarGoogle ScholarCross RefCross Ref
  48. Henri Ylitie, Tero Karras, and Samuli Laine. 2017. Efficient Incoherent Ray Traversal on GPUs through Compressed Wide BVHs. In Proc. ACM Conf. on High Performance Graphics (HPG).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Treelet Prefetching For Ray Tracing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
        October 2023
        1528 pages
        ISBN:9798400703294
        DOI:10.1145/3613424

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 December 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate484of2,242submissions,22%

        Upcoming Conference

        MICRO '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format