ABSTRACT
Ray tracing is traditionally only used in offline rendering to produce images of high fidelity because it is computationally expensive. Recent Graphics Processing Units (GPUs) have included dedicated accelerators to bring ray tracing to real-time rendering for video games and other graphics applications. These accelerators focus on finding the closest intersection between a ray and a scene using a hierarchical tree data structure called a Bounding Volume Hierarchy (BVH) tree. However, BVH tree traversal is still very costly due to divergent rays accessing different parts of the tree, with each ray following a unique pointer-chasing sequence that is difficult to optimize with traditional methods. To address this, we propose treelet prefetching to reduce the latency of ray traversal. Treelets are smaller subtrees created by splitting the BVH tree. When a ray visits a treelet root node, we prefetch the corresponding treelet, enabling deeper levels of the tree to be fetched in advance. This reduces the latency associated with pointer-chasing during tree traversal. Our approach uses a hardware prefetcher with a two-stack treelet based traversal algorithm, maximizing the benefits of treelet prefetching. Our simulation results show treelet prefetching on average improves performance of the baseline RT Unit in Vulkan-Sim by 32.1% while maintaining the same power consumption.
- 2020. Unreal Engine 4 Ray Tracing Features Settings. Retrieved April 22, 2023 from https://docs.unrealengine.com/4.26/en-US/RenderingAndGraphics/RayTracing/RayTracingSettings/#: :text=Ray%20Tracing-, Samples%20Per%20Pixel,sample%20per%20pixel%20by%20default.Google Scholar
- 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. Retrieved April 27, 2023 from https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdfGoogle Scholar
- 2022. Real-Time Ray Tracing on Intel Arc Graphics. Retrieved April 27, 2023 from https://game.intel.com/story/intel-arc-graphics-ray-tracing/Google Scholar
- 2023. NVIDIA ADA GPU ARCHITECTURE. Retrieved April 27, 2023 from https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvidia-ada-gpu-architecture.pdfGoogle Scholar
- Timo Aila and Tero Karras. 2010. Architecture considerations for tracing incoherent rays. In Proc. ACM Conf. on High Performance Graphics (HPG). 113–122.Google Scholar
- Sam Ainsworth and Timothy M. Jones. 2016. Graph Prefetching Using Data Structure Knowledge. In Proc. ACM Conf. on Supercomputing (ICS).Google ScholarDigital Library
- Cesar Avalos Baddouh, Mahmoud Khairy, Roland N Green, Mathias Payer, and Timothy G Rogers. 2021. Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarDigital Library
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proc. IEEE Symp. on Perf. Analysis of Systems and Software (ISPASS). 163–174.Google ScholarCross Ref
- Kristof Beets. 2021. Rays Your Game: Introduction to the PowerVR Photon Architecture. https://imaginationtech.com/products/gpu/graphics-architecture/powervr-photon/Google Scholar
- Carsten Benthin, Ingo Wald, Sven Woop, and Attila T. Áfra. 2018. Compressed-Leaf Bounding Volume Hierarchies. In Proc. ACM Conf. on High Performance Graphics (HPG).Google ScholarDigital Library
- John Burgess. 2020. RTX on—the NVIDIA Turing GPU. IEEE Micro 40, 2 (2020), 36–44.Google ScholarCross Ref
- Tien-Fu Chen and Jean-Loup Baer. 1995. Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers (TOC) (1995).Google Scholar
- Min Feng, Changhui Lin, and Rajiv Gupta. 2012. PLDS: Partitioning Linked Data Structures for Parallelism. ACM Transactions on Architecture and Code Optimization (TACO) (2012).Google ScholarDigital Library
- J.W.C. Fu, J.H. Patel, and B.L. Janssens. 1992. Stride Directed Prefetching In Scalar Processors. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google Scholar
- Kirill Garanzha and Charles Loop. 2010. Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing. Computer Graphics Forum (2010).Google Scholar
- Kirill Garanzha and Charles Loop. 2010. Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing. Computer Graphics Forum 29, 2 (2010), 289–298.Google ScholarCross Ref
- Hui Guo, Libo Huang, Yashuai Yashuai Lü, Jianqiao Ma, Cheng Qian, Sheng Ma, and Zhiying Wang. 2018. Accelerating BFS via Data Structure-Aware Prefetching on GPU. IEEE Access (2018).Google ScholarCross Ref
- Michael Guthe. 2014. Latency Considerations of Depth-first GPU Ray Tracing. In Eurographics 2014 - Short Papers.Google Scholar
- N.P. Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA).Google Scholar
- James T. Kajiya. 1986. The Rendering Equation. In Proc. Int’l Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH). 143–150.Google ScholarDigital Library
- Vijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G Rogers, Tor M Aamodt, and Nikos Hardavellas. 2021. AccelWattch: A Power Modeling Framework for Modern GPUs. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 738–753.Google ScholarDigital Library
- Gunjae Koo, Hyeran Jeon, Zhenhong Liu, Nam Sung Kim, and Murali Annavaram. 2018. CTA-Aware Prefetching and Scheduling for GPU. In Proc. IEEE Int’l Parallel and Distributed Processing Symp. (IPDPS).Google ScholarCross Ref
- Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2015. Memory considerations for low energy ray tracing. In Computer Graphics Forum, Vol. 34. 47–59.Google Scholar
- Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc. 2010. Many-Thread Aware Prefetching Mechanisms for GPGPU Applications. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarDigital Library
- Gabor Liktor and Karthik Vaidyanathan. 2016. Bandwidth-Efficient BVH Layout for Incremental Hardware Traversal. In Proc. ACM Conf. on High Performance Graphics (HPG).Google Scholar
- Daqi Lin, Konstantin Shkurko, Ian Mallett, and Cem Yuksel. 2019. Dual-Split Trees. In Proc. ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D). Article 3, 9 pages.Google Scholar
- Lufei Liu, Wesley Chang, Francois Demoullin, Yuan Hsi Chou, Mohammadreza Saed, David Pankratz, Tyler Nowicki, and Tor M Aamodt. 2021. Intersection Prediction for Accelerated GPU Ray Tracing. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 709–723.Google ScholarDigital Library
- Lufei Liu, Mohammadreza Saed, Yuan Hsi Chou, Davit Grigoryan, Tyler Nowicki, and Tor M. Aamodt. 2023. LumiBench: A Benchmark Suite for Hardware Ray Tracing. In Proc. IEEE Symp. on Workload Characterization (IISWC).Google ScholarCross Ref
- Peng Liu, Jiyang Yu, and Michael C. Huang. 2016. Thread-Aware Adaptive Prefetcher on Multicore Systems: Improving the Performance for Multithreaded Workloads. In ACM Transactions on Architecture and Code Optimization (TACO).Google Scholar
- Daniel Meister, Jakub Boksansky, Michael Guthe, and Jiri Bittner. 2020. On Ray Reordering Techniques for Faster GPU Ray Tracing. In Proc. ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D). 1–9.Google ScholarDigital Library
- Daniel Meister, Shinji Ogaki, Carsten Benthin, Michael J. Doyle, Michael Guthe, and Jirí Bittner. 2021. A Survey on Bounding Volume Hierarchies for Ray Tracing. Computer Graphics Forum (2021).Google Scholar
- Bochang Moon, Yongyoung Byun, Tae-Joon Kim, Pio Claudio, Hye-Sun Kim, Yun-Ji Ban, Seung Woo Nam, and Sung-Eui Yoon. 2010. Cache-Oblivious Ray Reordering. ACM Transactions on Graphics (TOG) (2010).Google Scholar
- Paul Arthur Navratil, Donald S. Fussell, Calvin Lin, and William R. Mark. 2007. Dynamic Ray Scheduling to Improve Ray Coherence and Bandwidth Utilization. In IEEE Symposium on Interactive Ray Tracing. 95–104.Google ScholarDigital Library
- K.J. Nesbit, A.S. Dhodapkar, and J.E. Smith. 2004. AC/DC: an adaptive data cache prefetcher. In Proc. IEEE/ACM Conf. on Par. Arch. and Comp. Tech. (PACT).Google Scholar
- K.J. Nesbit and J.E. Smith. 2004. Data Cache Prefetching Using a Global History Buffer. In Proc. IEEE Symp. on High-Perf. Computer Architecture (HPCA).Google Scholar
- Lars Nyland, John R. Nickolls, Gentaro Hirota, and Tanmoy Mandal. 2008. Systems and methods for coalescing memory accesses of parallel threads. Patent No. US20090240895A1, Filed Mar. 24th., 2008, Issued Dec. 27th., 2011.Google Scholar
- S. Palacharla and R.E. Kessler. 1994. Evaluating stream buffers as a secondary cache replacement. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA).Google Scholar
- Matt Pharr and Greg Humphreys. 2018. Physically Based Rendering, Third Edition: From Theory To Implementation. Morgan Kaufmann Publishers Inc.Google ScholarDigital Library
- Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan. 1997. Rendering Complex Scenes with Memory-Coherent Ray Tracing. In Proc. Int’l Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH). 101–108.Google ScholarDigital Library
- Shafiur Rahman, Nael Abu-Ghazaleh, and Rajiv Gupta. 2020. GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarCross Ref
- Mohammadreza Saed, Yuan Hsi Chou, Lufei Liu, Tyler Nowicki, and Tor M. Aamodt. 2022. Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing. In Proc. IEEE/ACM Symp. on Microarch. (MICRO).Google ScholarDigital Library
- Ankit Sethia, Ganesh Dasika, Mehrzad Samadi, and Scott Mahlke. 2013. APOGEE: Adaptive prefetching on GPUs for energy efficiency. In Proc. IEEE/ACM Conf. on Par. Arch. and Comp. Tech. (PACT).Google Scholar
- Konstantin Shkurko, Tim Grant, Daniel Kopta, Ian Mallett, Cem Yuksel, and Erik Brunvand. 2017. Dual Streaming for Hardware-Accelerated Ray Tracing. In Proc. ACM Conf. on High Performance Graphics (HPG).Google ScholarDigital Library
- James E. Stine, Ivan Castellanos, Michael Wood, Jeff Henson, Fred Love, W. Rhett Davis, Paul D. Franzon, Michael Bucher, Sunil Basavarajaiah, Julie Oh, and Ravi Jenkal. 2007. FreePDK: An Open-Source Variation-Aware Design Kit. In IEEE International Conference on Microelectronic Systems Education.Google Scholar
- Nishil Talati, Kyle May, Armand Behroozi, Yichen Yang, Kuba Kaszyk, Christos Vasiladiotis, Tarunesh Verma, Lu Li, Brandon Nguyen, Jiawen Sun, John Magnus Morton, Agreen Ahmadi, Todd Austin, Michael O’Boyle, Scott Mahlke, Trevor Mudge, and Ronald Dreslinski. 2021. Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design. In Proc. IEEE Symp. on High-Perf. Computer Architecture (HPCA).Google ScholarCross Ref
- Ingo Wald, Sven Woop, Carsten Benthin, Gregory S. Johnson, and Manfred Ernst. 2014. Embree: A Kernel Framework for Efficient CPU Ray Tracing. ACM Transactions on Graphics (TOG) (2014).Google ScholarDigital Library
- Pengyu Wang, Lu Zhang, Chao Li, and Minyi Guo. 2019. Excavating the Potential of GPU for Accelerating Graph Traversal. In Proc. IEEE Int’l Parallel and Distributed Processing Symp. (IPDPS).Google ScholarCross Ref
- Henri Ylitie, Tero Karras, and Samuli Laine. 2017. Efficient Incoherent Ray Traversal on GPUs through Compressed Wide BVHs. In Proc. ACM Conf. on High Performance Graphics (HPG).Google ScholarDigital Library
Index Terms
- Treelet Prefetching For Ray Tracing
Recommendations
Intersection Prediction for Accelerated GPU Ray Tracing
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitectureRay tracing has been used for years in motion picture to generate photorealistic images while faster raster-based shading techniques have been preferred for video games to meet real-time requirements. However, recent Graphics Processing Units (GPUs) ...
Use of hardware Z-buffered rasterization to accelerate ray tracing
SAC '07: Proceedings of the 2007 ACM symposium on Applied computingRay tracing is a rendering technique for producing realistic 3D computer graphics. Compared to traditional scan-line rendering which is generally adopted by graphics pipeline, ray tracing can simulate more realistic global illumination, however, with ...
A shading reuse method for efficient micropolygon ray tracing
We present a shading reuse method for micropolygon ray tracing. Unlike previous shading reuse methods that require an explicit object-to-image space mapping for shading density estimation or shading accuracy, our method performs shading density control ...
Comments