ABSTRACT
In order to achieve the highest possible performance, the ray traversal and intersection routines at the core of every high-performance ray tracer are usually hand-coded, heavily optimized, and implemented separately for each hardware platform—even though they share most of their algorithmic core. The results are implementations that heavily mix algorithmic aspects with hardware and implementation details, making the code non-portable and difficult to change and maintain.
In this paper, we present a new approach that offers the ability to define in a functional language a set of conceptual, high-level language abstractions that are optimized away by a special compiler in order to maximize performance. Using this abstraction mechanism we separate a generic ray traversal and intersection algorithm from its low-level aspects that are specific to the target hardware. We demonstrate that our code is not only significantly more flexible, simpler to write, and more concise but also that the compiled results perform as well as state-of-the-art implementations on any of the tested CPU and GPU platforms.
- Timo Aila and Samuli Laine. 2009. Understanding the Efficiency of Ray Traversal on GPUs. In Proceedings of the Conference on HighPerformance Graphics (HPG). ACM, 145–149. Google ScholarDigital Library
- Timo Aila, Samuli Laine, and Tero Karras. 2012. Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum. Technical Report NVR-2012-002. NVIDIA Technical Report.Google Scholar
- L.O Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. Københavns Universitet. Datalogisk Institut.Google Scholar
- P.H. Andersen. 1995. Partial Evaluation Applied to Ray Tracing. DIKU Research Report 95/2.Google Scholar
- Rasmus Barringer and Tomas Akenine-Möller. 2014. Dynamic Ray Stream Traversal. ACM Trans. Graph. 33, 4, Article 151 (2014), 9 pages. Google ScholarDigital Library
- Carsten Benthin and Ingo Wald. 2009. Efficient Ray Traced Soft Shadows using Multi-Frusta Tracing. In High-Performance Graphics. Google ScholarDigital Library
- Carsten Benthin, Ingo Wald, Sven Woop, Manfred Ernst, and William R. Mark. 2012. Combining Single and Packet-Ray Tracing for Arbitrary Ray Distributions on the Intel MIC Architecture. IEEE Transactions on Visualization and Computer Graphics 18, 9 (2012), 1438–1448. Google ScholarDigital Library
- Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In 12th International Conference on Parallel Architectures and Compilation Techniques (PACT). 89–100. Google ScholarDigital Library
- Jacques Carette, Oleg Kiselyov, and Chung-chieh Shan. 2009. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. J. Funct. Program. 19, 5 (2009), 509–543. Google ScholarDigital Library
- Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. 2010. Language virtualization for heterogeneous parallel computing. In Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 835– 847. Google ScholarDigital Library
- Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A Parallel DSL for Image Analysis and Visualization. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 111–120. Google ScholarDigital Library
- Robert L. Cook, Thomas Porter, and Loren Carpenter. 1984. Distributed Ray Tracing. SIGGRAPH Comput. Graph. 18, 3 (1984), 137–145. Google ScholarDigital Library
- Holger Dammertz, Johannes Hanika, and Alexander Keller. 2008. Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays. In Proceedings of the Nineteenth Eurographics Conference on Rendering. Eurographics Association, 1225–1233. Google ScholarDigital Library
- Zach DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: a domain specific language for building portable mesh-based PDE solvers. In Conference on High Performance Computing Networking, Storage and Analysis (SC). 9:1–9:12. Google ScholarDigital Library
- Tim Foley and Jeremy Sugerman. 2005. KD-tree Acceleration Structures for a GPU Raytracer. In Proceedings of the ACM SIGGRAPH/EU-ROGRAPHICS Conference on Graphics Hardware. ACM, 15–22. Google ScholarDigital Library
- Iliyan Georgiev and Philipp Slusallek. 2008. RTfact: Generic Concepts for Flexible and High Performance Ray Tracing. In IEEE Symposium on Interactive Ray Tracing (RT). 115–122. Google ScholarCross Ref
- Johannes Gunther, Stefan Popov, Hans-Peter Seidel, and Philipp Slusallek. 2007. Realtime Ray Tracing on GPU with BVH-based Packet Traversal. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing. 113–118. Google ScholarDigital Library
- Maurice H. Halstead. 1977. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc.Google Scholar
- Christian Hofer, Klaus Ostermann, Tillmann Rendel, and Adriaan Moors. 2008. Polymorphic embedding of DSLs. In Proceedings of the 7th International Conference on Generative Programming and Component Engineering (GPCE). 137–148. Google ScholarDigital Library
- Daniel Reiter Horn, Jeremy Sugerman, Mike Houston, and Pat Hanrahan. 2007. Interactive K-d Tree GPU Raytracing. In Proceedings of the Symposium on Interactive 3D Graphics and Games. ACM, 167–174. Google ScholarDigital Library
- P. Hudak. 1998. Modular Domain Specific Languages and Tools. In Proceedings of the 5th International Conference on Software Reuse (ICSR). IEEE Computer Society, 134–. http://dl.acm.org/citation.cfm? id=551789.853532Google ScholarCross Ref
- Timothy L. Kay and James T. Kajiya. 1986. Ray Tracing Complex Scenes. SIGGRAPH Comput. Graph. 20, 4 (1986), 269–278. Google ScholarDigital Library
- Roland Leißa, Klaas Boesche, Sebastian Hack, Richard Membarth, and Philipp Slusallek. 2015. Shallow Embedding of DSLs via Online Partial Evaluation. In Proceedings of the 14th International Conference on Generative Programming: Concepts & Experiences (GPCE). ACM, 11–20. Google ScholarDigital Library
- Roland Leißa, Marcel Köster, and Sebastian Hack. 2015. A GraphBased Higher-Order Intermediate Representation. In International Symposium on Code Generation and Optimization (CGO). IEEE, 202–212. Google ScholarCross Ref
- Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. HIPA cc : A Domain-Specific Language and Compiler for Image Processing. IEEE Trans. Parallel Distrib. Syst. 27, 1 (2016), 210–224. Google ScholarDigital Library
- Tomas Möller and Ben Trumbore. 1997. Fast, Minimum Storage RayTriangle Intersection. J. Graphics, GPU, & Game Tools 2, 1 (1997). Google ScholarDigital Library
- Chris J. Newburn, Byoungro So, Zhenying Liu, Michael D. McCool, Anwar M. Ghuloum, Stefanus Du Toit, Zhi-Gang Wang, Zhaohui Du, Yongjian Chen, Gansha Wu, Peng Guo, Zhanglin Liu, and Dan Zhang. 2011. Intel’s Array Building Blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the 9th International Symposium on Code Generation and Optimization (CGO). 224–235. Google ScholarCross Ref
- NVIDIA. 2014. Whitepaper: NVIDIA GeForce GTX 980. Technical Report. NVIDIA Corporation.Google Scholar
- Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. 2013. Spiral in Scala: Towards the Systematic Construction of Generators for Performance Libraries. In International Conference on Generative Programming: Concepts & Experiences (GPCE). 125–134. Google ScholarDigital Library
- Steven Parker, William Martin, Peter-Pike J. Sloan, Peter Shirley, Brian Smits, and Charles Hansen. 1999. Interactive Ray Tracing. In Proceedings of the Symposium on Interactive 3D Graphics. ACM, 119–126. Google ScholarDigital Library
- Steven G. Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock, David Luebke, David McAllister, Morgan McGuire, Keith Morley, Austin Robison, and Martin Stich. 2010. OptiX: A General Purpose Ray Tracing Engine. ACM Transactions on Graphics (2010). Google ScholarDigital Library
- M. Pharr and W. R. Mark. 2012. ispc: A SPMD Compiler for HighPerformance CPU Programming. In In Proceedings of Innovative Parallel Computing (InPar). Google ScholarCross Ref
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 519–530. Google ScholarDigital Library
- Alexander Reshetov, Alexei Soupikov, and Jim Hurley. 2005. Multi-level Ray Tracing Algorithm. ACM Trans. Graph. 24, 3 (2005), 1176–1185. Google ScholarDigital Library
- Tiark Rompf and Martin Odersky. 2010. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the 10th International Conference on Generative Programming and Component Engineering (GPCE). 127–136. Google ScholarDigital Library
- Kai Selgrad, Alexander Lier, Franz Köferl, Marc Stamminger, and Daniel Lohmann. 2015. Lightweight, Generative Variant Exploration for High-performance Graphics Applications. In Proceedings of the 14th International Conference on Generative Programming: Concepts & Experiences (GPCE). ACM, 141–150. Google ScholarDigital Library
- Martin Stich, Heiko Friedrich, and Andreas Dietrich. 2009. Spatial Splits in Bounding Volume Hierarchies. In Proceedings of the Conference on High-Performance Graphics (HPG). ACM, 7–13. Google ScholarDigital Library
- Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. In Proceedings of the 28th International Conference on Machine Learning (ICML). 609–616.Google Scholar
- John A. Tsakok. 2009. Faster Incoherent Rays: Multi-BVH Ray Stream Tracing. In Proceedings of the Conference on High-Performance Graphics (HPG). ACM, 151–158. Google ScholarDigital Library
- Ingo Wald. 2005. The OpenRT-API. In ACM SIGGRAPH Courses. ACM, Article 21. Google ScholarDigital Library
- Ingo Wald, Carsten Benthin, and Solomon Boulos. 2008. Getting rid of packets: Efficient SIMD single-ray traversal using multibranching BVHs. In IEEE/Eurographics Symposium on Interactive Ray Tracing. 49–57. Google ScholarCross Ref
- Ingo Wald, Philipp Slusallek, Carsten Benthin, and Markus Wagner. 2001. Interactive Rendering with Coherent Ray Tracing. Computer Graphics Forum (2001). Google ScholarDigital Library
- Ingo Wald, Sven Woop, Carsten Benthin, Gregory S. Johnson, and Manfred Ernst. 2014. Embree: A Kernel Framework for Efficient CPU Ray Tracing. ACM Trans. Graph. 33, 4, Article 143 (2014), 8 pages. Google ScholarDigital Library
- Sven Woop. 2004. A Ray Tracing Hardware Architecture for Dynamic Scenes. Technical Report. Saarland University.Google Scholar
Index Terms
- RaTrace: simple and efficient abstractions for BVH ray traversal algorithms
Recommendations
RaTrace: simple and efficient abstractions for BVH ray traversal algorithms
GPCE '17In order to achieve the highest possible performance, the ray traversal and intersection routines at the core of every high-performance ray tracer are usually hand-coded, heavily optimized, and implemented separately for each hardware platform—even ...
Stackless Multi-BVH Traversal for CPU, MIC and GPU Ray Tracing
Stackless traversal algorithms for ray tracing acceleration structures require significantly less storage per ray than ordinary stack-based ones. This advantage is important for massively parallel rendering methods, where there are many rays in flight. ...
CPU-style SIMD ray traversal on GPUs
HPG '18: Proceedings of the Conference on High-Performance GraphicsIn this paper we describe and evaluate an implementation of CPU-style SIMD ray traversal on the GPU. We show how spreading moderately wide BVHs (up to a branching factor of eight) across multiple threads in a warp can improve performance while not ...
Comments