Abstract
Complex shading often associates with long shaders and huge data access. To obtain good performance on current generation GPU hardware, it is necessary to design some algorithms to manage data, schedule more efficient threads, and memory access under the hierarchy of GPU memory. In this paper, we propose an approach to accelerate the rendering process for complex shaders by analyzing and sorting shading jobs according to their complexity and potential memory access. We show that by sorting these shading jobs in three levels of memory hierarchies and reorganizing threads block according to the complexity, all shading jobs are scheduled in order, and we can significantly improve cache utilization and GPU hardware utilization, especially for poor performance caused by large branching. All sorting work are processed on CPU with plentiful logic function, and can be processed in a very efficient manner, compared with the expensive compaction operation on GPU. Our experiments with this hierarchy demonstrate improvements against a SIMD packet tracing with compaction on GPU.
Similar content being viewed by others
References
AMD (2008) ATI stream computing. AMD Developer Website. http://ati.amd.com/technology/streamcomputing/. Accessed June 2008
Bennett K (2009) NVIDIA’s “Fermi” architecture white paper. Nvidia Developer Website. http://www.hardocp.com/article/2009/09/30/nvidias_fermi_architecture_white_paper/. Accessed July 2009
Boulos S, Edwards D, Lacewell JD, et al. (2007) Packet-based whitted and distribution ray tracing. In: Proc. Graphics Interface 2007. Montreal, Canada, pp 177–184
Choi B, Komuravelli R, Lu V, et al. (2010) Parallel SAH k-D tree construction. In: Proc. of the Conference on High Performance Graphics. Saarbrucken, Germany, pp 77–86
Dammertz H, Hanika J, Keller A (2008) Shallow bounding volume hierarchies for fast SIMD ray tracing of incoherent rays. Comput Graph Forum 27(4):1225–1233
Deering M, Winner S, Schediwy B et al (1988) The triangle processor and normal vector shader: a VLSI system for high performance graphics. Comput Graph 22(4):21–31
Henry W (2010) Demystifying GPU microarchitecture through microbenchmarking. In: Proc. IEEE International Symposium on Performance Analysis of Systems & Software, 28–30 March 2010, pp 235–246
Hoberock J, Lu V, Jia Y, et al. (2009) Stream compaction for deferred shading. Proceedings of the Conference on High Performance Graphics, New Orleans, Louisiana, pp 173–180
Lindholm E, Nickolls J, Oberman S et al (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55
Mansson E., Munkberg J. and Akenine-Molle TR (2007) Deep coherent ray tracing. In: Proc. of 2007 I.E. Symposium on Interactive Ray Tracing. Ulm, Germany, pp 79–85
Overbeck R, Ramamoorthi R, Mark WR (2008) Large ray packets for real-time whitted ray tracing. In: Proc. of IEEE/EG Symposium on Interactive Ray Tracing. Los Angeles, California, USA, pp 41–48
Pharr M, Kolb C, Gershbein R, et al. (1997) Rendering complex scenes with memory-coherent ray tracing. In: Proc. of the 24th annual Conference on Computer graphics and interactive techniques. Los Angeles, California, USA, pp 101–108
Reshetov A (2006) Omnidirectional ray tracing traversal algorithm for kd-trees. In: Proc. of IEEE Symposium on Interactive Ray Tracing. Salt Lake City, Utah, USA, pp 57–60
Reshetov A (2007) Faster ray packets-triangle intersection through vertex culling. In: Proc. of ACM SIGGRAPH 2007 Posters. San Diego, California, USA, p 171
Sengupta S, Harris M, Zhang Y, et al. (2007) Scan primitives for GPU computing. In: Proc. of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics hardware. San Diego, California, USA, pp 97–106
Shih M, Chiu YF, Chen YC, et al. (2009) Real-Time Ray Tracing with CUDA. In: Proc. of the 9th International Conference on Algorithms and Architectures for Parallel Processing. Taipei, Taiwan, pp 327–337
Wald I, Benthin C, Boulos S (2008) Getting rid of packets: efficient SIMD single-ray traversal using multibranching BVHs. In: Proc. of IEEE/Eurographics Symposium on Interactive Ray Tracing. Los Angeles, California, USA, pp 49–57
Wald I, Boulos S, Shirley P (2007) Ray tracing deformable scenes using dynamic bounding volume hierarchies. ACM Trans Graph 26(1):6
Wald I, Gribble CP, Boulos S, et al. (2007) SIMD Ray Stream Tracing-SIMD ray traversal with generalized ray packets and on-the-fly re-ordering. Technical Report #UUSCI-2007-012
Wald I, Slusallek P, Benthin C et al (2001) Interactive rendering with coherent ray tracing. Comput Graph Forum 20(3):153–164
Zlatuška M, Havran V (2010) Ray Tracing on a GPU with CUDA-Comparative Study of Three Algorithms. In: Proc. of 18th International Conference on Computer Graphics, Visualization and Computer Vision. Czech Republic, pp 69–76
Acknowledgments
The authors would like to thank the anonymous reviewers for the careful reading of the original manuscript. Their comments and suggestions have led to a much better presentation of the paper. This research is supported in part by the National Natural Science Foundation of China under Grant Nos. 61300084, in part by Grant of China Postdoctoral Science Foundation under Grant No.2012M520625, and Scientific Research Foundation of Dalian University of Technology under Grant DUT12RC(3)63. The authors also appreciate the support of the Nvidia and Microsoft corporations.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, X., Xu, Dq., Zhao, L. et al. Complex shading efficiently for ray tracing on GPU. Multimed Tools Appl 74, 1091–1106 (2015). https://doi.org/10.1007/s11042-013-1712-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1712-5