Skip to main content
Log in

Complex shading efficiently for ray tracing on GPU

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Complex shading often associates with long shaders and huge data access. To obtain good performance on current generation GPU hardware, it is necessary to design some algorithms to manage data, schedule more efficient threads, and memory access under the hierarchy of GPU memory. In this paper, we propose an approach to accelerate the rendering process for complex shaders by analyzing and sorting shading jobs according to their complexity and potential memory access. We show that by sorting these shading jobs in three levels of memory hierarchies and reorganizing threads block according to the complexity, all shading jobs are scheduled in order, and we can significantly improve cache utilization and GPU hardware utilization, especially for poor performance caused by large branching. All sorting work are processed on CPU with plentiful logic function, and can be processed in a very efficient manner, compared with the expensive compaction operation on GPU. Our experiments with this hierarchy demonstrate improvements against a SIMD packet tracing with compaction on GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. AMD (2008) ATI stream computing. AMD Developer Website. http://ati.amd.com/technology/streamcomputing/. Accessed June 2008

  2. Bennett K (2009) NVIDIA’s “Fermi” architecture white paper. Nvidia Developer Website. http://www.hardocp.com/article/2009/09/30/nvidias_fermi_architecture_white_paper/. Accessed July 2009

  3. Boulos S, Edwards D, Lacewell JD, et al. (2007) Packet-based whitted and distribution ray tracing. In: Proc. Graphics Interface 2007. Montreal, Canada, pp 177–184

  4. Choi B, Komuravelli R, Lu V, et al. (2010) Parallel SAH k-D tree construction. In: Proc. of the Conference on High Performance Graphics. Saarbrucken, Germany, pp 77–86

  5. Dammertz H, Hanika J, Keller A (2008) Shallow bounding volume hierarchies for fast SIMD ray tracing of incoherent rays. Comput Graph Forum 27(4):1225–1233

    Article  Google Scholar 

  6. Deering M, Winner S, Schediwy B et al (1988) The triangle processor and normal vector shader: a VLSI system for high performance graphics. Comput Graph 22(4):21–31

    Article  Google Scholar 

  7. Henry W (2010) Demystifying GPU microarchitecture through microbenchmarking. In: Proc. IEEE International Symposium on Performance Analysis of Systems & Software, 28–30 March 2010, pp 235–246

  8. Hoberock J, Lu V, Jia Y, et al. (2009) Stream compaction for deferred shading. Proceedings of the Conference on High Performance Graphics, New Orleans, Louisiana, pp 173–180

  9. Lindholm E, Nickolls J, Oberman S et al (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55

    Article  Google Scholar 

  10. Mansson E., Munkberg J. and Akenine-Molle TR (2007) Deep coherent ray tracing. In: Proc. of 2007 I.E. Symposium on Interactive Ray Tracing. Ulm, Germany, pp 79–85

  11. Overbeck R, Ramamoorthi R, Mark WR (2008) Large ray packets for real-time whitted ray tracing. In: Proc. of IEEE/EG Symposium on Interactive Ray Tracing. Los Angeles, California, USA, pp 41–48

  12. Pharr M, Kolb C, Gershbein R, et al. (1997) Rendering complex scenes with memory-coherent ray tracing. In: Proc. of the 24th annual Conference on Computer graphics and interactive techniques. Los Angeles, California, USA, pp 101–108

  13. Reshetov A (2006) Omnidirectional ray tracing traversal algorithm for kd-trees. In: Proc. of IEEE Symposium on Interactive Ray Tracing. Salt Lake City, Utah, USA, pp 57–60

  14. Reshetov A (2007) Faster ray packets-triangle intersection through vertex culling. In: Proc. of ACM SIGGRAPH 2007 Posters. San Diego, California, USA, p 171

  15. Sengupta S, Harris M, Zhang Y, et al. (2007) Scan primitives for GPU computing. In: Proc. of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics hardware. San Diego, California, USA, pp 97–106

  16. Shih M, Chiu YF, Chen YC, et al. (2009) Real-Time Ray Tracing with CUDA. In: Proc. of the 9th International Conference on Algorithms and Architectures for Parallel Processing. Taipei, Taiwan, pp 327–337

  17. Wald I, Benthin C, Boulos S (2008) Getting rid of packets: efficient SIMD single-ray traversal using multibranching BVHs. In: Proc. of IEEE/Eurographics Symposium on Interactive Ray Tracing. Los Angeles, California, USA, pp 49–57

  18. Wald I, Boulos S, Shirley P (2007) Ray tracing deformable scenes using dynamic bounding volume hierarchies. ACM Trans Graph 26(1):6

    Google Scholar 

  19. Wald I, Gribble CP, Boulos S, et al. (2007) SIMD Ray Stream Tracing-SIMD ray traversal with generalized ray packets and on-the-fly re-ordering. Technical Report #UUSCI-2007-012

  20. Wald I, Slusallek P, Benthin C et al (2001) Interactive rendering with coherent ray tracing. Comput Graph Forum 20(3):153–164

    Article  Google Scholar 

  21. Zlatuška M, Havran V (2010) Ray Tracing on a GPU with CUDA-Comparative Study of Three Algorithms. In: Proc. of 18th International Conference on Computer Graphics, Visualization and Computer Vision. Czech Republic, pp 69–76

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for the careful reading of the original manuscript. Their comments and suggestions have led to a much better presentation of the paper. This research is supported in part by the National Natural Science Foundation of China under Grant Nos. 61300084, in part by Grant of China Postdoctoral Science Foundation under Grant No.2012M520625, and Scientific Research Foundation of Dalian University of Technology under Grant DUT12RC(3)63. The authors also appreciate the support of the Nvidia and Microsoft corporations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, X., Xu, Dq., Zhao, L. et al. Complex shading efficiently for ray tracing on GPU. Multimed Tools Appl 74, 1091–1106 (2015). https://doi.org/10.1007/s11042-013-1712-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1712-5

Keywords

Navigation