Skip to main content
Log in

A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Recently, massively-parallel computing libraries and devices are much widely used, in addition to the traditional 3D graphics systems. In this paper, we present a full 3D fixed-function graphics pipeline, based on the OpenCL, which is one of the most widely used massively-parallel computing library. The full 3D graphics features including WebGL, Web3D and others can be implemented on the massively-parallel computations, without underlying 3D graphics hardware support. Many previous works focused on another massively-parallel system of CUDA, which has a drawback of limited availability. In contrast, we designed and implemented a new architecture with OpenCL, which is now available on various computing devices, including most CPUs, GPUs, and at least theoretically, special-purpose embedded FPGAs. Our work provides full 3D graphics features on OpenCL-capable systems, without dedicated 3D graphics hardware, to finally make 3D graphics features ubiquitous. Technically, we used a top-down approach in its rendering, from the whole screen to precise pixels. At each stage, we tuned our OpenCL implementations and also their global and local parameter spaces. We present the details of our design and also the final result of our implementation, and show its correctness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. 2nd Gen Ryzen 7: https://www.amd.com/en/products/cpu/amd-ryzen-7-2700x (retieved in Sep 2020)

  2. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) TensorFlow: A system for large-scale machine learning. In: Proc of the 12th USENIX Conference on Operating Systems Design and Implementation. OSDI’16. USENIX Association, USA, pp 265–283

  3. AMD Radeon RX Vega 56 Specs: https://www.techpowerup.com/gpu-specs/radeon-rx-vega-56.c2993 (retieved in Sep 2020)

  4. AMD Ryzen 7 2700X GFLOPS: https://gadgetversus.com/processor/amd-ryzen-7-2700x-gflops-performance/ (retieved in Sep 2020)

  5. Baek N, Kim K (2019) Design and implementation of OpenGL SC 2.0 rendering pipeline. Cluster Comput 22:931–936

    Article  Google Scholar 

  6. Boyd C (2007) The Direct3D 10 pipeline. In: ACM SIGGRAPH 2007 Courses, SIGGRAPH ’07, pp. 52–109. ACM

  7. Colonna G, Bonelli F, Pascazio G (2019) Impact of fundamental molecular kinetics on macroscopic properties of high-enthalpy flows: the case of hypersonic atmospheric entry. Phys Rev Fluids 4(3):033404

    Article  Google Scholar 

  8. Corso AD, Salvi M, Kolb C, Frisvad JR, Lefohn A, Luebke D (2017) Interactive stable ray tracing. In: Proc of High Performance Graphics, HPG ’17, p. Article 1. ACM

  9. Fernando R (2004) GPU gems: programming techniques, tips and tricks for real-time graphics. Addison-Wesley Professional, Boston

    Google Scholar 

  10. Gambhir M, Panda S, Basha SJ (2018) Vulkan rendering framework for mobile multimedia. In: SIGGRAPH Asia 2018 Posters, SA ’18. ACM

  11. GeForce RTX 2060 Graphics Card: https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2060/ (retrieved in Sep 2020)

  12. Gervasi O, Russo D, Vella F (2010) The AES implantation based on OpenCL for multi/many core architecture. In: 2010 Int’l Conf on Computational Science and Its Applications, pp. 129–134

  13. Gkeka M.R, Bellas N, Antonopoulos C.D(2019) Comparative performance analysis of Vulkan implementations of computational applications. In: Proc of the Int’l Workshop on OpenCL, IWOCL ’19, p. Article 6. ACM

  14. Heinecke A, Trinitis C, Weidendorfer J(2010) Porting existing cache-oblivious linear algebra HPC modules to Larrabee architecture. In: Proc of the 7th ACM Int’l Conf on Computing Frontiers, CF ’10, pp. 91–92. ACM

  15. Hughes JF et al (2013) Computer graphics: principles and practice: principles and practices. Addison-Wesley Professional, Boston

    Google Scholar 

  16. Intel: Intel FPGA SDK for OpenCL software technology (retrieved on October 07, 2020)

  17. Intel UHD Graphics 620: https://ark.intel.com/content/www/us/en/ark/products/graphics/126789/intel-uhd-graphics-620.html (retrieved in Sep 2020)

  18. Intel UHD Graphics 620 Specs: https://www.techpowerup.com/gpu-specs/uhd-graphics-620.c2909 (retrieved in Sep 2020)

  19. Iqbal U et al (2016) Cancer-disease associations: a visualization and animation through medical big data. Comp Methods Programs Biomed 127:44–51

    Article  Google Scholar 

  20. Karimi K, Dickson N.G, Hamze F (2010) A performance comparison of CUDA and OpenCL. arXiv preprint arXiv:1005.2581

  21. Kenzel M, Kerbl B, Schmalstieg D, Steinberger D (2018) A high-performance software graphics pipeline architecture for the GPU. ACM Trans Graph 37(4):140:1–140:15

    Article  Google Scholar 

  22. Kerbl B, Kenzel M, Schmalstieg D, Steinberger M (2017) Effective static bin patterns for sort-middle rendering. In: Proc of High Performance Graphics, HPG ’17. ACM

  23. Kessenich J, Sellers G, Shreiner D (2016) OpenGL Programming Guide. Addison-Wesley Professional, Boston

    Google Scholar 

  24. Khronos OpenCL Working Group (2012) The OpenCL Specification version 1.2. Khronos Group, Beaverton

    Google Scholar 

  25. Khronos OpenCL Working Group (2019) The OpenCL Specification version 2.2. Khronos Group, Beaverton

    Google Scholar 

  26. Khronos OpenCL Working Group: OpenCL overview (retrieved on August 10, 2020)

  27. Kirk D (2007) NVIDIA CUDA software and GPU parallel computing architecture. In: Proc of the 6th Int’l Symp on Memory Management, ISMM ’07, pp. 103–104. ACM

  28. Kwon Y.C, Baek N (2014) A CUDA-based implementation of OpenGL-compatible rasterization library prototype. In: Proc of the 29th Annual ACM Symp on Applied Computing, SAC ’14, pp. 1747–1748. ACM

  29. Laine S, Karras T (2011) High-performance software rasterization on GPUs. In: Proc of the ACM SIGGRAPH Symp on High Performance Graphics, HPG ’11, pp. 79–88. ACM

  30. Liu F, Huang M, Liu X, Wu E.H (2010) Coherent depth test scheme in FreePipe. In: Proc of the 9th ACM SIGGRAPH Conf on Virtual-Reality Continuum and Its Applications in Industry, VRCAI ’10, pp. 265–270. ACM

  31. Liu F, Huang M.C, Liu X.H, Wu E.H (2009) CUDA renderer: A programmable graphics pipeline. In: ACM SIGGRAPH ASIA 2009 Sketches, SIGGRAPH ASIA ’09. ACM

  32. Liu F, Huang M.C, Liu X.H, Wu E.H (2010) FreePipe: A programmable parallel rendering architecture for efficient multi-fragment effects. In: Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D ’10, pp. 75–82. ACM

  33. Luna F (2016) Introduction to 3D game programming with directX 12. Mercury Learning & Information, Herndon VA, United States

    Google Scholar 

  34. Memeti S, Li L, Pllana S, Kołodziej J, Kessler C (2017) Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming productivity, performance, and energy consumption. In: Proc of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ARMS-CC ’17, pp. 1–6. ACM

  35. Memeti S, Li L, Pllana S, Kołodziej J, Kessler C (2017) Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption. In: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, pp. 1–6

  36. Mesa Team: The Mesa 3D graphics library (retrieved on August 10, 2020)

  37. Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40–53

    Article  Google Scholar 

  38. NVIDIA (2019) CUDA Toolkit Documentation version 10.2. NVIDIA, Santa Clara

    Google Scholar 

  39. NVIDIA GeForce RTX 2060 Specs: https://www.techpowerup.com/gpu-specs/geforce-rtx-2060.c3310 (retrieved in Sep 2020)

  40. Olsson O, Billeter M, Assarsson, U (2012) Tiled and clustered forward shading: Supporting transparency and MSAA. In: ACM SIGGRAPH 2012 Talks, SIGGRAPH ’12. ACM

  41. Patney A, Tzeng S, Seitz KA, Owens JD (2014) Piko: a design framework for programmable graphics pipelines. http://arXiv.org14046293

  42. Patney A, Tzeng S, Seitz KA, Owens JD (2015) Piko: a framework for authoring programmable graphics pipelines. ACM Trans Graph 34(4):1–13

    Article  Google Scholar 

  43. Perkins H CUDA-on-CL:(2017) A compiler and runtime for running NVIDIA CUDA C++11 applications on OpenCL 1.2 devices. In: Proc of the 5th Int’l Workshop on OpenCL, IWOCL 2017. ACM

  44. Pratt H, Coenen F, Broadbent DM, Harding SP, Zheng Y (2016) Convolutional neural networks for diabetic retinopathy. Proc Comp Sci 90:200–205

    Article  Google Scholar 

  45. Radeon RX Vega 56 Graphics Card: https://www.amd.com/en/products/graphics/radeon-rx-vega-56 (retrieved in Sep 2020)

  46. Sanchez D, Lo D, Yoo RM, Sugerman J, Kozyrakis C (2011) Dynamic fine-grain scheduling of pipeline parallelism. In: 2011 Int’l Conf on Parallel Architectures and Compilation Techniques, pp. 22–32. IEEE

  47. Sechelea A, Do Huu T, Zimos E, Deligiannis N (2016) Twitter data clustering and visualization. In: 2016 23rd Int’l Conf on Telecommunications (ICT), pp. 1–5

  48. Segal M, Akeley K (2019) The OpenGL graphics system: a specification. Khronos Group, Beaverton

    Google Scholar 

  49. Seiler L et al (2009) Larrabee: a many-core x86 architecture for visual computing. IEEE Micro 29:10–21

    Article  Google Scholar 

  50. Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66

    Article  Google Scholar 

  51. Suda N et al (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: Proc of the 2016 ACM/SIGDA Int’l Symp on Field-Programmable Gate Arrays, FPGA ’16, pp. 16–25. ACM

  52. Sugerman J et al (2009) GRAMPS: a programming model for graphics pipelines. ACM Trans Graph 28(1):1–11

    Article  Google Scholar 

  53. The Khronos Vulkan working group (2020) Vulkan–A Specification. Khronos Group, Beaverton

    Google Scholar 

  54. Valero A, Gracia DS, Tejero RG, Ramos LM, Navarro-Torres A, Muñoz A, Ezpeleta J, Briz JL, Murillo AC, Montijano E, et al. (2019) Exposing abstraction-level interactions with a parallel ray tracer. In: Proc of the Workshop on Computer Architecture Education, WCAE ’19, p. Article 5. ACM, New York, NY, USA

  55. Venkatesh N, Ejorh D. Introduction to OpenCL on FPGAs. https://www.coursera.org/learn/opencl-fpga-introduction (retrieved on October 07, 2020)

  56. Wald I (2014) High fidelity visualization

  57. Welstead ST (1999) Fractal and wavelet image compression techniques. SPIE Publication, Bellingham

    Book  Google Scholar 

  58. Xilinux: Developing and optimizing applications using the opencl framework (retrieved on October 07, 2020)

  59. Xilinx: OpenCL devices and FPGAs (retrieved on October 07, 2020)

  60. Zhou Wang, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

Download references

Acknowledgements

This work has supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grand No. NRF-2019R1I1A3A01061310).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nakhoon Baek.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grand No. NRF-2019R1I1A3A01061310).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, M., Baek, N. A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing. J Supercomput 77, 7351–7367 (2021). https://doi.org/10.1007/s11227-020-03581-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03581-8

Keywords

Navigation