Abstract
Recently, massively-parallel computing libraries and devices are much widely used, in addition to the traditional 3D graphics systems. In this paper, we present a full 3D fixed-function graphics pipeline, based on the OpenCL, which is one of the most widely used massively-parallel computing library. The full 3D graphics features including WebGL, Web3D and others can be implemented on the massively-parallel computations, without underlying 3D graphics hardware support. Many previous works focused on another massively-parallel system of CUDA, which has a drawback of limited availability. In contrast, we designed and implemented a new architecture with OpenCL, which is now available on various computing devices, including most CPUs, GPUs, and at least theoretically, special-purpose embedded FPGAs. Our work provides full 3D graphics features on OpenCL-capable systems, without dedicated 3D graphics hardware, to finally make 3D graphics features ubiquitous. Technically, we used a top-down approach in its rendering, from the whole screen to precise pixels. At each stage, we tuned our OpenCL implementations and also their global and local parameter spaces. We present the details of our design and also the final result of our implementation, and show its correctness and efficiency.





Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
2nd Gen Ryzen 7: https://www.amd.com/en/products/cpu/amd-ryzen-7-2700x (retieved in Sep 2020)
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) TensorFlow: A system for large-scale machine learning. In: Proc of the 12th USENIX Conference on Operating Systems Design and Implementation. OSDI’16. USENIX Association, USA, pp 265–283
AMD Radeon RX Vega 56 Specs: https://www.techpowerup.com/gpu-specs/radeon-rx-vega-56.c2993 (retieved in Sep 2020)
AMD Ryzen 7 2700X GFLOPS: https://gadgetversus.com/processor/amd-ryzen-7-2700x-gflops-performance/ (retieved in Sep 2020)
Baek N, Kim K (2019) Design and implementation of OpenGL SC 2.0 rendering pipeline. Cluster Comput 22:931–936
Boyd C (2007) The Direct3D 10 pipeline. In: ACM SIGGRAPH 2007 Courses, SIGGRAPH ’07, pp. 52–109. ACM
Colonna G, Bonelli F, Pascazio G (2019) Impact of fundamental molecular kinetics on macroscopic properties of high-enthalpy flows: the case of hypersonic atmospheric entry. Phys Rev Fluids 4(3):033404
Corso AD, Salvi M, Kolb C, Frisvad JR, Lefohn A, Luebke D (2017) Interactive stable ray tracing. In: Proc of High Performance Graphics, HPG ’17, p. Article 1. ACM
Fernando R (2004) GPU gems: programming techniques, tips and tricks for real-time graphics. Addison-Wesley Professional, Boston
Gambhir M, Panda S, Basha SJ (2018) Vulkan rendering framework for mobile multimedia. In: SIGGRAPH Asia 2018 Posters, SA ’18. ACM
GeForce RTX 2060 Graphics Card: https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2060/ (retrieved in Sep 2020)
Gervasi O, Russo D, Vella F (2010) The AES implantation based on OpenCL for multi/many core architecture. In: 2010 Int’l Conf on Computational Science and Its Applications, pp. 129–134
Gkeka M.R, Bellas N, Antonopoulos C.D(2019) Comparative performance analysis of Vulkan implementations of computational applications. In: Proc of the Int’l Workshop on OpenCL, IWOCL ’19, p. Article 6. ACM
Heinecke A, Trinitis C, Weidendorfer J(2010) Porting existing cache-oblivious linear algebra HPC modules to Larrabee architecture. In: Proc of the 7th ACM Int’l Conf on Computing Frontiers, CF ’10, pp. 91–92. ACM
Hughes JF et al (2013) Computer graphics: principles and practice: principles and practices. Addison-Wesley Professional, Boston
Intel: Intel FPGA SDK for OpenCL software technology (retrieved on October 07, 2020)
Intel UHD Graphics 620: https://ark.intel.com/content/www/us/en/ark/products/graphics/126789/intel-uhd-graphics-620.html (retrieved in Sep 2020)
Intel UHD Graphics 620 Specs: https://www.techpowerup.com/gpu-specs/uhd-graphics-620.c2909 (retrieved in Sep 2020)
Iqbal U et al (2016) Cancer-disease associations: a visualization and animation through medical big data. Comp Methods Programs Biomed 127:44–51
Karimi K, Dickson N.G, Hamze F (2010) A performance comparison of CUDA and OpenCL. arXiv preprint arXiv:1005.2581
Kenzel M, Kerbl B, Schmalstieg D, Steinberger D (2018) A high-performance software graphics pipeline architecture for the GPU. ACM Trans Graph 37(4):140:1–140:15
Kerbl B, Kenzel M, Schmalstieg D, Steinberger M (2017) Effective static bin patterns for sort-middle rendering. In: Proc of High Performance Graphics, HPG ’17. ACM
Kessenich J, Sellers G, Shreiner D (2016) OpenGL Programming Guide. Addison-Wesley Professional, Boston
Khronos OpenCL Working Group (2012) The OpenCL Specification version 1.2. Khronos Group, Beaverton
Khronos OpenCL Working Group (2019) The OpenCL Specification version 2.2. Khronos Group, Beaverton
Khronos OpenCL Working Group: OpenCL overview (retrieved on August 10, 2020)
Kirk D (2007) NVIDIA CUDA software and GPU parallel computing architecture. In: Proc of the 6th Int’l Symp on Memory Management, ISMM ’07, pp. 103–104. ACM
Kwon Y.C, Baek N (2014) A CUDA-based implementation of OpenGL-compatible rasterization library prototype. In: Proc of the 29th Annual ACM Symp on Applied Computing, SAC ’14, pp. 1747–1748. ACM
Laine S, Karras T (2011) High-performance software rasterization on GPUs. In: Proc of the ACM SIGGRAPH Symp on High Performance Graphics, HPG ’11, pp. 79–88. ACM
Liu F, Huang M, Liu X, Wu E.H (2010) Coherent depth test scheme in FreePipe. In: Proc of the 9th ACM SIGGRAPH Conf on Virtual-Reality Continuum and Its Applications in Industry, VRCAI ’10, pp. 265–270. ACM
Liu F, Huang M.C, Liu X.H, Wu E.H (2009) CUDA renderer: A programmable graphics pipeline. In: ACM SIGGRAPH ASIA 2009 Sketches, SIGGRAPH ASIA ’09. ACM
Liu F, Huang M.C, Liu X.H, Wu E.H (2010) FreePipe: A programmable parallel rendering architecture for efficient multi-fragment effects. In: Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D ’10, pp. 75–82. ACM
Luna F (2016) Introduction to 3D game programming with directX 12. Mercury Learning & Information, Herndon VA, United States
Memeti S, Li L, Pllana S, Kołodziej J, Kessler C (2017) Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming productivity, performance, and energy consumption. In: Proc of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ARMS-CC ’17, pp. 1–6. ACM
Memeti S, Li L, Pllana S, Kołodziej J, Kessler C (2017) Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption. In: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, pp. 1–6
Mesa Team: The Mesa 3D graphics library (retrieved on August 10, 2020)
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40–53
NVIDIA (2019) CUDA Toolkit Documentation version 10.2. NVIDIA, Santa Clara
NVIDIA GeForce RTX 2060 Specs: https://www.techpowerup.com/gpu-specs/geforce-rtx-2060.c3310 (retrieved in Sep 2020)
Olsson O, Billeter M, Assarsson, U (2012) Tiled and clustered forward shading: Supporting transparency and MSAA. In: ACM SIGGRAPH 2012 Talks, SIGGRAPH ’12. ACM
Patney A, Tzeng S, Seitz KA, Owens JD (2014) Piko: a design framework for programmable graphics pipelines. http://arXiv.org14046293
Patney A, Tzeng S, Seitz KA, Owens JD (2015) Piko: a framework for authoring programmable graphics pipelines. ACM Trans Graph 34(4):1–13
Perkins H CUDA-on-CL:(2017) A compiler and runtime for running NVIDIA CUDA C++11 applications on OpenCL 1.2 devices. In: Proc of the 5th Int’l Workshop on OpenCL, IWOCL 2017. ACM
Pratt H, Coenen F, Broadbent DM, Harding SP, Zheng Y (2016) Convolutional neural networks for diabetic retinopathy. Proc Comp Sci 90:200–205
Radeon RX Vega 56 Graphics Card: https://www.amd.com/en/products/graphics/radeon-rx-vega-56 (retrieved in Sep 2020)
Sanchez D, Lo D, Yoo RM, Sugerman J, Kozyrakis C (2011) Dynamic fine-grain scheduling of pipeline parallelism. In: 2011 Int’l Conf on Parallel Architectures and Compilation Techniques, pp. 22–32. IEEE
Sechelea A, Do Huu T, Zimos E, Deligiannis N (2016) Twitter data clustering and visualization. In: 2016 23rd Int’l Conf on Telecommunications (ICT), pp. 1–5
Segal M, Akeley K (2019) The OpenGL graphics system: a specification. Khronos Group, Beaverton
Seiler L et al (2009) Larrabee: a many-core x86 architecture for visual computing. IEEE Micro 29:10–21
Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66
Suda N et al (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: Proc of the 2016 ACM/SIGDA Int’l Symp on Field-Programmable Gate Arrays, FPGA ’16, pp. 16–25. ACM
Sugerman J et al (2009) GRAMPS: a programming model for graphics pipelines. ACM Trans Graph 28(1):1–11
The Khronos Vulkan working group (2020) Vulkan–A Specification. Khronos Group, Beaverton
Valero A, Gracia DS, Tejero RG, Ramos LM, Navarro-Torres A, Muñoz A, Ezpeleta J, Briz JL, Murillo AC, Montijano E, et al. (2019) Exposing abstraction-level interactions with a parallel ray tracer. In: Proc of the Workshop on Computer Architecture Education, WCAE ’19, p. Article 5. ACM, New York, NY, USA
Venkatesh N, Ejorh D. Introduction to OpenCL on FPGAs. https://www.coursera.org/learn/opencl-fpga-introduction (retrieved on October 07, 2020)
Wald I (2014) High fidelity visualization
Welstead ST (1999) Fractal and wavelet image compression techniques. SPIE Publication, Bellingham
Xilinux: Developing and optimizing applications using the opencl framework (retrieved on October 07, 2020)
Xilinx: OpenCL devices and FPGAs (retrieved on October 07, 2020)
Zhou Wang, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Acknowledgements
This work has supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grand No. NRF-2019R1I1A3A01061310).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work has supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grand No. NRF-2019R1I1A3A01061310).
Rights and permissions
About this article
Cite this article
Kim, M., Baek, N. A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing. J Supercomput 77, 7351–7367 (2021). https://doi.org/10.1007/s11227-020-03581-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03581-8