Skip to main content
Log in

Support OpenCL 2.0 Compiler on LLVM for PTX Simulators

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Heterogeneous systems that consist of multiple CPUs and GPUs for high-performance computing are becoming increasingly popular, and OpenCL (Open Computing Language) provides a framework for writing programs that can be executed across heterogeneous devices. Compared with OpenCL 1.2, the new features of OpenCL 2.0 provide developers with better expressive power for programming heterogeneous computing environments. Currently, gem5-gpu, which includes gem5 and GPGPU-Sim, can offer an experimental simulation environment for OpenCL. In gem5-gpu, gem5 only supports CUDA, although GPGPU-Sim can support OpenCL by compiling an OpenCL kernel code to PTX code using real GPU drivers. However, this compilation flow in GPGPU-Sim can only support up to OpenCL 1.2. OpenCL 2.0 provides new features such as workgroup built-in functions, extended atomic built-in functions, and device-side enqueue. To support OpenCL 2.0, the compiler must be extended to enable the compilation of OpenCL 2.0 kernel code to PTX code. In this paper, the proposed compiler is modified from the low level virtual machine (LLVM) compiler to extend such features to enhance the emulator to support OpenCL 2.0. The proposed compiler creates local buffers for each workgroup to enable workgroup built-in functions and adds atomic built-in functions with memory order and memory scope for OpenCL 2.0 in NVPTX. Furthermore, the APIs available in CUDA are utilized to implement the OpenCL 2.0 device-side enqueue kernel and compilation schemes in Clang are revised. The AMD APP SDK 3.0 and NTU OpenCL benchmarks are used to verify that the proposed compiler can support the features of OpenCL 2.0.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Listing 1
Listing 2
Listing 3
Listing 4
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. AMD OpenCL Accelerated Parallel Processing (APP). http://developer.amd.com/tools-and-sdks/.

  2. Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M. (2009). Analyzing cuda workloads using a detailed gpu simulator. In IEEE International symposium on performance analysis of systems and software, 2009. ISPASS 2009 (pp. 163–174). IEEE.

  3. Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., et al. (2011). The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2), 1–7.

    Article  Google Scholar 

  4. CUDA Zone. https://developer.nvidia.com/cuda-zone.

  5. GPGPU-Sim. http://www.gpgpu-sim.org/.

  6. Khronos. https://www.khronos.org/.

  7. Khronos OpenCL Resources. https://www.khronos.org/opencl/resources.

  8. Lattner, C., & Adve, V. (2002). The llvm instruction set and compilation strategy. CS Dept. Univ. of Illinois at Urbana-Champaign, Tech. Report UIUCDCS.

  9. Lattner, C., & Adve, V. (2004). Llvm: a compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization (p. 75). IEEE Computer Society.

  10. libclc. http://libclc.llvm.org/.

  11. opencl2.0-sim. https://github.com/ntueclab/opencl2.0-sim.

  12. Power, J., Hestness, J., Orr, M.S., Hill, M.D., Wood, D.A. (2015). gem5-gpu: a heterogeneous cpu-gpu simulator. IEEE Computer Architecture Letters, 14(1), 34–36.

    Article  Google Scholar 

  13. Sharlet, D., Kunze, A., Junkins, S., Joshi, D. (2012). Shevlin park: implementing c++ amp with clang/llvm and opencl. In General Meeting of LLVM developers and users.

  14. Seven OpenCL benchmarks for heterogeneous system architecture evaluation. http://mtkntu.ntu.edu.tw/upload/edmfs150404031052772.pdf.

  15. The LLVM Compiler infrastructure. http://llvm.org/.

  16. Wang, L., Tsai, R.-W., Wang, S.-C., Chen, K.-C., Wang, P.-H., Cheng, H.-Y., Lee, Y.-C., Shu, S.-J., Yang, C.-C., Hsu, M.-Y., Kan, L.-C., Lee, C.-L., Yu, T.-C., Peng, R.-D., Yang, C.-L., Hwang, Y.-S., Lee, J.-K., Tsao, S.-L., Ouhyoun, M. (2017). Analyzing opencl 2.0 workloads using a heterogeneous cpu-gpu simulator. In Accepter by ISPASS 2017 poster. IEEE.

  17. Yang, C.-C., Wang, S.-C., Hsu, M.-Y., Chang, Y.-M., Hwang, Y.-S., Lee, J.-K. (2017). Opencl 2.0 compiler adaptation on llvm for ptx simulators. In Proceedings of the 2017 international workshop on embedded multicore systems (ICPP-EMS 2017) (pp. 53–58). IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenq-Kuen Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, CC., Wang, SC., Hsu, MY. et al. Support OpenCL 2.0 Compiler on LLVM for PTX Simulators. J Sign Process Syst 91, 261–271 (2019). https://doi.org/10.1007/s11265-018-1377-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-018-1377-4

Keywords

Navigation