Skip to main content
Log in

Efficient Abstractions for GPGPU Programming

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

General purpose (GP)GPU programming demands to couple highly parallel computing units with classic CPUs to obtain a high performance. Heterogenous systems lead to complex designs combining multiple paradigms and programming languages to manage each hardware architecture. In this paper, we present tools to harness GPGPU programming through the high-level OCaml programming language. We describe the SPOC library that allows to handle GPGPU subprograms (kernels) and data transfers between devices. We then present how SPOC expresses GPGPU kernel: through interoperability with common low-level extensions (from Cuda and OpenCL frameworks) but also via an embedded DSL for OCaml. Using simple benchmarks as well as a real world HPC software, we show that SPOC can offer a high performance while efficiently easing development. To allow better abstractions over tasks and data, we introduce some parallel skeletons built upon SPOC as well as composition constructs over those skeletons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.top500.org/list/2012/11/

References

  1. AMD: Aparapi. http://code.google.com/p/aparapi/

  2. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Experience Special Issue: Euro-Par 2009(23), 187–198 (2011)

  3. Beck, R., Larsen, H., Jensen, T., Thomsen, B.: Extending scala with general purpose GPU programming. Technical report, Adlborg University, Departement of Computer Science (2011)

  4. Bourgoin, M., Chailloux, E., Lamotte, J.L.: SPOC : GPGPU programming through stream processing with OCaml. Parallel Process. Lett. 22(2), 1–12 (2012)

    Google Scholar 

  5. Bourgoin, M., Chailloux, E., Lamotte, J.L.: High level GPGPU programming with parallel skeletons. In: Patterns for Parallel Programming on GPUs. Saxe-Coburg Publications (to appear). ISBN 978-1-874672-57-9

  6. Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Notices 46(8), 47 (2011)

    Article  Google Scholar 

  7. Cray Inc. CAPS Enterprise, N., Group, T.P.: OpenACC 1.0 specification (2011)

  8. Dolbeau, R., Bihan, S., Bodin, F.: HMPP: a hybrid multi-core parallel programming environment. In First Workshop on General Purpose Processing on Graphics Processing Units (2007)

  9. Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, HLPP ’10, pp. 5–14. ACM (2010)

  10. Fortin, P., Habel, R., Jezequel, F., Lamotte, J., Scott, N.: Deployment on gpus of an application in computational atomic physics. In: Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1359–1366. IEEE (2011)

  11. Leroy, X., Doligez, D., Firsch, A., Garrigue, J., Remy, D.R., Vouillon, J.: The OCaml system release 4.00: documentation and user’s manual. Technical report, Inria (2012). http://caml.inria.fr

  12. Munshi, A., et al.: The OpenCL Specification (2012). http://www.khronos.org/opencl

  13. Nvidia, C.: Cublas library (2012). http://developer.nvidia.com/cublas

  14. Nvidia, C.: Cuda C Programming Guide (2012). http://docs.nvidia.com/cuda/index.html

  15. Scott, N., Scott, M., Burke, P., Stitt, T., Faro-Maza, V., Denis, C., Maniopoulou, A.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. 180(12), 2424–2449 (2009)

    Article  MATH  Google Scholar 

  16. Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL-a portable skeleton library for high-level GPU programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1176–1182. IEEE (2011)

  17. Svensson, J.: Obsidian: GPU Kernel programming in Haskell. Technical report 77L, Computer Science and Enginering, Chalmers University of Technology and Gothenburg University (2011)

  18. Tarditi, D., Puri, S., Oglesby, J.: Accelerator: using data parallelism to program GPUs for general-purpose uses. ACM SIGARCH Comput. Archit. News 34(5), 325–335 (2006)

    Article  Google Scholar 

  19. Tomov, S., Nath, R., Du, P., Dongarra, J.: Magma users guide. ICL, UTK (2009)

Download references

Acknowledgments

The authors thank Stan Scott and the “High Performance and Distributed Computing” department of Queen’s University of Belfast (United Kingdom) for the 2DRMP library and the program PROP as well as Rachid Habel from Telecom SudParis for sharing his knowledge on PROP. This work is part of the OpenGPU project and is partially funded by SYSTEMATIC PARIS REGION SYSTEMS & ICT CLUSTER (http://opengpu.net/).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathias Bourgoin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bourgoin, M., Chailloux, E. & Lamotte, JL. Efficient Abstractions for GPGPU Programming. Int J Parallel Prog 42, 583–600 (2014). https://doi.org/10.1007/s10766-013-0261-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-013-0261-x

Keywords

Navigation