Abstract
General purpose (GP)GPU programming demands to couple highly parallel computing units with classic CPUs to obtain a high performance. Heterogenous systems lead to complex designs combining multiple paradigms and programming languages to manage each hardware architecture. In this paper, we present tools to harness GPGPU programming through the high-level OCaml programming language. We describe the SPOC library that allows to handle GPGPU subprograms (kernels) and data transfers between devices. We then present how SPOC expresses GPGPU kernel: through interoperability with common low-level extensions (from Cuda and OpenCL frameworks) but also via an embedded DSL for OCaml. Using simple benchmarks as well as a real world HPC software, we show that SPOC can offer a high performance while efficiently easing development. To allow better abstractions over tasks and data, we introduce some parallel skeletons built upon SPOC as well as composition constructs over those skeletons.
Similar content being viewed by others
References
AMD: Aparapi. http://code.google.com/p/aparapi/
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Experience Special Issue: Euro-Par 2009(23), 187–198 (2011)
Beck, R., Larsen, H., Jensen, T., Thomsen, B.: Extending scala with general purpose GPU programming. Technical report, Adlborg University, Departement of Computer Science (2011)
Bourgoin, M., Chailloux, E., Lamotte, J.L.: SPOC : GPGPU programming through stream processing with OCaml. Parallel Process. Lett. 22(2), 1–12 (2012)
Bourgoin, M., Chailloux, E., Lamotte, J.L.: High level GPGPU programming with parallel skeletons. In: Patterns for Parallel Programming on GPUs. Saxe-Coburg Publications (to appear). ISBN 978-1-874672-57-9
Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Notices 46(8), 47 (2011)
Cray Inc. CAPS Enterprise, N., Group, T.P.: OpenACC 1.0 specification (2011)
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: a hybrid multi-core parallel programming environment. In First Workshop on General Purpose Processing on Graphics Processing Units (2007)
Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, HLPP ’10, pp. 5–14. ACM (2010)
Fortin, P., Habel, R., Jezequel, F., Lamotte, J., Scott, N.: Deployment on gpus of an application in computational atomic physics. In: Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1359–1366. IEEE (2011)
Leroy, X., Doligez, D., Firsch, A., Garrigue, J., Remy, D.R., Vouillon, J.: The OCaml system release 4.00: documentation and user’s manual. Technical report, Inria (2012). http://caml.inria.fr
Munshi, A., et al.: The OpenCL Specification (2012). http://www.khronos.org/opencl
Nvidia, C.: Cublas library (2012). http://developer.nvidia.com/cublas
Nvidia, C.: Cuda C Programming Guide (2012). http://docs.nvidia.com/cuda/index.html
Scott, N., Scott, M., Burke, P., Stitt, T., Faro-Maza, V., Denis, C., Maniopoulou, A.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. 180(12), 2424–2449 (2009)
Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL-a portable skeleton library for high-level GPU programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1176–1182. IEEE (2011)
Svensson, J.: Obsidian: GPU Kernel programming in Haskell. Technical report 77L, Computer Science and Enginering, Chalmers University of Technology and Gothenburg University (2011)
Tarditi, D., Puri, S., Oglesby, J.: Accelerator: using data parallelism to program GPUs for general-purpose uses. ACM SIGARCH Comput. Archit. News 34(5), 325–335 (2006)
Tomov, S., Nath, R., Du, P., Dongarra, J.: Magma users guide. ICL, UTK (2009)
Acknowledgments
The authors thank Stan Scott and the “High Performance and Distributed Computing” department of Queen’s University of Belfast (United Kingdom) for the 2DRMP library and the program PROP as well as Rachid Habel from Telecom SudParis for sharing his knowledge on PROP. This work is part of the OpenGPU project and is partially funded by SYSTEMATIC PARIS REGION SYSTEMS & ICT CLUSTER (http://opengpu.net/).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bourgoin, M., Chailloux, E. & Lamotte, JL. Efficient Abstractions for GPGPU Programming. Int J Parallel Prog 42, 583–600 (2014). https://doi.org/10.1007/s10766-013-0261-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-013-0261-x