Efficient Abstractions for GPGPU Programming

Bourgoin, Mathias; Chailloux, Emmanuel; Lamotte, Jean-Luc

doi:10.1007/s10766-013-0261-x

Efficient Abstractions for GPGPU Programming

Published: 09 August 2013

Volume 42, pages 583–600, (2014)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Mathias Bourgoin¹,
Emmanuel Chailloux¹ &
Jean-Luc Lamotte¹

617 Accesses
16 Citations
6 Altmetric
Explore all metrics

Abstract

General purpose (GP)GPU programming demands to couple highly parallel computing units with classic CPUs to obtain a high performance. Heterogenous systems lead to complex designs combining multiple paradigms and programming languages to manage each hardware architecture. In this paper, we present tools to harness GPGPU programming through the high-level OCaml programming language. We describe the SPOC library that allows to handle GPGPU subprograms (kernels) and data transfers between devices. We then present how SPOC expresses GPGPU kernel: through interoperability with common low-level extensions (from Cuda and OpenCL frameworks) but also via an embedded DSL for OCaml. Using simple benchmarks as well as a real world HPC software, we show that SPOC can offer a high performance while efficiently easing development. To allow better abstractions over tasks and data, we introduce some parallel skeletons built upon SPOC as well as composition constructs over those skeletons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Notes

http://www.top500.org/list/2012/11/

References

AMD: Aparapi. http://code.google.com/p/aparapi/
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Experience Special Issue: Euro-Par 2009(23), 187–198 (2011)
Beck, R., Larsen, H., Jensen, T., Thomsen, B.: Extending scala with general purpose GPU programming. Technical report, Adlborg University, Departement of Computer Science (2011)
Bourgoin, M., Chailloux, E., Lamotte, J.L.: SPOC : GPGPU programming through stream processing with OCaml. Parallel Process. Lett. 22(2), 1–12 (2012)
Google Scholar
Bourgoin, M., Chailloux, E., Lamotte, J.L.: High level GPGPU programming with parallel skeletons. In: Patterns for Parallel Programming on GPUs. Saxe-Coburg Publications (to appear). ISBN 978-1-874672-57-9
Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Notices 46(8), 47 (2011)
Article Google Scholar
Cray Inc. CAPS Enterprise, N., Group, T.P.: OpenACC 1.0 specification (2011)
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: a hybrid multi-core parallel programming environment. In First Workshop on General Purpose Processing on Graphics Processing Units (2007)
Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, HLPP ’10, pp. 5–14. ACM (2010)
Fortin, P., Habel, R., Jezequel, F., Lamotte, J., Scott, N.: Deployment on gpus of an application in computational atomic physics. In: Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1359–1366. IEEE (2011)
Leroy, X., Doligez, D., Firsch, A., Garrigue, J., Remy, D.R., Vouillon, J.: The OCaml system release 4.00: documentation and user’s manual. Technical report, Inria (2012). http://caml.inria.fr
Munshi, A., et al.: The OpenCL Specification (2012). http://www.khronos.org/opencl
Nvidia, C.: Cublas library (2012). http://developer.nvidia.com/cublas
Nvidia, C.: Cuda C Programming Guide (2012). http://docs.nvidia.com/cuda/index.html
Scott, N., Scott, M., Burke, P., Stitt, T., Faro-Maza, V., Denis, C., Maniopoulou, A.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. 180(12), 2424–2449 (2009)
Article MATH Google Scholar
Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL-a portable skeleton library for high-level GPU programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1176–1182. IEEE (2011)
Svensson, J.: Obsidian: GPU Kernel programming in Haskell. Technical report 77L, Computer Science and Enginering, Chalmers University of Technology and Gothenburg University (2011)
Tarditi, D., Puri, S., Oglesby, J.: Accelerator: using data parallelism to program GPUs for general-purpose uses. ACM SIGARCH Comput. Archit. News 34(5), 325–335 (2006)
Article Google Scholar
Tomov, S., Nath, R., Du, P., Dongarra, J.: Magma users guide. ICL, UTK (2009)

Download references

Acknowledgments

The authors thank Stan Scott and the “High Performance and Distributed Computing” department of Queen’s University of Belfast (United Kingdom) for the 2DRMP library and the program PROP as well as Rachid Habel from Telecom SudParis for sharing his knowledge on PROP. This work is part of the OpenGPU project and is partially funded by SYSTEMATIC PARIS REGION SYSTEMS & ICT CLUSTER (http://opengpu.net/).

Author information

Authors and Affiliations

Laboratoire d’Informatique de Paris 6 (LIP6-UMR 7606), Université Pierre et Marie Curie (UPMC-Paris 6), Sorbonne Universités, 4 place Jussieu, 75005 , Paris, France
Mathias Bourgoin, Emmanuel Chailloux & Jean-Luc Lamotte

Authors

Mathias Bourgoin
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Chailloux
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Lamotte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathias Bourgoin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bourgoin, M., Chailloux, E. & Lamotte, JL. Efficient Abstractions for GPGPU Programming. Int J Parallel Prog 42, 583–600 (2014). https://doi.org/10.1007/s10766-013-0261-x

Download citation

Received: 02 March 2013
Accepted: 27 July 2013
Published: 09 August 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10766-013-0261-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Abstractions for GPGPU Programming

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Efficient High-Level Programming in Plain Java

Shared Memory Parallelism in Modern C++ and HPX

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Abstractions for GPGPU Programming

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Efficient High-Level Programming in Plain Java

Shared Memory Parallelism in Modern C++ and HPX

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation