High Level Data Structures for GPGPU Programming in a Statically Typed Language

Bourgoin, Mathias; Chailloux, Emmanuel; Lamotte, Jean-Luc

doi:10.1007/s10766-016-0424-7

High Level Data Structures for GPGPU Programming in a Statically Typed Language

Published: 11 May 2016

Volume 45, pages 242–261, (2017)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Mathias Bourgoin^1,2,
Emmanuel Chailloux³ &
Jean-Luc Lamotte³

396 Accesses
4 Citations
Explore all metrics

Abstract

To increase software performance, it is now common to use hardware accelerators. Currently, GPUs are the most widespread accelerators that can handle general computations. This requires to use GPGPU frameworks such as Cuda or OpenCL. Both are very low-level and make the benefit of GPGPU programming difficult to achieve. In particular, they require to write programs as a combination of two subprograms, and, to manually manage devices and memory transfers. This increases the complexity of the overall software design. The idea we develop in this paper is to guarantee expressiveness and safety for CPU and GPU computations and memory managements with high-level data-structures and static type-checking. In this paper, we present how statically typed languages, compilers and libraries help harness high level GPGPU programming. In particular, we show how we added high-level user-defined data structures to a GPGPU programming framework based on a statically typed programming language: OCaml. Thus, we describe the introduction of records and tagged unions shared between the host program and GPGPU kernels described via a domain specific language as well as a simple pattern matching control structure to manage them. Examples, practical tests and comparisons with state of the art tools, show that our solutions improve code design, productivity, and safety while providing a high level of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

An empirical study of automated unit test generation for Python

Article Open access 31 January 2023

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

Notes

CUDA C Programming Guide http://docs.nvidia.com/cuda.
OpenCL Specification http://www.khronos.org/opencl.
OpenMP. http://www.openmp.org/.
OpenACC. http://www.openacc.org/.
SPOC open-source distribution. http://www.algo-prog.info/spoc.
CuBLAS. http://developer.nvidia.com/cublas.
Magma. http://icl.eecs.utk.edu/magma.
OCaml-Ctypes. https://github.com/ocamllabs/ocaml-ctypes.
F# to OpenCL Compiler and Runtime. http://www.gabrielecocco.it/fscl
Aparapi. http://code.google.com/p/aparapi.
Thrust: C++ Template Library for CUDA. http://thrust.github.io/.

References

Augonnet, C., et al.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2009)
Article Google Scholar
Bergstrom, L., Fluet, M., Rainey, M., Reppy, J., Rosen, S., Shaw, A.: Data-only flattening for nested data parallelism. In: Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), pp. 81–92. ACM, New York, NY (2013)
Bergstrom, L., Reppy, J.: Nested data-parallelism on the GPU. In: Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP 2012), pp. 247–258 (2012)
Blelloch, G.E., et al.: Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput. 21(1), 4–14 (1994)
Article MathSciNet Google Scholar
Bourgoin, M., Chailloux, E.: GPGPU composition with OCaml. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY’14, pp. 32:32–32:37 (2014)
Bourgoin, M., Chailloux, E., Lamotte, J.L.: Efficient abstractions for GPGPU programming. Int. J. Parallel Prog. 42(4), 583–600 (2014)
Article Google Scholar
Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Not. 46(8), 47–56 (2011)
Article Google Scholar
Chakravarty, M., et al.: Accelerating Haskell array codes with multicore GPUs. In: Workshop on Declarative Aspects of Multicore Programming (DAMP), pp. 3–14 (2011)
Clifton-Everest, R., et al.: Embedding foreign code. In: International Symposium on Practical Aspects of Declarative Languages (PADL), pp. 136–151. Springer (2014)
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30, 389–406 (2004)
Article Google Scholar
Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling X10 to CUDA. In: ACM SIGPLAN X10 Workshop, X10 ’11, pp. 8:1–8:10. ACM (2011)
Esterie, P., et al.: The numerical template toolbox: a modern C++ design for scientific computing. J. Parallel Distrib. Comput. 74(12), 3240–3253 (2014)
Article Google Scholar
Gautier, T., et al.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1299–1308 (2013)
Maranget, L.: Compiling pattern matching to good decision trees. In: Workshop on ML, ML ’08, pp. 35–46. ACM (2008)
Masliah, I., Baboulin, M., Falcou, J.: Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures. In: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015). Helsinki, Finland (2015)
Nystrom, N., White, D., Das, K.: Firepile: Run-time compilation for GPUs in scala. In: Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, GPCE ’11, pp. 107–116. ACM (2011)
Rompf, T., et al.: Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In: Proceedings of the 40th Symposium on Principles of Programming Languages, POPL ’13. ACM (2013)
Rubinsteyn, A., et al.: Parakeet: a just-in-time parallel accelerator for python. In: The 4th USENIX Workshop on Hot Topics in Parallelism. USENIX (2012)
Scott, N.S., et al.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. pp. 2424–2449 (2009)
Vouillon, J., Balat, V.: From bytecode to javascript: the Js_of_ocaml compiler. Softw. Pract. Exp. 44(8), 951–972 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CNRS, VERIMAG, University of Grenoble Alpes, 38000, Grenoble, France
Mathias Bourgoin
INSA Centre Val de Loire, University of Orléans, LIFO EA 4022, 45067, Orléans, France
Mathias Bourgoin
Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, 75005, Paris, France
Emmanuel Chailloux & Jean-Luc Lamotte

Authors

Mathias Bourgoin
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Chailloux
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Lamotte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathias Bourgoin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bourgoin, M., Chailloux, E. & Lamotte, JL. High Level Data Structures for GPGPU Programming in a Statically Typed Language. Int J Parallel Prog 45, 242–261 (2017). https://doi.org/10.1007/s10766-016-0424-7

Download citation

Received: 03 September 2015
Accepted: 31 March 2016
Published: 11 May 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10766-016-0424-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High Level Data Structures for GPGPU Programming in a Statically Typed Language

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

An empirical study of automated unit test generation for Python

In-memory database acceleration on FPGAs: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High Level Data Structures for GPGPU Programming in a Statically Typed Language

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

An empirical study of automated unit test generation for Python

In-memory database acceleration on FPGAs: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation