skip to main content
10.1145/3091966.3091974acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Modular array-based GPU computing in a dynamically-typed language

Published: 18 June 2017 Publication History

Abstract

Nowadays, GPU accelerators are widely used in areas with large data-parallel computations such as scientific computations or neural networks. Programmers can either write code in low-level CUDA/OpenCL code or use a GPU extension for a high-level programming language for better productivity. Most extensions focus on statically-typed languages, but many programmers prefer dynamically-typed languages due to their simplicity and flexibility.
This paper shows how programmers can write high-level modular code in Ikra, a Ruby extension for array-based GPU computing. Programmers can compose GPU programs of multiple reusable parallel sections, which are subsequently fused into a small number of GPU kernels. We propose a seamless syntax for separating code regions that extensively use dynamic language features from those that are compiled for efficient execution. Moreover, we propose symbolic execution and a program analysis for kernel fusion to achieve performance that is close to hand-written CUDA code.

References

[1]
M. Abadi, L. Cardelli, B. Pierce, and G. Plotkin. Dynamic typing in a statically typed language. ACM Trans. Program. Lang. Syst., 13(2):237–268, April 1991.
[2]
M. M.T. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore GPUs. DAMP ’11, pages 3–14. ACM, 2011.
[3]
J. Filipoviˇc, M. Madzin, J. Fousek, and L. Matyska. Optimizing CUDA code by kernel fusion: application on BLAS. The Journal of Supercomputing, 71(10):3934–3957, 2015.
[4]
J. Fumero, M. Steuwer, L. Stadler, and C. Dubach. Just-in-time GPU compilation for interpreted languages with partial evaluation. VEE ’17, pages 60–73. ACM, 2017.
[5]
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.
[6]
T. Henriksen, K. F. Larsen, and C. E. Oancea. Design and GPGPU performance of Futhark’s redomap construct. ARRAY 2016, pages 17–24. ACM, 2016.
[7]
E. Holk, R. Newton, J. Siek, and A. Lumsdaine. Region-based memory management for GPU programming languages: Enabling rich data structures on a spartan host. OOPSLA ’14, pages 141–155. ACM.
[8]
F. B. Kjolstad and M. Snir. Ghost cell pattern. ParaPLoP ’10. ACM.
[9]
A. Klöckner, N. Pinto, Y. Lee, B. Catanzaro, O. Ivanov, and A. Fasih. PyCUDA and PyOpenCL: A scripting-based approach to GPU runtime code generation. Parallel Comput., 38(3):157–174, March 2012.
[10]
A. S. D. Lee and T. S. Abdelrahman. Launch-time optimization of OpenCL GPU kernels. GPGPU-10, pages 32–41. ACM, 2017.
[11]
B. Meyer. Object-Oriented Software Construction. Prentice-Hall, Inc., 1st edition, 1988.
[12]
S. Sato and H. Iwasaki. A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming, pages 79–94. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[13]
J. Shen, A. L. Varbanescu, X. Martorell, and H. Sips. A study of application kernel structure for data parallel applications. Technical report, Delft University of Technology, 2015.
[14]
M. Springer and H. Masuhara. Object support in an array-based GPGPU extension for Ruby. ARRAY 2016, pages 25–31. ACM, 2016.
[15]
M. Viñas, Z. Bozkus, and B. B. Fraguela. Exploiting heterogeneous parallelism with the heterogeneous programming library. J. Parallel Distrib. Comput., 73(12):1627–1638, December 2013.
[16]
M. Wahib and N. Maruyama. Scalable kernel fusion for memorybound GPU applications. SC ’14, pages 191–202. IEEE Press, 2014.
[17]
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. MICRO-45, pages 107–118. IEEE Computer Society, 2012.
[18]
T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to rule them all. Onward! 2013, pages 187–204. ACM, 2013.
[19]
Y. Yan, M. Grossman, and V. Sarkar. Jcuda: A programmer-friendly interface for accelerating Java programs with CUDA. Euro-Par ’09, pages 887–899. Springer-Verlag, 2009.

Cited By

View all
  • (2024)T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640410(1146-1164)Online publication date: 27-Apr-2024
  • (2024)CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00058(700-717)Online publication date: 2-Nov-2024
  • (2023)Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00026(140-153)Online publication date: 1-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ARRAY 2017: Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
June 2017
62 pages
ISBN:9781450350693
DOI:10.1145/3091966
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. GPGPU
  3. Ruby
  4. kernel fusion

Qualifiers

  • Research-article

Conference

PLDI '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 25 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640410(1146-1164)Online publication date: 27-Apr-2024
  • (2024)CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00058(700-717)Online publication date: 2-Nov-2024
  • (2023)Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00026(140-153)Online publication date: 1-Oct-2023
  • (2022)Demystifying BERT: System Design Implications2022 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC55918.2022.00033(296-309)Online publication date: Nov-2022
  • (2022)Automatic horizontal fusion for GPU kernelsProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741270(14-27)Online publication date: 2-Apr-2022
  • (2018)Exploiting high-performance heterogeneous hardware for Java programs using graalProceedings of the 15th International Conference on Managed Languages & Runtimes10.1145/3237009.3237016(1-13)Online publication date: 12-Sep-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media