ABSTRACT
This paper presents GPotion, a DSL for GPU programming embedded in the Elixir functional language. GPotion allows programmers to write low-level GPU kernels, similar to CUDA kernels, in Elixir but also provides high-level facilities like, garbage collection, type inference and simplified data transfer. Preliminary experiments demonstrate that GPotion allows fast and efficient kernels with little overhead in comparison to pure CUDA. GPotion is implemented using metaprogramming features of Elixir, without having to modify Elixir’s compiler. The source code for GPotion and the benchmarks used in the experiments are available in a GitHub repository1.
- 2021. OpenACC Programming and Best Practices Guide May 2021. WWW page, https://www.openacc.org/sites/default/files/inline-files/OpenACC_Programming_Guide_0_0.pdf.Google Scholar
- 2023. CUDA Toolkit Documentation 12.1 Update 1. WWW page, https://docs.nvidia.com/cuda/.Google Scholar
- 2023. Introduction to HIP Programming Guide. WWW page, https://docs.amd.com/bundle/HIP-Programming-Guide-v5.3/.Google Scholar
- 2023. OpenCL. WWW page, https://www.opencl.org/.Google Scholar
- 2023. The Akka Framework. WWW page, https://akka.io/.Google Scholar
- 2023. THe CBLAS library. WWW page, https://www.gnu.org/software/gsl/doc/html/cblas.html.Google Scholar
- 2023. The Elixir Language. WWW page, https://elixir-lang.org/.Google Scholar
- 2023. The Erlang language. WWW page, https://www.erlang.org.Google Scholar
- 2023. THe Matrex library. WWW page, https://hexdocs.pm/matrex/Matrex.html.Google Scholar
- 2023. The NIFs library. WWW page, https://www.erlang.org/doc/man/erl_nif.html.Google Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: a system for large-scale machine learning.. In Osdi, Vol. 16. Savannah, GA, USA, 265–283.Google Scholar
- Joe Armstrong. 2003. Making reliable distributed systems in the presence of software errors. Ph. D. Dissertation. Royal Institute of Technology, Stockholm, Sweden.Google Scholar
- Tim Besard, Christophe Foket, and Bjorn De Sutter. 2019. Effective Extensible Programming: Unleashing Julia on GPUs. IEEE Transactions on Parallel and Distributed Systems 30, 4 (2019), 827–841. https://doi.org/10.1109/TPDS.2018.2872064Google ScholarDigital Library
- Carl Camilleri, Joseph G. Vella, and Vitezslav Nezval. 2023. Actor Model Frameworks: An Empirical Performance Analysis. In Key Digital Trends Shaping the Future of Information and Management Science, Lalit Garg, Dilip Singh Sisodia, Nishtha Kesswani, Joseph G. Vella, Imene Brigui, Sanjay Misra, and Deepak Singh (Eds.). Springer International Publishing, Cham, 461–472.Google Scholar
- Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2011. Copperhead: Compiling an Embedded Data Parallel Language. SIGPLAN Not. 46, 8 (feb 2011), 47–56. https://doi.org/10.1145/2038037.1941562Google ScholarDigital Library
- Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming (Austin, Texas, USA) (DAMP ’11). Association for Computing Machinery, New York, NY, USA, 3–14. https://doi.org/10.1145/1926354.1926358Google ScholarDigital Library
- Dominik Charousset, Raphael Hiesgen, and Thomas C. Schmidt. 2014. CAF - the C++ Actor Framework for Scalable and Resource-Efficient Applications. In Proceedings of the 4th International Workshop on Programming Based on Actors Agents & Decentralized Control (Portland, Oregon, USA) (AGERE! ’14). Association for Computing Machinery, New York, NY, USA, 15–28. https://doi.org/10.1145/2687357.2687363Google ScholarDigital Library
- Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a High-Level Language for GPUs: (Via Language Support for Architectures and Compilers). SIGPLAN Not. 47, 6 (jun 2012), 1–12. https://doi.org/10.1145/2345156.2254066Google ScholarDigital Library
- Tianyi David Han and Tarek S. Abdelrahman. 2011. hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems 22, 1 (2011), 78–90. https://doi.org/10.1109/TPDS.2010.62Google ScholarDigital Library
- Paul Harvey, Kristian Hentschel, and Joseph Sventek. 2015. Parallel Programming in Actor-Based Applications via OpenCL. In Proceedings of the 16th Annual Middleware Conference (Vancouver, BC, Canada) (Middleware ’15). Association for Computing Machinery, New York, NY, USA, 162–172. https://doi.org/10.1145/2814576.2814732Google ScholarDigital Library
- Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: Purely Functional GPU-Programming with Nested Parallelism and in-Place Array Updates. SIGPLAN Not. 52, 6 (jun 2017), 556–571. https://doi.org/10.1145/3140587.3062354Google ScholarDigital Library
- Carl Hewitt, Peter Bishop, and Richard Steiger. 1973. A Universal Modular ACTOR Formalism for Artificial Intelligence. In Proceedings of the 3rd International Joint Conference on Artificial Intelligence (Stanford, USA) (IJCAI’73). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 235–245.Google ScholarDigital Library
- Raphael Hiesgen, Dominik Charousset, and Thomas C. Schmidt. 2015. Manyfold Actors: Extending the C++ Actor Framework to Heterogeneous Many-Core Machines Using OpenCL. In Proceedings of the 5th International Workshop on Programming Based on Actors, Agents, and Decentralized Control (Pittsburgh, PA, USA) (AGERE! 2015). Association for Computing Machinery, New York, NY, USA, 45–56. https://doi.org/10.1145/2824815.2824820Google ScholarDigital Library
- Raphael Hiesgen, Dominik Charousset, and Thomas C. Schmidt. 2018. OpenCL Actors – Adding Data Parallelism to Actor-Based Programming with CAF. In Lecture Notes in Computer Science. Springer International Publishing, 59–93. https://doi.org/10.1007/978-3-030-00302-9_3Google ScholarCross Ref
- Pieter Hijma, Stijn Heldens, Alessio Sclocco, Ben van Werkhoven, and Henri E. Bal. 2023. Optimization Techniques for GPU Programming. ACM Comput. Surv. 55, 11, Article 239 (mar 2023), 81 pages. https://doi.org/10.1145/3570638Google ScholarDigital Library
- Eric Holk, Milinda Pathirage, Arun Chauhan, Andrew Lumsdaine, and Nicholas D. Matsakis. 2013. GPU Programming in Rust: Implementing High-Level Abstractions in a Systems-Level Language. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. 315–324. https://doi.org/10.1109/IPDPSW.2013.173Google ScholarDigital Library
- John Högberg. 2020. A brief introduction to BEAM. WWW page, https://www.erlang.org/blog/a-brief-beam-primer/.Google Scholar
- Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, and Vivek Sarkar. 2015. Compiling and optimizing java 8 programs for gpu execution. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 419–431.Google ScholarDigital Library
- Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (Austin, Texas) (LLVM ’15). Association for Computing Machinery, New York, NY, USA, Article 7, 6 pages. https://doi.org/10.1145/2833157.2833162Google ScholarDigital Library
- Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding Compiled GPU Functions in Haskell. In Proceedings of the Third ACM Haskell Symposium on Haskell (Baltimore, Maryland, USA) (Haskell ’10). Association for Computing Machinery, New York, NY, USA, 67–78. https://doi.org/10.1145/1863523.1863533Google ScholarDigital Library
- Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, and Ben Lippmeier. 2013. Optimising Purely Functional GPU Programs. SIGPLAN Not. 48, 9 (sep 2013), 49–60. https://doi.org/10.1145/2544174.2500595Google ScholarDigital Library
- Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, and Ryan R. Newton. 2015. Type-Safe Runtime Code Generation: Accelerate to LLVM. SIGPLAN Not. 50, 12 (aug 2015), 201–212. https://doi.org/10.1145/2887747.2804313Google ScholarDigital Library
- Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. HIPAcc: A Domain-Specific Language and Compiler for Image Processing. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2016), 210–224. https://doi.org/10.1109/TPDS.2015.2394802Google ScholarDigital Library
- Thomas Nelson. 2022. Introducing Microsoft Orleans. In Introducing Microsoft Orleans: Implementing Cloud-Native Services with a Virtual Actor Framework. Springer, 17–27.Google Scholar
- ROYUD Nishino and Shohei Hido Crissman Loomis. 2017. Cupy: A numpy-compatible library for nvidia gpu calculations. 31st confernce on neural information processing systems 151, 7 (2017).Google Scholar
- NVIDIA. 2023. Fundamentals of Accelerated Computing with CUDA C/C++. Online Course NVIDIA Deep Learning Institute, https://courses.nvidia.com/courses/course-v1:DLI+C-AC-01+V1/.Google Scholar
- OpenMP Architecture Review Board. 2023. OpenMP Application Programming Interface, Version 5.0 November 2018. WWW page, https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf.Google Scholar
- Philip C. Pratt-Szeliga, James W. Fawcett, and Roy D. Welch. 2012. Rootbeer: Seamlessly Using GPUs from Java. In 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems. 375–380. https://doi.org/10.1109/HPCC.2012.57Google ScholarDigital Library
- Dinei A. Rockenbach, Júnior Löff, Gabriell Araujo, Dalvan Griebler, and Luiz Gustavo Fernandes. 2022. High-Level Stream and Data Parallelism in C++ for GPUs. In Proceedings of the XXVI Brazilian Symposium on Programming Languages (Virtual Event, Brazil) (SBLP ’22). Association for Computing Machinery, New York, NY, USA, 41–49. https://doi.org/10.1145/3561320.3561327Google ScholarDigital Library
- Alex Rubinsteyn, Eric Hielscher, Nathaniel Weinman, and Dennis Shasha. 2012. Parakeet: A Just-in-Time Parallel Accelerator for Python. In Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism (Berkeley, CA) (HotPar’12). USENIX Association, USA, 14.Google Scholar
- Jason Sanders and Edward Kandrot. 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.Google ScholarDigital Library
- Satish Narayana Srirama, Freddy Marcelo Surriabre Dick, and Mainak Adhikari. 2021. Akka framework based on the Actor model for executing distributed Fog Computing applications. Future Generation Computer Systems 117 (2021), 439–452. https://doi.org/10.1016/j.future.2020.12.011Google ScholarCross Ref
- Satish Narayana Srirama and Deepika Vemuri. 2023. CANTO: An actor model-based distributed fog framework supporting neural networks training in IoT applications. Computer Communications 199 (2023), 1–9. https://doi.org/10.1016/j.comcom.2022.12.007Google ScholarDigital Library
- Andrew Stromme, Ryan Carlson, and Tia Newhall. 2012. Chestnut: A GPU Programming Language for Non-Experts. In Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores (New Orleans, Louisiana) (PMAM ’12). Association for Computing Machinery, New York, NY, USA, 156–167. https://doi.org/10.1145/2141702.2141720Google ScholarDigital Library
- Joel Svensson, Koen Claessen, and Mary Sheeran. 2010. GPGPU kernel implementation and refinement using Obsidian. Procedia Computer Science 1, 1 (2010), 2065–2074. https://doi.org/10.1016/j.procs.2010.04.231 ICCS 2010.Google ScholarCross Ref
- Ruomeng (Cocoa) Xu, Anna Lito Michala, and Phil Trinder. 2022. CAEFL: Composable and Environment Aware Federated Learning Models. In Proceedings of the 21st ACM SIGPLAN International Workshop on Erlang (Ljubljana, Slovenia) (Erlang 2022). Association for Computing Machinery, New York, NY, USA, 9–20. https://doi.org/10.1145/3546186.3549927Google ScholarDigital Library
- Yonghong Yan, Max Grossman, and Vivek Sarkar. 2009. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA. In Euro-Par 2009 Parallel Processing, Henk Sips, Dick Epema, and Hai-Xiang Lin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 887–899.Google Scholar
Index Terms
- GPotion: An embedded DSL for GPU programming in Elixir
Recommendations
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Comments