skip to main content
10.1145/2935323.2935327acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Object support in an array-based GPGPU extension for Ruby

Published: 02 June 2016 Publication History

Abstract

This paper presents implementation and optimization techniques to support objects in Ikra, an array-based parallel extension to Ruby with dynamic compilation. The high-level goal of Ikra is to allow developers to exploit GPU-based high-performance computing without paying much attention to intricate details of the underlying GPU infrastructure and CUDA. Ikra supports dynamically-typed object-oriented programming in Ruby and performs a number of optimizations. It supports parallel operations (e.g., map, each) on arrays of polymorphic objects, allowing polymorphic method calls inside a kernel by compiling them to conditional branches. To reduce branch divergence, Ikra shuffles thread assignments to base array elements based on runtime types of elements. To facilitate memory coalescing, Ikra stores objects in a structure-of-arrays (SoA) representation (columnar object layout). To eliminate intermediate data in global memory, Ikra merges cascaded parallel sections into one kernel using symbolic execution.

References

[1]
Daniel Abadi, Peter A. Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. The design and implementation of modern column-oriented database systems. Foundations and Trends in Databases, 5(3):197–280, 2013.
[2]
Daniel Abadi, Samuel Madden, and Miguel Ferreira. Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, pages 671–682, New York, NY, USA, 2006. ACM.
[3]
Martin Abadi, Luca Cardelli, Benjamin Pierce, and Gordon Plotkin. Dynamic typing in a statically-typed language. In Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’89, pages 213–227, New York, NY, USA, 1989. ACM.
[4]
James Abel, Kumar Balasubramanian, Mike Bargeron, Tom Craver, and Mike Phlipot. Applications tuning for streaming SIMD extensions. Intel Technology Journal, (Q2):13, May 1999.
[5]
Peter Bakkum and Kevin Skadron. Accelerating SQL database operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU- 3, pages 94–103, New York, NY, USA, 2010. ACM.
[6]
Edward Corwin and Antonette Logar. Sorting in linear time - variations on the bucket sort. J. Comput. Sci. Coll., 20(1):197–202, October 2004.
[7]
Wu-chun Feng and Shucai Xiao. To GPU synchronize or not GPU synchronize? In International Symposium on Circuits and Systems (ISCAS 2010), pages 3801–3804. IEEE, 2010.
[8]
Steffen Frey, Guido Reina, and Thomas Ertl. SIMT microscheduling: Reducing thread stalling in divergent iterative algorithms. In 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2012, pages 399–406, 2012.
[9]
Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 2 edition, 2008.
[10]
Tianyi David Han and Tarek S. Abdelrahman. Reducing branch divergence in GPU programs. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 3:1–3:8, New York, NY, USA, 2011. ACM.
[11]
Dirk Helbing. Social Self-Organization: Agent-Based Simulations and Experiments to Study Emergent Social Behavior, chapter Agent-Based Modeling, pages 25–70. Springer Berlin Heidelberg, 2012.
[12]
Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, and Vivek Sarkar. Compiling and optimizing Java 8 programs for GPU execution. In 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2015.
[13]
Glenn Krasner, editor. Smalltalk-80: Bits of History, Words of Advice. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1983.
[14]
Hidehiko Masuhara and Yusuke Nishiguchi. A data-parallel extension to ruby for GPGPU: Toward a framework for implementing domainspecific optimizations. In Proceedings of the 9th ECOOP Workshop on Reflection, AOP, and Meta-Data for Software Evolution, RAM-SE ’12, pages 3–6, New York, NY, USA, 2012. ACM.
[15]
Toni Mattis, Johannes Henning, Patrick Rein, Robert Hirschfeld, and Malte Appeltauer. Columnar objects: Improving the performance of analytical applications. In 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!), Onward! 2015, pages 197–210, New York, NY, USA, 2015. ACM.
[16]
Gang Mei and Hong Tian. Impact of data layouts on the efficiency of GPU-accelerated IDW interpolation. SpringerPlus, 5(1):1–18, 2016.
[17]
Nathaniel Nystrom, Derek White, and Kishen Das. Firepile: Runtime compilation for GPUs in Scala. In Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, GPCE ’11, pages 107–116, New York, NY, USA, 2011. ACM.
[18]
Ritesh A. Patel, Yao Zhang, Jason Mak, and John D. Owens. Parallel lossless data compression on the GPU. In Proceedings of Innovative Parallel Computing (InPar ’12), May 2012.
[19]
Hasso Plattner. A common database approach for OLTP and OLAP using an in-memory column database. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, pages 1–2, New York, NY, USA, 2009. ACM.
[20]
Piotr Przymus and Krzysztof Kaczmarski. On the Move to Meaningful Internet Systems 2012 Workshops: OTM Academy, Industry Case Studies Program, EI2N, INBAST, META4eS, OnToContent, ORM, SeDeS, SINCOM, and SOMOCO 2012.Proceedings, chapter Improving Efficiency of Data Intensive Applications on GPU Using Lightweight Compression, pages 3–12. Springer Berlin Heidelberg, 2012.
[21]
Koichi Sasada. YARV: Yet Another RubyVM: Innovating the ruby interpreter. In Companion to the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’05, pages 158–159, New York, NY, USA, 2005. ACM.
[22]
Vasily Volkov. Better performance at lower occupancy. Proceedings of the GPU Technology Conference, GTC, 10:16, 2010.
[23]
Mohamed Wahib and Naoya Maruyama. Scalable kernel fusion for memory-bound GPU applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pages 191–202, Piscataway, NJ, USA, 2014.
[24]
IEEE Press.
[25]
Haicheng Wu, Gregory Diamos, Srihari Cadambi, and Sudhakar Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’12, pages 107–118, Washington, DC, USA, 2012. IEEE Computer Society.
[26]
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, and Xipeng Shen. Streamlining GPU applications on the fly: Thread divergence elimination through runtime thread-data remapping. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS ’10, pages 115–126, New York, NY, USA, 2010. ACM.

Cited By

View all
  • (2021)Characterizing Massively Parallel Polymorphism2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00037(205-216)Online publication date: Mar-2021
  • (2017)User-friendly interface for GPGPU programming2017 6th National Conference on Technology and Management (NCTM)10.1109/NCTM.2017.7872835(99-104)Online publication date: Jan-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ARRAY 2016: Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
June 2016
68 pages
ISBN:9781450343848
DOI:10.1145/2935323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. GPGPU
  3. Ruby
  4. object-oriented programming

Qualifiers

  • Research-article

Conference

PLDI '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 25 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Characterizing Massively Parallel Polymorphism2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00037(205-216)Online publication date: Mar-2021
  • (2017)User-friendly interface for GPGPU programming2017 6th National Conference on Technology and Management (NCTM)10.1109/NCTM.2017.7872835(99-104)Online publication date: Jan-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media