skip to main content
10.1145/3178433.3178439acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout

Published:24 February 2018Publication History

ABSTRACT

Structure of Arrays (SOA) is a well-studied data layout technique for SIMD architectures. Previous work has shown that it can speed up applications in high-performance computing by several factors compared to a traditional Array of Structures (AOS) layout. However, most programmers are used to AOS-style programming, which is more readable and easier to maintain.

We present Ikra-Cpp, an embedded DSL for object-oriented programming in C++/CUDA. Ikra-Cpp's notation is very close to standard AOS-style C++ code, but data is layed out as SOA. This gives programmers the performance benefit of SOA and the expressiveness of AOS-style object-oriented programming at the same time. Ikra-Cpp is well integrated with C++ and lets programmers use C++ notation and syntax for classes, fields, member functions, constructors and instance creation.

References

  1. Gilbert Louis Bernstein, Chinmayee Shah, Crystal Lemire, Zachary Devito, Matthew Fisher, Philip Levis, and Pat Hanrahan. 2016. Ebb: A DSL for Physical Simulation on CPUs and GPUs. ACM Trans. Graph. 35, 2, Article 21 (May 2016), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Paul Besl. 2015. A case study comparing AoS (Arrays of Structures) and SoA (Structures of Arrays) data layouts for a compute-intensive loop run on Intel Xeon processors and Intel Xeon Phi product family coprocessors. Technical Report. Intel Corporation.Google ScholarGoogle Scholar
  3. James Brodman, Dmitry Babokin, Ilia Filippov, and Peng Tu. 2014. Writing Scalable SIMD Programs with ISPC (WPMVP '14). ACM, 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Calore, A. Gabbana, J. Kraus, E. Pellegrini, S.F. Schifano, and R. Tripiccione. 2016. Massively Parallel Lattice-Boltzmann Codes on Large GPU Clusters. Parallel Comput. 58, C (Oct. 2016), 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A Domain-specific Approach to Heterogeneous Parallelism (PPoPP '11). ACM, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. James O. Coplien. 1995. Curiously Recurring Template Patterns. C++ Rep. 7, 2 (Feb. 1995), 24--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Pawan Harish and P. J. Narayanan. 2007. Accelerating Large Graph Algorithms on the GPU Using CUDA (HiPC'07). Springer-Verlag, 197--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dirk Helbing. 2012. Agent-Based Modeling. In Social Self-Organization: Agent-Based Simulations and Experiments to Study Emergent Social Behavior. Springer-Verlag, 25--70.Google ScholarGoogle Scholar
  9. Bruce Hendrickson and Jonathan W. Berry. 2008. Graph Analysis with High-Performance Computing. Computing in Science and Engg. 10, 2 (March 2008), 14--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Holger Homann and Francois Laenen. 2017. SoAx: A generic C++ Structure of Arrays for handling Particles in HPC Codes. ArXiv e-prints, to appear in Comm. Phys. Comm. (Oct. 2017).Google ScholarGoogle Scholar
  11. Paul Hudak. 1998. Modular Domain Specific Languages and Tools (ICSR '98). IEEE Computer Society, 134--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. ISO. 2012. ISO/IEC 14882:2011 Information technology --- Programming languages --- C++. International Organization for Standardization. 1338 (est.) pages.Google ScholarGoogle Scholar
  13. Klaus Kofler, Biagio Cosenza, and Thomas Fahringer. 2015. Automatic Data Layout Optimizations for GPUs (Euro-Par 2015). Springer-Verlag, 263--274.Google ScholarGoogle Scholar
  14. Roland Leißa, Sebastian Hack, and Ingo Wald. 2012. Extending a C-like Language for Portable SIMD Programming (PPoPP '12). ACM, 65--74.Google ScholarGoogle Scholar
  15. Roland Leißa, Immanuel Haffner, and Sebastian Hack. 2014. Sierra: A SIMD Extension for C++ (WPMVP '14). ACM, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. 2008. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro 28, 2 (March 2008), 39--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing (SIGMOD '10). ACM, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Harris Mark. 2008. Optimizing parallel reduction in CUDA. Nvidia CUDA SDK 2 (2008).Google ScholarGoogle Scholar
  19. Toni Mattis, Johannes Henning, Patrick Rein, Robert Hirschfeld, and Malte Appeltauer. 2015. Columnar Objects: Improving the Performance of Analytical Applications (Onward! 2015). ACM, 197--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gang Mei and Hong Tian. 2016. Impact of data layouts on the efficiency of GPU-accelerated IDW interpolation. SpringerPlus 5, 1 (Feb. 2016).Google ScholarGoogle ScholarCross RefCross Ref
  21. Marjan Mernik, Jan Heering, and Anthony M. Sloane. 2005. When and How to Develop Domain-specific Languages. ACM Comput. Surv. 37, 4 (Dec. 2005), 316--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU Graph Traversal. SIGPLAN Not. 47, 8 (Feb. 2012), 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bertrand Meyer. 1997. Object-oriented Software Construction (2nd Ed.). Prentice-Hall, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Perhaad Mistry, Dana Schaa, Byunghyun Jang, David Kaeli, Albert Dvornik, and Dwight Meglan. 2011. Data Structures and Transformations for Physically Based Simulation on a GPU. In High Performance Computing for Computational Science -- VECPAR 2010: 9th Int. Conference, Revised Selected Papers. Springer-Verlag, 162--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Matt Pharr and William R. Mark. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Innovative Parallel Computing (InPar). IEEE, 1--13.Google ScholarGoogle Scholar
  26. Viera K. Proulx. 1998. Traffic Simulation: A Case Study for Teaching Object Oriented Design (SIGCSE '98). ACM, 48--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not. 48, 6 (June 2013), 519--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Richmond, S. Coakley, and D. M. Romano. 2009. A High Performance Agent Based Modelling Framework on Graphics Card Hardware with CUDA (AAMAS '09). International Foundation for Autonomous Agents and Multiagent Systems, 1125--1126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tiark Rompf, Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. Building-Blocks for Performance Oriented DSLs (DSL '11). 93--117.Google ScholarGoogle Scholar
  30. Alban Rousset, Bénédicte Herrmann, Christophe Lang, and Laurent Philippe. 2016. A survey on parallel and distributed multi-agent systems for high performance computing simulations. Computer Science Review 22, Supplement C (2016), 27--46.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jakob Siegel, Juergen Ributzka, and Xiaoming Li. 2009. CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator (ICPPW '09). IEEE Computer Society, 174--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Matthias Springer and Hidehiko Masuhara. 2016. Object Support in an Array-based GPGPU Extension for Ruby (ARRAY 2016). ACM, 25--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Benedikt Stefansson. 2000. Simulating Economic Agents in Swarm. In Economic Simulations in Swarm: Agent-Based Modelling and Object Oriented Programming. Springer US, 3--61.Google ScholarGoogle Scholar
  34. Bjarne Stroustrup. 2012. Foundations of C++ (ESOP 2012). Springer-Verlag, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Robert Strzodka. 2012. Chapter 31 - Abstraction for AoS and SoA Layout in C++. In GPU Computing Gems Jade Edition, Wen-mei W. Hwu (Ed.). Morgan Kaufmann, 429--441.Google ScholarGoogle Scholar
  36. Robert Strzodka. 2012. Data Layout Optimization for Multi-valued Containers in OpenCL. J. Parallel Distrib. Comput. 72, 9 (Sept. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embed. Comput. Syst. 13, 4s, Article 134 (April 2014), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Nicolas Weber and Michael Goesele. 2014. Auto-tuning Complex Array Layouts for GPUs (PGV '14). Eurographics Association, 57--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jianlong Zhong and Bingsheng He. 2013. Parallel Graph Processing on Graphics Processors Made Easy. Proc. VLDB Endow. 6, 12 (Aug. 2013), 1270--1273. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WPMVP'18: Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing
          February 2018
          68 pages
          ISBN:9781450356466
          DOI:10.1145/3178433

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 February 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          WPMVP'18 Paper Acceptance Rate8of12submissions,67%Overall Acceptance Rate20of30submissions,67%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader