ABSTRACT
Structure of Arrays (SOA) is a well-studied data layout technique for SIMD architectures. Previous work has shown that it can speed up applications in high-performance computing by several factors compared to a traditional Array of Structures (AOS) layout. However, most programmers are used to AOS-style programming, which is more readable and easier to maintain.
We present Ikra-Cpp, an embedded DSL for object-oriented programming in C++/CUDA. Ikra-Cpp's notation is very close to standard AOS-style C++ code, but data is layed out as SOA. This gives programmers the performance benefit of SOA and the expressiveness of AOS-style object-oriented programming at the same time. Ikra-Cpp is well integrated with C++ and lets programmers use C++ notation and syntax for classes, fields, member functions, constructors and instance creation.
- Gilbert Louis Bernstein, Chinmayee Shah, Crystal Lemire, Zachary Devito, Matthew Fisher, Philip Levis, and Pat Hanrahan. 2016. Ebb: A DSL for Physical Simulation on CPUs and GPUs. ACM Trans. Graph. 35, 2, Article 21 (May 2016), 12 pages. Google ScholarDigital Library
- Paul Besl. 2015. A case study comparing AoS (Arrays of Structures) and SoA (Structures of Arrays) data layouts for a compute-intensive loop run on Intel Xeon processors and Intel Xeon Phi product family coprocessors. Technical Report. Intel Corporation.Google Scholar
- James Brodman, Dmitry Babokin, Ilia Filippov, and Peng Tu. 2014. Writing Scalable SIMD Programs with ISPC (WPMVP '14). ACM, 25--32. Google ScholarDigital Library
- E. Calore, A. Gabbana, J. Kraus, E. Pellegrini, S.F. Schifano, and R. Tripiccione. 2016. Massively Parallel Lattice-Boltzmann Codes on Large GPU Clusters. Parallel Comput. 58, C (Oct. 2016), 1--24. Google ScholarDigital Library
- Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A Domain-specific Approach to Heterogeneous Parallelism (PPoPP '11). ACM, 35--46. Google ScholarDigital Library
- James O. Coplien. 1995. Curiously Recurring Template Patterns. C++ Rep. 7, 2 (Feb. 1995), 24--27. Google ScholarDigital Library
- Pawan Harish and P. J. Narayanan. 2007. Accelerating Large Graph Algorithms on the GPU Using CUDA (HiPC'07). Springer-Verlag, 197--208. Google ScholarDigital Library
- Dirk Helbing. 2012. Agent-Based Modeling. In Social Self-Organization: Agent-Based Simulations and Experiments to Study Emergent Social Behavior. Springer-Verlag, 25--70.Google Scholar
- Bruce Hendrickson and Jonathan W. Berry. 2008. Graph Analysis with High-Performance Computing. Computing in Science and Engg. 10, 2 (March 2008), 14--19. Google ScholarDigital Library
- Holger Homann and Francois Laenen. 2017. SoAx: A generic C++ Structure of Arrays for handling Particles in HPC Codes. ArXiv e-prints, to appear in Comm. Phys. Comm. (Oct. 2017).Google Scholar
- Paul Hudak. 1998. Modular Domain Specific Languages and Tools (ICSR '98). IEEE Computer Society, 134--142. Google ScholarDigital Library
- ISO. 2012. ISO/IEC 14882:2011 Information technology --- Programming languages --- C++. International Organization for Standardization. 1338 (est.) pages.Google Scholar
- Klaus Kofler, Biagio Cosenza, and Thomas Fahringer. 2015. Automatic Data Layout Optimizations for GPUs (Euro-Par 2015). Springer-Verlag, 263--274.Google Scholar
- Roland Leißa, Sebastian Hack, and Ingo Wald. 2012. Extending a C-like Language for Portable SIMD Programming (PPoPP '12). ACM, 65--74.Google Scholar
- Roland Leißa, Immanuel Haffner, and Sebastian Hack. 2014. Sierra: A SIMD Extension for C++ (WPMVP '14). ACM, 17--24. Google ScholarDigital Library
- Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. 2008. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro 28, 2 (March 2008), 39--55. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing (SIGMOD '10). ACM, 135--146. Google ScholarDigital Library
- Harris Mark. 2008. Optimizing parallel reduction in CUDA. Nvidia CUDA SDK 2 (2008).Google Scholar
- Toni Mattis, Johannes Henning, Patrick Rein, Robert Hirschfeld, and Malte Appeltauer. 2015. Columnar Objects: Improving the Performance of Analytical Applications (Onward! 2015). ACM, 197--210. Google ScholarDigital Library
- Gang Mei and Hong Tian. 2016. Impact of data layouts on the efficiency of GPU-accelerated IDW interpolation. SpringerPlus 5, 1 (Feb. 2016).Google ScholarCross Ref
- Marjan Mernik, Jan Heering, and Anthony M. Sloane. 2005. When and How to Develop Domain-specific Languages. ACM Comput. Surv. 37, 4 (Dec. 2005), 316--344. Google ScholarDigital Library
- Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU Graph Traversal. SIGPLAN Not. 47, 8 (Feb. 2012), 117--128. Google ScholarDigital Library
- Bertrand Meyer. 1997. Object-oriented Software Construction (2nd Ed.). Prentice-Hall, Inc. Google ScholarDigital Library
- Perhaad Mistry, Dana Schaa, Byunghyun Jang, David Kaeli, Albert Dvornik, and Dwight Meglan. 2011. Data Structures and Transformations for Physically Based Simulation on a GPU. In High Performance Computing for Computational Science -- VECPAR 2010: 9th Int. Conference, Revised Selected Papers. Springer-Verlag, 162--171. Google ScholarDigital Library
- Matt Pharr and William R. Mark. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Innovative Parallel Computing (InPar). IEEE, 1--13.Google Scholar
- Viera K. Proulx. 1998. Traffic Simulation: A Case Study for Teaching Object Oriented Design (SIGCSE '98). ACM, 48--52. Google ScholarDigital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not. 48, 6 (June 2013), 519--530. Google ScholarDigital Library
- P. Richmond, S. Coakley, and D. M. Romano. 2009. A High Performance Agent Based Modelling Framework on Graphics Card Hardware with CUDA (AAMAS '09). International Foundation for Autonomous Agents and Multiagent Systems, 1125--1126. Google ScholarDigital Library
- Tiark Rompf, Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. Building-Blocks for Performance Oriented DSLs (DSL '11). 93--117.Google Scholar
- Alban Rousset, Bénédicte Herrmann, Christophe Lang, and Laurent Philippe. 2016. A survey on parallel and distributed multi-agent systems for high performance computing simulations. Computer Science Review 22, Supplement C (2016), 27--46.Google ScholarCross Ref
- Jakob Siegel, Juergen Ributzka, and Xiaoming Li. 2009. CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator (ICPPW '09). IEEE Computer Society, 174--181. Google ScholarDigital Library
- Matthias Springer and Hidehiko Masuhara. 2016. Object Support in an Array-based GPGPU Extension for Ruby (ARRAY 2016). ACM, 25--31. Google ScholarDigital Library
- Benedikt Stefansson. 2000. Simulating Economic Agents in Swarm. In Economic Simulations in Swarm: Agent-Based Modelling and Object Oriented Programming. Springer US, 3--61.Google Scholar
- Bjarne Stroustrup. 2012. Foundations of C++ (ESOP 2012). Springer-Verlag, 1--25. Google ScholarDigital Library
- Robert Strzodka. 2012. Chapter 31 - Abstraction for AoS and SoA Layout in C++. In GPU Computing Gems Jade Edition, Wen-mei W. Hwu (Ed.). Morgan Kaufmann, 429--441.Google Scholar
- Robert Strzodka. 2012. Data Layout Optimization for Multi-valued Containers in OpenCL. J. Parallel Distrib. Comput. 72, 9 (Sept. 2012). Google ScholarDigital Library
- Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embed. Comput. Syst. 13, 4s, Article 134 (April 2014), 25 pages. Google ScholarDigital Library
- Nicolas Weber and Michael Goesele. 2014. Auto-tuning Complex Array Layouts for GPUs (PGV '14). Eurographics Association, 57--64. Google ScholarDigital Library
- Jianlong Zhong and Bingsheng He. 2013. Parallel Graph Processing on Graphics Processors Made Easy. Proc. VLDB Endow. 6, 12 (Aug. 2013), 1270--1273. Google ScholarDigital Library
Index Terms
- Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout
Recommendations
Inner array inlining for structure of arrays layout
ARRAY 2018: Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array ProgrammingPrevious work has shown how the well-studied and SIMD-friendly Structure of Arrays (SOA) data layout strategy can speed up applications in high-performance computing compared to a traditional Array of Structures (AOS) data layout. However, a standard ...
What Is Object-Oriented Programming?
The meaning of the term 'object oriented' is examined in the context of the general-purpose programming language C++. This choice is made partly to introduce C++ and partly because C++ is one of the few languages that supports data abstraction, object-...
Methodology first and language second: a way to teach object-oriented programming
OOPSLA '03: Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applicationsC++ is a very successful object-oriented language. It is a required language for more and more students. It takes great effort and practice for these students to learn how to program in C++ and how to make object-oriented programs. One potential failure ...
Comments