research-article

Inner array inlining for structure of arrays layout

Authors:

Matthias Springer,

Hidehiko MasuharaAuthors Info & Claims

ARRAY 2018: Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

Pages 50 - 58

https://doi.org/10.1145/3219753.3219760

Published: 19 June 2018 Publication History

Abstract

Previous work has shown how the well-studied and SIMD-friendly Structure of Arrays (SOA) data layout strategy can speed up applications in high-performance computing compared to a traditional Array of Structures (AOS) data layout. However, a standard SOA layout cannot handle structures with inner arrays; such structures appear frequently in graph-based applications and object-oriented designs with associations of high multiplicity.

This work extends the SOA data layout to structures with array-typed fields. We present different techniques for inlining (embedding) inner arrays into an AOS or SOA layout, as well as the design and implementation of an embedded C++/CUDA DSL that lets programmers write such layouts in a notation close to standard C++. We evaluate several layout strategies with a traffic flow simulation, an important real-world application in transport planning.

References

[1]

G. Boeing. 2017. OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems 65 (2017), 126–139.

[2]

J. Dolby and A. Chien. 2000. An Automatic Object Inlining Optimization and Its Evaluation (PLDI ’00). ACM, 345–357.

Digital Library

[3]

J. Esser and M. Schreckenberg. 1997. Microscopic Simulation of Urban Traffic Based on Cellular Automata. Int. J. Mod. Phys. C 08, 05 (1997), 1025–1036.

[4]

N. Faria, R. Silva, and J. L. Sobral. 2013. Impact of Data Structure Layout on Performance. In 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing . IEEE, 116–120.

Digital Library

[5]

P. Harish and P. J. Narayanan. 2007. Accelerating Large Graph Algorithms on the GPU Using CUDA. In High Performance Computing – HiPC 2007 . Springer Berlin Heidelberg, 197–208.

Digital Library

[6]

M. Harris. 2007. Optimizing CUDA. (2007). Supercomputing Tutorial.

[7]

H. Homann and F. Laenen. 2018. SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes. Comput. Phys. Commun. 224 (2018), 325–332.

[8]

S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. 2011. Accelerating CUDA Graph Algorithms at Maximum Warp (PPoPP ’11). ACM, 267– 276.

Digital Library

[9]

K. Kofler, B. Cosenza, and T. Fahringer. 2015. Automatic Data Layout Optimizations for GPUs (Euro-Par 2015). Springer, 263–274.

[10]

P. Korček, L. Sekanina, and O. Fučík. 2011. Cellular automata based traffic simulation accelerated on GPU (MENDEL2011). Institute of Automation and Computer Science FME BUT, 395–402.

[11]

A. S. D. Lee and T. S. Abdelrahman. 2017. Launch-Time Optimization of OpenCL GPU Kernels. In Proceedings of the General Purpose GPUs (GPGPU-10) . ACM, 32–41.

Digital Library

[12]

J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. 2008. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. CoRR (Oct. 2008).

[13]

L. Luo, M. Wong, and W. Hwu. 2010. An Effective GPU Implementation of Breadth-first Search (DAC ’10). ACM, 52–55.

Digital Library

[14]

S. Maerivoet and B. De Moor. 2005. Transportation Planning and Traffic Flow Models. ArXiv Physics e-prints (July 2005).

[15]

G. Malewicz, M. H. Austern, A. J.C Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing (SIGMOD ’10). ACM, 135–146.

Digital Library

[16]

D. Merrill, M. Garland, and A. Grimshaw. 2015. High-Performance and Scalable GPU Graph Traversal. ACM Trans. Parallel Comput. 1, 2, Article 14 (Feb. 2015), 30 pages.

Digital Library

[17]

K. Nagel and M. Schreckenberg. 1992. A cellular automaton model for freeway traffic. J. Phys. I France 2, 12 (1992), 2221–2229.

[18]

J. Nickolls, I. Buck, M. Garland, and K. Skadron. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (March 2008), 40–53.

Digital Library

[19]

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. 2008. GPU Computing. Proc. IEEE 96, 5 (May 2008), 879–899.

[20]

M. Pharr and W. R. Mark. 2012. ispc: A SPMD compiler for highperformance CPU programming. In Innovative Parallel Computing. IEEE, 1–13.

[21]

V. K. Proulx. 1998. Traffic Simulation: A Case Study for Teaching Object Oriented Design (SIGCSE ’98). ACM, 48–52.

Digital Library

[22]

M. Springer and H. Masuhara. 2018. Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout (WPMVP ’18) . ACM, Article 6, 9 pages.

Digital Library

[23]

D. Strippgen and K. Nagel. 2009. Multi-agent traffic simulation with CUDA (HPCS ’09). IEEE, 106–114.

[24]

D. Strippgen and K. Nagel. 2009. Using Common Graphics Hardware for Multi-agent Traffic Simulation with CUDA (Simutools ’09). ICST, Article 62, 8 pages.

Digital Library

[25]

R. Strzodka. 2012. Ch. 31 - Abstraction for AoS and SoA Layout in C++. In GPU Computing Gems Jade Edition. Morgan Kaufmann, 429–441.

[26]

J. Wahle, J. Esser, L. Neubert, and M. Schreckenberg. 1998. A Cellular Automaton Traffic Flow Model for Online-Simulation of Urban Traffic. In Cellular Automata: Research Towards Industry. Springer, 185–193.

[27]

J. Wahle, L. Neubert, J. Esser, and M. Schreckenberg. 2001. A cellular automaton traffic flow model for online simulation of traffic. Parallel Comput. 27, 5 (2001), 719–735. Cellular automata: From modeling to applications.

Digital Library

[28]

N. Weber and M. Goesele. 2014. Auto-tuning Complex Array Layouts for GPUs (PGV ’14). Eurographics Association, 57–64.

Digital Library

[29]

C. Wimmer and H. Mössenböck. 2008. Automatic Array Inlining in Java Virtual Machines (CGO ’08). ACM, 14–23.

Digital Library

Cited By

Li JCao WDong XLi GWang XZhao PLiu LFeng X(2021)Compiler-assisted Operator Template Library for DNN AcceleratorsInternational Journal of Parallel Programming10.1007/s10766-021-00701-6Online publication date: 25-Mar-2021
https://doi.org/10.1007/s10766-021-00701-6
Li JCao WDong XLi GWang XLiu LFeng X(2021)Compiler-Assisted Operator Template Library for DNN AcceleratorsNetwork and Parallel Computing10.1007/978-3-030-79478-1_1(3-16)Online publication date: 23-Jun-2021
https://doi.org/10.1007/978-3-030-79478-1_1
Tasos AFranco JDrossopoulou SWrigstad TEisenbach S(2020)Reshape your layouts, not your programs: A safe language extension for better cache localityScience of Computer Programming10.1016/j.scico.2020.102481(102481)Online publication date: May-2020
https://doi.org/10.1016/j.scico.2020.102481

Index Terms

Inner array inlining for structure of arrays layout
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Data types and structures
      2. Language types
        Object oriented languages
        Parallel programming languages

Recommendations

Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout
WPMVP'18: Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing

Structure of Arrays (SOA) is a well-studied data layout technique for SIMD architectures. Previous work has shown that it can speed up applications in high-performance computing by several factors compared to a traditional Array of Structures (AOS) ...
Data layout optimization for multi-valued containers in OpenCL

Scientific data is mostly multi-valued, e.g., coordinates, velocities, moments or feature components, and it comes in large quantities. The data layout of such containers has an enormous impact on the achieved performance, however, layout optimization ...
DOA estimation in conformal arrays based on the nested array principles

The nested array structure has attracted great attention recently due to its ability in reducing the number of sensors in an array and at the same time preserving the array performance. While a uniform linear array (ULA) can detect at most N - 1 sources ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ARRAY 2018: Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

June 2018

66 pages

ISBN:9781450358521

DOI:10.1145/3219753

General Chairs:
Sven-Bodo Scholz,
Olin Shivers

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Japan Society for the Promotion of Science

Conference

PLDI '18

Sponsor:

SIGPLAN

PLDI '18: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 19, 2018

PA, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 17 of 25 submissions, 68%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
139
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li JCao WDong XLi GWang XZhao PLiu LFeng X(2021)Compiler-assisted Operator Template Library for DNN AcceleratorsInternational Journal of Parallel Programming10.1007/s10766-021-00701-6Online publication date: 25-Mar-2021
https://doi.org/10.1007/s10766-021-00701-6
Li JCao WDong XLi GWang XLiu LFeng X(2021)Compiler-Assisted Operator Template Library for DNN AcceleratorsNetwork and Parallel Computing10.1007/978-3-030-79478-1_1(3-16)Online publication date: 23-Jun-2021
https://doi.org/10.1007/978-3-030-79478-1_1
Tasos AFranco JDrossopoulou SWrigstad TEisenbach S(2020)Reshape your layouts, not your programs: A safe language extension for better cache localityScience of Computer Programming10.1016/j.scico.2020.102481(102481)Online publication date: May-2020
https://doi.org/10.1016/j.scico.2020.102481

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten