skip to main content
10.1145/3219753.3219760acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Inner array inlining for structure of arrays layout

Published: 19 June 2018 Publication History

Abstract

Previous work has shown how the well-studied and SIMD-friendly Structure of Arrays (SOA) data layout strategy can speed up applications in high-performance computing compared to a traditional Array of Structures (AOS) data layout. However, a standard SOA layout cannot handle structures with inner arrays; such structures appear frequently in graph-based applications and object-oriented designs with associations of high multiplicity.
This work extends the SOA data layout to structures with array-typed fields. We present different techniques for inlining (embedding) inner arrays into an AOS or SOA layout, as well as the design and implementation of an embedded C++/CUDA DSL that lets programmers write such layouts in a notation close to standard C++. We evaluate several layout strategies with a traffic flow simulation, an important real-world application in transport planning.

References

[1]
G. Boeing. 2017. OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems 65 (2017), 126–139.
[2]
J. Dolby and A. Chien. 2000. An Automatic Object Inlining Optimization and Its Evaluation (PLDI ’00). ACM, 345–357.
[3]
J. Esser and M. Schreckenberg. 1997. Microscopic Simulation of Urban Traffic Based on Cellular Automata. Int. J. Mod. Phys. C 08, 05 (1997), 1025–1036.
[4]
N. Faria, R. Silva, and J. L. Sobral. 2013. Impact of Data Structure Layout on Performance. In 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing . IEEE, 116–120.
[5]
P. Harish and P. J. Narayanan. 2007. Accelerating Large Graph Algorithms on the GPU Using CUDA. In High Performance Computing – HiPC 2007 . Springer Berlin Heidelberg, 197–208.
[6]
M. Harris. 2007. Optimizing CUDA. (2007). Supercomputing Tutorial.
[7]
H. Homann and F. Laenen. 2018. SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes. Comput. Phys. Commun. 224 (2018), 325–332.
[8]
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. 2011. Accelerating CUDA Graph Algorithms at Maximum Warp (PPoPP ’11). ACM, 267– 276.
[9]
K. Kofler, B. Cosenza, and T. Fahringer. 2015. Automatic Data Layout Optimizations for GPUs (Euro-Par 2015). Springer, 263–274.
[10]
P. Korček, L. Sekanina, and O. Fučík. 2011. Cellular automata based traffic simulation accelerated on GPU (MENDEL2011). Institute of Automation and Computer Science FME BUT, 395–402.
[11]
A. S. D. Lee and T. S. Abdelrahman. 2017. Launch-Time Optimization of OpenCL GPU Kernels. In Proceedings of the General Purpose GPUs (GPGPU-10) . ACM, 32–41.
[12]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. 2008. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. CoRR (Oct. 2008).
[13]
L. Luo, M. Wong, and W. Hwu. 2010. An Effective GPU Implementation of Breadth-first Search (DAC ’10). ACM, 52–55.
[14]
S. Maerivoet and B. De Moor. 2005. Transportation Planning and Traffic Flow Models. ArXiv Physics e-prints (July 2005).
[15]
G. Malewicz, M. H. Austern, A. J.C Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing (SIGMOD ’10). ACM, 135–146.
[16]
D. Merrill, M. Garland, and A. Grimshaw. 2015. High-Performance and Scalable GPU Graph Traversal. ACM Trans. Parallel Comput. 1, 2, Article 14 (Feb. 2015), 30 pages.
[17]
K. Nagel and M. Schreckenberg. 1992. A cellular automaton model for freeway traffic. J. Phys. I France 2, 12 (1992), 2221–2229.
[18]
J. Nickolls, I. Buck, M. Garland, and K. Skadron. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (March 2008), 40–53.
[19]
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. 2008. GPU Computing. Proc. IEEE 96, 5 (May 2008), 879–899.
[20]
M. Pharr and W. R. Mark. 2012. ispc: A SPMD compiler for highperformance CPU programming. In Innovative Parallel Computing. IEEE, 1–13.
[21]
V. K. Proulx. 1998. Traffic Simulation: A Case Study for Teaching Object Oriented Design (SIGCSE ’98). ACM, 48–52.
[22]
M. Springer and H. Masuhara. 2018. Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout (WPMVP ’18) . ACM, Article 6, 9 pages.
[23]
D. Strippgen and K. Nagel. 2009. Multi-agent traffic simulation with CUDA (HPCS ’09). IEEE, 106–114.
[24]
D. Strippgen and K. Nagel. 2009. Using Common Graphics Hardware for Multi-agent Traffic Simulation with CUDA (Simutools ’09). ICST, Article 62, 8 pages.
[25]
R. Strzodka. 2012. Ch. 31 - Abstraction for AoS and SoA Layout in C++. In GPU Computing Gems Jade Edition. Morgan Kaufmann, 429–441.
[26]
J. Wahle, J. Esser, L. Neubert, and M. Schreckenberg. 1998. A Cellular Automaton Traffic Flow Model for Online-Simulation of Urban Traffic. In Cellular Automata: Research Towards Industry. Springer, 185–193.
[27]
J. Wahle, L. Neubert, J. Esser, and M. Schreckenberg. 2001. A cellular automaton traffic flow model for online simulation of traffic. Parallel Comput. 27, 5 (2001), 719–735. Cellular automata: From modeling to applications.
[28]
N. Weber and M. Goesele. 2014. Auto-tuning Complex Array Layouts for GPUs (PGV ’14). Eurographics Association, 57–64.
[29]
C. Wimmer and H. Mössenböck. 2008. Automatic Array Inlining in Java Virtual Machines (CGO ’08). ACM, 14–23.

Cited By

View all
  • (2021)Compiler-assisted Operator Template Library for DNN AcceleratorsInternational Journal of Parallel Programming10.1007/s10766-021-00701-6Online publication date: 25-Mar-2021
  • (2021)Compiler-Assisted Operator Template Library for DNN AcceleratorsNetwork and Parallel Computing10.1007/978-3-030-79478-1_1(3-16)Online publication date: 23-Jun-2021
  • (2020)Reshape your layouts, not your programs: A safe language extension for better cache localityScience of Computer Programming10.1016/j.scico.2020.102481(102481)Online publication date: May-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ARRAY 2018: Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
June 2018
66 pages
ISBN:9781450358521
DOI:10.1145/3219753
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. Data Inlining
  3. Inner Arrays
  4. Object-oriented Programming
  5. SIMD
  6. Structure of Arrays

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 25 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Compiler-assisted Operator Template Library for DNN AcceleratorsInternational Journal of Parallel Programming10.1007/s10766-021-00701-6Online publication date: 25-Mar-2021
  • (2021)Compiler-Assisted Operator Template Library for DNN AcceleratorsNetwork and Parallel Computing10.1007/978-3-030-79478-1_1(3-16)Online publication date: 23-Jun-2021
  • (2020)Reshape your layouts, not your programs: A safe language extension for better cache localityScience of Computer Programming10.1016/j.scico.2020.102481(102481)Online publication date: May-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media