skip to main content
10.1145/2925426.2926285acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications

Published: 01 June 2016 Publication History

Abstract

Applying SIMD parallelization to irregular applications with non-continuous and data-dependent memory accesses is challenging. While an application involving a static pattern of indirect accesses (across iterations) can be accelerated by data transformations, such techniques are no longer feasible if the indirect access patterns change over time. In this paper, we propose an indexing method that facilitates the reuse of data reorganization for efficient SIMD parallelization of dynamic irregular applications. This indexing approach is first applied on a class of vertex-centric graph algorithms where the set of active vertices varies over the execution -- the indexing method helps maintain the set of active edges. Next, we focus on unstructured particle interaction applications in which the edges change adaptively, and present an incremental indexing method. In our experimental evaluation, the speedups achieved by utilizing SIMD on graph applications range from 3.04× to 7.19×, and between 2.54× to 4.43× for molecular dynamics.

References

[1]
http://www.top500.org/lists/2015/11/.
[2]
G. Agrawal and J. Saltz. Interprocedural compilation of irregular applications for distributed memory machines. SC '95, 1995.
[3]
L. Chen, P. Jiang, and G. Agrawal. Exploiting recent simd architectural advances for irregular applications. CGO '16, 2016.
[4]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. SIGPLAN Not., may 1999.
[5]
E. Gutiérrez, O. Plata, and E. L. Zapata. Balanced, locality-based parallel irregular reductions. In Languages and Compilers for Parallel Computing. 2003.
[6]
H. Han and C.-W. Tseng. Improving compiler and run-time support for irregular reductions using local writes. In Languages and Compilers for Parallel Computing. 1999.
[7]
H. Han and C.-W. Tseng. Efficient compiler and run-time support for parallel irregular reductions. Parallel Computing, 2000.
[8]
P. Harish and P. J. Narayanan. Accelerating large graph algorithms on the gpu using cuda. HiPC'07, 2007.
[9]
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating cuda graph algorithms at maximum warp. SIGPLAN Not., 46(8), Feb. 2011.
[10]
Y.-S. Hwang, B. Moon, S. D. Sharma, R. Ponnusamy, R. Das, and J. H. Saltz. Runtime and language support for compiling adaptive irregular programs. 25(6):597--621, June 1995.
[11]
F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. Cusha: Vertex-centric graph processing on gpus. HPDC '14.
[12]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. OSDI'12, 2012.
[13]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014.
[14]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. SIGMOD '10, 2010.
[15]
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications using data and computation reorderings. Int. J. Parallel Program., June 2001.
[16]
D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. SIGPLAN Not., Feb. 2012.
[17]
S. J. Pennycook, C. J. Hughes, M. Smelyanskiy, and S. A. Jarvis. Exploring simd for molecular dynamics, using intel® xeon® processors and intel® xeon phi coprocessors. IPDPS '13.
[18]
S. Salihoglu and J. Widom. Gps: A graph processing system. 2013.
[19]
L. Thébault, E. Petit, and Q. Dinh. Scalable and efficient implementation of 3d unstructured meshes computation: A case study on matrix assembly. PPoPP 2015, 2015.
[20]
B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. PPoPP '13, 2013.

Cited By

View all
  • (2023)Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph TransformationsACM Transactions on Architecture and Code Optimization10.1145/363170921:1(1-25)Online publication date: 9-Nov-2023
  • (2020)Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative SpeculationACM Transactions on Parallel Computing10.1145/33997147:3(1-26)Online publication date: 21-Jun-2020
  • (2020)GraptorProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392753(1-13)Online publication date: 29-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '16: Proceedings of the 2016 International Conference on Supercomputing
June 2016
547 pages
ISBN:9781450343619
DOI:10.1145/2925426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICS '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph TransformationsACM Transactions on Architecture and Code Optimization10.1145/363170921:1(1-25)Online publication date: 9-Nov-2023
  • (2020)Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative SpeculationACM Transactions on Parallel Computing10.1145/33997147:3(1-26)Online publication date: 21-Jun-2020
  • (2020)GraptorProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392753(1-13)Online publication date: 29-Jun-2020
  • (2020)Parallelizing pruned landmark labelingProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392745(1-13)Online publication date: 29-Jun-2020
  • (2020)A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUsProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374546(376-388)Online publication date: 19-Feb-2020
  • (2018)GraphphiProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243205(1-14)Online publication date: 1-Nov-2018
  • (2018)Conflict-free vectorization of associative irregular applications with recent SIMD architectural advancesProceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 201810.1145/3179541.3168827(175-187)Online publication date: 2018
  • (2018)Conflict-free vectorization of associative irregular applications with recent SIMD architectural advancesProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168827(175-187)Online publication date: 24-Feb-2018
  • (2018)Evaluating Scalability Bottlenecks by Workload Extrapolation2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2018.00039(333-347)Online publication date: Sep-2018
  • (2017)Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative SpeculationACM SIGPLAN Notices10.1145/3155284.301876052:8(179-191)Online publication date: 26-Jan-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media