research-article

PARRAY: a unifying array representation for heterogeneous parallelism

Authors:

Hong MeiAuthors Info & Claims

PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Pages 171 - 180

https://doi.org/10.1145/2145816.2145838

Published: 25 February 2012 Publication History

Abstract

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends mainstream C programming with novel array types of distinct features: 1) the dimensions of an array type are nested in a tree, conceptually reflecting the memory hierarchy; 2) the definition of an array type may contain references to other array types, allowing sophisticated array types to be created for parallelization; 3) threads also form arrays that allow programming in a Single-Program-Multiple-Codeblock (SPMC) style to unify various sophisticated communication patterns. This leads to shorter, more portable and maintainable parallel codes, while the programmer still has control over performance-related features necessary for deep manual optimization. Although the source-to-source code generator only faithfully generates low-level library calls according to the type information, higher-level programming and automatic performance optimization are still possible through building libraries of sub-programs on top of PARRAY. The case study on cluster FFT illustrates a simple 30-line code that 2x outperforms Intel Cluster MKL on the Tianhe-1A system with 7168 Fermi GPUs and 14336 CPUs.

References

[1]

References

[2]

CUDA CUFFT Library, Version 2.3. NVIDIA Corp., 2009.

[3]

N. Akira and M. Satoshi. Auto-tuning 3D FFT library for cuda GPUs. In SC'09, pages 1--10. ACM, 2009.

Digital Library

[4]

K. Brown and et al. A heterogeneous parallel framework for domainspecific languages. In PACT'11, 2011.

Digital Library

[5]

H. Chafi and et al. A domain-specific approach to heterogeneous parallelism. In PPoPP'11, 2011.

Digital Library

[6]

B. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the Chapel language. IJHPCA, 21(3):291--312, 2007.

Digital Library

[7]

B. Chamberlain and et al. The high-level parallel language ZPL improves productivity and performance. In IJHPCA'04, 2004.

[8]

P. Charles and et al. X10: An object-oriented approach to nonuniform cluster computing. In OOPSLA'05, 2005.

Digital Library

[9]

Y. Chen, X. Cui, and H. Mei. Large-scale FFT on GPU clusters. In ACM Inter. Conf. on Supercomputing (ICS'10), pages 50--59, 2010.

Digital Library

[10]

M. Fatica. Accelerating linpack with CUDA on heterogenous clusters. GPGPU'09, June 2009.

Digital Library

[11]

B. Francois. Incremental migration of C and Fortran applications to GPGPU using HMPP. Technical report, hipeac, 2010.

[12]

B. Ganesh and et al. Programming for parallelism and locality with hierarchically tiled arrays. In PPoPP'06, pages 48--57, 2006.

Digital Library

[13]

N. Govindaraju and et al. High performance discrete fourier transforms on graphics processors. SC'08, November 2008.

Digital Library

[14]

G. Hains and L. M. R. Mullin. Parallel functional programming with arrays. Comput. J., 36(3):238--245, 1993.

[15]

C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985.

Digital Library

[16]

C. A. R. Hoare and et al. Laws of programming. Communications of the ACM, 30(8):672--686, 1987.

Digital Library

[17]

C. A. R. Hoare and J. He. Unifying Theories of Programming. Prentice Hall, 1998.

[18]

J. J. Nieplocha, R. J. Harrison, and R. J. Littlefield. Global arrays: A nonuniform memory access programming model for highperformance computers. The Journal of Supercomputing, 10(2), 1996.

Digital Library

[19]

B. J'onsson and A. Tarski. Boolean algebras with operators, part I. American Journal of Mathematics, 73:891--939, 1951.

[20]

K. Kandalla and et al. High-performance and scalable non-blocking All-to-All with collective offload on infiniband clusters: A study with parallel 3D FFT. In ISC'11, 2011.

[21]

A. Nukada and et al. Bandwidth intensive 3-D FFT kernel for GPUs using cuda. In SC'08, pages 1--11, 2008.

Digital Library

[22]

R. Numerich and J. Reid. Co-Array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1C31, 1998.

Digital Library

[23]

D. Pekurovsky. http://www.sdsc.edu/us/resources/p3dfft.php.

[24]

V. Volkov and B. Kazian. Fitting FFT onto the G80 architecture. http://www.cs.berkeley.edu/, May 2008.

[25]

K. Yelick and et al. Titanium: A high-performance Java dialect. In In ACM, pages 10--11, 1998.

[26]

Y. Zheng and et al. Extending Unified Parallel C for GPU computing. In SIAM Conf on Parallel Processing for Scientific Computing, 2010

Cited By

Cui XLi XChen Y(2020)The transplantation technology of communication intensive applications on heterogeneous clustersTransactions on Emerging Telecommunications Technologies10.1002/ett.405131:12Online publication date: 22-Dec-2020
https://dl.acm.org/doi/10.1002/ett.4051
Cho HKwon OMidkiff S(2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
https://doi.org/10.1007/978-3-030-34627-0_13
Zhang SWang BChen Y(2018)Abstract Parallel Array Types and Ghost Cell Update ImplementationAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05051-1_37(532-541)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-3-030-05051-1_37
Show More Cited By

Index Terms

PARRAY: a unifying array representation for heterogeneous parallelism
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Parallel programming languages

Recommendations

PARRAY: a unifying array representation for heterogeneous parallelism
PPOPP '12

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining ...
The STAPL pArray
MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture

The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming framework that extends C++ and STL with support for parallelism. STAPL provides parallel data structures (pContainers) and generic parallel algorithms (pAlgorithms), and a ...
Using distributed memory parallel computers and GPU clusters for multidimensional Monte Carlo integration

The aim of this paper is to show that the multidimensional Monte Carlo integration can be efficiently implemented on various distributed memory parallel computers and clusters of multicore nodes using recently developed parallel versions of linear ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

February 2012

352 pages

ISBN:9781450311601

DOI:10.1145/2145816

General Chair:
J. Ramanujam
Louisiana State University, USA
,
Program Chair:
P. Sadayappan
The Ohio State University, USA

ACM SIGPLAN Notices Volume 47, Issue 8
PPOPP '12
August 2012
334 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2370036
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '12

Sponsor:

SIGPLAN

PPoPP '12: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 25 - 29, 2012

Louisiana, New Orleans, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
403
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cui XLi XChen Y(2020)The transplantation technology of communication intensive applications on heterogeneous clustersTransactions on Emerging Telecommunications Technologies10.1002/ett.405131:12Online publication date: 22-Dec-2020
https://dl.acm.org/doi/10.1002/ett.4051
Cho HKwon OMidkiff S(2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
https://doi.org/10.1007/978-3-030-34627-0_13
Zhang SWang BChen Y(2018)Abstract Parallel Array Types and Ghost Cell Update ImplementationAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05051-1_37(532-541)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-3-030-05051-1_37
Viñas MFraguela BAndrade DDoallo R(2017)High productivity multi-device exploitation with the Heterogeneous Programming LibraryJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.11.001101:C(51-68)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.jpdc.2016.11.001
Chen YCui XMei H(2015)Tiles: a new language mechanism for heterogeneous parallelismACM SIGPLAN Notices10.1145/2858788.268855550:8(287-288)Online publication date: 24-Jan-2015
https://dl.acm.org/doi/10.1145/2858788.2688555
Chen YCui XMei HCohen AGrove D(2015)Tiles: a new language mechanism for heterogeneous parallelismProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688555(287-288)Online publication date: 24-Jan-2015
https://dl.acm.org/doi/10.1145/2688500.2688555
Cui XLi XChen YBalaji PXu C(2015)Programming heterogeneous systems with array typesProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.113(1067-1074)Online publication date: 4-May-2015
https://dl.acm.org/doi/10.1109/CCGrid.2015.113
Moreton-Fernandez AGonzalez-Escribano ALlanos D(2014)Exploiting distributed and shared memory hierarchies with Hitmap2014 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2014.6903696(278-286)Online publication date: Jul-2014
https://doi.org/10.1109/HPCSim.2014.6903696
Cui XLi XChen Y(2014)Cross-Platform Parallel Programming in Parray: A Case StudyAdvanced Information Systems Engineering10.1007/978-3-662-44917-2_57(579-582)Online publication date: 2014
https://doi.org/10.1007/978-3-662-44917-2_57
Brown N(2014)A Type-Oriented Graph500 BenchmarkProceedings of the 29th International Conference on Supercomputing - Volume 848810.1007/978-3-319-07518-1_31(460-469)Online publication date: 22-Jun-2014
https://dl.acm.org/doi/10.1007/978-3-319-07518-1_31
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten