skip to main content
10.1145/2145816.2145838acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

PARRAY: a unifying array representation for heterogeneous parallelism

Published: 25 February 2012 Publication History

Abstract

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends mainstream C programming with novel array types of distinct features: 1) the dimensions of an array type are nested in a tree, conceptually reflecting the memory hierarchy; 2) the definition of an array type may contain references to other array types, allowing sophisticated array types to be created for parallelization; 3) threads also form arrays that allow programming in a Single-Program-Multiple-Codeblock (SPMC) style to unify various sophisticated communication patterns. This leads to shorter, more portable and maintainable parallel codes, while the programmer still has control over performance-related features necessary for deep manual optimization. Although the source-to-source code generator only faithfully generates low-level library calls according to the type information, higher-level programming and automatic performance optimization are still possible through building libraries of sub-programs on top of PARRAY. The case study on cluster FFT illustrates a simple 30-line code that 2x outperforms Intel Cluster MKL on the Tianhe-1A system with 7168 Fermi GPUs and 14336 CPUs.

References

[1]
References
[2]
CUDA CUFFT Library, Version 2.3. NVIDIA Corp., 2009.
[3]
N. Akira and M. Satoshi. Auto-tuning 3D FFT library for cuda GPUs. In SC'09, pages 1--10. ACM, 2009.
[4]
K. Brown and et al. A heterogeneous parallel framework for domainspecific languages. In PACT'11, 2011.
[5]
H. Chafi and et al. A domain-specific approach to heterogeneous parallelism. In PPoPP'11, 2011.
[6]
B. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the Chapel language. IJHPCA, 21(3):291--312, 2007.
[7]
B. Chamberlain and et al. The high-level parallel language ZPL improves productivity and performance. In IJHPCA'04, 2004.
[8]
P. Charles and et al. X10: An object-oriented approach to nonuniform cluster computing. In OOPSLA'05, 2005.
[9]
Y. Chen, X. Cui, and H. Mei. Large-scale FFT on GPU clusters. In ACM Inter. Conf. on Supercomputing (ICS'10), pages 50--59, 2010.
[10]
M. Fatica. Accelerating linpack with CUDA on heterogenous clusters. GPGPU'09, June 2009.
[11]
B. Francois. Incremental migration of C and Fortran applications to GPGPU using HMPP. Technical report, hipeac, 2010.
[12]
B. Ganesh and et al. Programming for parallelism and locality with hierarchically tiled arrays. In PPoPP'06, pages 48--57, 2006.
[13]
N. Govindaraju and et al. High performance discrete fourier transforms on graphics processors. SC'08, November 2008.
[14]
G. Hains and L. M. R. Mullin. Parallel functional programming with arrays. Comput. J., 36(3):238--245, 1993.
[15]
C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985.
[16]
C. A. R. Hoare and et al. Laws of programming. Communications of the ACM, 30(8):672--686, 1987.
[17]
C. A. R. Hoare and J. He. Unifying Theories of Programming. Prentice Hall, 1998.
[18]
J. J. Nieplocha, R. J. Harrison, and R. J. Littlefield. Global arrays: A nonuniform memory access programming model for highperformance computers. The Journal of Supercomputing, 10(2), 1996.
[19]
B. J'onsson and A. Tarski. Boolean algebras with operators, part I. American Journal of Mathematics, 73:891--939, 1951.
[20]
K. Kandalla and et al. High-performance and scalable non-blocking All-to-All with collective offload on infiniband clusters: A study with parallel 3D FFT. In ISC'11, 2011.
[21]
A. Nukada and et al. Bandwidth intensive 3-D FFT kernel for GPUs using cuda. In SC'08, pages 1--11, 2008.
[22]
R. Numerich and J. Reid. Co-Array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1C31, 1998.
[23]
D. Pekurovsky. http://www.sdsc.edu/us/resources/p3dfft.php.
[24]
V. Volkov and B. Kazian. Fitting FFT onto the G80 architecture. http://www.cs.berkeley.edu/, May 2008.
[25]
K. Yelick and et al. Titanium: A high-performance Java dialect. In In ACM, pages 10--11, 1998.
[26]
Y. Zheng and et al. Extending Unified Parallel C for GPU computing. In SIAM Conf on Parallel Processing for Scientific Computing, 2010

Cited By

View all
  • (2020)The transplantation technology of communication intensive applications on heterogeneous clustersTransactions on Emerging Telecommunications Technologies10.1002/ett.405131:12Online publication date: 22-Dec-2020
  • (2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
  • (2018)Abstract Parallel Array Types and Ghost Cell Update ImplementationAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05051-1_37(532-541)Online publication date: 7-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
February 2012
352 pages
ISBN:9781450311601
DOI:10.1145/2145816
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 8
    PPOPP '12
    August 2012
    334 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2370036
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. array representation
  2. gpu clusters
  3. heterogeneous parallelism
  4. parallel programming

Qualifiers

  • Research-article

Conference

PPoPP '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)The transplantation technology of communication intensive applications on heterogeneous clustersTransactions on Emerging Telecommunications Technologies10.1002/ett.405131:12Online publication date: 22-Dec-2020
  • (2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
  • (2018)Abstract Parallel Array Types and Ghost Cell Update ImplementationAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05051-1_37(532-541)Online publication date: 7-Dec-2018
  • (2017)High productivity multi-device exploitation with the Heterogeneous Programming LibraryJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.11.001101:C(51-68)Online publication date: 1-Mar-2017
  • (2015)Tiles: a new language mechanism for heterogeneous parallelismACM SIGPLAN Notices10.1145/2858788.268855550:8(287-288)Online publication date: 24-Jan-2015
  • (2015)Tiles: a new language mechanism for heterogeneous parallelismProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688555(287-288)Online publication date: 24-Jan-2015
  • (2015)Programming heterogeneous systems with array typesProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.113(1067-1074)Online publication date: 4-May-2015
  • (2014)Exploiting distributed and shared memory hierarchies with Hitmap2014 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2014.6903696(278-286)Online publication date: Jul-2014
  • (2014)Cross-Platform Parallel Programming in Parray: A Case StudyAdvanced Information Systems Engineering10.1007/978-3-662-44917-2_57(579-582)Online publication date: 2014
  • (2014)A Type-Oriented Graph500 BenchmarkProceedings of the 29th International Conference on Supercomputing - Volume 848810.1007/978-3-319-07518-1_31(460-469)Online publication date: 22-Jun-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media