tutorial

NOVA: A Functional Language for Data Parallelism

Authors:

Alexander Collins,

Adriana SusneaAuthors Info & Claims

ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

Pages 8 - 13

https://doi.org/10.1145/2627373.2627375

Published: 09 June 2014 Publication History

Abstract

Functional languages provide a solid foundation on which complex optimization passes can be designed to exploit parallelism available in the underlying system. Their mathematical foundations enable high-level optimizations that would be impossible in traditional imperative languages. This makes them uniquely suited for generation of efficient target code for parallel systems, such as multiple Central Processing Units (CPUs) or highly data-parallel Graphics Processing Units (GPUs). Such systems are becoming the mainstream for scientific and commodity desktop computing.

Writing performance portable code for such systems using low-level languages requires significant effort from a human expert. This paper presents NOVA, a functional language and compiler for multi-core CPUs and GPUs. The NOVA language is a polymorphic, statically-typed functional language with a suite of higher-order functions which are used to express parallelism. These include map, reduce and scan. The NOVA compiler is a light-weight, yet powerful, optimizing compiler. It generates code for a variety of target platforms that achieve performance comparable to competing languages and tools, including hand-optimized code. The NOVA compiler is stand-alone and can be easily used as a target for higher-level or domain specific languages or embedded in other applications.

We evaluate NOVA against two competing approaches: the Thrust library and hand-written CUDA C. NOVA achieves comparable performance to these approaches across a range of benchmarks. NOVA-generated code also scales linearly with the number of processor cores across all compute-bound benchmarks.

References

[1]

The OpenACC application programming interface, 2011. URL http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf. Version 1.0.

[2]

The OpenCL specification version 1.2, 2011. URL http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.

[3]

Intel Threading Building Blocks reference manual, 2011. URL http://software.intel.com/sites/products/documentation/hpc/tbb/referencev2.pdf.

[4]

CUDA C programming guide version 4.1, 2012. URL http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf.

[5]

N. Bell and M. Garland. Cusp: Generic parallel algorithms for sparse matrix and graph computations, 2012. Version 0.3.0.

[6]

N. Bell and J. Hoberock. Thrust: A productivity-orientied library for CUDA. In GPU Computing Gems: Jade Edition. 2011.

[7]

L. Bergstrom and J. Reppy. Nested data-parallelism on the gpu. In ICFP, 2012.

Digital Library

[8]

G. E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, 1990.

[9]

G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput., 21(1):4--14, 1994.

Digital Library

[10]

B. C. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In PPOPP, 2011.

Digital Library

[11]

M. M. T. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore GPUs. In DAMP, 2011.

Digital Library

[12]

R. Dolbeau, S. Bihan, and F. Bodin. HMPP: A hybrid multi-core parallel programming environment. In Workshop on General Purpose Processing Using GPUs, 2007.

[13]

J.-Y. Girard. Interprétation fonctionelle et élimination des coupures de l'arithmétique d'ordre supérieur. PhD thesis, Université Paris VII, 1972.

[14]

T. Johnsson. Lambda lifting: Transforming programs to recursive equations. 1985.

[15]

S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the multicores: Nested data parallelism in haskell. In FSTTCS, 2008.

[16]

D. R. Kincaid, J. R. Respess, and D. M. Young. ITPACK 2.0 user's guide. Technical Report CNA-150, Center for Numerical Analysis, University of Texas, Austin, Texas, 1979.

[17]

R. Milner. A theory of type polymorphism in programming. Journal of Computer and System Science, 1978.

[18]

J. C. Reynolds. Towards a theory of type structure. In Programming Symposium, Proceedings Colloque sur la Programmation, pages 408--423, 1974.

Digital Library

[19]

S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In Graphics Hardware, 2007.

Digital Library

[20]

M. Wolfe. Implementing the PGI accelerator model. In GPGPU, 2010.

Digital Library

[21]

Y. Zhang and F. Mueller. Cunesl: Compiling nested data-parallel languages for simt architectures. In ICPP, 2012.

Digital Library

Cited By

van Balen DKeller Gde Wolff IMcDonell TRainey MScholz S(2024)Fusing Gathers with Integer Linear ProgrammingProceedings of the 1st ACM SIGPLAN International Workshop on Functional Programming for Productivity and Performance10.1145/3677997.3678227(10-23)Online publication date: 28-Aug-2024
https://dl.acm.org/doi/10.1145/3677997.3678227
Tang SCui MQi LXu X(2023)Edge Intelligence with Distributed Processing of DNNs: A SurveyComputer Modeling in Engineering & Sciences10.32604/cmes.2023.023684136:1(5-42)Online publication date: 2023
https://doi.org/10.32604/cmes.2023.023684
Arora JWestrick SAcar U(2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591284
Show More Cited By

Index Terms

NOVA: A Functional Language for Data Parallelism
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of ...
Panda: A Compiler Framework for Concurrent CPU$$+$$+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of ...
An OpenCL micro-benchmark suite for GPUs and CPUs

Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

June 2014

112 pages

ISBN:9781450329378

DOI:10.1145/2627373

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

PLDI '14

Sponsor:

SIGPLAN

PLDI '14: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 9 - 11, 2014

Edinburgh, United Kingdom

Acceptance Rates

ARRAY'14 Paper Acceptance Rate 17 of 25 submissions, 68%;

Overall Acceptance Rate 17 of 25 submissions, 68%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
320
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

van Balen DKeller Gde Wolff IMcDonell TRainey MScholz S(2024)Fusing Gathers with Integer Linear ProgrammingProceedings of the 1st ACM SIGPLAN International Workshop on Functional Programming for Productivity and Performance10.1145/3677997.3678227(10-23)Online publication date: 28-Aug-2024
https://dl.acm.org/doi/10.1145/3677997.3678227
Tang SCui MQi LXu X(2023)Edge Intelligence with Distributed Processing of DNNs: A SurveyComputer Modeling in Engineering & Sciences10.32604/cmes.2023.023684136:1(5-42)Online publication date: 2023
https://doi.org/10.32604/cmes.2023.023684
Arora JWestrick SAcar U(2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591284
Gurdeep Singh RScholliers C(2023)GaiwanScience of Computer Programming10.1016/j.scico.2023.102989230:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.scico.2023.102989
Coelho RTanus FMoreira ANazar G(2020)ACQuA: A Parallel Accelerator Architecture for Pure Functional Programs2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI49217.2020.00070(346-351)Online publication date: Jul-2020
https://doi.org/10.1109/ISVLSI49217.2020.00070
Baghdadi RRay JRomdhane MDel Sozzo EAkkas AZhang YSuriana PKamil SAmarasinghe SKandemir MJimborean AMoseley T(2019)Tiramisu: a polyhedral compiler for expressing fast and portable codeProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314896(193-205)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314896
Rink NCastrillon JGibbons J(2019)TeIL: a type-safe imperative tensor intermediate languageProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329959(57-68)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3315454.3329959
Wu SDong XZhang XZhu Z(2019)NoTThe Journal of Supercomputing10.1007/s11227-019-02749-175:7(3810-3841)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s11227-019-02749-1
Ginsbach PRemmelg TSteuwer MBodin BDubach CO'Boyle M(2018)Automatic Matching of Legacy Code to Heterogeneous APIsACM SIGPLAN Notices10.1145/3296957.317318253:2(139-153)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173182
Ginsbach PRemmelg TSteuwer MBodin BDubach CO'Boyle MShen XTuck JBianchini RSarkar V(2018)Automatic Matching of Legacy Code to Heterogeneous APIsProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173182(139-153)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3173162.3173182
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten