skip to main content
10.1145/2627373.2627375acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
tutorial

NOVA: A Functional Language for Data Parallelism

Published: 09 June 2014 Publication History

Abstract

Functional languages provide a solid foundation on which complex optimization passes can be designed to exploit parallelism available in the underlying system. Their mathematical foundations enable high-level optimizations that would be impossible in traditional imperative languages. This makes them uniquely suited for generation of efficient target code for parallel systems, such as multiple Central Processing Units (CPUs) or highly data-parallel Graphics Processing Units (GPUs). Such systems are becoming the mainstream for scientific and commodity desktop computing.
Writing performance portable code for such systems using low-level languages requires significant effort from a human expert. This paper presents NOVA, a functional language and compiler for multi-core CPUs and GPUs. The NOVA language is a polymorphic, statically-typed functional language with a suite of higher-order functions which are used to express parallelism. These include map, reduce and scan. The NOVA compiler is a light-weight, yet powerful, optimizing compiler. It generates code for a variety of target platforms that achieve performance comparable to competing languages and tools, including hand-optimized code. The NOVA compiler is stand-alone and can be easily used as a target for higher-level or domain specific languages or embedded in other applications.
We evaluate NOVA against two competing approaches: the Thrust library and hand-written CUDA C. NOVA achieves comparable performance to these approaches across a range of benchmarks. NOVA-generated code also scales linearly with the number of processor cores across all compute-bound benchmarks.

References

[1]
The OpenACC application programming interface, 2011. URL http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf. Version 1.0.
[2]
The OpenCL specification version 1.2, 2011. URL http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.
[3]
Intel Threading Building Blocks reference manual, 2011. URL http://software.intel.com/sites/products/documentation/hpc/tbb/referencev2.pdf.
[4]
CUDA C programming guide version 4.1, 2012. URL http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf.
[5]
N. Bell and M. Garland. Cusp: Generic parallel algorithms for sparse matrix and graph computations, 2012. Version 0.3.0.
[6]
N. Bell and J. Hoberock. Thrust: A productivity-orientied library for CUDA. In GPU Computing Gems: Jade Edition. 2011.
[7]
L. Bergstrom and J. Reppy. Nested data-parallelism on the gpu. In ICFP, 2012.
[8]
G. E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, 1990.
[9]
G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput., 21(1):4--14, 1994.
[10]
B. C. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In PPOPP, 2011.
[11]
M. M. T. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore GPUs. In DAMP, 2011.
[12]
R. Dolbeau, S. Bihan, and F. Bodin. HMPP: A hybrid multi-core parallel programming environment. In Workshop on General Purpose Processing Using GPUs, 2007.
[13]
J.-Y. Girard. Interprétation fonctionelle et élimination des coupures de l'arithmétique d'ordre supérieur. PhD thesis, Université Paris VII, 1972.
[14]
T. Johnsson. Lambda lifting: Transforming programs to recursive equations. 1985.
[15]
S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the multicores: Nested data parallelism in haskell. In FSTTCS, 2008.
[16]
D. R. Kincaid, J. R. Respess, and D. M. Young. ITPACK 2.0 user's guide. Technical Report CNA-150, Center for Numerical Analysis, University of Texas, Austin, Texas, 1979.
[17]
R. Milner. A theory of type polymorphism in programming. Journal of Computer and System Science, 1978.
[18]
J. C. Reynolds. Towards a theory of type structure. In Programming Symposium, Proceedings Colloque sur la Programmation, pages 408--423, 1974.
[19]
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In Graphics Hardware, 2007.
[20]
M. Wolfe. Implementing the PGI accelerator model. In GPGPU, 2010.
[21]
Y. Zhang and F. Mueller. Cunesl: Compiling nested data-parallel languages for simt architectures. In ICPP, 2012.

Cited By

View all
  • (2024)Fusing Gathers with Integer Linear ProgrammingProceedings of the 1st ACM SIGPLAN International Workshop on Functional Programming for Productivity and Performance10.1145/3677997.3678227(10-23)Online publication date: 28-Aug-2024
  • (2023)Edge Intelligence with Distributed Processing of DNNs: A SurveyComputer Modeling in Engineering & Sciences10.32604/cmes.2023.023684136:1(5-42)Online publication date: 2023
  • (2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
June 2014
112 pages
ISBN:9781450329378
DOI:10.1145/2627373
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Array-oriented programming
  2. CUDA
  3. Code generation
  4. Compilation
  5. Functional programming
  6. Multi-core CPU

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

PLDI '14
Sponsor:

Acceptance Rates

ARRAY'14 Paper Acceptance Rate 17 of 25 submissions, 68%;
Overall Acceptance Rate 17 of 25 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fusing Gathers with Integer Linear ProgrammingProceedings of the 1st ACM SIGPLAN International Workshop on Functional Programming for Productivity and Performance10.1145/3677997.3678227(10-23)Online publication date: 28-Aug-2024
  • (2023)Edge Intelligence with Distributed Processing of DNNs: A SurveyComputer Modeling in Engineering & Sciences10.32604/cmes.2023.023684136:1(5-42)Online publication date: 2023
  • (2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023
  • (2023)GaiwanScience of Computer Programming10.1016/j.scico.2023.102989230:COnline publication date: 1-Aug-2023
  • (2020)ACQuA: A Parallel Accelerator Architecture for Pure Functional Programs2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI49217.2020.00070(346-351)Online publication date: Jul-2020
  • (2019)Tiramisu: a polyhedral compiler for expressing fast and portable codeProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314896(193-205)Online publication date: 16-Feb-2019
  • (2019)TeIL: a type-safe imperative tensor intermediate languageProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329959(57-68)Online publication date: 8-Jun-2019
  • (2019)NoTThe Journal of Supercomputing10.1007/s11227-019-02749-175:7(3810-3841)Online publication date: 1-Jul-2019
  • (2018)Automatic Matching of Legacy Code to Heterogeneous APIsACM SIGPLAN Notices10.1145/3296957.317318253:2(139-153)Online publication date: 19-Mar-2018
  • (2018)Automatic Matching of Legacy Code to Heterogeneous APIsProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173182(139-153)Online publication date: 19-Mar-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media