ABSTRACT
Nesl is a first-order functional language with an apply-to-each construct and other parallel primitives that enables the expression of irregular nested data-parallel (NDP) algorithms. To compile Nesl, Blelloch and others developed a global flattening transformation that maps irregular NDP code into regular flat data parallel (FDP) code suitable for executing on SIMD or SIMT architectures, such as GPUs.
While flattening solves the problem of mapping irregular parallelism into a regular model, it requires significant additional optimizations to produce performant code. Nessie is a compiler for Nesl that generates CUDA code for running on Nvidia GPUs. The Nessie compiler relies on a fairly complicated shape analysis that is performed on the FDP code produced by the flattening transformation. Shape analysis plays a key rôle in the compiler as it is the enabler of fusion optimizations, smart kernel scheduling, and other optimizations.
In this paper, we present a new approach to the shape analysis problem for Nesl that is both simpler to implement and provides better quality shape information. The key idea is to analyze the NDP representation of the program and then preserve shape information through the flattening transformation.
- Lars Bergstrom and John Reppy. 2012. Nested Data-Parallelism on the GPU. In ICFP '12 (Copenhagen, Denmark). ACM, New York, NY, 247--258.Google Scholar
- Guy E. Blelloch. 1989. Scans as Primitive Parallel Operations. IEEE Computer 38, 11 (Nov. 1989), 1526--1538.Google ScholarDigital Library
- Guy E. Blelloch. 1990. Vector models for data-parallel computing. MIT Press, Cambridge, MA, USA.Google ScholarDigital Library
- Guy E. Blelloch. 1995. NESL: A nested data-parallel language (version 3.1). Technical Report CMU-CS-95-170. School of C.S., CMU, Pittsburgh, PA.Google Scholar
- Guy E. Blelloch. 1996. Programming parallel algorithms. CACM 39, 3 (March 1996), 85--97.Google ScholarDigital Library
- Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha. 1994. Implementation of a portable nested data-parallel language. JPDC 21, 1 (1994), 4--14.Google ScholarDigital Library
- Guy E. Blelloch and Gary W. Sabot. 1990. Compiling collection-oriented languages onto massively parallel computers. JPDC 8, 2 (1990), 119--134.Google ScholarDigital Library
- Troels Henriksen, Martin Elsman, and Cosmin E. Oancea. 2014. Size Slicing: A Hybrid Approach to Size Inference in Futhark. In FHPC '14 (Gothenburg, Sweden). ACM, New York, NY, 31--42.Google Scholar
- Gabriele Keller. 1999. Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines. Ph.D. Dissertation. Technische Universität Berlin, Berlin, Germany.Google Scholar
- Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. 2010. Regular, Shape-polymorphic, Parallel Arrays in Haskell. In ICFP '10 (Baltimore, MD). ACM, New York, NY, 261--272. Google ScholarDigital Library
- Gabriele Keller, Manuel M. T. Chakravarty, Roman Leshchinskiy, Ben Lippmeier, and Simon Peyton Jones. 2012. Vectorisation Avoidance. In HASKELL '12 (Copenhagen, Denmark). ACM, New York, NY, 37--48.Google Scholar
- Gabriele Keller and Martin Simons. 1996. A Calculational Approach to Flattening Nested Data Parallelism in Functional Languages. In Concurrency and Parallelism, Programming, Networking, and Security (LNCS), Joxan Jaffar and Roland H. C. Yap (Eds.), Vol. 1179. Springer-Verlag, New York, NY, 234--243.Google Scholar
- Roman Leshchinskiy. 2005. Higher-Order Nested Data Parallelism: Semantics and Implementation. Ph.D. Dissertation. Technische Universität Berlin, Berlin, Germany.Google Scholar
- Ben Lippmeier, Manuel M.T. Chakravarty, Gabriele Keller, Roman Leshchinskiy, and Simon Peyton Jones. 2012. Work Efficient Higher-order Vectorisation. In ICFP '12 (Copenhagen, Denmark). ACM, New York, NY, 259--270.Google Scholar
- Frederik M. Madsen. 2012. Flattening Nested Data Parallelism. Master's Project, DIKU. Available from http://hiperfit.dk/publications.Google Scholar
- Jan F. Prins and Daniel W. Palmer. 1993. Transforming High-Level Data-Parallel Programs into Vector Operations. In PPoPP '93 (San Diego, CA). ACM, New York, NY, 119--128.Google Scholar
- John Reppy and Nora Sandler. 2015. Nessie: A NESL to CUDA Compiler. Presented at CPC 2015; London, UK., 13 pages. Available from https://nessie.cs.uchicago.edu.Google Scholar
- John Reppy and Joe Wingerter. 2016. λcu --- An Intermediate Representation for Compiling Nested Data Parallelism. Presented at CPC 2016; Valladolid, Spain.., 13 pages. Available from https://cpc2016.infor.uva.es.Google Scholar
- Amos Robinson, Ben Lippmeier, and Gabriele Keller. 2014. Fusing Filters with Integer Linear Programming. In FHPC '14 (Gothenburg, Sweden). ACM, New York, NY, 53--62.Google Scholar
- Nora Sandler. 2014. Nessie: A New NESL Compiler. (June 2014). BA Honors Thesis, Department of Computer Science, University of Chicago.Google Scholar
- Scandal Project. [n.d.]. A library of parallel algorithms written in NESL. Available from http://www.cs.cmu.edu/~scandal/nesl/algorithms.html.Google Scholar
- Sven-Bodo Scholz. 2001. A Type System for Inferring Array Shapes. In IFL '01 (Stockholm, Sweden) (LNCS), Thomas Arts and Markus Mohnen (Eds.). Springer-Verlag, New York, NY, 65--82.Google Scholar
- Fangyong Tang and Clemens Grelck. 2013. User-Defined Shape Constraints in SAC. Presented at IFL 2012; Oxford U.K.., 19 pages. Available from www.sac-home.org.Google Scholar
- Kai Trojahner, Clemens Grelck, and Sven-Bodo Scholz. 2006. On Optimising Shape-Generic Array Programs Using Symbolic Structural Information. In IFL '06 (Budapest, Hungary), Zoltán Horváth, Viktória Zsók, and Andrew Butterfield (Eds.). Springer-Verlag, New York, NY, 1--18.Google Scholar
- Joe Wingerter. 2017. λcu --- An Intermediate Representation for Compiling Nested Data Parallelism. Master's thesis. University of Chicago.Google Scholar
- Yongpeng Zhang and Frank Mueller. 2012. CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures. In ICPP '12 (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 340--349.Google Scholar
Recommendations
Data-only flattening for nested data parallelism
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingData parallelism has proven to be an effective technique for high-level programming of a certain class of parallel applications, but it is not well suited to irregular parallel computations. Blelloch and others proposed nested data parallelism (NDP) as ...
Data-only flattening for nested data parallelism
PPoPP '13Data parallelism has proven to be an effective technique for high-level programming of a certain class of parallel applications, but it is not well suited to irregular parallel computations. Blelloch and others proposed nested data parallelism (NDP) as ...
Nested data-parallelism on the gpu
ICFP '12: Proceedings of the 17th ACM SIGPLAN international conference on Functional programmingGraphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ...
Comments