skip to main content
10.1145/2517208.2517228acmconferencesArticle/Chapter ViewAbstractPublication PagesgpceConference Proceedingsconference-collections
research-article

Spiral in scala: towards the systematic construction of generators for performance libraries

Published: 27 October 2013 Publication History

Abstract

Program generators for high performance libraries are an appealing solution to the recurring problem of porting and optimizing code with every new processor generation, but only few such generators exist to date. This is due to not only the difficulty of the design, but also of the actual implementation, which often results in an ad-hoc collection of standalone programs and scripts that are hard to extend, maintain, or reuse. In this paper we ask whether and which programming language concepts and features are needed to enable a more systematic construction of such generators. The systematic approach we advocate extrapolates from existing generators: a) describing the problem and algorithmic knowledge using one, or several, domain-specific languages (DSLs), b) expressing optimizations and choices as rewrite rules on DSL programs, c) designing data structures that can be configured to control the type of code that is generated and the data representation used, and d) using autotuning to select the best-performing alternative. As a case study, we implement a small, but representative subset of Spiral in Scala using the Lightweight Modular Staging (LMS) framework. The first main contribution of this paper is the realization of c) using type classes to abstract over staging decisions, i.e. which pieces of a computation are performed immediately and for which pieces code is generated. Specifically, we abstract over different complex data representations jointly with different code representations including generating loops versus unrolled code with scalar replacement - a crucial and usually tedious performance transformation. The second main contribution is to provide full support for a) and d) within the LMS framework: we extend LMS to support translation between different DSLs and autotuning through search.

References

[1]
Eigen C++ template library for linear algebra. http://eigen.tuxfamily.org.
[2]
G. N. W. A. Logg, K.-A. Mardal, editor. Automated Solution of Differential Equations by the Finite Element Method. Springer, 2012.
[3]
B. Aktemur, Y. Kameyama, O. Kiselyov, and C.-c. Shan. Shonan challenge for generative programming: short position paper. In Proc. Partial evaluation and program manipulation (PEPM), pages 147--154, 2013.
[4]
G. Belter, E. R. Jessup, I. Karlin, and J. G. Siek. Automating the generation of composed linear algebra kernels. In SC. ACM, 2009.
[5]
J. Bilmes, K. Asanović, C. whye Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proc. Int'l Conference on Supercomputing (ICS), pages 340--347, 1997.
[6]
M. Bravenboer, K. T. Kalleberg, R. Vermaas, and E. Visser. Stratego/xt 0.17. a language and toolset for program transformation. Sci. Comput. Program., 72(1-2):52--70, 2008.
[7]
K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. A heterogeneous parallel framework for domain-specific languages. In Proc. Parallel Architectures and Compilation Techniques (PACT), pages 89--100, 2011.
[8]
C. Calcagno, W. Taha, L. Huang, and X. Leroy. Implementing multi-stage languages using ASTs, Gensym, and reflection. In Proc. Generative Programming and Component Engineering (GPCE), pages 57--76, 2003.
[9]
H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language virtualization for heterogeneous parallel computing. In Proc. Int'l conference on object oriented programming systems languages and applications (OOPSLA), pages 835--847, 2010.
[10]
C. Click and K. D. Cooper. Combining analyses, combining optimizations. ACM Trans. Program. Lang. Syst., 17:181--196, March 1995.
[11]
A. Cohen, S. Donadio, M. J. Garzarán, C. A. Herrmann, O. Kiselyov, and D. A. Padua. In search of a program generator to implement generic transformations for high-performance computing. Sci. Comput. Program., 62(1):25--46, 2006.
[12]
F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. In Programming Languages Design and Implementation (PLDI), pages 315--326, 2005.
[13]
F. Franchetti, Y. Voronenko, and M. Püschel. FFT program generation for shared memory: SMP and multicore. In Supercomputing (SC), 2006.
[14]
F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006.
[15]
M. Frigo. A fast Fourier transform compiler. In Proc. Programming Language Design and Implementation (PLDI), pages 169--180, 1999.
[16]
J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn. FLAME: Formal linear algebra methods environment. ACM Trans. on Mathematical Software, 27(4):422--455, 2001.
[17]
JetBrains. Meta Programming System, 2009.
[18]
N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.
[19]
S. Karmesin, J. Crotinger, J. Cummings, S. Haney, W. Humphrey, J. Reynders, S. Smith, and T. J.Williams. Array design and expression evaluation in POOMA II. In ISCOPE, pages 231--238, 1998.
[20]
L. C. L. Kats and E. Visser. The Spoofax language workbench. rules for declarative specification of languages and IDEs. In SPLASH/OOPSLA Companion, pages 237--238, 2010.
[21]
O. Kiselyov, K. N. Swadi, and W. Taha. A methodology for generating verified combinatorial circuits. In G. C. Buttazzo, editor, EMSOFT, pages 249--258. ACM, 2004.
[22]
O. Kiselyov and W. Taha. Relating FFTW and split-radix. In Z. Wu, C. Chen, M. Guo, and J. Bu, editors, ICESS, volume 3605 of Lecture Notes in Computer Science, pages 488--493. Springer, 2004.
[23]
H. Lee, K. J. Brown, A. K. Sujeeth, H. Chafi, T. Rompf, M. Odersky, and K. Olukotun. Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro, 31(5):42--53, 2011.
[24]
J. Mattingley and S. Boyd. CVXGEN: A code generator for embedded convex optimization. Optimization and Engineering, 13(1):1--27, 2012.
[25]
U. Norell and P. Jansson. Polytypic programming in Haskell. In P. W. Trinder, G. Michaelson, and R. Pena, editors, IFL, volume 3145 of Lecture Notes in Computer Science, pages 168--184. Springer, 2003.
[26]
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232-- 275, 2005.
[27]
M. Püschel, B. Singer, M. Veloso, and J. M. F. Moura. Fast automatic generation of DSP algorithms. In International Conference on Computational Science (ICCS), volume 2073 of Lecture Notes In Computer Science, pages 97--106. Springer, 2001.
[28]
T. Rompf. Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. PhD thesis, EPFL, 2012.
[29]
T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scala-virtualized: Linguistic reuse for deep embeddings. In Higher-Order and Symbolic Computation (Special issue for PEPM'12, to appear).
[30]
T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55(6):121--130, 2012.
[31]
T. Rompf, A. K. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs. In Proc. Principles of programming languages (POPL), pages 497--510, 2013.
[32]
T. Rompf, A. K. Sujeeth, H. Lee, K. J. Brown, H. Chafi, M. Odersky, and K. Olukotun. Building-blocks for performance oriented DSLs. DSL, 2011.
[33]
A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, M. Wu, A. R. Atreya, M. Odersky, and K. Olukotun. OptiML: an implicitly parallel domain-specific language for machine learning. In Proceedings of the 28th International Conference on Machine Learning, ICML, 2011.
[34]
W. Taha and T. Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211--242, 2000.
[35]
S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. In Programming language design and implementation (PLDI), PLDI '11, pages 132--141, 2011.
[36]
T. L. Veldhuizen. Expression templates, C++ gems. SIGS Publications, Inc., New York, NY, 1996.
[37]
T. L. Veldhuizen. Arrays in blitz++. In ISCOPE, pages 223--230, 1998.
[38]
Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In International Symposium on Code Generation and Optimization (CGO), pages 102--113, 2009.
[39]
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, volume 16 of Journal of Physics: Conference Series, pages 521--530, 2005.
[40]
P. Wadler and S. Blott. How to make ad-hoc polymorphism less adhoc. In POPL, pages 60--76, 1989.
[41]
R. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.
[42]
J. Xiong, J. Johnson, R. W. Johnson, and D. Padua. SPL: A language and compiler for DSP algorithms. In Programming Languages Design and Implementation (PLDI), pages 298--308, 2001.

Cited By

View all
  • (2023)Multi-Stage Vertex-Centric Programming for Agent-Based SimulationsProceedings of the 22nd ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3624007.3624057(100-112)Online publication date: 22-Oct-2023
  • (2023)Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured GridProceedings of the ACM on Programming Languages10.1145/36228227:OOPSLA2(686-715)Online publication date: 16-Oct-2023
  • (2020)Compiling symbolic execution with staging and algebraic effectsProceedings of the ACM on Programming Languages10.1145/34282324:OOPSLA(1-33)Online publication date: 13-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GPCE '13: Proceedings of the 12th international conference on Generative programming: concepts & experiences
October 2013
198 pages
ISBN:9781450323734
DOI:10.1145/2517208
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. abstraction over staging
  2. data representation
  3. scalar replacement
  4. selective precomputation
  5. synthesis

Qualifiers

  • Research-article

Conference

GPCE'13
Sponsor:
GPCE'13: Generative Programming: Concepts and Experiences
October 27 - 28, 2013
Indiana, Indianapolis, USA

Acceptance Rates

GPCE '13 Paper Acceptance Rate 20 of 59 submissions, 34%;
Overall Acceptance Rate 56 of 180 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Multi-Stage Vertex-Centric Programming for Agent-Based SimulationsProceedings of the 22nd ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3624007.3624057(100-112)Online publication date: 22-Oct-2023
  • (2023)Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured GridProceedings of the ACM on Programming Languages10.1145/36228227:OOPSLA2(686-715)Online publication date: 16-Oct-2023
  • (2020)Compiling symbolic execution with staging and algebraic effectsProceedings of the ACM on Programming Languages10.1145/34282324:OOPSLA(1-33)Online publication date: 13-Nov-2020
  • (2020)Multi-stage programming in the large with staged classesProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426961(35-49)Online publication date: 16-Nov-2020
  • (2020)FireironProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414632(71-82)Online publication date: 30-Sep-2020
  • (2020)AnyHLS: High-Level Synthesis With Partial EvaluationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301217239:11(3202-3214)Online publication date: Nov-2020
  • (2019)DSL-Based Hardware Generation with ScalaACM Transactions on Reconfigurable Technology and Systems10.1145/335975413:1(1-23)Online publication date: 19-Dec-2019
  • (2019)A stage-polymorphic IR for compiling MATLAB-style dynamic tensor expressionsProceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3357765.3359514(34-47)Online publication date: 21-Oct-2019
  • (2019)DSL-Based Modular IP Core Generators: Example FFT and Related Structures2019 IEEE 26th Symposium on Computer Arithmetic (ARITH)10.1109/ARITH.2019.00043(190-191)Online publication date: Jun-2019
  • (2018)Backpropagation with continuation callbacksProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327682(10201-10212)Online publication date: 3-Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media