skip to main content
10.1145/1576702.1576705acmconferencesArticle/Chapter ViewAbstractPublication PagesissacConference Proceedingsconference-collections
invited-talk

Automatic synthesis of high performance mathematical programs

Published: 28 July 2009 Publication History

Abstract

The evolution of computing platforms is at a historic inflection point. CPU frequency has stalled (in 2004 at about 3GHz), which means future performance gains will only be achievable due to increasing parallelism in the form of multiple cores and vector instructions sets. The impact on the developers of high performance libraries implementing important mathematical functionality such as matrix-multiplication, linear transforms, and many others is profound. Traditionally, an algorithm developer ensures correctness and minimizes the operations count. A software engineer then performs the actual implementation (in a compilable language like C) and performance optimization. However, on modern platforms, two implementations with the exact same operations count may differ by 10, 100, or even 1000x in runtime: instead, the structure of an algorithm becomes a major factor and determines how well it can be parallelized, vectorized, and matched to the memory hierarchy. Ideally, a compiler would perform all these tasks, but the current state of knowledge suggest that this may be inherently impossible for many types of code. The reason may be two-fold. First, many transformations, in particular for parallelism, require domain-knowledge that the compiler simply does possess. Second, often there are simply too many choices of transformations that the compiler cannot or does not know how to explore.
As a consequence, the development of high performance libraries for mathematical functions becomes extraordinarily difficult, since the developer needs to have a good understanding of available algorithms, the target microarchitecture, and implementation techniques such as threading and vector instruction set such as SSE on Intel. To make things worse, optimal code is often platform specific, that is, code that runs very fast on one platform can be suboptimal on another. This means that if highest performance is desired, library developers are constantly forced to reimplement and reoptimize the same functionality. A commercial example following this model are Intel's IPP and MKL libraries, which provide a very broad set of mathematical functions needed in scientific computing, signal and image processing, communication, and security applications.
An attractive solution would be to automate the library development, which means let the computer write the code and rewrite it for every new platforms. There are several challenges involved with this proposal. First, for a given desired function (such as multiplying matrices or computing a discrete Fourier transform), the existing algorithm knowledge has to be encoded into a form or language that is suitable for computer representation. Second, structural algorithm transformations for parallelism or locality that are typically performed by the programmer also have to be encoded into this form. Third, available choices have to be explored systematically and efficiently. As we will show for a specific domain, techniques from symbolic computation provide the answers.
In this talk we present Spiral [6, 1], a domain-specific program generation system for important mathematical functionality such as linear transforms, filters, Viterbi decoders, and basic linear algebra routines. Spiral completely replaces the human programmer. For a desired function, Spiral generates alternative algorithms, optimizes them, compiles them into programs, and "intelligently"' searches for the best match to the computing platform. The main idea behind Spiral is a mathematical, symbolic, declarative, domain-specific language to represent algorithms and the use of rewriting systems to generate and structurally optimize algorithms at a high level of abstraction. Optimization includes parallelization, vectorization, and locality improvement for the memory hierarchy [3, 4, 5, 7, 2]. Experimental results show that the code generated by Spiral competes with, and sometimes outperforms, the best available human-written code.

References

[1]
Spiral web site, 2006. www.spiral.net.
[2]
F. Franchetti, F. de Mesmay, D. McFarlin, and M. Püschel. Operator language: A program generation framework for fast kernels. In IFIP Working Conference on Domain Specific Languages (DSL WC), 2009.
[3]
F. Franchetti, Y. Voronenko, and M. Püschel. Loop merging for signal transforms. In Proc. PLDI, pages 315--326, 2005.
[4]
F. Franchetti, Y. Voronenko, and M. Püschel. FFT program generation for shared memory: SMP and multicore. In Supercomputing, 2006.
[5]
F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006.
[6]
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005. special issue on "Program Generation, Optimization, and Adaptation".
[7]
Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In International Symposium on Code Generation and Optimization (CGO), pages 102--113, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSAC '09: Proceedings of the 2009 international symposium on Symbolic and algebraic computation
July 2009
402 pages
ISBN:9781605586090
DOI:10.1145/1576702

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automation
  2. domain-specific language
  3. fourier transform
  4. high performance
  5. matrix algebra
  6. parallelization
  7. program generation
  8. rewriting
  9. vectorization

Qualifiers

  • Invited-talk

Conference

ISSAC '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 395 of 838 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 152
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media