invited-talk

Automatic synthesis of high performance mathematical programs

Author:

Markus PüschelAuthors Info & Claims

ISSAC '09: Proceedings of the 2009 international symposium on Symbolic and algebraic computation

Pages 5 - 6

https://doi.org/10.1145/1576702.1576705

Published: 28 July 2009 Publication History

Get Access

Abstract

The evolution of computing platforms is at a historic inflection point. CPU frequency has stalled (in 2004 at about 3GHz), which means future performance gains will only be achievable due to increasing parallelism in the form of multiple cores and vector instructions sets. The impact on the developers of high performance libraries implementing important mathematical functionality such as matrix-multiplication, linear transforms, and many others is profound. Traditionally, an algorithm developer ensures correctness and minimizes the operations count. A software engineer then performs the actual implementation (in a compilable language like C) and performance optimization. However, on modern platforms, two implementations with the exact same operations count may differ by 10, 100, or even 1000x in runtime: instead, the structure of an algorithm becomes a major factor and determines how well it can be parallelized, vectorized, and matched to the memory hierarchy. Ideally, a compiler would perform all these tasks, but the current state of knowledge suggest that this may be inherently impossible for many types of code. The reason may be two-fold. First, many transformations, in particular for parallelism, require domain-knowledge that the compiler simply does possess. Second, often there are simply too many choices of transformations that the compiler cannot or does not know how to explore.

As a consequence, the development of high performance libraries for mathematical functions becomes extraordinarily difficult, since the developer needs to have a good understanding of available algorithms, the target microarchitecture, and implementation techniques such as threading and vector instruction set such as SSE on Intel. To make things worse, optimal code is often platform specific, that is, code that runs very fast on one platform can be suboptimal on another. This means that if highest performance is desired, library developers are constantly forced to reimplement and reoptimize the same functionality. A commercial example following this model are Intel's IPP and MKL libraries, which provide a very broad set of mathematical functions needed in scientific computing, signal and image processing, communication, and security applications.

An attractive solution would be to automate the library development, which means let the computer write the code and rewrite it for every new platforms. There are several challenges involved with this proposal. First, for a given desired function (such as multiplying matrices or computing a discrete Fourier transform), the existing algorithm knowledge has to be encoded into a form or language that is suitable for computer representation. Second, structural algorithm transformations for parallelism or locality that are typically performed by the programmer also have to be encoded into this form. Third, available choices have to be explored systematically and efficiently. As we will show for a specific domain, techniques from symbolic computation provide the answers.

In this talk we present Spiral [6, 1], a domain-specific program generation system for important mathematical functionality such as linear transforms, filters, Viterbi decoders, and basic linear algebra routines. Spiral completely replaces the human programmer. For a desired function, Spiral generates alternative algorithms, optimizes them, compiles them into programs, and "intelligently"' searches for the best match to the computing platform. The main idea behind Spiral is a mathematical, symbolic, declarative, domain-specific language to represent algorithms and the use of rewriting systems to generate and structurally optimize algorithms at a high level of abstraction. Optimization includes parallelization, vectorization, and locality improvement for the memory hierarchy [3, 4, 5, 7, 2]. Experimental results show that the code generated by Spiral competes with, and sometimes outperforms, the best available human-written code.

References

[1]

Spiral web site, 2006. www.spiral.net.

Google Scholar

[2]

F. Franchetti, F. de Mesmay, D. McFarlin, and M. P&#252;schel. Operator language: A program generation framework for fast kernels. In IFIP Working Conference on Domain Specific Languages (DSL WC), 2009.

Digital Library

Google Scholar

[3]

F. Franchetti, Y. Voronenko, and M. P&#252;schel. Loop merging for signal transforms. In Proc. PLDI, pages 315--326, 2005.

Digital Library

Google Scholar

[4]

F. Franchetti, Y. Voronenko, and M. P&#252;schel. FFT program generation for shared memory: SMP and multicore. In Supercomputing, 2006.

Digital Library

Google Scholar

[5]

F. Franchetti, Y. Voronenko, and M. P&#252;schel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006.

Digital Library

Google Scholar

[6]

M. P&#252;schel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Ga&#269;i&#263;, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005. special issue on "Program Generation, Optimization, and Adaptation".

Google Scholar

[7]

Y. Voronenko, F. de Mesmay, and M. P&#252;schel. Computer generation of general size linear transform libraries. In International Symposium on Code Generation and Optimization (CGO), pages 102--113, 2009.

Digital Library

Google Scholar

Index Terms

Automatic synthesis of high performance mathematical programs

Recommendations

Compiling math to fast code
PEPM '12: Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation

Extracting optimal performance from modern computing platforms has become increasingly difficult over the last few years. The effect is particularly noticeable in computations that are of mathematical nature such as those needed in multimedia processing,...
Automatic performance programming
Onward! 2011: Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software

It has become extraordinarily difficult to write software that performs close to optimally on complex modern microarchitectures. Particularly plagued are domains that are data intensive and require complex mathematical computations such as information ...
Can we teach computers to write fast libraries?
GPCE '07: Proceedings of the 6th international conference on Generative programming and component engineering

As the computing world "goes multicore", high performance library development finally becomes a nightmare. Optimal programs, and their underlying algorithms, have to be adapted to take full advantage of the platform's parallelism, memory hierarchy, and ...

Comments

Information & Contributors

Information

Published In

ISSAC '09: Proceedings of the 2009 international symposium on Symbolic and algebraic computation

July 2009

402 pages

ISBN:9781605586090

DOI:10.1145/1576702

General Chairs:
Jeremy Johnson
Drexel University, USA
,
Hyungju Park
Korea Institute for Advanced Study, Korea
,
Program Chair:
Erich Kaltofen
North Carolina State University, USA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Invited-talk

Conference

ISSAC '09

Sponsor:

ISSAC '09: International Symposium on Symbolic and Algebraic Computation

July 28 - 31, 2009

Seoul, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 395 of 838 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
152
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

Compiling math to fast code

Automatic performance programming

Can we teach computers to write fast libraries?

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations