skip to main content
10.1145/1576702.1576705acmconferencesArticle/Chapter ViewAbstractPublication PagesissacConference Proceedingsconference-collections
invited-talk

Automatic synthesis of high performance mathematical programs

Published:28 July 2009Publication History

ABSTRACT

The evolution of computing platforms is at a historic inflection point. CPU frequency has stalled (in 2004 at about 3GHz), which means future performance gains will only be achievable due to increasing parallelism in the form of multiple cores and vector instructions sets. The impact on the developers of high performance libraries implementing important mathematical functionality such as matrix-multiplication, linear transforms, and many others is profound. Traditionally, an algorithm developer ensures correctness and minimizes the operations count. A software engineer then performs the actual implementation (in a compilable language like C) and performance optimization. However, on modern platforms, two implementations with the exact same operations count may differ by 10, 100, or even 1000x in runtime: instead, the structure of an algorithm becomes a major factor and determines how well it can be parallelized, vectorized, and matched to the memory hierarchy. Ideally, a compiler would perform all these tasks, but the current state of knowledge suggest that this may be inherently impossible for many types of code. The reason may be two-fold. First, many transformations, in particular for parallelism, require domain-knowledge that the compiler simply does possess. Second, often there are simply too many choices of transformations that the compiler cannot or does not know how to explore.

As a consequence, the development of high performance libraries for mathematical functions becomes extraordinarily difficult, since the developer needs to have a good understanding of available algorithms, the target microarchitecture, and implementation techniques such as threading and vector instruction set such as SSE on Intel. To make things worse, optimal code is often platform specific, that is, code that runs very fast on one platform can be suboptimal on another. This means that if highest performance is desired, library developers are constantly forced to reimplement and reoptimize the same functionality. A commercial example following this model are Intel's IPP and MKL libraries, which provide a very broad set of mathematical functions needed in scientific computing, signal and image processing, communication, and security applications.

An attractive solution would be to automate the library development, which means let the computer write the code and rewrite it for every new platforms. There are several challenges involved with this proposal. First, for a given desired function (such as multiplying matrices or computing a discrete Fourier transform), the existing algorithm knowledge has to be encoded into a form or language that is suitable for computer representation. Second, structural algorithm transformations for parallelism or locality that are typically performed by the programmer also have to be encoded into this form. Third, available choices have to be explored systematically and efficiently. As we will show for a specific domain, techniques from symbolic computation provide the answers.

In this talk we present Spiral [6, 1], a domain-specific program generation system for important mathematical functionality such as linear transforms, filters, Viterbi decoders, and basic linear algebra routines. Spiral completely replaces the human programmer. For a desired function, Spiral generates alternative algorithms, optimizes them, compiles them into programs, and "intelligently"' searches for the best match to the computing platform. The main idea behind Spiral is a mathematical, symbolic, declarative, domain-specific language to represent algorithms and the use of rewriting systems to generate and structurally optimize algorithms at a high level of abstraction. Optimization includes parallelization, vectorization, and locality improvement for the memory hierarchy [3, 4, 5, 7, 2]. Experimental results show that the code generated by Spiral competes with, and sometimes outperforms, the best available human-written code.

References

  1. Spiral web site, 2006. www.spiral.net.Google ScholarGoogle Scholar
  2. F. Franchetti, F. de Mesmay, D. McFarlin, and M. Püschel. Operator language: A program generation framework for fast kernels. In IFIP Working Conference on Domain Specific Languages (DSL WC), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Franchetti, Y. Voronenko, and M. Püschel. Loop merging for signal transforms. In Proc. PLDI, pages 315--326, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Franchetti, Y. Voronenko, and M. Püschel. FFT program generation for shared memory: SMP and multicore. In Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005. special issue on "Program Generation, Optimization, and Adaptation".Google ScholarGoogle Scholar
  7. Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In International Symposium on Code Generation and Optimization (CGO), pages 102--113, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic synthesis of high performance mathematical programs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Article Metrics

            • Downloads (Last 12 months)2
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader