ABSTRACT
It has become extraordinarily difficult to write software that performs close to optimally on complex modern microarchitectures. Particularly plagued are domains that require complex mathematical computations such as multimedia processing, communication, control, graphics, and machine learning. In these domains, performance-critical components are usually written in C (with possible extensions) and often even in assembly, carefully "tuned" to the platform's architecture and microarchitecture. The result is usually long, rather unreadable code that needs to be re-written or re-tuned with every platform upgrade. On the other hand, the performance penalty for relying on straightforward, non-tuned, "more elegant" implementations can be often a factor of 10, 100, or even more. The overall problem is one of productivity, maintainability, and quality (namely performance), i.e., software engineering. However, even though a large set of sophisticated software engineering theory and tools exist, it appears that to date this community has not focused much on mathematical computations nor performance in the detailed, close-to-optimal sense above. The reason for the latter may be that performance, unlike various aspects of correctness, is not syntactic in nature (and in reality is often even unpredictable and, well, messy). The aim of this talk is to draw attention to the performance/productivity problem for mathematical applications and to make the case for a more interdisciplinary attack. As a set of thoughts in this direction we offer some of the lessons we have learned in the last decade in our own research on Spiral (www.spiral.net), a program generation framework for numerical kernels. Key techniques used in Spiral include staged declarative domain-specific languages to express algorithm knowledge and algorithm transformations, the use of platform-cognizant rewriting systems for parallelism and locality optimizations, and the use of search and machine learning techniques to navigate possible spaces of choices. Experimental results show that the codegenerated by Spiral competes with, and sometimes outperforms, the best available human-written code. Spiral has been used to generate part of Intel's commercial libraries IPP and MKL.
Index Terms
- Program generation for performance
Recommendations
Automatic performance programming
Onward! 2011: Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and softwareIt has become extraordinarily difficult to write software that performs close to optimally on complex modern microarchitectures. Particularly plagued are domains that are data intensive and require complex mathematical computations such as information ...
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both ...
Can we teach computers to write fast libraries?
GPCE '07: Proceedings of the 6th international conference on Generative programming and component engineeringAs the computing world "goes multicore", high performance library development finally becomes a nightmare. Optimal programs, and their underlying algorithms, have to be adapted to take full advantage of the platform's parallelism, memory hierarchy, and ...
Comments