skip to main content
10.1145/1712605.1712607acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
keynote

Software knows best: portable parallelism requires standardized measurements of transparent hardware

Published: 28 January 2010 Publication History

Abstract

The hardware trend of the last 15 years of dynamically trying to improve performance with little software visibility is not only irrelevant today, its counterproductive; adaptivity must be at the software level if parallel software is going to be portable, fast, and energy-efficient. A portable parallel program is an oxymoron today; there is no reason to be parallel if it's slow, and parallel can't be fast if it's portable. Hence, portable parallel programs of the future must be able to understand and measure /any/ computer on which it runs so that it can adapt effectively, which suggests that hardware measurement should be standardized and processor performance and energy consumption should become transparent.
In addition to software-controlled adaptivity for execution efficiency by using techniques like autotuning and dynamic scheduling, modern software environments adapt to improve /programmer/ efficiency [1]. Classic examples include dynamic linking, dynamic memory allocation, garbage collection, interpreters, just-in-time compilers, and debugger-support. Examples that are more recent are selective embedded just in time specialization (SEJITS) [2] for highly productive languages like Python and Ruby. Thus, the future of programming is likely to involve program generators at many levels of the hierarchy tailoring the application to the machine. These productivity advances via adaptivity should be reflected in modern benchmarks: virtually no one writes the statically linked, highest-level-optimized C programs that are the foundation of most benchmark suites.
The dream is to improve productivity without sacrificing too much performance. Indeed, how often have you heard the claim that a new productive environment is now "almost as fast as C" or "almost as fast as Java?" The implication of the necessary tie between productivity and performance in the manycore era is that these modern environments must be able to utilize manycore well, or the gap between highly efficient code and highly productive code will grow with the number of cores.
For industry's bet on manycore to win, therefore, both very high level and very low level programming environments will need to be able to understand and measure their underlying hardware and adapt their execution so as to be portable, relatively fast, and energy-efficient.
Hence, we argue that a standard of accurate hardware operation trackers (SHOT) would have a huge positive impact on making parallel software portable with good performance and energy efficiency, similar to the impact of the IEEE-754 standard had on portability of numerical software. In particular, we believe SHOT will lead to much larger improvements in portability, performance, energy efficiency of parallel codes than recent architectural fads like opportunistic "turbo modes," transactional memory, or reconfigurable computing.

References

[1]
Krste Asanović, Rastislav Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel, Katherine Yelick, "A View of the Parallel Computing Landscape", Communications of the ACM, vol. 52, no. 10, October 2009.
[2]
B. Catanzaro, S. Kamil, Y. Lee, K. Asanovic, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox. "SEJITS: Getting Productivity AND Performance With Selective Embedded JIT Specialization," First Workshop on Programmable Models for Emerging Architecture (PMEA) at the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT'09), Raleigh, North Carolina, September 2009.

Cited By

View all
  • (2012)Performance driven cooperation between kernel and auto-tuning multi-threaded interval b&b applicationsProceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I10.1007/978-3-642-31125-3_5(57-70)Online publication date: 18-Jun-2012

Index Terms

  1. Software knows best: portable parallelism requires standardized measurements of transparent hardware

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WOSP/SIPEW '10: Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
    January 2010
    294 pages
    ISBN:9781605585635
    DOI:10.1145/1712605

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 January 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. standardized

    Qualifiers

    • Keynote

    Conference

    WOSP/SIPEW'10

    Acceptance Rates

    Overall Acceptance Rate 149 of 241 submissions, 62%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2012)Performance driven cooperation between kernel and auto-tuning multi-threaded interval b&b applicationsProceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I10.1007/978-3-642-31125-3_5(57-70)Online publication date: 18-Jun-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media