skip to main content
10.1145/1806596.1806639acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Composing parallel software efficiently with lithe

Published: 05 June 2010 Publication History

Abstract

Applications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. This paper presents the design and implementation of Lithe, a low-level substrate that provides the basic primitives and a standard interface for composing parallel codes efficiently. Lithe can be inserted underneath the runtimes of legacy parallel libraries to provide bolt-on composability without needing to change existing application code. Lithe can also serve as the foundation for building new parallel abstractions and libraries that automatically interoperate with one another.
In this paper, we show versions of Threading Building Blocks (TBB) and OpenMP perform competitively with their original implementations when ported to Lithe. Furthermore, for two applications composed of multiple parallel libraries, we show that leveraging our substrate outperforms their original, even expertly tuned, implementations.

References

[1]
Atul Adya et al. Cooperative task management without manual stack management. In USENIX, 2002.
[2]
Thomas Anderson et al. Scheduler activations: Effective kernel support for the user-level management of parallelism. In SOSP, 1991.
[3]
Animoto. http://www.animoto.com.
[4]
Robert Blumofe et al. Cilk: An efficient multithreaded runtime system. In PPOPP, 1995.
[5]
Rohit Chandra et al. Parallel Programming in OpenMP. Morgan Kaufmann, 2001.
[6]
Jike Chong et al. Scalable hmm based inference engine in large vocabulary continuous speech recognition. In ICME, 2009.
[7]
Timothy Davis. Multifrontal multithreaded rank-revealing sparse QR factorization. Transactions on Mathematical Software, Submitted.
[8]
K. Dussa et al. Dynamic partitioning in a Transputer environment. In SIGMETRICS, 1990.
[9]
EVE Online. http://www.eveonline.com.
[10]
Kathleen Fisher and John Reppy. Compiler support for lightweight concurrency. Technical report, Bell Labs, 2002.
[11]
Flickr. http://www.flickr.com.
[12]
Matthew Fluet et al. A scheduling framework for general-purpose parallel languages. In ICFP, 2008.
[13]
Bryan Ford and Sai Susarla. CPU inheritance scheduling. In OSDI, 1996.
[14]
Seth Copen Goldstein et al. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 1996.
[15]
Google Voice. http://voice.google.com.
[16]
GraphicsMagick. http://www.graphicsmagick.org.
[17]
Benjamin Hindman. Libprocess. http://www.eecs.berkeley.edu/ benh/libprocess.
[18]
Parry Husbands and Katherine Yelick. Multithreading and one-sided communication in parallel lu factorization. In Supercomputing, 2007.
[19]
Intel. Math Kernel Library for the Linux Operating System: User's Guide. 2007.
[20]
Ravi Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004.
[21]
Haoqiang Ji et al. The OpenMP implementation of NAS parallel benchmarks and its performance. Technical report, NASA Research Center, 1999.
[22]
Laxmikant V. Kale, Joshua Yelon, and Timothy Knauff. Threads for interoperable parallel programming. Languages and Compilers for Parallel Computing, 1996.
[23]
Jakub Kurzak et al. Scheduling linear algebra operations on multicore processors. Technical report, LAPACK, 2009.
[24]
C. L. Lawson et al. Basic linear algebra subprograms for FORTRAN usage. Transactions on Mathematical Software, 1979.
[25]
Jae Lee et al. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In ISCA, 2008.
[26]
Peng Li et al. Lightweight concurrency primitives. In Haskell, 2007.
[27]
Rose Liu et al. Tessellation: Space-time partitioning in a manycore client OS. In HotPar, 2009.
[28]
Brian Marsh et al. First-class user-level threads. OS Review, 1991.
[29]
Cathy McCann et al.A dynamic processor allocation policy for multiprogrammed shared--memory multiprocessors. Transactions on Computer Systems, 1993.
[30]
Ana Lucia De Moura and Robert Ierusalimschy. Revisiting coroutines. Transactions on Programming Languages and Systems, 2009.
[31]
Rajesh Nishtala and Kathy Yelick. Optimizing collective communication on multicores. In HotPar, 2009.
[32]
Simon Peter et al. 30 seconds is not enough! a study of operating system timer usage. In Eurosys, 2008.
[33]
John Regehr. Using Hierarchical Scheduling to Support Soft Real-Time Applications in General-Purpose Operating Systems. PhD thesis, University of Virginia, 2001.
[34]
James Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly, 2007.

Cited By

View all
  • (2022)Plor: General Transactions with Predictable, Low Tail LatencyProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517879(19-33)Online publication date: 10-Jun-2022
  • (2021)Paths to OpenMP in the kernelProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476183(1-17)Online publication date: 14-Nov-2021
  • (2021)Task-graph scheduling extensions for efficient synchronization and communicationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3461616(88-101)Online publication date: 3-Jun-2021
  • Show More Cited By

Index Terms

  1. Composing parallel software efficiently with lithe

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2010
    514 pages
    ISBN:9781450300193
    DOI:10.1145/1806596
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 45, Issue 6
      PLDI '10
      June 2010
      496 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1809028
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. composability
    2. cooperative scheduling
    3. hierarchical scheduling
    4. oversubscription
    5. parallelism
    6. resource management
    7. user-level scheduling

    Qualifiers

    • Research-article

    Conference

    PLDI '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 406 of 2,067 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Plor: General Transactions with Predictable, Low Tail LatencyProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517879(19-33)Online publication date: 10-Jun-2022
    • (2021)Paths to OpenMP in the kernelProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476183(1-17)Online publication date: 14-Nov-2021
    • (2021)Task-graph scheduling extensions for efficient synchronization and communicationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3461616(88-101)Online publication date: 3-Jun-2021
    • (2020)Compiler-based timing for extremely fine-grain preemptive parallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433771(1-15)Online publication date: 9-Nov-2020
    • (2020)Compiler-Based Timing For Extremely Fine-Grain Preemptive ParallelismSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00057(1-15)Online publication date: Nov-2020
    • (2020)Application-Driven Requirements for Node Resource Management in Next-Generation Systems2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)10.1109/ROSS51935.2020.00006(1-11)Online publication date: Nov-2020
    • (2019)ShenangoProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323265(361-377)Online publication date: 26-Feb-2019
    • (2019)BOLT: Optimizing OpenMP Parallel Regions with User-Level ThreadsProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00011(29-42)Online publication date: 23-Sep-2019
    • (2019)HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00039(268-281)Online publication date: Mar-2019
    • (2018)Argobots: A Lightweight Low-Level Threading and Tasking FrameworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.276606229:3(512-526)Online publication date: 1-Mar-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media