poster

Work-stealing with configurable scheduling strategies

Authors:

Daniel Cederman,

Jesper Larsson Träff,

Philippas TsigasAuthors Info & Claims

PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 315 - 316

https://doi.org/10.1145/2442516.2442562

Published: 23 February 2013 Publication History

Abstract

Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. They do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, task execution order is typically determined by an underlying task storage data structure, and cannot be changed. There are thus possibilities for optimizing task parallel executions by providing information on specific tasks and their preferred execution order to the scheduling system.

We investigate generalizations of work-stealing and introduce a framework enabling applications to dynamically provide hints on the nature of specific tasks using scheduling strategies. Strategies can be used to independently control both local task execution and steal order. Strategies allow optimizations on specific tasks, in contrast to more conventional scheduling policies that are typically global in scope. Strategies are composable and allow different, specific scheduling choices for different parts of an application simultaneously. We have implemented a work-stealing system based on our strategy framework. A series of benchmarks demonstrates beneficial effects that can be achieved with scheduling strategies.

References

[1]

U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3):321--347, 2002.

[2]

N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115--144, 2001.

[3]

P. Berenbrink, T. Friedetzky, and L. A. Goldberg. The natural work-stealing algorithm is stable. In In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS, pages 178--187, 2001.

Digital Library

[4]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 37(1):55--69, 1996.

Digital Library

[5]

R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720--748, 1999.

Digital Library

[6]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA, pages 519--538, New York, NY, USA, 2005. ACM.

Digital Library

[7]

J. Clausen and J. L. Traff. Implementation of parallel branch-and-bound algorithms -- experiences with the graph partitioning problem. Annals of Operations Research, 33:331--349, 1991.

[8]

R. Cole and V. Ramachandran. Resource oblivious sorting on multicores. In Automata, Languages and Programming, 37th International Colloquium (ICALP) Proceedings, Part I, volume 6198 of Lecture Notes in Computer Science, pages 226--237, 2010.

Digital Library

[9]

T. G. Crainic, B. L. Cun, and C. Roucairol. Parallel branch-and-bound algorithms. In E.-G. Talbi, editor, Parallel Combinatorial Optimization, pages 1--28. Wiley, 2006.

[10]

F. Evans, S. Skiena, and A. Varshney. Optimizing triangle strips for fast rendering. In Visualization'96. Proceedings., pages 319--326. IEEE, 1996.

Digital Library

[11]

K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In ACM/IEEE Supercomputing, page 83, 2006.

Digital Library

[12]

Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and help-first scheduling policies for async-finish task parallelism. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--12, may 2009.

Digital Library

[13]

Y. Guo, J. Zhao, V. Cavé, and V. Sarkar. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--12, 2010.

[14]

K. T. Herley, A. Pietracaprina, and G. Pucci. Fast deterministic parallel branch-and-bound. Parallel Processing Letters, 9(3):325--333, 1999.

[15]

M. Houston, J. Y. Park, M. Ren, T. J. Knight, K. Fatahalian, A. Aiken, W. J. Dally, and P. Hanrahan. A portable runtime interface for multi-level memory hierarchies. In 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), pages 143--152, 2008.

Digital Library

[16]

R. M. Karp and Y. Zhang. Randomized parallel algorithms for backtrack search and branch-and-bound computation. Journal of the ACM, 40(3):765--789, 1993.

Digital Library

[17]

A. Kukanov and M. J. Voss. The foundations for scalable multi-core software in Intel Threading Building Blocks. Intel Technology Journal, 11(4), 2007.

[18]

C. E. Leiserson. The Cilk

[19]

concurrency platform. The Journal of Supercomputing, 51(3):244--257, 2010.

Digital Library

[20]

A. Lenharth, D. Nguyen, and K. Pingali. Priority queues are not good concurrent priority schedulers. Technical Report TR-11--39, Department of Computer Science, The University of Texas at Austin, 2011.

[21]

S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C. Tseng. Uts: An unbalanced tree search benchmark. Languages and Compilers for Parallel Computing, pages 235--250, 2007.

[22]

C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982.

Digital Library

[23]

P. Sanders. Fast priority queues for parallel branch-and-bound. In Parallel Algorithms for Irregularly Structured Problems, Second International Workshop, (IRREGULAR), volume 980 of Lecture Notes in Computer Science, pages 379--393, 1995.

Digital Library

[24]

F. Song, A. YarKhan, and J. Dongarra. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 19:1--19:11, New York, NY, USA, 2009. ACM.

Digital Library

[25]

M. Squillante and E. Lazowska. Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Transactions on Parallel and Distributed Systems, 4(2):131--143, feb 1993.

Digital Library

[26]

B. Weissman. Performance counters and state sharing annotations: a unified approach to thread locality. In Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, ASPLOS-VIII, pages 127--138, New York, NY, USA, 1998. ACM.

Digital Library

Cited By

He ZHuang QLi ZWeng C(2020)Handling Data Skew for Aggregation in Spark SQL Using Task StealingInternational Journal of Parallel Programming10.1007/s10766-020-00657-zOnline publication date: 25-Mar-2020
https://doi.org/10.1007/s10766-020-00657-z
Muller SWestrick SAcar U(2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1145/3341685
Yessenov KKuraj ISolar-Lezama A(2017)DemoMatch: API discovery from demonstrationsACM SIGPLAN Notices10.1145/3140587.306238652:6(64-78)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3140587.3062386
Show More Cited By

Index Terms

Work-stealing with configurable scheduling strategies
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Work-stealing with configurable scheduling strategies
PPoPP '13

Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. They do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, task execution order is typically ...
Adaptive work-stealing with parallelism feedback

Multiprocessor scheduling in a shared multiprogramming environment can be structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler schedules the work of a job on its allotted ...
A work-stealing scheduler for X10's task parallelism with suspension
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

The X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

February 2013

332 pages

ISBN:9781450319225

DOI:10.1145/2442516

General Chairs:
Alex Nicolau
University of California, Irvine, USA
,
Xiaowei Shen
IBM Research, China
,
Program Chairs:
Saman Amarasinghe
Massachusetts Institute of Technology, USA
,
Richard Vuduc
Georgia Institute of Technology, USA

ACM SIGPLAN Notices Volume 48, Issue 8
PPoPP '13
August 2013
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2517327
Issue’s Table of Contents

Copyright © 2013 Authors.

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

PPoPP '13

Sponsor:

SIGPLAN

PPoPP '13: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 23 - 27, 2013

Shenzhen, China

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
412
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

He ZHuang QLi ZWeng C(2020)Handling Data Skew for Aggregation in Spark SQL Using Task StealingInternational Journal of Parallel Programming10.1007/s10766-020-00657-zOnline publication date: 25-Mar-2020
https://doi.org/10.1007/s10766-020-00657-z
Muller SWestrick SAcar U(2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1145/3341685
Yessenov KKuraj ISolar-Lezama A(2017)DemoMatch: API discovery from demonstrationsACM SIGPLAN Notices10.1145/3140587.306238652:6(64-78)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3140587.3062386
Muller SAcar UHarper R(2017)Responsive parallel computation: bridging competitive and cooperative threadingACM SIGPLAN Notices10.1145/3140587.306237052:6(677-692)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3140587.3062370
Muller SAcar UHarper RCohen AVechev M(2017)Responsive parallel computation: bridging competitive and cooperative threadingProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3062341.3062370(677-692)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3062341.3062370
Unat DDubey AHoefler TShalf JAbraham MBianco MChamberlain BCledat REdwards HFinkel HFuerlinger KHannig FJeannot EKamil AKeasler JKelly PLeung VLtaief HMaruyama NNewburn CPericas M(2017)Trends in Data Locality Abstractions for HPC SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.270314928:10(3007-3020)Online publication date: 1-Oct-2017
https://doi.org/10.1109/TPDS.2017.2703149
Tzilis SPericas MTrancoso PSourdis I(2017)SWAS: Stealing Work Using Approximate System-Load Information2017 46th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2017.51(309-318)Online publication date: Aug-2017
https://doi.org/10.1109/ICPPW.2017.51
Srinivasan VReps T(2015)Partial evaluation of machine codeACM SIGPLAN Notices10.1145/2858965.281432150:10(860-879)Online publication date: 23-Oct-2015
https://dl.acm.org/doi/10.1145/2858965.2814321
Ou PDemsky B(2015)AutoMO: automatic inference of memory order parameters for C/C++11ACM SIGPLAN Notices10.1145/2858965.281428650:10(221-240)Online publication date: 23-Oct-2015
https://dl.acm.org/doi/10.1145/2858965.2814286
Wimmer MVersaci FTräff JCederman DTsigas P(2014)Data structures for task-based priority schedulingACM SIGPLAN Notices10.1145/2692916.255527849:8(379-380)Online publication date: 6-Feb-2014
https://dl.acm.org/doi/10.1145/2692916.2555278
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten