Limits of Work-Stealing Scheduling

Vrba, Željko; Espeland, Håvard; Halvorsen, Pål; Griwodz, Carsten

doi:10.1007/978-3-642-04633-9_15

Željko Vrba^18,19,
Håvard Espeland^18,19,
Pål Halvorsen^18,19 &
…
Carsten Griwodz^18,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5798))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

563 Accesses
7 Citations

Abstract

The number of applications with many parallel cooperating processes is steadily increasing, and developing efficient runtimes for their execution is an important task. Several frameworks have been developed, such as MapReduce and Dryad, but developing scheduling mechanisms that take into account processing and communication requirements is hard. In this paper, we explore the limits of work stealing scheduler, which has empirically been shown to perform well, and evaluate load-balancing based on graph partitioning as an orthogonal approach. All the algorithms are implemented in our Nornir runtime system, and our experiments on a multi-core workstation machine show that the main cause of performance degradation of work stealing is when very little processing time, which we quantify exactly, is performed per message. This is the type of workload in which graph partitioning has the potential to achieve better performance than work-stealing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Combination of Intra- and Inter-place Work Stealing for the APGAS Library

Distributed Work Stealing in a Task-Based Dataflow Runtime

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

References

Lee, E.A.: The problem with threads. Computer 39(5), 33–42 (2006)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of Symposium on Opearting Systems Design & Implementation (OSDI), Berkeley, CA, USA, p. 10. USENIX Association (2004)
Google Scholar
Valvag, S.V., Johansen, D.: Oivos: Simple and efficient distributed data processing. In: 10th IEEE International Conference on High Performance Computing and Communications, 2008. HPCC 2008, September 2008, pp. 113–122 (2008)
Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems, pp. 59–72. ACM, New York (2007)
Google Scholar
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, DC, USA, pp. 13–24. IEEE Computer Society, Los Alamitos (2007)
Google Scholar
de Kruijf, M., Sankaralingam, K.: MapReduce for the Cell BE Architecture. University of Wisconsin Computer Sciences Technical Report CS-TR-2007 1625 (2007)
Google Scholar
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a mapreduce framework on graphics processors. In: PACT 2008: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp. 260–269. ACM, New York (2008)
Chapter Google Scholar
Vrba, Ž., Halvorsen, P., Griwodz, C.: Evaluating the run-time performance of kahn process network implementation techniques on shared-memory multiprocessors. In: Proceedings of the International Workshop on Multi-Core Computing Systems, MuCoCoS (2009)
Google Scholar
Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. In: Proceedings of ACM symposium on Parallel algorithms and architectures (SPAA), pp. 119–129. ACM, New York (1998)
Chapter Google Scholar
Catalyurek, U., Boman, E., Devine, K., Bozdag, D., Heaphy, R., Riesen, L.: Hypergraph-based dynamic load balancing for adaptive scientific computations. In: Proc. of 21st International Parallel and Distributed Processing Symposium (IPDPS 2007). IEEE, Los Alamitos (2007); Best Algorithms Paper Award
Google Scholar
Kahn, G.: The semantics of a simple language for parallel programming. Information Processing 74 (1974)
Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. Technical report, Cambridge, MA, USA (1996)
Google Scholar
Blumofe, R.D., Papadopoulos, D.: The performance of work stealing in multiprogrammed environments (extended abstract). SIGMETRICS Perform. Eval. Rev. 26(1), 266–267 (1998)
Article Google Scholar
Saha, B., Adl-Tabatabai, A.R., Ghuloum, A., Rajagopalan, M., Hudson, R.L., Petersen, L., Menon, V., Murphy, B., Shpeisman, T., Sprangle, E., Rohillah, A., Carmean, D., Fang, J.: Enabling scalability and performance in a large scale cmp environment. SIGOPS Oper. Syst. Rev. 41(3), 73–86 (2007)
Article Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN ’98 Conference on Programming Language Design and Implementation, Montreal, Quebec, Canada, June 1998, pp. 212–223 (1998); Proceedings published ACM SIGPLAN Notices, Vol. 33(5) (May 1998)
Google Scholar
Catalyurek, U.V., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10(7), 673–693 (1999)
Article Google Scholar
Richardson, I.E.G.: H.264/mpeg-4 part 10 white paper, http://www.vcodex.com/files/h264_overview_orig.pdf
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics (1947)
Google Scholar
Chevalier, C., Pellegrini, F.: Pt-scotch: A tool for efficient parallel graph ordering. Parallel Comput. 34(6-8), 318–331 (2008)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Simula Research Laboratory, Oslo,
Željko Vrba, Håvard Espeland, Pål Halvorsen & Carsten Griwodz
Department of Informatics, University of Oslo,
Željko Vrba, Håvard Espeland, Pål Halvorsen & Carsten Griwodz

Authors

Željko Vrba
View author publications
You can also search for this author in PubMed Google Scholar
Håvard Espeland
View author publications
You can also search for this author in PubMed Google Scholar
Pål Halvorsen
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Griwodz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft, 475 Brannan St., 94107, San Francisco, CA, USA
Eitan Frachtenberg
Robotics Research Institute, Section Information Technology, TU Dortmund University, Otto-Hahn-Str. 8, 44227, Dortmund, Germany
Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vrba, Ž., Espeland, H., Halvorsen, P., Griwodz, C. (2009). Limits of Work-Stealing Scheduling. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2009. Lecture Notes in Computer Science, vol 5798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04633-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-04633-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04632-2
Online ISBN: 978-3-642-04633-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics