Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

Kuchumov, Ruslan; Sokolov, Andrey; Korkhov, Vladimir

doi:10.1007/978-3-319-95171-3_8

Ruslan Kuchumov²³,
Andrey Sokolov²⁴ &
Vladimir Korkhov²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10963))

Included in the following conference series:

International Conference on Computational Science and Its Applications

2186 Accesses
2 Citations

Abstract

Parallel tasks work-stealing schedulers yield near-optimal tasks distribution (i.e. all CPU cores are loaded equally) and have low time, memory and inter-thread synchronizations. The key idea of work-stealing strategy is that when scheduler worker runs out of tasks for execution, it start stealing tasks from the queues of other workers. It’s been shown that double ended queues based on circular arrays are effective in this scenario. They are designed with an assumption that tasks pointer are stored in these data structures, while tasks object reside in heap memory. By modifying tasks queues so that they can hold task objects instead pointers we managed to increase the performance above 2.5 times on CPU bound applications and decrease last-level cache misses 30% compared to Intel TBB and Intel/MIT Cilk work-stealing schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mitigating the NUMA effect on task-based runtime systems

Article 06 April 2023

CATS: cache-aware task scheduling for Hadoop-based systems

Article 24 May 2017

Distributed Job Allocation for Large-Scale Manycores

References

Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4) (1999). https://doi.org/10.1145/344588.344618
Article Google Scholar
Beaumont, O., Carter, L., Ferrante, J., Legrand, A., Marchal, L., Robert, Y.: Centralized versus distributed schedulers for multiple bag-of-task applications. In: 20th IEEE International Parallel & Distributed Processing Symposium (2006). https://doi.org/10.1109/IPDPS.2006.1639262
Hendler, D., Shavit, N.: Work dealing (extended abstract). In: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 2002, pp. 164–172 (2002). https://doi.org/10.1145/564870.564900
Acar, U.A., Chargueraud, A., Rainey, M.: Scheduling parallel programs by work stealing with private deques. In: PPoPP 2013, pp. 219–228. ACM, New York (2013). https://doi.org/10.1145/2442516.2442538
Article Google Scholar
Hendler, D., Shavit, N.: Non-blocking steal-half work queues. In: Proceedings of the Twenty-First Annual Symposium on Principles of Distributed Computing, pp. 280–289. https://doi.org/10.1145/571825.571876
Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. In: Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 119–129 (1998)
Google Scholar
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. In: Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 119–129 (1999)
Google Scholar
Reinders, J.: Intel Threading Building Blocks. O’Reilly & Associates Inc., Sebastopol (2007)
Google Scholar
Duffy, J.: Concurrent Programming on Windows. Addison-Wesley, Upper Saddle River (2008)
Google Scholar
Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The natural work-stealing algorithm is stable. In: Proceedings of 42nd IEEE Symposium on Foundations of Computer Science, pp. 1260–1279 (2001). https://doi.org/10.1137/S0097539701399551
Article MathSciNet Google Scholar
Mitzenmacher, M.: Analyses of load stealing models based on differential equations. In: SPAA 1998 Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures
Google Scholar
Aksenova, E.A., Sokolov, A.V.: Modeling of the memory management process for dynamic work-stealing schedulers. In: Ivannikov ISPRAS Open Conference (ISPRAS), Moscow, pp. 12–15 (2017). https://doi.org/10.1109/ISPRAS.2017.00009
Kuchumov, R.I.: Implementation and analysis of work-stealing task scheduler. Stochastic Optim. Comput. Sci. 12, 20–39 (2016)
Google Scholar
Peierls, T., Bloch, J., Bowbeer, J., Lea, D., Holmes, D.: Java Concurrency in Practice. Addison-Wesley Professional, Reading (2006)
Google Scholar
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, pp. 519–538 (2005). https://doi.org/10.1145/1094811.1094852
Article Google Scholar
Guo, Y.: A scalable locality-aware adaptive work-stealing scheduler for multi-core task parallelism. Rice University Houston, TX, USA (2010)
Google Scholar
Robison, A.: A primer on scheduling fork-join parallelism with work stealing (2014)
Google Scholar
Hendler, D., Lev, Y., Moir, M., Shavit, N.: A dynamic-sized nonblocking work stealing deque. Distrib. Comput. 18(3), 189–207 (2005)
Article Google Scholar
Chase, D., Lev Y.: Dynamic circular work-stealing deque. In: SPAA 2005 Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 21–28 (2005). https://doi.org/10.1145/1073970.1073974
Le, N.M., Pop, A., Cohen, A., Nardelli, F.Z.: Correct and efficient work-stealing for weak memory models. In: PPoPP 2013 Proceedings of the 18th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp. 69–80 (2013). https://doi.org/10.1145/2442516.2442524
Chen, Q., Guo, M., Guan, H.: LAWS: Locality-aware work-stealing for multi-socket multi-core architectures. In: ICS 2014 Proceedings of the 28th ACM International Conference on Supercomputing (2014). https://doi.org/10.1145/2597652.2597665
Chen, Q., Guo, M.: Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans. Comput. https://doi.org/10.1109/TC.2017.2783932
Wang, K., Zhou, X., Li, T., Zhao, D., Lang, M., Raicu, I.: Optimizing load balancing and data-locality with data-aware scheduling In.: 2014 IEEE International Conference on Big Data (Big Data). https://doi.org/10.1109/BigData.2014.7004220
Armbrust, M., Fox, A., Griffith, R., Joseph A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). https://doi.org/10.1145/1721654.1721672
Article Google Scholar

Download references

Acknowledgements

Research has been supported by the RFBR grants No. 18-01-00125-a and 16-07-01111.

Author information

Authors and Affiliations

Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034, Russia
Ruslan Kuchumov & Vladimir Korkhov
Institute of Applied Mathematical Research, Karelian Research Centre RAS, Petrozavodsk, Russia
Andrey Sokolov

Authors

Ruslan Kuchumov
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Sokolov
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Korkhov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Korkhov .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
Polytechnic University of Bari, Bari, Italy
Carmelo M. Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka shi, Fukuoka, Japan
Bernady O. Apduhan
Politecnico di Bari, Bari, Italy
Eufemia Tarantino
Myongji University, Yongin, Korea (Republic of)
Yeonseung Ryu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuchumov, R., Sokolov, A., Korkhov, V. (2018). Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2018. ICCSA 2018. Lecture Notes in Computer Science(), vol 10963. Springer, Cham. https://doi.org/10.1007/978-3-319-95171-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-95171-3_8
Published: 04 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95170-6
Online ISBN: 978-3-319-95171-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics