Skip to main content

Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2018 (ICCSA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10963))

Included in the following conference series:

Abstract

Parallel tasks work-stealing schedulers yield near-optimal tasks distribution (i.e. all CPU cores are loaded equally) and have low time, memory and inter-thread synchronizations. The key idea of work-stealing strategy is that when scheduler worker runs out of tasks for execution, it start stealing tasks from the queues of other workers. It’s been shown that double ended queues based on circular arrays are effective in this scenario. They are designed with an assumption that tasks pointer are stored in these data structures, while tasks object reside in heap memory. By modifying tasks queues so that they can hold task objects instead pointers we managed to increase the performance above 2.5 times on CPU bound applications and decrease last-level cache misses 30% compared to Intel TBB and Intel/MIT Cilk work-stealing schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4) (1999). https://doi.org/10.1145/344588.344618

    Article  Google Scholar 

  2. Beaumont, O., Carter, L., Ferrante, J., Legrand, A., Marchal, L., Robert, Y.: Centralized versus distributed schedulers for multiple bag-of-task applications. In: 20th IEEE International Parallel & Distributed Processing Symposium (2006). https://doi.org/10.1109/IPDPS.2006.1639262

  3. Hendler, D., Shavit, N.: Work dealing (extended abstract). In: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 2002, pp. 164–172 (2002). https://doi.org/10.1145/564870.564900

  4. Acar, U.A., Chargueraud, A., Rainey, M.: Scheduling parallel programs by work stealing with private deques. In: PPoPP 2013, pp. 219–228. ACM, New York (2013). https://doi.org/10.1145/2442516.2442538

    Article  Google Scholar 

  5. Hendler, D., Shavit, N.: Non-blocking steal-half work queues. In: Proceedings of the Twenty-First Annual Symposium on Principles of Distributed Computing, pp. 280–289. https://doi.org/10.1145/571825.571876

  6. Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. In: Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 119–129 (1998)

    Google Scholar 

  7. Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. In: Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 119–129 (1999)

    Google Scholar 

  8. Reinders, J.: Intel Threading Building Blocks. O’Reilly & Associates Inc., Sebastopol (2007)

    Google Scholar 

  9. Duffy, J.: Concurrent Programming on Windows. Addison-Wesley, Upper Saddle River (2008)

    Google Scholar 

  10. Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The natural work-stealing algorithm is stable. In: Proceedings of 42nd IEEE Symposium on Foundations of Computer Science, pp. 1260–1279 (2001). https://doi.org/10.1137/S0097539701399551

    Article  MathSciNet  Google Scholar 

  11. Mitzenmacher, M.: Analyses of load stealing models based on differential equations. In: SPAA 1998 Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures

    Google Scholar 

  12. Aksenova, E.A., Sokolov, A.V.: Modeling of the memory management process for dynamic work-stealing schedulers. In: Ivannikov ISPRAS Open Conference (ISPRAS), Moscow, pp. 12–15 (2017). https://doi.org/10.1109/ISPRAS.2017.00009

  13. Kuchumov, R.I.: Implementation and analysis of work-stealing task scheduler. Stochastic Optim. Comput. Sci. 12, 20–39 (2016)

    Google Scholar 

  14. Peierls, T., Bloch, J., Bowbeer, J., Lea, D., Holmes, D.: Java Concurrency in Practice. Addison-Wesley Professional, Reading (2006)

    Google Scholar 

  15. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, pp. 519–538 (2005). https://doi.org/10.1145/1094811.1094852

    Article  Google Scholar 

  16. Guo, Y.: A scalable locality-aware adaptive work-stealing scheduler for multi-core task parallelism. Rice University Houston, TX, USA (2010)

    Google Scholar 

  17. Robison, A.: A primer on scheduling fork-join parallelism with work stealing (2014)

    Google Scholar 

  18. Hendler, D., Lev, Y., Moir, M., Shavit, N.: A dynamic-sized nonblocking work stealing deque. Distrib. Comput. 18(3), 189–207 (2005)

    Article  Google Scholar 

  19. Chase, D., Lev Y.: Dynamic circular work-stealing deque. In: SPAA 2005 Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 21–28 (2005). https://doi.org/10.1145/1073970.1073974

  20. Le, N.M., Pop, A., Cohen, A., Nardelli, F.Z.: Correct and efficient work-stealing for weak memory models. In: PPoPP 2013 Proceedings of the 18th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp. 69–80 (2013). https://doi.org/10.1145/2442516.2442524

  21. Chen, Q., Guo, M., Guan, H.: LAWS: Locality-aware work-stealing for multi-socket multi-core architectures. In: ICS 2014 Proceedings of the 28th ACM International Conference on Supercomputing (2014). https://doi.org/10.1145/2597652.2597665

  22. Chen, Q., Guo, M.: Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans. Comput. https://doi.org/10.1109/TC.2017.2783932

  23. Wang, K., Zhou, X., Li, T., Zhao, D., Lang, M., Raicu, I.: Optimizing load balancing and data-locality with data-aware scheduling In.: 2014 IEEE International Conference on Big Data (Big Data). https://doi.org/10.1109/BigData.2014.7004220

  24. Armbrust, M., Fox, A., Griffith, R., Joseph A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). https://doi.org/10.1145/1721654.1721672

    Article  Google Scholar 

Download references

Acknowledgements

Research has been supported by the RFBR grants No. 18-01-00125-a and 16-07-01111.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Korkhov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kuchumov, R., Sokolov, A., Korkhov, V. (2018). Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2018. ICCSA 2018. Lecture Notes in Computer Science(), vol 10963. Springer, Cham. https://doi.org/10.1007/978-3-319-95171-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95171-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95170-6

  • Online ISBN: 978-3-319-95171-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics