Skip to main content
Log in

A dynamic-sized nonblocking work stealing deque

  • Special Issue Disc 04
  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (hencheforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both industry and academia. This highly efficient scheme is based on a collection of array-based double-ended queues (deques) with low cost synchronization among local and stealing processes. Unfortunately, the algorithm's synchronization protocol is strongly based on the use of fixed size arrays, which are prone to overflows, especially in the multiprogrammed environments for which they are designed. This is a significant drawback since, apart from memory inefficiency, it means that the size of the deque must be tailored to accommodate the effects of the hard-to-predict level of multiprogramming, and the implementation must include an expensive and application-specific overflow mechanism.

This paper presents the first dynamic memory work-stealing algorithm. It is based on a novel way of building non-blocking dynamic-sized work stealing deques by detecting synchronization conflicts based on “pointer-crossing” rather than “gaps between indexes” as in the original ABP algorithm. As we show, the new algorithm dramatically increases robustness and memory efficiency, while causing applications no observable performance penalty. We therefore believe it can replace array-based ABP work stealing deques, eliminating the need for application-specific overflow mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Lev, Y.: A Dynamic-Sized Nonblocking Work Stealing Deque. MS thesis, Tel-Aviv University, Tel-Aviv, Israel (2004)

  2. Rudolph, L., Slivkin-Allalouf, M., Upfal, E.: A simple load balancing scheme for task allocation in parallel machines. In Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 237–245. ACM Press (1991)

  3. Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems 34, 115–144 (2001)

    Article  MathSciNet  Google Scholar 

  4. Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 1–12 (2000)

  5. Flood, C., Detlefs, D., Shavit, N., Zhang, C.: Parallel garbage collection for shared memory multiprocessors. In: Usenix Java Virtual Machine Research and Technology Symposium (JVM '01), Monterey, CA (2001)

  6. Leiserson, P.: Programming parallel applications in cilk. SINEWS: SIAM News 31 (1998)

  7. Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. Journal of the ACM 46, 720–748 (1999)

    Article  MathSciNet  Google Scholar 

  8. Knuth, D.: The Art of Computer Programming: Fundamental Algorithms. 2nd edn. Addison-Wesley (1968)

  9. Hendler, D., Shavit, N.: Non-blocking steal-half work queues. In: Proceedings of the 21st Annual ACM Symposium on Principles of Distributed Computing (2002)

  10. Detlefs, D., Flood, C., Heller, S., Printezis, T.: Garbage-first garbage collection. Technical report, Sun Microsystems – Sun Laboratories (2004) To appear.

  11. Agesen, O., Detlefs, D., Flood, C., Garthwaite, A., Martin, P., Moir, M., Shavit, N., Steele, G.: DCAS-based concurrent deques. Theory of Computing Systems 35, 349–386 (2002)

    Article  MathSciNet  Google Scholar 

  12. Martin, P., Moir, M., Steele, G.: Dcas-based concurrent deques supporting bulk allocation. Technical Report TR-2002-111, Sun Microsystems Laboratories (2002)

  13. Greenwald, M.B., Cheriton, D.R.: The synergy between non-blocking synchronization and operating system structure. In: 2nd Symposium on Operating Systems Design and Implementation, pp. 123–136. Seattle, WA (1996)

  14. Blumofe, R.D., Papadopoulos, D.: The performance of work stealing in multiprogrammed environments (extended abstract). In: Measurement and Modeling of Computer Systems, pp. 266–267 (1998)

  15. Arnold, J.M., Buell, D.A., Davis, E.G.: Splash 2. In: Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 316–322. ACM Press (1992)

  16. Papadopoulos, D.: Hood: A user-level thread library for multiprogrammed multiprocessors. In: Master's thesis, Department of Computer Sciences, University of Texas at Austin (1998)

  17. Prakash, S., Lee, Y., Johnson, T.: A non-blocking algorithm for shared queues using compare-and-swap. IEEE Transactions on Computers 43, 548–559 (1994)

    Article  Google Scholar 

  18. Scott, M.L.: Personal communication: Code for a lock-free memory management pool (2003)

  19. Hendler, D., Lev, Y., Moir, M., Shavit, N.: A dynamic-sized nonblocking work stealing deque. Technical Report TR-2005-144, Sun Microsystems Laboratories (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yossi Lev.

Additional information

This work was conducted while Yossi Lev was a student at Tel Aviv University, and is derived from his MS thesis [1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hendler, D., Lev, Y., Moir, M. et al. A dynamic-sized nonblocking work stealing deque. Distrib. Comput. 18, 189–207 (2006). https://doi.org/10.1007/s00446-005-0144-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-005-0144-5

Keywords

Navigation