skip to main content
10.1145/2312005.2312035acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools

Published:25 June 2012Publication History

ABSTRACT

We present a highly-scalable non-blocking producer-consumer task pool, designed with a special emphasis on lightweight synchronization and data locality. The core building block of our pool is SALSA, Scalable And Low Synchronization Algorithm for a single-consumer container with task stealing support. Each consumer operates on its own SALSA container, stealing tasks from other containers if necessary. We implement an elegant self-tuning policy for task insertion, which does not push tasks to overloaded SALSA containers, thus decreasing the likelihood of stealing.

SALSA manages large chunks of tasks, which improves locality and facilitates stealing. SALSA uses a novel approach for coordination among consumers, without strong atomic operations or memory barriers in the fast path. It invokes only two CAS operations during a chunk steal.

Our evaluation demonstrates that a pool built using SALSA containers scales linearly with the number of threads and significantly outperforms other FIFO and non-FIFO alternatives.

References

  1. http://home.comcast.net/~pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt.Google ScholarGoogle Scholar
  2. www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919.Google ScholarGoogle Scholar
  3. Y. Afek, G. Korland, M. Natanzon, and N. Shavit. Scalable producer-consumer pools based on elimination-diffraction trees. In Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II}, Euro-Par'10, pages 151--162, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Afek, G. Korland, and E. Yanovsky. Quasi-linearizability: Relaxed consistency for improved concurrency. In Principles of Distributed Systems, Lecture Notes in Computer Science, pages 395--410. Google ScholarGoogle Scholar
  5. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures}, SPAA '98, pages 119--129, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Basin. Cafe: Scalable task pools with adjustable fairness and contention. Master's thesis, Technion, 2011.Google ScholarGoogle Scholar
  7. D. Basin, R. Fan, I. Keidar, O. Kiselov, and D. Perelman. Cafe: scalable task pools with adjustable fairness and contention. In Proceedings of the 25th international conference on Distributed computing}, DISC'11, pages 475--488, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. A case for numa-aware contention management on multicore systems. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720--748, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Braginsky and E. Petrank. Locality-conscious lock-free linked lists. In Proceedings of the 12th international conference on Distributed computing and networking, ICDCN'11, pages 107--118, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Dice, V. J. Marathe, and N. Shavit. Flat-combining numa locks. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gidenstam, H. Sundell, and P. Tsigas. Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency. In Proceedings of the 14th international conference on Principles of distributed systems}, OPODIS'10, pages 302--317, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Gidron, I. Keidar, D. Perelman, and Y. Perez. SALSA: Scalable and Low Synchronization NUMA-aware Algorithm for Producer-Consumer Pools. Technical report, Technion, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Hendler, Y. Lev, M. Moir, and N. Shavit. A dynamic-sized nonblocking work stealing deque. Distrib. Comput., 18:189--207, February 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Hendler and N. Shavit. Non-blocking steal-half work queues. In Proceedings of the twenty-first annual symposium on Principles of distributed computing, PODC '02, pages 280--289, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Hendler, N. Shavit, and L. Yerushalmi. A scalable lock-free stack algorithm. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '04, pages 206--215, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Hoffman, O. Shalev, and N. Shavit. The baskets queue. In Proceedings of the 11th international conference on Principles of distributed systems, OPODIS'07, pages 401--414, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Ladan-Mozes, I.-T. A. Lee, and D. Vyukov. Location-based memory fences. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA '11, pages 75--84, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Lamport. How to make a multiprocessor computer that correctly execute multiprocess programs. IEEE Trans. Comput., pages 690--691, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. M. Michael. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst., 15:491--504, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, PODC '96, pages 267--275, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Moir, D. Nussbaum, O. Shalev, and N. Shavit. Using elimination to implement scalable and lock-free fifo queues. In Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '05, pages 253--262, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-tso: a rigorous and usable programmer's model for x86 multiprocessors. Commun. ACM, pages 89--97, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Sundell, A. Gidenstam, M. Papatriantafilou, and P. Tsigas. A lock-free algorithm for concurrent bags. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA '11, pages 335--344, 2011.. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
        June 2012
        348 pages
        ISBN:9781450312134
        DOI:10.1145/2312005

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 June 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate447of1,461submissions,31%

        Upcoming Conference

        SPAA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader