ABSTRACT
In sharded data processing systems, sharded in-memory key-value stores, data flow programming and load sharing, multiple concurrent data producers feed requests into the same data consumer. This can be naturally realized through concurrent queues, where each consumer pulls its tasks from its dedicated queue. For scalability, wait-free queues are preferred over lock based structures.
The vast majority of wait-free queue implementations, and even lock-free ones, support the multi-producer multi-consumer model. Yet, this comes at a premium, since implementing wait-free multi-producer multi-consumer queues requires utilizing complex helper data structures. The latter increases the memory consumption of such queues and limits their performance and scalability. Many such designs employ (hardware) cache unfriendly access patterns.
In this work we study the implementation of wait-free multi-producer single-consumer queues. Specifically, we propose Jiffy, an efficient memory frugal novel wait-free multi-producer single-consumer queue and formally prove its correctness. We compare the performance and memory requirements of Jiffy with other state of the art lock-free and wait-free queues. We show that indeed Jiffy can maintain good performance with up to 128 threads, delivers up to 50% better throughput than the next best construction we compared against, and consumes ≈ 90% less memory.
- Dolev Adas. 2020. Jiffy’s C++ Implementation. (2020). https://github.com/DolevAdas/Jiffy.Google Scholar
- Dolev Adas and Roy Friedman. 2020. A Fast Wait-Free Multi-Producers Single-Consumer Queue. CoRR abs/2010.14189(2020).Google Scholar
- Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The Spraylist: A Scalable Relaxed Priority Queue. In Proc. of the ACM PPoPP. 11–20.Google ScholarDigital Library
- James Aspnes and Maurice Herlihy. 1990. Wait-Free Data Structures in the Asynchronous PRAM Model. In Proc. of ACM SPAA. 340–349.Google ScholarDigital Library
- Keren Censor-Hillel, Erez Petrank, and Shahar Timnat. 2015. Help!. In Proc. of ACM PODC. 241–250.Google ScholarDigital Library
- Matei David. 2004. A Single-Enqueuer Wait-Free Queue Implementation. In International Symposium on Distributed Computing. Springer, 132–143.Google ScholarCross Ref
- Tudor David, Aleksandar Dragojevic, Rachid Guerraoui, and Igor Zablotchi. 2018. Log-Free Concurrent Data Structures. In USENIX ATC. 373–386.Google Scholar
- Panagiota Fatourou and Nikolaos D Kallimanis. 2012. Revisiting the Combining Synchronization Technique. In Proc. of the ACM PPoPP. 257–266.Google ScholarDigital Library
- John Giacomoni, Tipp Moseley, and Manish Vachharajani. 2008. FastForward for Efficient Pipeline Parallelism: a Cache-Optimized Concurrent Lock-Free Queue. In Proc. of the ACM PPoPP. 43–52.Google ScholarDigital Library
- Mohammad Hedayati, Kai Shen, Michael L. Scott, and Mike Marty. 2019. Multi-Queue Fair Queuing. In USENIX Annual Technical Conference (ATC). 301–314.Google Scholar
- Maurice Herlihy. 1991. Wait-Free Synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS) 13, 1(1991), 124–149.Google ScholarDigital Library
- Maurice Herlihy and Nir Shavit. 2011. The Art of Multiprocessor Programming. Morgan Kaufmann.Google Scholar
- Maurice P Herlihy and Jeannette M Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 3(1990), 463–492.Google ScholarDigital Library
- Prasad Jayanti and Srdjan Petrovic. 2005. Logarithmic-Time Single Deleter, Multiple Inserter Wait-Free Queues and Stacks. In International Conference on Foundations of Software Technology and Theoretical Computer Science. Springer, 408–419.Google Scholar
- Alex Kogan and Erez Petrank. 2011. Wait-Free Queues with Multiple Enqueuers and Dequeuers. In Proc. of the ACM PPoPP. 223–234.Google ScholarDigital Library
- Alex Kogan and Erez Petrank. 2012. A Methodology for Creating Fast Wait-Free Data Structures. In Proc. of the ACM PPoPP. 141–150.Google ScholarDigital Library
- Edya Ladan-Mozes and Nir Shavit. 2004. An Optimistic Approach to Lock-Free FIFO Queues. In Proc. of DISC. 117–131.Google ScholarCross Ref
- Nhat Minh Lê, Adrien Guatto, Albert Cohen, and Antoniu Pop. 2013. Correct and Efficient Bounded FIFO Queues. In 25th International Symposium on Computer Architecture and High Performance Computing. IEEE, 144–151.Google Scholar
- Xue Liu and Wenbo He. 2007. Active Queue Management Design Using Discrete-Event Control. In 46th IEEE Conference on Decision and Control. 3806–3811.Google Scholar
- Ben Manes. 2017. Caffeine: A High Performance Caching Library for Java 8. (2017). https://github.com/ben-manes/caffeine.Google Scholar
- Maged M. Michael and Michael L. Scott. 1996. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. In ACM PODC. 267–275.Google Scholar
- Gal Milman, Alex Kogan, Yossi Lev, Victor Luchangco, and Erez Petrank. 2018. BQ: A Lock-Free Queue with Batching. In Proceedings of the ACM SPAA. 99–109.Google ScholarDigital Library
- Adam Morrison and Yehuda Afek. 2013. Fast Concurrent Queues for x86 Processors. In Proc. of the ACM PPoPP. 103–112.Google ScholarDigital Library
- Nicholas Nethercote and Julian Seward. 2003. Valgrind: A Program Supervision Framework. Electronic notes in theoretical computer science 89, 2 (2003), 44–66.Google ScholarCross Ref
- William N. Scherer, Doug Lea, and Michael L. Scott. 2006. Scalable Synchronous Queues. In Proceedings of the ACM PPOPP. 147–156.Google ScholarDigital Library
- Michael L. Scott. 2002. Non-Blocking Timeout in Scalable Queue-Based Spin Locks. In Proceedings of ACM PODC. 31–40.Google ScholarDigital Library
- Michael L. Scott and William N. Scherer. 2001. Scalable Queue-Based Spin Locks with Timeout. In Proceedings of the ACM PPOPP. 44–52.Google Scholar
- Niloufar Shafiei. 2009. Non-Blocking Array-Based Algorithms for Stacks and Queues. In Proceedings of ICDCN. Springer, 55–66.Google Scholar
- Nir Shavit and Asaph Zemach. 1999. Scalable Concurrent Priority Queue Algorithms. In Proceedings of the ACM PODC. 113–122.Google ScholarDigital Library
- Foteini Strati, Christina Giannoula, Dimitrios Siakavaras, Georgios I. Goumas, and Nectarios Koziris. 2019. An Adaptive Concurrent Priority Queue for NUMA Architectures. In Proceedings of the 16th ACM International Conference on Computing Frontiers (CF). 135–144.Google ScholarDigital Library
- Philippas Tsigas and Yi Zhang. 2001. A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for Shared Memory Multiprocessor Systems. In ACM SPAA. 134–143.Google Scholar
- Hans Vandierendonck, Kallia Chronaki, and Dimitrios S Nikolopoulos. 2013. Deterministic Scale-Free Pipeline Parallelism with Hyperqueues. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 32.Google ScholarDigital Library
- Chaoran Yang and John Mellor-Crummey. 2016. A Wait-Free Queue as Fast as Fetch-and-Add. Proc. of the ACM PPoPP 51, 8 (2016), 16.Google Scholar
Index Terms
- A Fast Wait-Free Multi-Producers Single-Consumer Queue
Recommendations
A scalable multi-producer multi-consumer wait-free ring buffer
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied ComputingA ring buffer or cyclical queue is a First In, First Out (FIFO) queue that stores elements on a fixed-length array. This allows for efficient O(1) operations, cache-aware optimizations, and low memory overhead. Because ring buffers are limited to only ...
A wait-free queue as fast as fetch-and-add
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingConcurrent data structures that have fast and predictable performance are of critical importance for harnessing the power of multicore processors, which are now ubiquitous. Although wait-free objects, whose operations complete in a bounded number of ...
A wait-free multi-producer multi-consumer ring buffer
The ring buffer is a staple data structure used in many algorithms and applications. It is highly desirable in high-demand use cases such as multimedia, network routing, and trading systems. This work presents a new ring buffer design that is, to the ...
Comments