skip to main content
10.1145/1989493.1989549acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

A highly-efficient wait-free universal construction

Published: 04 June 2011 Publication History

Abstract

We present a new simple wait-free universal construction, called Sim, that uses just a Fetch&Add and an LL/SC object and performs a constant number of shared memory accesses. We have implemented SIM in a real shared-memory machine. In theory terms, our practical version of SIM, called P-SIM, has worse complexity than its theoretical analog; in practice though, we experimentally show that P-SIM outperforms several state-of-the-art lock-based and lock-free techniques, and this given that it is wait-free, i.e., that it satisfies a stronger progress condition than all the algorithms it outperforms.
We have used P-SIM to get highly-efficient wait-free implementations of stacks and queues. Our experiments show that our implementations outperform the currently state-of-the-art shared stack and queue implementations which ensure only weaker progress properties than wait-freedom.

References

[1]
Yehuda Afek, Dalia Dauber, and Dan Touitou. Wait-free made fast. In Proceedings of the 27th ACM Symposium on Theory of Computing, pages 538--547, 1995.
[2]
Yehuda Afek, Gideon Stupp, and Dan Touitou. Long-lived adaptive collect with applications. In Proceedings of the 40th Symposium on Foundations of Computer Science, pages 262--272, 1999.
[3]
James H. Anderson and Mark Moir. Universal constructions for multi-object operations. In Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pages 184--193, 1995.
[4]
James H. Anderson and Mark Moir. Universal constructions for large objects. IEEE Transactions on Parallel and Distributed Systems, 10(12):1317--1332, dec 1999.
[5]
Hagit Attiya, Rachid Guerraoui, and Eric Ruppert. Partial snapshot objects. In Proceedings of the 20th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 336--343, 2008.
[6]
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117--128, 2000.
[7]
Phong Chuong, Faith Ellen, and Vijaya Ramachandran. A universal construction for wait-free transaction friendly data structures. In Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 335--344, 2010.
[8]
Pat Conway, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak, and Bill Hughes. Blade computing with the amd opteron processor (magny-cours). Hot chips 21, August 2009.
[9]
T. S. Craig. Building fifo and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, February 1993.
[10]
Panagiota Fatourou and Nikolaos D. Kallimanis. The RedBlue adaptive universal constractions. In Proceedings of the 23rd International Symposium on Distributed Computing, pages 127--141, 2009.
[11]
Panagiota Fatourou and Nikolaos D. Kallimanis. Fast implementations of shared objects using fetch&add. Technical Report TR 02-2010, Department of Computer Science, University of Ioannina, February 2010.
[12]
D. George S. Harvey W. Kleinfelder K. McAuliffe E. Melton V. Norton G. Pfister, W. Brantley and J. Weiss. The ibm research parallel processor prototype (rp3): Introduction and architecture. pages 764--771, 1985.
[13]
P. Heidelberger, A. Norton, and John T. Robinson. Parallel quicksort using fetch-and-add. IEEE Transactions on Computers., 39(1):133--138, 1990.
[14]
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. The code for flat combining. http://github.com/mit-carbon/flat-combining.
[15]
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 355--364, 2010.
[16]
Danny Hendler, Nir Shavit, and Lena Yerushalmi. A scalable lock-free stack algorithm. In Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures, pages 206--215, 2004.
[17]
Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13:124--149, jan 1991.
[18]
Maurice Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 15(5):745--770, nov 1993.
[19]
Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12:463--492, 1990.
[20]
Damien Imbs and Michel Raynal. Help when needed, but no more: Efficient read/write partial snapshot. In Proceedings of the 23rd International Symposium on Distributed Computing, pages 142--156. Springer, 2009.
[21]
Prasad Jayanti. A time complexity lower bound for randomized implementations of some shared objects. In Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pages 201--210, 1998.
[22]
Peter S. Magnusson, Anders Landin, and Erik Hagersten. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pages 165--171, 1994.
[23]
John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991.
[24]
Maged M. Michael and Michael L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pages 267--275, 1996.
[25]
Dimitrios S. Nikolopoulos and Theodore S. Papatheodorou. A quantitative architectural evaluation of synchronization algorithms and disciplines on ccnuma systems: the case of the sgi origin2000. In Proceedings of the 13th international conference on Supercomputing (ICS '99), pages 319--328, New York, NY, USA, 1999. ACM.
[26]
Ori Shalev and Nir Shavit. Predictive log-synchronization. In EuroSys, pages 305--315, 2006.
[27]
Nir Shavit and Asaph Zemach. Combining funnels: A dynamic approach to software combining. Journal of Parallel and Distributed Computing, 60(11):1355--1387, 2000.
[28]
Gadi Taubenfeld. Synchronization Algorithms and Concurrent Programming. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2006.
[29]
R. K. Treiber. Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, April 1986.
[30]
Pen-Chung Yew, Nian-Feng Tzeng, and D.H. Lawrie. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Transactions on Computers, C-36(4):388 --395, April 1987.

Cited By

View all
  • (2025)Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO QueueProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710892(382-395)Online publication date: 28-Feb-2025
  • (2024)Concurrent Data Structures Made EasyProceedings of the ACM on Programming Languages10.1145/36897758:OOPSLA2(1814-1842)Online publication date: 8-Oct-2024
  • (2024)History-Independent Concurrent ObjectsProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662814(14-24)Online publication date: 17-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
June 2011
404 pages
ISBN:9781450307437
DOI:10.1145/1989493
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. concurrent data structures
  2. queues
  3. stacks
  4. universal constructions
  5. wait free

Qualifiers

  • Research-article

Conference

SPAA '11

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)7
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO QueueProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710892(382-395)Online publication date: 28-Feb-2025
  • (2024)Concurrent Data Structures Made EasyProceedings of the ACM on Programming Languages10.1145/36897758:OOPSLA2(1814-1842)Online publication date: 8-Oct-2024
  • (2024)History-Independent Concurrent ObjectsProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662814(14-24)Online publication date: 17-Jun-2024
  • (2024)A Family of Fast and Memory Efficient Lock- and Wait-Free ReclamationProceedings of the ACM on Programming Languages10.1145/36588518:PLDI(2174-2198)Online publication date: 20-Jun-2024
  • (2024)HUILLY: A Non-Blocking Ingestion Buffer for Timestepped Simulation Analytics2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00022(113-118)Online publication date: 6-May-2024
  • (2024)Highly-Efficient Persistent FIFO QueuesStructural Information and Communication Complexity10.1007/978-3-031-60603-8_14(238-261)Online publication date: 27-May-2024
  • (2023)Memento: A Framework for Detectable Recoverability in Persistent MemoryProceedings of the ACM on Programming Languages10.1145/35912327:PLDI(292-317)Online publication date: 6-Jun-2023
  • (2023)Wait-Free Updates and Range Search Using UruvStabilization, Safety, and Security of Distributed Systems10.1007/978-3-031-44274-2_33(435-450)Online publication date: 30-Sep-2023
  • (2023)Compiler‐driven approach for automating nonblocking synchronization in concurrent data abstractionsConcurrency and Computation: Practice and Experience10.1002/cpe.793536:5Online publication date: 24-Oct-2023
  • (2022)BQ: A Lock-Free Queue with BatchingACM Transactions on Parallel Computing10.1145/35127579:1(1-49)Online publication date: 23-Mar-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media