skip to main content
research-article

Lease/Release: Architectural Support for Scaling Contended Data Structures

Published: 10 October 2017 Publication History

Abstract

High memory contention is generally agreed to be a worst-case scenario for concurrent data structures. There has been a significant amount of research effort spent investigating designs that minimize contention, and several programming techniques have been proposed to mitigate its effects. However, there are currently few architectural mechanisms to allow scaling contended data structures at high thread counts.
In this article, we investigate hardware support for scalable contended data structures. We propose Lease/Release, a simple addition to standard directory-based MESI cache coherence protocols, allowing participants to lease memory, at the granularity of cache lines, by delaying coherence messages for a short, bounded period of time. Our analysis shows that Lease/Release can significantly reduce the overheads of contention for both non-blocking (lock-free) and lock-based data structure implementations while ensuring that no deadlocks are introduced. We validate Lease/Release empirically on the Graphite multiprocessor simulator on a range of data structures, including queue, stack, and priority queue implementations, as well as on transactional applications. Results show that Lease/Release consistently improves both throughput and energy usage, by up to 5x, both for lock-free and lock-based data structure designs.

References

[1]
Yehuda Afek, Michael Hakimi, and Adam Morrison. 2013. Fast and scalable rendezvousing. Distributed Computing 26, 243--269
[2]
Masab Ahmad, Farrukh Hijaz, Qingchuan Shi, and Omer Khan. 2015. CRONO: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In Proceedings of the 2015 IEEE International Symposium on Workload Characterization (IISWC’15). IEEE, Los Alamitos, CA, 44--55.
[3]
Dan Alistarh, James Aspnes, Keren Censor-Hillel, Seth Gilbert, and Rachid Guerraoui. 2014. Tight bounds for asynchronous renaming. Journal of the ACM 61, 3, 18.
[4]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The SprayList: A scalable relaxed priority queue. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 11--20.
[5]
Byn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, Nima Honarmand, Sarita V. Adve, Vikram S. Adve, Nicholas P. Carter, and Ching-Tsun Chou. 2011. DeNovo: Rethinking the memory hierarchy for disciplined parallelism. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE, Los Alamitos, CA, 155--166.
[6]
Travis Craig. 1994. Building FIFO and Priority-Queuing Spin Locks From Atomic Swap. Technical Report 93-02-02, University of Washington, Seattle.
[7]
Tyler Crain, Vincent Gramoli, and Michel Raynal. 2012. A speculation-friendly binary search tree. ACM SIGPLAN Notices 47, 8, 161--170.
[8]
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, New York, NY, 33--48.
[9]
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2015. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 631--644.
[10]
David Dice, Danny Hendler, and Ilya Mirsky. 2013. Lightweight contention management for efficient compare-and-swap operations. In Euro-Par 2013 Parallel Processing. Springer, 595--606.
[11]
David Dice, Virendra J. Marathe, and Nir Shavit. 2015. Lock cohorting: A general technique for designing NUMA locks. ACM Transactions on Parallel Computing 1, 2 (February 2015), Article 13, 42 pages.
[12]
Dave Dice, Ori Shalev, and Nir Shavit. 2006. Transactional locking II. In Distributed Computing. Springer, 194--208.
[13]
Faith Ellen, Panagiota Fatourou, Eric Ruppert, and Franck van Breugel. 2010. Non-blocking binary search trees. In Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC’10). ACM, New York, NY, 131--140.
[14]
Faith Ellen, Danny Hendler, and Nir Shavit. 2012. On the inherent sequentiality of concurrent objects. SIAM Journal on Computing 41, 3, 519--536.
[15]
Panagiota Fatourou and Nikolaos D. Kallimanis. 2011. A highly-efficient wait-free universal construction. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 325--334.
[16]
Keir Fraser. 2004. Practical Lock-Freedom. Ph.D. Dissertation. Cambridge University Computer Laboratory, Cambridge, UK. Also available as Technical Report UCAM-CL-TR-579.
[17]
James R. Goodman, Mary K. Vernon, and Philip J. Woest. 1989. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. SIGARCH Computer Architecture News 17, 2, 64--75.
[18]
Timothy L. Harris. 2001. A pragmatic implementation of non-blocking linked-lists. In Proceedings of the 15th International Conference on Distributed Computing (DISC’01). Springer, 300--314. http://dl.acm.org/citation.cfm?id=645958.676105
[19]
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 355--364.
[20]
Thomas A. Henzinger, Christoph M. Kirsch, Hannes Payer, Ali Sezgin, and Ana Sokolova. 2013. Quantitative relaxation of concurrent data structures. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’13). ACM, New York, NY, 317--328.
[21]
Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann.
[22]
Alain Kägi, Doug Burger, and James R. Goodman. 1997. Efficient synchronization: Let them eat QOLB. SIGARCH Computer Architecture News 25, 2, 170--180.
[23]
Charles Leiserson. 2015. A simple deterministic algorithm for guaranteeing the forward progress of transactions. In Proceedings of the 10th ACM SIGPLAN Workshop on Transactional Computing (TRANSACT’15).
[24]
Itay Lotan and Nir Shavit. 2000. Skiplist-based concurrent priority queues. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS’00). IEEE, Los Alamitos, CA, 263--268.
[25]
Peter Magnusson, Anders Landin, and Erik Hagersten. 1994. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, Los Alamitos, CA, 165--171.
[26]
John M. Mellor-Crummey and Michael L. Scott. 1991. Synchronization without contention. ACM SIGPLAN Notices 26, 4, 269--278.
[27]
Maged M. Michael. 2002. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, NY, 73--82.
[28]
Maged M. Michael and Michael L. Scott. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC’96). ACM, New York, NY, 267--275.
[29]
Jason E. Miller, Harshad Kasture, George Kurian, Charles Gruenwald III, Nathan Beckmann, Christopher Celio, Jonathan Eastep, and Anant Agarwal. 2010. Graphite: A distributed parallel simulator for multicores. In Proceedings of the 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA’10). IEEE, Los Alamitos, CA, 1--12.
[30]
Adam Morrison and Yehuda Afek. 2013. Fast concurrent queues for x86 processors.ACM SIGPLAN Notices 48, 103--112.
[31]
Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, and Hisanobu Tomari. 2015. Quantitative comparison of hardware transactional memory for blue gene/Q, zEnterprise EC12, Intel core, and POWER8. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 144--157.
[32]
Aravind Natarajan and Neeraj Mittal. 2014. Fast concurrent lock-free binary search trees. ACM SIGPLAN Notices 49, 317--328.
[33]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at Facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI’13). 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala.
[34]
William Pugh. 1998. Concurrent Maintenance of Skip Lists. Technical Report. University of Maryland at College Park, College Park, MD.
[35]
Ravi Rajwar, Alain Kagi, and James R. Goodman. 2000. Improving the throughput of synchronization by insertion of delays. In Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA’00). IEEE, Los Alamitos, CA, 168--179.
[36]
Ravi Rajwar, Alain Kägi, and James R. Goodman. 2003. Inferential queueing and speculative push for reducing critical communication latencies. In Proceedings of the 17th Annual International Conference on Supercomputing (ICS’03). ACM, New York, NY, 273--284.
[37]
Hamza Rihani, Peter Sanders, and Roman Dementiev. 2015. Brief announcement. MultiQueues: Simple relaxed concurrent priority queues. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures (SPAA’15). ACM, New York, NY, 80--82.
[38]
Michael L. Scott. 2013. Shared-Memory Synchronization. Morgan 8 Claypool.
[39]
Ori Shalev and Nir Shavit. 2005. Transient Blocking Synchronization. Technical Report. Mountain View, CA.
[40]
Nir Shavit and Dan Touitou. 1995. Elimination trees and the construction of pools and stacks: Preliminary version. In Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, NY, 54--63.
[41]
Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan 8 Claypool.
[42]
R. K. Treiber. 1986. Systems Programming: Coping with Parallelism. Technical Report RJ 5118. IBM Almaden Research Center, San Jose, CA.
[43]
Xiangyao Yu and Srinivas Devadas. 2015. TARDIS: Timestamp based coherence algorithm for distributed shared memory. arXiv:1501.04504.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 4, Issue 2
Special Issue: Invited papers from PPoPP 2016, Part 2
June 2017
154 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3134419
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2017
Accepted: 01 August 2017
Revised: 01 July 2017
Received: 01 January 2017
Published in TOPC Volume 4, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Concurrent data structures
  2. hardware mechanisms
  3. lock-based data structures
  4. lock-free data structures

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Swiss National Fund Ambizione Fellowship

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 116
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media