research-article

A preliminary study of minimal-contention locks

Author:
Philip Machanick

Rhodes University

Rhodes University
View Profile

SAICSIT '18: Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information TechnologistsSeptember 2018Pages 269–278https://doi.org/10.1145/3278681.3278713

Published:26 September 2018Publication History

SAICSIT '18: Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists

Pages 269–278

ABSTRACT

As multicore CPUs become more common, scalable synchronization primitives have wider use and ideas previously used in large-scale computation are worth re-opening for wider use. In this paper I explore one approach to scalable synchronization, a minimal-contention lock (M-lock). The key idea is to avoid spinning on a global variable but instead for each blocked task (process or thread) to spin on a local lock representing the task that immediately preceded it in attempting to acquire the lock. This creates an ordering based on the order in which tasks attempt to acquire the lock, preventing starvation. The only globally shared variable is a pointer to the next local lock to be contended for. Each contending task swaps the value of this pointer for a pointer to its own variable. It spins on the variable previously pointed to by the global pointer. Each waiting task spins on a lock only seen by itself and the owner of that lock variable. While a task is spinning, the lock variable can be held in its local cache until invalidated by the lock owner when it unsets the lock. Consequently, the amount of bus traffic is considerably less than with a spinlock, which has the pernicious feature that the task releasing the lock is delayed by all the other bus traffic arising from contention for the lock. An MCS lock has similar properties but is more complicated and requires more memory contention-causing operations. This paper outlines the design of the M-lock and provides a preliminary performance analysis.

References

Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2010. An Analysis of Linux Scalability to Many Cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 1--16. http://dl.acm.org/citation.cfm?id=1924943.1924944 Google ScholarDigital Library
Silas Boyd-Wickizer, M Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2012. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium. 119--130.Google Scholar
David R. Cheriton, Hendrik A. Goosen, Hugh Holbrook, and Philip Machanick. 1993. Restructuring a Parallel Simulation to Improve Cache Behavior in a Shared-memory Multiprocessor: The Value of Distributed Synchronization. In Proceedings of the Seventh Workshop on Parallel and Distributed Simulation (PADS '93). ACM, New York, NY, USA, 159--162. Google ScholarDigital Library
Austin T Clements, M Frans Kaashoek, Nickolai Zeldovich, Robert T Morris, and Eddie Kohler. 2015. The scalable commutativity rule: Designing scalable software for multicore processors. ACM Transactions on Computer Systems 32, 4 (2015), 10. Google ScholarDigital Library
Travis Craig. 1993. Building FIFO and priority queuing spin locks from atomic swap. Technical Report. University of Washington, Seattle. ftp://trout.cs.washington. edu/tr/1993/02/UW-CSE-93-02-02.pdfGoogle Scholar
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 33--48. Google ScholarDigital Library
Robert I Davis and Alan Burns. 2011. A survey of hard real-time scheduling for multiprocessor systems. ACM computing surveys 43, 4 (2011), 35. Google ScholarDigital Library
David Dice. 2011. Brief Announcement: A Partitioned Ticket Lock. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11). ACM, New York, NY, USA, 309--310. Google ScholarDigital Library
Johan Eker, Jörn W Janneck, Edward A Lee, Jie Liu, Xiaojun Liu, Jozsef Ludvig, Stephen Neuendorffer, Sonia Sachs, and Yuhong Xiong. 2003. Taming heterogeneity-the Ptolemy approach. Proc. IEEE 91, 1 (2003), 127--144.Google ScholarCross Ref
Hugo Guiroux, Renaud Lachaize, and Vivien Quéma. 2016. Multicore Locks: The Case is Not Closed Yet. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 649--662. http://dl.acm.org/citation.cfm?id=3026959.3027018 Google ScholarDigital Library
Jonathan MD Hill and David B Skillicorn. 1998. Practical barrier synchronisation. In Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing (PDP'98). IEEE, 438--444.Google ScholarCross Ref
Intel. 2016. Intel 64 and IA-32 Architectures Optimization Reference Manual. Technical Report. Intel. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf, accessed 27 June 2018.Google Scholar
John Mellor-Crummey. 2017. Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors. https://www.clear.rice.edu/comp422/lecture-notes/comp422-534-2017-Lecture21-HWLocks.pdf {Accessed 30 June 2018}.Google Scholar
John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computer Systems 9, 1 (Feb. 1991), 21--65. Google ScholarDigital Library
Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 126--136.Google ScholarCross Ref
Maged M. Michael. 2013. The Balancing Act of Choosing Nonblocking Features. Commun. ACM 56, 9 (Sept. 2013), 46--53. Google ScholarDigital Library
D. Molka, D. Hackenberg, R. Schone, and M.S. Muller. 2009. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System. In Proc. 18th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'09). 261--270. Google ScholarDigital Library
Bradford Nichols, Dick Buttlar, Jacqueline Farrell, and Jackie Farrell. 1996. Pthreads programming: A POSIX standard for better multiprocessing. O'Reilly, Sebastopol, CA. Google ScholarDigital Library
Steven Pelley, Peter M Chen, and Thomas F Wenisch. 2014. Memory persistency. In 41st International Symposium on Computer Architecture (ISCA). IEEE, 265--276. Google ScholarDigital Library
David P Reed and Rajendra K Kanodia. 1979. Synchronization with eventcounts and sequencers. Commun. ACM 22, 2 (1979), 115--123. Google ScholarDigital Library
Paul Rosenfeld. 2014. Performance exploration of the hybrid memory cube. Ph.D. Dissertation. University of Maryland. https://drum.lib.umd.edu/handle/1903/15372Google Scholar
Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing: Second-generation Intel Xeon Phi product. IEEE Micro 36, 2 (2016), 34--46. Google ScholarDigital Library
S Swaminathan, John Stultz, Jack F Vogel, and Paul E McKenney. 2002. Fairlocks A High Performance Fair Locking Scheme. In Parallel and Distributed Computing and Systems (PDCS). 241--246.Google Scholar
Josep Torrellas, HS Lam, and John L. Hennessy. 1994. False sharing and spatial locality in multiprocessor caches. IEEE Trans. Comput. 43, 6 (1994), 651--663. Google ScholarDigital Library
Roberto Vitali, Alessandro Pellegrini, and Gionata Cerasuolo. 2012. Cacheaware Memory Manager for Optimistic Simulations. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques (SIMUTOOLS '12). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium, Belgium, 129--138. http://dl.acm.org/citation.cfm?id=2263019.2263035 Google ScholarDigital Library
Andrew Waterman, Yunsup Lee, David A Patterson, and Krste Asanovic. 2011. The RISC-V instruction set manual, Volume I: Base user-level ISA. Technical Report UCB/EECS-2011-62. EECS Department, UC Berkeley.Google Scholar
Wm A Wulf and Sally A McKee. 1995. Hitting the memory wall: implications of the obvious. ACM SIGARCH computer architecture news 23, 1 (1995), 20--24. Google ScholarDigital Library

Index Terms

A preliminary study of minimal-contention locks
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software performance

Recommendations

Scalable reader-writer locks
SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

We present three new reader-writer lock algorithms that scale under high read-only contention. Many previous reader-writer locks suffer significant degradation when many readers attempt to acquire the lock concurrently, even though they are all allowed ...
Read More
Contention-conscious, locality-preserving locks
PPoPP '16

Over the last decade, the growing use of cache-coherent NUMA architectures has spurred the development of numerous locality-preserving mutual exclusion algorithms. NUMA-aware locks such as HCLH, HMCS, and cohort locks exploit locality of reference among ...
Read More
Inferring locks for atomic sections
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

Atomic sections are a recent and popular idiom to support the development of concurrent programs. Updates performed within an atomic section should not be visible to other threads until the atomic section has been executed entirely. Traditionally, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAICSIT '18: Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists
September 2018
362 pages
ISBN:9781450366472
DOI:10.1145/3278681
Conference Chair:
Sue Petratos,
Program Chairs:
Johan van Niekerk,
Bertram Haskins
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 September 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate187of439submissions,43%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 69
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A preliminary study of minimal-contention locks

SAICSIT '18: Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable reader-writer locks

Contention-conscious, locality-preserving locks

Inferring locks for atomic sections