ABSTRACT
As multicore CPUs become more common, scalable synchronization primitives have wider use and ideas previously used in large-scale computation are worth re-opening for wider use. In this paper I explore one approach to scalable synchronization, a minimal-contention lock (M-lock). The key idea is to avoid spinning on a global variable but instead for each blocked task (process or thread) to spin on a local lock representing the task that immediately preceded it in attempting to acquire the lock. This creates an ordering based on the order in which tasks attempt to acquire the lock, preventing starvation. The only globally shared variable is a pointer to the next local lock to be contended for. Each contending task swaps the value of this pointer for a pointer to its own variable. It spins on the variable previously pointed to by the global pointer. Each waiting task spins on a lock only seen by itself and the owner of that lock variable. While a task is spinning, the lock variable can be held in its local cache until invalidated by the lock owner when it unsets the lock. Consequently, the amount of bus traffic is considerably less than with a spinlock, which has the pernicious feature that the task releasing the lock is delayed by all the other bus traffic arising from contention for the lock. An MCS lock has similar properties but is more complicated and requires more memory contention-causing operations. This paper outlines the design of the M-lock and provides a preliminary performance analysis.
- Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2010. An Analysis of Linux Scalability to Many Cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 1--16. http://dl.acm.org/citation.cfm?id=1924943.1924944 Google ScholarDigital Library
- Silas Boyd-Wickizer, M Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2012. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium. 119--130.Google Scholar
- David R. Cheriton, Hendrik A. Goosen, Hugh Holbrook, and Philip Machanick. 1993. Restructuring a Parallel Simulation to Improve Cache Behavior in a Shared-memory Multiprocessor: The Value of Distributed Synchronization. In Proceedings of the Seventh Workshop on Parallel and Distributed Simulation (PADS '93). ACM, New York, NY, USA, 159--162. Google ScholarDigital Library
- Austin T Clements, M Frans Kaashoek, Nickolai Zeldovich, Robert T Morris, and Eddie Kohler. 2015. The scalable commutativity rule: Designing scalable software for multicore processors. ACM Transactions on Computer Systems 32, 4 (2015), 10. Google ScholarDigital Library
- Travis Craig. 1993. Building FIFO and priority queuing spin locks from atomic swap. Technical Report. University of Washington, Seattle. ftp://trout.cs.washington. edu/tr/1993/02/UW-CSE-93-02-02.pdfGoogle Scholar
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 33--48. Google ScholarDigital Library
- Robert I Davis and Alan Burns. 2011. A survey of hard real-time scheduling for multiprocessor systems. ACM computing surveys 43, 4 (2011), 35. Google ScholarDigital Library
- David Dice. 2011. Brief Announcement: A Partitioned Ticket Lock. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11). ACM, New York, NY, USA, 309--310. Google ScholarDigital Library
- Johan Eker, Jörn W Janneck, Edward A Lee, Jie Liu, Xiaojun Liu, Jozsef Ludvig, Stephen Neuendorffer, Sonia Sachs, and Yuhong Xiong. 2003. Taming heterogeneity-the Ptolemy approach. Proc. IEEE 91, 1 (2003), 127--144.Google ScholarCross Ref
- Hugo Guiroux, Renaud Lachaize, and Vivien Quéma. 2016. Multicore Locks: The Case is Not Closed Yet. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 649--662. http://dl.acm.org/citation.cfm?id=3026959.3027018 Google ScholarDigital Library
- Jonathan MD Hill and David B Skillicorn. 1998. Practical barrier synchronisation. In Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing (PDP'98). IEEE, 438--444.Google ScholarCross Ref
- Intel. 2016. Intel 64 and IA-32 Architectures Optimization Reference Manual. Technical Report. Intel. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf, accessed 27 June 2018.Google Scholar
- John Mellor-Crummey. 2017. Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors. https://www.clear.rice.edu/comp422/lecture-notes/comp422-534-2017-Lecture21-HWLocks.pdf {Accessed 30 June 2018}.Google Scholar
- John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computer Systems 9, 1 (Feb. 1991), 21--65. Google ScholarDigital Library
- Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 126--136.Google ScholarCross Ref
- Maged M. Michael. 2013. The Balancing Act of Choosing Nonblocking Features. Commun. ACM 56, 9 (Sept. 2013), 46--53. Google ScholarDigital Library
- D. Molka, D. Hackenberg, R. Schone, and M.S. Muller. 2009. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System. In Proc. 18th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'09). 261--270. Google ScholarDigital Library
- Bradford Nichols, Dick Buttlar, Jacqueline Farrell, and Jackie Farrell. 1996. Pthreads programming: A POSIX standard for better multiprocessing. O'Reilly, Sebastopol, CA. Google ScholarDigital Library
- Steven Pelley, Peter M Chen, and Thomas F Wenisch. 2014. Memory persistency. In 41st International Symposium on Computer Architecture (ISCA). IEEE, 265--276. Google ScholarDigital Library
- David P Reed and Rajendra K Kanodia. 1979. Synchronization with eventcounts and sequencers. Commun. ACM 22, 2 (1979), 115--123. Google ScholarDigital Library
- Paul Rosenfeld. 2014. Performance exploration of the hybrid memory cube. Ph.D. Dissertation. University of Maryland. https://drum.lib.umd.edu/handle/1903/15372Google Scholar
- Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing: Second-generation Intel Xeon Phi product. IEEE Micro 36, 2 (2016), 34--46. Google ScholarDigital Library
- S Swaminathan, John Stultz, Jack F Vogel, and Paul E McKenney. 2002. Fairlocks A High Performance Fair Locking Scheme. In Parallel and Distributed Computing and Systems (PDCS). 241--246.Google Scholar
- Josep Torrellas, HS Lam, and John L. Hennessy. 1994. False sharing and spatial locality in multiprocessor caches. IEEE Trans. Comput. 43, 6 (1994), 651--663. Google ScholarDigital Library
- Roberto Vitali, Alessandro Pellegrini, and Gionata Cerasuolo. 2012. Cacheaware Memory Manager for Optimistic Simulations. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques (SIMUTOOLS '12). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium, Belgium, 129--138. http://dl.acm.org/citation.cfm?id=2263019.2263035 Google ScholarDigital Library
- Andrew Waterman, Yunsup Lee, David A Patterson, and Krste Asanovic. 2011. The RISC-V instruction set manual, Volume I: Base user-level ISA. Technical Report UCB/EECS-2011-62. EECS Department, UC Berkeley.Google Scholar
- Wm A Wulf and Sally A McKee. 1995. Hitting the memory wall: implications of the obvious. ACM SIGARCH computer architecture news 23, 1 (1995), 20--24. Google ScholarDigital Library
Index Terms
- A preliminary study of minimal-contention locks
Recommendations
Scalable reader-writer locks
SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architecturesWe present three new reader-writer lock algorithms that scale under high read-only contention. Many previous reader-writer locks suffer significant degradation when many readers attempt to acquire the lock concurrently, even though they are all allowed ...
Contention-conscious, locality-preserving locks
PPoPP '16Over the last decade, the growing use of cache-coherent NUMA architectures has spurred the development of numerous locality-preserving mutual exclusion algorithms. NUMA-aware locks such as HCLH, HMCS, and cohort locks exploit locality of reference among ...
Inferring locks for atomic sections
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and ImplementationAtomic sections are a recent and popular idiom to support the development of concurrent programs. Updates performed within an atomic section should not be visible to other threads until the atomic section has been executed entirely. Traditionally, ...
Comments