skip to main content
10.1145/3064176.3064203acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections

Malthusian Locks

Published: 23 April 2017 Publication History


Applications running in modern multithreaded environments are sometimes overthreaded. The excess threads do not improve performance, and in fact may act to degrade performance via scalability collapse, which can manifest even when there are fewer ready threads than available cores. Often, such software also has highly contended locks. We leverage the existence of such locks by modifying the lock admission policy so as to intentionally limit the number of distinct threads circulating over the lock in a given period. Specifically, if there are more threads circulating than are necessary to keep the lock saturated (continuously held), our approach will selectively cull and passivate some of those excess threads. We borrow the concept of swapping from the field of memory management and impose concurrency restriction (CR) if a lock suffers from contention. The resultant admission order is unfair over the short term but we explicitly provide long-term fairness by periodically shifting threads between the set of passivated threads and those actively circulating. Our approach is palliative, but is often effective at avoiding or reducing scalability collapse, and in the worst case does no harm. Specifically, throughput is either unaffected or improved, and unfairness is bounded, relative to common test-and-set locks which allow unbounded bypass and starvation1. By reducing competition for shared resources, such as pipelines, processors and caches, concurrency restriction may also reduce overall resource consumption and improve the overall load carrying capacity of a system.


Y. Afek, D. Dice, and A. Morrison. Cache Index-aware Memory Allocation. International Symposium on Memory Management -- ISMM, 2011. URL
H. Akkan, M. Lang, and L. Ionkov. HPC runtime support for fast and power efficient locking and synchronization. In 2013 IEEE International Conference on Cluster Computer-CLUSTER, 2013. URL
T. E. Anderson. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 1(1), Jan. 1990. URL
M. Blasgen, J. Gray, M. Mitoma, and T. Price. The Convoy Phenomenon. SIGOPS Operating Systems Review, 1979. URL
S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium, 2012.
B. Brett, P. Kumar, M. Kim, and H. Kim. CHiP: A Profiler to Measure the Effect of Cache Contention on Scalability. International Parallel and Distributed Processing Symposium Workshops PhD Forum -- IPDPSW. IEEE Computer Society, 2013. URL
F. P. J. Brooks. The Mythical Man-Month. Addison-Wesley, 1975. ISBN 0-201-00650-2.
D. Bueso. Scalability Techniques for Practical Synchronization Primitives. Communications of the ACM -- CACM, 2014. URL
I. Calciu, D. Dice, Y. Lev, V. Luchangco, V. J. Marathe, and N. Shavit. NUMA-aware Reader-writer Locks. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP. ACM, 2013. URL
M. Chabbi and J. Mellor-Crummey. Contention-conscious, Locality-preserving Locks. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming -- PPoPP. ACM, 2016. URL
M. Chabbi, M. Fagan, and J. Mellor-Crummey. High Performance Locks for Multi-level NUMA Systems. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP. ACM, 2015. URL
G. Chadha, S. Mahlke, and S. Narayanasamy. When Less is More (LIMO):Controlled Parallelism For Improved Efficiency. Conference on Compilers, Architectures and Synthesis for Embedded Systems - CASES, 2012. URL
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. Symposium on High-Performance Computer Architecture -- HPCA. IEEE Computer Society, 2005. URL
Y. Chou, B. Fahs, and S. Abraham. Microarchitecture Optimizations for Exploiting Memory-Level Parallelism. International Symposium on Computer Archtecture -- ISCA. IEEE Computer Society, 2004. URL
Y. Cui, Y. Chen, and Y. Shi. Comparison of Lock Thrashing Avoidance Methods and Its Performance Implications for Lock Design. Workshop on Large-scale System and Application Performance - LSAP. ACM, 2011. URL
Y. Cui, Y. Wang, Y. Chen, and Y. Shi. Requester-Based Spin Lock: A Scalable and Energy Efficient Locking Scheme on Multicore Systems. IEEE Transactions on Computers, 2015. URL
C. Curtsinger and E. D. Berger. Coz: Finding Code That Counts with Causal Profiling. Symposium on Operating Systems Principles - SOSP. ACM, 2015. URL
T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. Symposium on Operating Systems Principles -- SOSP. ACM, 2013. URL
P. J. Denning. Working Sets Past and Present. IEEE Transactions on Software Engineering, 1980. URL
D. Dice. Implementing Fast Java Monitors with Relaxed-locks. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium- Volume 1, JVM. USENIX Association, 2001. URL
D. Dice. Adaptive spin-then-block mutual exclusion in multi-threaded processing, Sept. 2009. URL US Patent 7,594,234.
D. Dice. Polite busy-waiting with WRPAUSE on SPARC, 2011. URL
D. Dice. Inverted schedctl usage in the JVM, 2011. URL
D. Dice. Using MWAIT in spin loops, 2011. URL
D. Dice. Measuring long-term fairness for locks, 2014. URL
D. Dice. Preemption Tolerant MCS Locks, 2016. URL
D. Dice and T. Harris. Lock Holder Preemption Avoidance via Transactional Lock Elision. ACM SIGPLAN Workshop on Transactional Computing - Transact, 2016. URL
D. Dice, N. Shavit, and V. J. Marathe. US Patent US8775837 - Turbo Enablement, 2012. URL
D. Dice, V. J. Marathe, and N. Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. ACM Transactions on Parallel Computing - TOPC, 1 (2), Feb 2015. URL
J. Eastep, D. Wingate, M. D. Santambrogio, and A. Agarwal. Smartlocks: Lock Acquisition Scheduling for Self-aware Synchronization. International Conference on Autonomic Computing -- ICAC, 2010. URL
E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel Application Memory Scheduling. In International Symposium on Microarchitecture -- MICRO-44. ACM, 2011. URL
J. Edler, J. Lipkis, and E. Schonberg. Process Management for Highly Parallel UNIX Systems. In Proc. 1988 USENIX Workshop on UNIX and Supercomputers, 1988.
S. Eyerman and L. Eeckhout. Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design. International Symposium on Computer Architecture -- ISCA. ACM, 2010. URL
FAL Labs. Kyoto cabinet. URL
B. Falsafi, R. Guerraoui, J. Picorel, and V. Trigonakis. Unlocking Energy. In USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, 2016. URL
C. Gershenson and D. Helbing. When Slower is Faster. CoRR, 2011. URL
C. Gini. Variabilità e Mutabilità. Memorie di Metodologica Statistica, 1912.
H. Guiroux, R. Lachaize, and V. Quéma. Multicore Locks: The Case Is Not Closed Yet. In USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, 2016. URL
J. Gustedt. Futex Based Locks for C11's Generic Atomics. Symposium on Applied Computing - SAC. ACM, 2016. URL
B. He, W. N. Scherer, and M. L. Scott. Preemption Adaptivity in Time-published Queue-based Spin Locks. High Performance Computing -- HiPC. Springer-Verlag, 2005. URL
W. Heirman, T. Carlson, K. Van Craeynest, I. Hur, A. Jaleel, and L. Eeckhout. Undersubscribed Threading on Clustered Cache Architectures. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture --HPCA. URL
J. Holtman and N. J. Gunther. Getting in the Zone for Successful Scalability. CoRR, 2008. URL
Intel. Improving Real-Time Performance by Utilizing Cache Allocation Technology. URL
F. R. Johnson, R. Stoica, A. Ailamaki, and T. C. Mowry. Decoupling Contention Management from Scheduling. Architectural Support for Programming Languages and Operating Systems - ASPLOS XV. ACM, 2010. URL
R. Johnson, M. Athanassoulis, R. Stoica, and A. Ailamaki. A New Look at the Roles of Spinning and Blocking. Proceedings of the Fifth International Workshop on Data Management on New Hardware -- DaMoN. ACM, 2009. URL
A. R. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical Studies of Competitve Spinning for a Shared-memory Multiprocessor. SIGOPS Operating Systems Review, 1991. URL
S. Kashyap, C. Min, and T. Kim. Opportunistic Spinlocks: Achieving Virtual Machine Scalability in the Clouds. SIGOPS Operating Systems Review, 2016. URL
L. I. Kontothanassis, R. W. Wisniewski, and M. L. Scott. Scheduler-Conscious Synchronization. ACM Transations on Computing Systems, 1997. URL
N. Kosche, D. Singleton, B. Smaalders, and A. Tucker. Method and apparatus for execution and preemption control of computer process entities: US Patent number 5937187, 1999. URL
D.Lea. java.util.concurrent abstractqueuedsynchronizer, 2016. URL
B.-H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-scale Multiprocessors. ACM Transactions on Computing Systems, 1993. URL
B.-H. Lim and A. Agarwal. Reactive Synchronization Algorithms for Multiprocessors. Architectural Support for Programming Languages and Operating Systems -- ASPLOS. ACM, 1994. URL
V. Luchangco, D. Nussbaum, and N. Shavit. A Hierarchical CLH Queue Lock. In Euro-Par 2006 Parallel Processing. 2006. URL
E. P. Markatos and T. J. LeBlanc. Multiprocessor synchronization primitives with priorities. 8th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, 1991.
J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention Aware Execution: Online Contention Detection and Response. International Symposium on Code Generation and Optimization -- CGO. ACM, 2010. URL
G. Marsaglia. Xorshift RNGs. Journal of Statistical Software, 8(1), 2003. URL
J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computing Systems, 9(1), Feb. 1991. URL
R. Odaira and K. Hiraki. Selective Optimization of Locks by Runtime Statistics and Just-in-Time Compilation. International Parallel and Distributed Processing Symposium Workshops -- IPDPS. IEEE Computer Society, 2003. URL
Open Solaris. Synch.c: pthread_mutex implementation. URL
Oracle Corporation. Oracle's SPARC T5-2, SPARC T5-4, SPARC T5-8, and SPARC T5-1B Server Architecture, 2014. URL
A. K. Porterfield, S. L. Olivier, S. Bhalachandra, and J. F. Prins. Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs. International Parallel and Distributed Processing Symposium Workshops -- IPDPSW. IEEE Computer Society, 2013. URL
K. K. Pusukuri, R. Gupta, and L. N. Bhuyan. Thread Reinforcer: Dynamically Determining Number of Threads via OS Level Monitoring. International Symposium on Workload Characterization - IISWC. IEEE Computer Society, 2011. URL
Z. Radović and E. Hagersten. Hierarchical Backoff Locks for Nonuniform Communication Architectures. In International Symposium on High Performance Computer Architecture -- HPCA. IEEE Computer Society, 2003. URL
P. Ramalhete and A. Correia. Tidex: A Mutual Exclusion Lock. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming -- PPoPP. ACM, 2016. URL
A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism Orchestration Using DoPE: The Degree of Parallelism Executive. Programming Language Design and Implementation -- PLDI. ACM, 2011. URL
K. Ren, J. M. Faleiro, and D. J. Abadi. Design Principles for Scaling Multi-core OLTP Under High Contention. CoRR, 2015. URL
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 2004. URL
J.-T. Wamhoff, S. Diestelhorst, C. Fetzer, P. Marlier, P. Felber, and D. Dice. The TURBO Diaries: Application-controlled Frequency Scaling Explained. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). URL
T. Wang, M. Chabbi, and H. Kimura. Be My Guest: MCS Lock Now Welcomes Guests. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming -- PPoPP. ACM, 2016. URL
Wikipedia. Malthusianism, 2015. URL [Online; accessed 2015-08-07].
R. M. Yoo and H.-H. S. Lee. Adaptive Transaction Scheduling for Transactional Memory Systems. ACM Symposium on Parallelism in Algorithms and Architectures -- SPAA, 2008. URL
S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. ACM Computing Surveys, 2012. URL

Cited By

View all
  • (2025)HTLL: Latency-Aware Scalable Blocking MutexIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.352685936:3(471-486)Online publication date: 1-Mar-2025
  • (2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
  • (2024)ESem: To Harden Process Synchronization for ServersProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3657025(1554-1567)Online publication date: 1-Jul-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems
April 2017
648 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017


Request permissions for this article.

Check for updates

Author Tags

  1. Concurrency
  2. admission control
  3. admission order
  4. caches
  5. contention
  6. fairness
  7. locks
  8. multicore
  9. mutexes
  10. mutual exclusion
  11. scheduling
  12. spinning
  13. synchronization
  14. threads


  • Research-article
  • Research
  • Refereed limited


EuroSys '17
EuroSys '17: Twelfth EuroSys Conference 2017
April 23 - 26, 2017
Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)6
Reflects downloads up to 18 Feb 2025

Other Metrics


Cited By

View all
  • (2025)HTLL: Latency-Aware Scalable Blocking MutexIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.352685936:3(471-486)Online publication date: 1-Mar-2025
  • (2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
  • (2024)ESem: To Harden Process Synchronization for ServersProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3657025(1554-1567)Online publication date: 1-Jul-2024
  • (2024)Scalable Compact NUMA-aware Lock2024 23rd International Symposium on Parallel and Distributed Computing (ISPDC)10.1109/ISPDC62236.2024.10705400(1-8)Online publication date: 8-Jul-2024
  • (2023)Protecting Locks Against Unbalanced Unlock()Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591091(199-211)Online publication date: 17-Jun-2023
  • (2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
  • (2022)Asymmetry-aware scalable lockingProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508420(294-308)Online publication date: 2-Apr-2022
  • (2021)CLoFProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483557(851-865)Online publication date: 26-Oct-2021
  • (2021)FTSDProceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3476886.3477518(123-130)Online publication date: 24-Aug-2021
  • (2021)Towards Exploiting CPU Elasticity via Efficient Thread OversubscriptionProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460641(215-226)Online publication date: 21-Jun-2021
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media