skip to main content
10.1145/3064176.3064203acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Malthusian Locks

Published: 23 April 2017 Publication History

Abstract

Applications running in modern multithreaded environments are sometimes overthreaded. The excess threads do not improve performance, and in fact may act to degrade performance via scalability collapse, which can manifest even when there are fewer ready threads than available cores. Often, such software also has highly contended locks. We leverage the existence of such locks by modifying the lock admission policy so as to intentionally limit the number of distinct threads circulating over the lock in a given period. Specifically, if there are more threads circulating than are necessary to keep the lock saturated (continuously held), our approach will selectively cull and passivate some of those excess threads. We borrow the concept of swapping from the field of memory management and impose concurrency restriction (CR) if a lock suffers from contention. The resultant admission order is unfair over the short term but we explicitly provide long-term fairness by periodically shifting threads between the set of passivated threads and those actively circulating. Our approach is palliative, but is often effective at avoiding or reducing scalability collapse, and in the worst case does no harm. Specifically, throughput is either unaffected or improved, and unfairness is bounded, relative to common test-and-set locks which allow unbounded bypass and starvation1. By reducing competition for shared resources, such as pipelines, processors and caches, concurrency restriction may also reduce overall resource consumption and improve the overall load carrying capacity of a system.

References

[1]
Y. Afek, D. Dice, and A. Morrison. Cache Index-aware Memory Allocation. International Symposium on Memory Management -- ISMM, 2011. URL http://doi.acm.org/10.1145/1993478.1993486.
[2]
H. Akkan, M. Lang, and L. Ionkov. HPC runtime support for fast and power efficient locking and synchronization. In 2013 IEEE International Conference on Cluster Computer-CLUSTER, 2013. URL http://dx.doi.org/10.1109/CLUSTER.2013.6702659.
[3]
T. E. Anderson. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 1(1), Jan. 1990. URL http://dx.doi.org/10.1109/71.80120.
[4]
M. Blasgen, J. Gray, M. Mitoma, and T. Price. The Convoy Phenomenon. SIGOPS Operating Systems Review, 1979. URL http://doi.acm.org/10.1145/850657.850659.
[5]
S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium, 2012.
[6]
B. Brett, P. Kumar, M. Kim, and H. Kim. CHiP: A Profiler to Measure the Effect of Cache Contention on Scalability. International Parallel and Distributed Processing Symposium Workshops PhD Forum -- IPDPSW. IEEE Computer Society, 2013. URL http://dx.doi.org/10.1109/IPDPSW.2013.49.
[7]
F. P. J. Brooks. The Mythical Man-Month. Addison-Wesley, 1975. ISBN 0-201-00650-2.
[8]
D. Bueso. Scalability Techniques for Practical Synchronization Primitives. Communications of the ACM -- CACM, 2014. URL http://doi.acm.org/10.1145/2687882.
[9]
I. Calciu, D. Dice, Y. Lev, V. Luchangco, V. J. Marathe, and N. Shavit. NUMA-aware Reader-writer Locks. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP. ACM, 2013. URL http://doi.acm.org/10.1145/2442516.2442532.
[10]
M. Chabbi and J. Mellor-Crummey. Contention-conscious, Locality-preserving Locks. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming -- PPoPP. ACM, 2016. URL http://doi.acm.org/10.1145/2851141.2851166.
[11]
M. Chabbi, M. Fagan, and J. Mellor-Crummey. High Performance Locks for Multi-level NUMA Systems. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP. ACM, 2015. URL http://doi.acm.org/10.1145/2688500.2688503.
[12]
G. Chadha, S. Mahlke, and S. Narayanasamy. When Less is More (LIMO):Controlled Parallelism For Improved Efficiency. Conference on Compilers, Architectures and Synthesis for Embedded Systems - CASES, 2012. URL http://doi.acm.org/10.1145/2380403.2380431.
[13]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. Symposium on High-Performance Computer Architecture -- HPCA. IEEE Computer Society, 2005. URL http://dx.doi.org/10.1109/HPCA.2005.27.
[14]
Y. Chou, B. Fahs, and S. Abraham. Microarchitecture Optimizations for Exploiting Memory-Level Parallelism. International Symposium on Computer Archtecture -- ISCA. IEEE Computer Society, 2004. URL http://dl.acm.org/citation.cfm?id=998680.1006708.
[15]
Y. Cui, Y. Chen, and Y. Shi. Comparison of Lock Thrashing Avoidance Methods and Its Performance Implications for Lock Design. Workshop on Large-scale System and Application Performance - LSAP. ACM, 2011. URL http://doi.acm.org/10.1145/1996029.1996033.
[16]
Y. Cui, Y. Wang, Y. Chen, and Y. Shi. Requester-Based Spin Lock: A Scalable and Energy Efficient Locking Scheme on Multicore Systems. IEEE Transactions on Computers, 2015. URL http://dx.doi.org/10.1109/TC.2013.196.
[17]
C. Curtsinger and E. D. Berger. Coz: Finding Code That Counts with Causal Profiling. Symposium on Operating Systems Principles - SOSP. ACM, 2015. URL http://doi.acm.org/10.1145/2815400.2815409.
[18]
T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. Symposium on Operating Systems Principles -- SOSP. ACM, 2013. URL http://doi.acm.org/10.1145/2517349.2522714.
[19]
P. J. Denning. Working Sets Past and Present. IEEE Transactions on Software Engineering, 1980. URL http://dx.doi.org/10.1109/TSE.1980.230464.
[20]
D. Dice. Implementing Fast Java Monitors with Relaxed-locks. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium- Volume 1, JVM. USENIX Association, 2001. URL https://www.usenix.org/legacy/event/jvm01/full_papers/dice/dice.pdf.
[21]
D. Dice. Adaptive spin-then-block mutual exclusion in multi-threaded processing, Sept. 2009. URL http://www.google.com/patents/US7594234. US Patent 7,594,234.
[22]
D. Dice. Polite busy-waiting with WRPAUSE on SPARC, 2011. URL https://blogs.oracle.com/dave/entry/polite_busy_waiting_with_wrpause.
[23]
D. Dice. Inverted schedctl usage in the JVM, 2011. URL https://blogs.oracle.com/dave/entry/inverted_schedctl_usage_in_the.
[24]
D. Dice. Using MWAIT in spin loops, 2011. URL https://blogs.oracle.com/dave/resource/mwait-blog-final2.txt.
[25]
D. Dice. Measuring long-term fairness for locks, 2014. URL https://blogs.oracle.com/dave/entry/measuring_long_term_fairness_for.
[26]
D. Dice. Preemption Tolerant MCS Locks, 2016. URL https://blogs.oracle.com/dave/entry/preemption_tolerant_mcs_locks.
[27]
D. Dice and T. Harris. Lock Holder Preemption Avoidance via Transactional Lock Elision. ACM SIGPLAN Workshop on Transactional Computing - Transact, 2016. URL https://blogs.oracle.com/dave/resource/LHPA-TLE-Feb16.pdf.
[28]
D. Dice, N. Shavit, and V. J. Marathe. US Patent US8775837 - Turbo Enablement, 2012. URL http://www.google.com/patents/US8775837.
[29]
D. Dice, V. J. Marathe, and N. Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. ACM Transactions on Parallel Computing - TOPC, 1 (2), Feb 2015. URL http://doi.acm.org/10.1145/2686884.
[30]
J. Eastep, D. Wingate, M. D. Santambrogio, and A. Agarwal. Smartlocks: Lock Acquisition Scheduling for Self-aware Synchronization. International Conference on Autonomic Computing -- ICAC, 2010. URL http://doi.acm.org/10.1145/1809049.1809079.
[31]
E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel Application Memory Scheduling. In International Symposium on Microarchitecture -- MICRO-44. ACM, 2011. URL http://doi.acm.org/10.1145/2155620.2155663.
[32]
J. Edler, J. Lipkis, and E. Schonberg. Process Management for Highly Parallel UNIX Systems. In Proc. 1988 USENIX Workshop on UNIX and Supercomputers, 1988.
[33]
S. Eyerman and L. Eeckhout. Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design. International Symposium on Computer Architecture -- ISCA. ACM, 2010. URL http://doi.acm.org/10.1145/1815961.1816011.
[34]
FAL Labs. Kyoto cabinet. URL http://fallabs.com/kyotocabinet/.
[35]
B. Falsafi, R. Guerraoui, J. Picorel, and V. Trigonakis. Unlocking Energy. In USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, 2016. URL https://www.usenix.org/conference/atc16/technical-sessions/presentation/falsafi.
[36]
C. Gershenson and D. Helbing. When Slower is Faster. CoRR, 2011. URL http://arxiv.org/abs/1506.06796v2.
[37]
C. Gini. Variabilità e Mutabilità. Memorie di Metodologica Statistica, 1912.
[38]
H. Guiroux, R. Lachaize, and V. Quéma. Multicore Locks: The Case Is Not Closed Yet. In USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, 2016. URL https://www.usenix.org/conference/atc16/technical-sessions/presentation/guiroux.
[39]
J. Gustedt. Futex Based Locks for C11's Generic Atomics. Symposium on Applied Computing - SAC. ACM, 2016. URL http://doi.acm.org/10.1145/2851613.2851956.
[40]
B. He, W. N. Scherer, and M. L. Scott. Preemption Adaptivity in Time-published Queue-based Spin Locks. High Performance Computing -- HiPC. Springer-Verlag, 2005. URL http://dx.doi.org/10.1007/11602569_6.
[41]
W. Heirman, T. Carlson, K. Van Craeynest, I. Hur, A. Jaleel, and L. Eeckhout. Undersubscribed Threading on Clustered Cache Architectures. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture --HPCA. URL http://dx.doi.org/10.1109/HPCA.2014.6835975.
[42]
J. Holtman and N. J. Gunther. Getting in the Zone for Successful Scalability. CoRR, 2008. URL http://arxiv.org/abs/0809.2541.
[43]
Intel. Improving Real-Time Performance by Utilizing Cache Allocation Technology. URL http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/cache-allocation-technology-white-paper.pdf.
[44]
F. R. Johnson, R. Stoica, A. Ailamaki, and T. C. Mowry. Decoupling Contention Management from Scheduling. Architectural Support for Programming Languages and Operating Systems - ASPLOS XV. ACM, 2010. URL http://doi.acm.org/10.1145/1736020.1736035.
[45]
R. Johnson, M. Athanassoulis, R. Stoica, and A. Ailamaki. A New Look at the Roles of Spinning and Blocking. Proceedings of the Fifth International Workshop on Data Management on New Hardware -- DaMoN. ACM, 2009. URL http://doi.acm.org/10.1145/1565694.1565700.
[46]
A. R. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical Studies of Competitve Spinning for a Shared-memory Multiprocessor. SIGOPS Operating Systems Review, 1991. URL http://doi.acm.org/10.1145/121133.286599.
[47]
S. Kashyap, C. Min, and T. Kim. Opportunistic Spinlocks: Achieving Virtual Machine Scalability in the Clouds. SIGOPS Operating Systems Review, 2016. URL http://doi.acm.org/10.1145/2903267.2903271.
[48]
L. I. Kontothanassis, R. W. Wisniewski, and M. L. Scott. Scheduler-Conscious Synchronization. ACM Transations on Computing Systems, 1997. URL http://doi.acm.org/10.1145/244764.244765.
[49]
N. Kosche, D. Singleton, B. Smaalders, and A. Tucker. Method and apparatus for execution and preemption control of computer process entities: US Patent number 5937187, 1999. URL http://www.google.com/patents/US5937187.
[50]
D.Lea. java.util.concurrent abstractqueuedsynchronizer, 2016. URL http://download.java.net/java/jdk9/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html.
[51]
B.-H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-scale Multiprocessors. ACM Transactions on Computing Systems, 1993. URL http://doi.acm.org/10.1145/152864.152869.
[52]
B.-H. Lim and A. Agarwal. Reactive Synchronization Algorithms for Multiprocessors. Architectural Support for Programming Languages and Operating Systems -- ASPLOS. ACM, 1994. URL http://doi.acm.org/10.1145/195473.195490.
[53]
V. Luchangco, D. Nussbaum, and N. Shavit. A Hierarchical CLH Queue Lock. In Euro-Par 2006 Parallel Processing. 2006. URL http://dx.doi.org/10.1007/11823285_84.
[54]
E. P. Markatos and T. J. LeBlanc. Multiprocessor synchronization primitives with priorities. 8th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, 1991.
[55]
J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention Aware Execution: Online Contention Detection and Response. International Symposium on Code Generation and Optimization -- CGO. ACM, 2010. URL http://doi.acm.org/10.1145/1772954.1772991.
[56]
G. Marsaglia. Xorshift RNGs. Journal of Statistical Software, 8(1), 2003. URL http://dx.doi.org/10.18637/jss.v008.i14.
[57]
J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computing Systems, 9(1), Feb. 1991. URL http://doi.acm.org/10.1145/103727.103729.
[58]
R. Odaira and K. Hiraki. Selective Optimization of Locks by Runtime Statistics and Just-in-Time Compilation. International Parallel and Distributed Processing Symposium Workshops -- IPDPS. IEEE Computer Society, 2003. URL http://dl.acm.org/citation.cfm?id=838237.838665.
[59]
Open Solaris. Synch.c: pthread_mutex implementation. URL https://github.com/sonnens/illumos-joyent/blob/master/usr/src/lib/libc/port/threads/synch.c.
[60]
Oracle Corporation. Oracle's SPARC T5-2, SPARC T5-4, SPARC T5-8, and SPARC T5-1B Server Architecture, 2014. URL http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o13-024-sparc-t5-architecture-1920540.pdf.
[61]
A. K. Porterfield, S. L. Olivier, S. Bhalachandra, and J. F. Prins. Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs. International Parallel and Distributed Processing Symposium Workshops -- IPDPSW. IEEE Computer Society, 2013. URL http://dx.doi.org/10.1109/IPDPSW.2013.15.
[62]
K. K. Pusukuri, R. Gupta, and L. N. Bhuyan. Thread Reinforcer: Dynamically Determining Number of Threads via OS Level Monitoring. International Symposium on Workload Characterization - IISWC. IEEE Computer Society, 2011. URL http://dx.doi.org/10.1109/IISWC.2011.6114208.
[63]
Z. Radović and E. Hagersten. Hierarchical Backoff Locks for Nonuniform Communication Architectures. In International Symposium on High Performance Computer Architecture -- HPCA. IEEE Computer Society, 2003. URL http://dl.acm.org/citation.cfm?id=822080.822810.
[64]
P. Ramalhete and A. Correia. Tidex: A Mutual Exclusion Lock. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming -- PPoPP. ACM, 2016. URL http://doi.acm.org/10.1145/2851141.2851171.
[65]
A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism Orchestration Using DoPE: The Degree of Parallelism Executive. Programming Language Design and Implementation -- PLDI. ACM, 2011. URL http://doi.acm.org/10.1145/1993498.1993502.
[66]
K. Ren, J. M. Faleiro, and D. J. Abadi. Design Principles for Scaling Multi-core OLTP Under High Contention. CoRR, 2015. URL http://arxiv.org/abs/1512.06168.
[67]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 2004. URL http://dx.doi.org/10.1023/B:SUPE.0000014800.27383.8f.
[68]
J.-T. Wamhoff, S. Diestelhorst, C. Fetzer, P. Marlier, P. Felber, and D. Dice. The TURBO Diaries: Application-controlled Frequency Scaling Explained. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). URL https://www.usenix.org/conference/atc14/technical-sessions/presentation/wamhoff.
[69]
T. Wang, M. Chabbi, and H. Kimura. Be My Guest: MCS Lock Now Welcomes Guests. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming -- PPoPP. ACM, 2016. URL http://doi.acm.org/10.1145/2851141.2851160.
[70]
Wikipedia. Malthusianism, 2015. URL https://en.wikipedia.org/wiki/Malthusianism [Online; accessed 2015-08-07].
[71]
R. M. Yoo and H.-H. S. Lee. Adaptive Transaction Scheduling for Transactional Memory Systems. ACM Symposium on Parallelism in Algorithms and Architectures -- SPAA, 2008. URL http://doi.acm.org/10.1145/1378533.1378564.
[72]
S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. ACM Computing Surveys, 2012. URL http://doi.acm.org/10.1145/2379776.2379780.

Cited By

View all
  • (2025)HTLL: Latency-Aware Scalable Blocking MutexIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.352685936:3(471-486)Online publication date: 1-Mar-2025
  • (2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
  • (2024)ESem: To Harden Process Synchronization for ServersProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3657025(1554-1567)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems
April 2017
648 pages
ISBN:9781450349383
DOI:10.1145/3064176
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Concurrency
  2. admission control
  3. admission order
  4. caches
  5. contention
  6. fairness
  7. locks
  8. multicore
  9. mutexes
  10. mutual exclusion
  11. scheduling
  12. spinning
  13. synchronization
  14. threads

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '17
Sponsor:
EuroSys '17: Twelfth EuroSys Conference 2017
April 23 - 26, 2017
Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)6
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)HTLL: Latency-Aware Scalable Blocking MutexIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.352685936:3(471-486)Online publication date: 1-Mar-2025
  • (2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
  • (2024)ESem: To Harden Process Synchronization for ServersProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3657025(1554-1567)Online publication date: 1-Jul-2024
  • (2024)Scalable Compact NUMA-aware Lock2024 23rd International Symposium on Parallel and Distributed Computing (ISPDC)10.1109/ISPDC62236.2024.10705400(1-8)Online publication date: 8-Jul-2024
  • (2023)Protecting Locks Against Unbalanced Unlock()Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591091(199-211)Online publication date: 17-Jun-2023
  • (2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
  • (2022)Asymmetry-aware scalable lockingProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508420(294-308)Online publication date: 2-Apr-2022
  • (2021)CLoFProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483557(851-865)Online publication date: 26-Oct-2021
  • (2021)FTSDProceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3476886.3477518(123-130)Online publication date: 24-Aug-2021
  • (2021)Towards Exploiting CPU Elasticity via Efficient Thread OversubscriptionProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460641(215-226)Online publication date: 21-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media