skip to main content
research-article

Contention-Aware Scheduling on Multicore Systems

Published: 01 December 2010 Publication History

Abstract

Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications and in optimizing system energy consumption.

References

[1]
An Mey, D., Sarholz, S., Terboven, C., van der Pas, R., and Loh, E. 2007. The RWTH Aachen SMP-Cluster User’s Guide, Version 6.2.
[2]
Blagodurov, S., Zhuravlev, S., Lansiquot, S., and Fedorova, A. 2009. Addressing contention on multicore processors via scheduling. Tech. Rep., Simon Fraser University 2009-16.
[3]
Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05). 340--351.
[4]
Cho, S. and Jin, L. 2006. Managing distributed, shared l2 caches through os-level page allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’39). 455--468.
[5]
Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 280--291.
[6]
Dhiman, G., Marchetti, G., and Rosing, T. 2009. vGreen: A system for energy efficient computing in virtualized environments. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED).
[7]
Ebrahimi, E., Mutlu, O., Lee, C. J., and Patt, Y. N. 2009. Coordinated control of multiple prefetchers in multicore systems. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 316--326.
[8]
Fedorova, A., Seltzer, M. I., and Smith, M. D. 2007. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT’07). 25--38.
[9]
Gonzalez, R. and Horowitz, M. 1996. Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circ. 31, 1277--1284.
[10]
Grot, B., Keckler, S. W., and Mutlu, O. 2009. Preemptive virtual clock: A flexible, efficient, and cost-effective qos scheme for networks-on-chip. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 268--279.
[11]
Herdrich, A., Illikkal, R., Iyer, R., Newell, D., Chadha, V., and Moses, J. 2009. Rate-based QoS techniques for cache/memory in cmp platforms. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). 479--488.
[12]
Hoste, K. and Eeckhout, L. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3, 63--72.
[13]
Jiang, Y., Shen, X., Chen, J., and Tripathi, R. 2008. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 220--229.
[14]
Kim, Y., Han, D., Mutlu, O., and Harchol-balter, M. 2010. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’41).
[15]
Knauerhase, R., Brett, P., Hohlt, B., Li, T., and Hahn, S. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66.
[16]
Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. 2008. Prefetch-aware dram controllers. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA’08). 200--209.
[17]
Liedtke, J., Haertig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS’97). 213.
[18]
Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’08). 367--378.
[19]
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. 2005. PIN: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 190--200.
[20]
Moscibroda, T. and Mutlu, O. 2007. Memory performance attacks: denial of memory service in multicore systems. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (SS’07). 1--18.
[21]
Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’40). 146--160.
[22]
Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). 63--74.
[23]
Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’39). 423--432.
[24]
Shelepov, D., and Fedorova, A. 2008. Scheduling on heterogeneous multicore processors using architectural signatures. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA).
[25]
Snavely, A. and Tullsen, D. M. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. SIGARCH Comput. Archit. News 28, 5, 234--244.
[26]
Suh, G. E., Devadas, S., and Rudolph, L. 2002. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA’02). 117.
[27]
Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: sharing-aware acheduling on smp-cmp-smt multiprocessors. In Proceedings of the 2nd ACM European Conference on Computer Systems (EuroSys’07).
[28]
Tam, D. K., Azimi, R., Soares, L. B., and Stumm, M. 2009. Rapidmrc: Approximating l2 miss rate curves on commodity systems for online optimizations. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 121--132.
[29]
Thomas, M. H., Indermaur, T., and Gonzalez, R. 1994. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics. 8--11.
[30]
van der Pas, R. 2005. The OMPlab on sun systems. In Proceedings of the 1st International Workshop on OpenMP.
[31]
Xie, Y. and Loh, G. 2008. Dynamic classification of program memory behaviors in CMPs. In Proceedings of CMP-MSI, (held in conjunction with ISCA-35).
[32]
Zhang, X., Dwarkadas, S., and Shen, K. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys’09). 89--102.

Cited By

View all
  • (2024)Synergies of Operation, Information, and Communication Technology for Solving New Societal and Industrial Challenges: Future DirectionsIEEE Industrial Electronics Magazine10.1109/MIE.2023.332139018:2(6-16)Online publication date: Jun-2024
  • (2024)ITER: an ITERative approach for inter-core timing analysis in statically scheduled cyclic executive systems on COTS multicore platforms for CRTESThe Journal of Supercomputing10.1007/s11227-024-06208-480:13(19719-19770)Online publication date: 1-Sep-2024
  • (2023)ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent MemoryApplied Sciences10.3390/app13201152813:20(11528)Online publication date: 20-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 28, Issue 4
December 2010
100 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/1880018
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2010
Accepted: 01 October 2010
Received: 01 May 2010
Published in TOCS Volume 28, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multicore processors
  2. scheduling
  3. shared resource contention

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)6
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Synergies of Operation, Information, and Communication Technology for Solving New Societal and Industrial Challenges: Future DirectionsIEEE Industrial Electronics Magazine10.1109/MIE.2023.332139018:2(6-16)Online publication date: Jun-2024
  • (2024)ITER: an ITERative approach for inter-core timing analysis in statically scheduled cyclic executive systems on COTS multicore platforms for CRTESThe Journal of Supercomputing10.1007/s11227-024-06208-480:13(19719-19770)Online publication date: 1-Sep-2024
  • (2023)ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent MemoryApplied Sciences10.3390/app13201152813:20(11528)Online publication date: 20-Oct-2023
  • (2023)We need a theoretical framework for the modernization of industrial legacy systemsFrontiers in Industrial Engineering10.3389/fieng.2023.12666511Online publication date: 23-Nov-2023
  • (2023)Revisiting VM-Agnostic KVM vCPU Scheduler for Mitigating Excessive vCPU SpinningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329768834:10(2615-2628)Online publication date: Oct-2023
  • (2023)SpecWands: An Efficient Priority-Based Scheduler Against Speculation Contention AttacksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328429042:12(4477-4490)Online publication date: 1-Dec-2023
  • (2023)The Paradigm of Power Bounded High-Performance ComputingJournal of Computer Science and Technology10.1007/s11390-023-2885-738:1(87-102)Online publication date: 31-Jan-2023
  • (2022)Dynamic performance–Energy tradeoff consolidation with contention-aware resource provisioning in containerized cloudsPLOS ONE10.1371/journal.pone.026185617:1(e0261856)Online publication date: 20-Jan-2022
  • (2022)Fair Scheduling Through Collaborative Filtering on Multicore Systems2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937409(1551-1555)Online publication date: 28-May-2022
  • (2021)Rusty: Runtime Interference-Aware Predictive Monitoring for Modern Multi-Tenant SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.301394832:1(184-198)Online publication date: 1-Jan-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media