research-article

CRUISE: cache replacement and utility-aware scheduling

Authors:

Hashem H. Najaf-abadi,

Samantika Subramaniam,

Simon C. Steely,

Joel EmerAuthors Info & Claims

ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Pages 249 - 260

https://doi.org/10.1145/2150976.2151003

Published: 03 March 2012 Publication History

Abstract

When several applications are co-scheduled to run on a system with multiple shared LLCs, there is opportunity to improve system performance. This opportunity can be exploited by the hardware, software, or a combination of both hardware and software. The software, i.e., an operating system or hypervisor, can improve system performance by co-scheduling jobs on LLCs to minimize shared cache contention. The hardware can improve system throughput through better replacement policies by allocating more cache resources to applications that benefit from the cache and less to those applications that do not. This study presents a detailed analysis on the interactions between intelligent scheduling and smart cache replacement policies. We find that smart cache replacement reduces the burden on software to provide intelligent scheduling decisions. However, under smart cache replacement, there is still room to improve performance from better application co-scheduling. We find that co-scheduling decisions are a function of the underlying LLC replacement policy. We propose Cache Replacement and Utility-aware Scheduling (CRUISE)-a hardware/software co-designed approach for shared cache management. For 4-core and 8-core CMPs, we find that CRUISE approaches the performance of an ideal job co-scheduling policy under different LLC replacement policies.

References

[1]

Intel Corporation. Next leap in microprocessor architecture: Intel core duo. White paper. http://ces2006.akamai.com.edgesuite.net/yonahassets/CoreDuo_WhitePaper.pdf.

[2]

Intel. Intel Core i7 Processor. http://www.intel.com/products/processor/corei7/specifications.ht.

[3]

H. Al-Zoubi, A. Milenkovic and M. Milenkovic. Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite. In ACMSE, 2004.

Digital Library

[4]

J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. ICS-21, 2007.

Digital Library

[5]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-thread Cache Contention on a CMP. In HPCA, 2005.

Digital Library

[6]

S. Cho and L. Jin. Managing distributed shared L2 caches through OS-level Page Allocation. In MICRO-39, 2006.

Digital Library

[7]

A. Fedorova, M. I. Seltzer, and M. D. Smith. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In PACT, 2007.

Digital Library

[8]

R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS-18, 2004.

Digital Library

[9]

A. Jaleel, R. S. Cohn, C. K. Luk, and B. Jacob. CMP$im: A Pin-Based On-The-Fly Multi-Core Cache Simulator. In MoBS, 2008.

[10]

A. Jaleel, K. Theobald, S. Steely, and J. Emer. High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP). In ISCA-2010.

Digital Library

[11]

A. Jaleel, E. Borch, M. Bhandaru, S. Steely, and J. Emer. Achieving Non-Inclusive Cache Performance With Inclusive Caches -- Temporal Locality Aware (TLA) Cache Management Policies, In MICRO, 2010.

Digital Library

[12]

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, and J. Emer. Adaptive Insertion Policies for Managing Shared Caches. In PACT, 2008.

Digital Library

[13]

R. Kalla, B. Sinharoy, and J. M. Tendler. IBM Power5 chip: A Dual-Core Multi-Threaded Processor. IEEE Micro, 2004.

Digital Library

[14]

S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a CMP architecture. In PACT-13, 2004.

Digital Library

[15]

R, Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. In IEEE-MICRO 2008.

Digital Library

[16]

P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, March/April 2005.

Digital Library

[17]

J. Liedtke, H. Haertig, and M. Hohmuth. OS-Controlled Cache Predicatbility for Real-Time Systems. In RTAS-97.

Digital Library

[18]

J. Lin, Q. Lu, X. Ding, Z. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA, 2008.

[19]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.

Digital Library

[20]

K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in smt processors. In ISPASS, pages 164--171, 2001.

[21]

J. Moses, K. Aisopos, A. Jaleel, R. Iyer, R. Illikkal, D. Newell, and S. Makineni. CMPSched$im: Evaluating OS/CMP Interaction on Shared Cache Management, In ISPASS, 2009.

[22]

K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA-34, pages 57--68, 2007.

Digital Library

[23]

H. Patil, R. Cohn, M. Charney, R. Kapoor, and A. Sun. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO, 2004.

Digital Library

[24]

M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely Jr., and J. Emer. Adaptive insertion policies for high-performance caching. In ISCA-34, 2007.

Digital Library

[25]

M. K. Qureshi and Y. Patt. Utility Based Cache Partitioning: A Low Overhead High-Performance Runtime Mechanism to Partition Shared Caches. In MICRO-39, 2006.

Digital Library

[26]

M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A Case for MLP-Aware Cache Replacement. In ISCA-33, 2006.

Digital Library

[27]

S. Srinath, O.Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007.

Digital Library

[28]

A. Snavely and D. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. In ASPLOS IX, 2000.

Digital Library

[29]

H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Transactions on Computers, 1992.

Digital Library

[30]

G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. In Journal of Supercomputing, 2004.

Digital Library

[31]

G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In HPCA'2002.

Digital Library

[32]

L. Tang, J. Mars, N. Vachharajani, R. Hundt, M. Soffa. The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. In ISCA, 2011.

Digital Library

[33]

J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Paper, Oct. 2001.

Digital Library

[34]

Y. Xie and G. H. Loh. PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches. In ISCA, 2009.

Digital Library

[35]

Y. Xie and G. H. Loh. Dynamic Classification of Program Memory Behaviors in CMPs. In CMP-MSI, 2008.

[36]

X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring -based multicore cache management. In EuroSys, 2009.

Digital Library

[37]

S. Zhuravlev, S. Blagodurov and A. Fedorova. AKULA: A Toolset for Developing Scheduling Algorithms on Multicore Systems. In PACT, 2010.

Digital Library

[38]

S. Zhuravlev, S. Blagodurov and A. Fedorova. Addressing Shared Resource Contention in Multicore Processors via Scheduling. In ASPLOS, 2010.

Digital Library

Cited By

Mururu GNi KGavrilovska APande SEgger BLee D(2023)PinIt: Influencing OS Scheduling via Compiler-Induced AffinitiesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596279(87-98)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1145/3589610.3596279
Fang JKong HYang HXu YCai M(2022)A Heterogeneity-Aware Replacement Policy for the Partitioned Cache on Asymmetric Multi-Core ArchitecturesMicromachines10.3390/mi1311201413:11(2014)Online publication date: 18-Nov-2022
https://doi.org/10.3390/mi13112014
Kundan SMarinakis TAnagnostopoulos IKagaris D(2022)A Pressure-Aware Policy for Contention Minimization on Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/352461619:3(1-26)Online publication date: 25-May-2022
https://dl.acm.org/doi/10.1145/3524616
Show More Cited By

Index Terms

CRUISE: cache replacement and utility-aware scheduling
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
PACMan: prefetch-aware cache management for high performance caching
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Hardware prefetching and last-level cache (LLC) management are two independent mechanisms to mitigate the growing latency to memory. However, the interaction between LLC management and hardware prefetching has received very little attention. This paper ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

March 2012

476 pages

ISBN:9781450307598

DOI:10.1145/2150976

General Chair:
Tim Harris
Microsoft Research
,
Program Chair:
Michael L. Scott
University of Rochester

ACM SIGARCH Computer Architecture News Volume 40, Issue 1
ASPLOS '12
March 2012
453 pages
ISSN:0163-5964
DOI:10.1145/2189750
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 47, Issue 4
ASPLOS '12
April 2012
453 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2248487
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS'12

Sponsor:

ASPLOS'12: Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems

March 3 - 7, 2012

London, England, UK

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
1,243
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)2

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mururu GNi KGavrilovska APande SEgger BLee D(2023)PinIt: Influencing OS Scheduling via Compiler-Induced AffinitiesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596279(87-98)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1145/3589610.3596279
Fang JKong HYang HXu YCai M(2022)A Heterogeneity-Aware Replacement Policy for the Partitioned Cache on Asymmetric Multi-Core ArchitecturesMicromachines10.3390/mi1311201413:11(2014)Online publication date: 18-Nov-2022
https://doi.org/10.3390/mi13112014
Kundan SMarinakis TAnagnostopoulos IKagaris D(2022)A Pressure-Aware Policy for Contention Minimization on Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/352461619:3(1-26)Online publication date: 25-May-2022
https://dl.acm.org/doi/10.1145/3524616
Zhang YChen LBai J(2022)Real-time Prediction Model of Cache Miss Rate Based on Local Memory Access Characteristics2022 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA)10.1109/ICMTMA54903.2022.00164(798-803)Online publication date: Jan-2022
https://doi.org/10.1109/ICMTMA54903.2022.00164
Kundan SAnagnostopoulos I(2021)Priority-Aware Scheduling Under Shared-Resource Contention on Chip Multicore Processors2021 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS51556.2021.9401337(1-5)Online publication date: May-2021
https://doi.org/10.1109/ISCAS51556.2021.9401337
Nikas KPapadopoulou NGiantsidi DKarakostas VGoumas GKoziris N(2019)DICERProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337891(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337891
Malik MGhasemzadeh HMohsenin TCammarota RZhao LSasan AHomayoun HRafatirad S(2019)ECoSTProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337834(1-11)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337834
Dublish SNagarajan VTopham N(2019)Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs Using Machine Learning2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00061(492-505)Online publication date: Feb-2019
https://doi.org/10.1109/HPCA.2019.00061
Ma ZWang HShi KWang X(2018)Learning Automata Based Caching for Efficient Data Access in Delay Tolerant NetworksWireless Communications & Mobile Computing10.1155/2018/38069072018(8)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1155/2018/3806907
Nesterenko BYi QRao J(2018)Improving Resource Utilization through Demand Aware Process SchedulingProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225132(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3225058.3225132
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten