research-article

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Authors:

Vivek Seshadri,

Michael A. Kozuch,

Todd C. MowryAuthors Info & Claims

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 355 - 366

https://doi.org/10.1145/2370816.2370868

Published: 19 September 2012 Publication History

Abstract

Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple on-chip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) cache pollution, i.e., blocks with low reuse evicting blocks with high reuse from the cache, and 2) cache thrashing, i.e., blocks with high reuse evicting each other from the cache.

In this paper, we propose a new, simple mechanism to predict the reuse behavior of missed cache blocks in a manner that mitigates both pollution and thrashing. Our mechanism tracks the addresses of recently evicted blocks in a structure called the Evicted-Address Filter (EAF). Missed blocks whose addresses are present in the EAF are predicted to have high reuse and all other blocks are predicted to have low reuse. The key observation behind this prediction scheme is that if a block with high reuse is prematurely evicted from the cache, it will be accessed soon after eviction. We show that an EAF-implementation using a Bloom filter, which is cleared periodically, naturally mitigates the thrashing problem by ensuring that only a portion of a thrashing working set is retained in the cache, while incurring low storage cost and implementation complexity.

We compare our EAF-based mechanism to five state-of-the-art mechanisms that address cache pollution or thrashing, and show that it provides significant performance improvements for a wide variety of workloads and system configurations.

References

[1]

AMD Phenom II key architectural features. http://goo.gl/iQBfK.

[2]

Intel next generation microarchitecture. http://goo.gl/3eskx.

[3]

Oracle SPARC T4. http://goo.gl/KZSnc.

[4]

Wind River Simics. www.windriver.com/producs/simics.

[5]

S. Bansal and D. S. Modha. CAR: Clock with adaptive replacement. In FAST, 2004.

Digital Library

[6]

B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. ACM Communications, 13, July 1970.

Digital Library

[7]

J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS, 2007.

Digital Library

[8]

Y. Chen, A. Kumar, and J. Xu. A new design of Bloom Vlter for packet inspection speedup. In GLOBECOM, 2007.

[9]

J. Collins and D. M. Tullsen. Hardware identiVcation of cache conflict misses. In MICRO, 1999.

Digital Library

[10]

C. Ding and K. Kennedy. Improving eUective bandwidth through compiler enhancement of global cache reuse. JPDC, 2004.

Digital Library

[11]

S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 2008.

Digital Library

[12]

A. Goel and P. Gupta. Small subset queries and Bloom Vlters using ternary associative memories, with applications. In SIGMETRICS, 2010.

Digital Library

[13]

E. G. Hallnor and S. K. Reinhardt. A fully associative software managed cache design. In ISCA, 2000.

Digital Library

[14]

Intel. Intel 64 and IA-32 architectures software developer's manuals, 2011.

[15]

R. Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004.

Digital Library

[16]

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In PACT, 2008.

Digital Library

[17]

A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction. In ISCA, 2010.

Digital Library

[18]

S. Jiang and X. Zhang. LIRS: An eXcient low inter-reference recency set replacement policy to improve buffer cache performance. In SIGMETRICS, 2002.

Digital Library

[19]

T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. IEEE TC, 1999.

Digital Library

[20]

T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB, 1994.

Digital Library

[21]

N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990.

Digital Library

[22]

R. Kalla, B. Sinharoy, W. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 2010.

Digital Library

[23]

G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD, 2007.

[24]

S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.

Digital Library

[25]

Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.

[26]

Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010.

Digital Library

[27]

H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO, 2008.

Digital Library

[28]

K. Luo, J. Gummaraju, and M. Franklin. Balancing thoughput and fairness in SMT processors. In ISPASS, 2001.

[29]

M. J. Lyons and D. Brooks. The design of a Bloom Vlter hardware accelerator for ultra low power systems. In ISLPED, 2009.

Digital Library

[30]

N. Megiddo and D. S. Modha. ARC: A self-tuning, low overhead replacement cache. In FAST, 2003.

Digital Library

[31]

K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA, 2007.

Digital Library

[32]

E. J. O'Neil, P. E. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. In SIGMOD, 1993.

Digital Library

[33]

J.-K. Peir, S.-C. Lai, S.-L. Lu, J. Stark, and K. Lai. Bloom filtering cache misses for accurate data speculation and prefetching. In ICS, 2002.

Digital Library

[34]

T. Piquet, O. Rochecouste, and A. Seznec. Exploiting single-usage for effective memory management. In ACSAC, 2007.

Digital Library

[35]

M. K. Qureshi, A. Jaleel, Y. Patt, S. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA, 2007.

Digital Library

[36]

M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA, 2006.

Digital Library

[37]

M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.

Digital Library

[38]

M. K. Qureshi, D. Thompson, and Y. N. Patt. The V-way cache: Demand based associativity via global replacement. In ISCA, 2005.

Digital Library

[39]

K. Rajan and G. Ramaswamy. Emulating optimal replacement with a shepherd cache. In MICRO, 2007.

Digital Library

[40]

R. Rajwar, M. Herlihy, and K. Lai. Virtualizing transactional memory. In ISCA, 2005.

Digital Library

[41]

M. V. Ramakrishna, E. Fu, and E. Bahcekapili. Efficient hardware hashing functions for high performance computers. IEEE TC, 1997.

Digital Library

[42]

G. Rivera and C.-W. Tseng. Compiler optimizations for eliminating cache conWict misses. Technical Report UMIACS-TR-97-59, University of Maryland, College Park, 1997.

Digital Library

[43]

J. Rivers and E. S. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In ICPP, 1996.

[44]

D. Rolan, B. B. Fraguela, and R. Doallo. Adaptive line placement with the set balancing cache. In MICRO, 2009.

Digital Library

[45]

D. Rolan, B. B. Fraguela, and R. Doallo. Reducing capacity and conflict misses using set saturation levels. In HiPC, 2010.

[46]

D. Sanchez and C. Kozyrakis. The ZCache: Decoupling ways and associativity. In MICRO, 2010.

Digital Library

[47]

V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The Evicted-Address Filter: A uniVed mechanism to address both cache pollution and thrashing. Technical Report SAFARI 2012-002, CMU, 2012.

Digital Library

[48]

A. Seznec. A case for two-way skewed-associative caches. In ISCA, 1993.

Digital Library

[49]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, 2002.

Digital Library

[50]

A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000.

Digital Library

[51]

H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE TC, Sep. 1992.

Digital Library

[52]

G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002.

Digital Library

[53]

G. S. Tyson, M. K. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO, 1995.

Digital Library

[54]

Z. Wang, K. S. McKinley, A. L. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In PACT, 2002.

Digital Library

[55]

C.-J. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr., and J. Emer. SHIP: Signature-based hit predictor for high performance caching. In MICRO, 2011.

Digital Library

[56]

Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA, 2009.

Digital Library

[57]

D. Zhan, H. Jiang, and S. C. Seth. STEM: Spatiotemporal management of capacity for intra-core last level caches. In MICRO, 2010.

Digital Library

Cited By

Zhong YZhou MShen ZShu JBagchi SZhang Y(2024)UniMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692021(463-477)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692021
Yağlıkçı ATuğrul YOliveira GYüksel İOlgun ALuo HMutlu O(2024)Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00048(560-577)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00048
Jamet AVavouliotis GJiménez DAlvarez LCasas M(2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00046
Show More Cited By

Index Terms

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Adaptive insertion policies for high performance caching
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

September 2012

512 pages

ISBN:9781450311823

DOI:10.1145/2370816

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '12

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '12: International Conference on Parallel Architectures and Compilation Techniques

September 19 - 23, 2012

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

93
Total Citations
View Citations
686
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)7

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhong YZhou MShen ZShu JBagchi SZhang Y(2024)UniMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692021(463-477)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692021
Yağlıkçı ATuğrul YOliveira GYüksel İOlgun ALuo HMutlu O(2024)Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00048(560-577)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00048
Jamet AVavouliotis GJiménez DAlvarez LCasas M(2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00046
Srivastava SSingh P(2024)Leveraging Replacement Algorithm for Improved Cache Management SystemWireless Personal Communications10.1007/s11277-024-11022-5135:1(389-401)Online publication date: 29-Apr-2024
https://doi.org/10.1007/s11277-024-11022-5
Joseph DAragón JParcerisa JGonzález A(2023)Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUsProceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT58117.2023.00019(124-136)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1109/PACT58117.2023.00019
Lu XWang RSun X(2023)CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071125(1208-1220)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071125
Wang YChang CSivasubramaniam ASoundararajan N(2023)ACIC: Admission-Controlled Instruction Cache2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071033(165-178)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071033
Usman SMehmood RKatib IAlbeshri A(2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
https://doi.org/10.3390/electronics12010053
Ebrahimi SSalkhordeh ROsia STaheri ARabiee HAsadi H(2022) RC-RNN: R econfigurable C ache Architecture for Storage Systems Using R ecurrent N eural N etworks IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.310204110:3(1492-1506)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TETC.2021.3102041
Behnam PBojnordi M(2022)Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth MemoryIEEE Transactions on Computers10.1109/TC.2022.314089771:10(2675-2686)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TC.2022.3140897
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten