skip to main content
10.1145/2370816.2370868acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Published: 19 September 2012 Publication History

Abstract

Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple on-chip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) cache pollution, i.e., blocks with low reuse evicting blocks with high reuse from the cache, and 2) cache thrashing, i.e., blocks with high reuse evicting each other from the cache.
In this paper, we propose a new, simple mechanism to predict the reuse behavior of missed cache blocks in a manner that mitigates both pollution and thrashing. Our mechanism tracks the addresses of recently evicted blocks in a structure called the Evicted-Address Filter (EAF). Missed blocks whose addresses are present in the EAF are predicted to have high reuse and all other blocks are predicted to have low reuse. The key observation behind this prediction scheme is that if a block with high reuse is prematurely evicted from the cache, it will be accessed soon after eviction. We show that an EAF-implementation using a Bloom filter, which is cleared periodically, naturally mitigates the thrashing problem by ensuring that only a portion of a thrashing working set is retained in the cache, while incurring low storage cost and implementation complexity.
We compare our EAF-based mechanism to five state-of-the-art mechanisms that address cache pollution or thrashing, and show that it provides significant performance improvements for a wide variety of workloads and system configurations.

References

[1]
AMD Phenom II key architectural features. http://goo.gl/iQBfK.
[2]
Intel next generation microarchitecture. http://goo.gl/3eskx.
[3]
Oracle SPARC T4. http://goo.gl/KZSnc.
[4]
Wind River Simics. www.windriver.com/producs/simics.
[5]
S. Bansal and D. S. Modha. CAR: Clock with adaptive replacement. In FAST, 2004.
[6]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. ACM Communications, 13, July 1970.
[7]
J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS, 2007.
[8]
Y. Chen, A. Kumar, and J. Xu. A new design of Bloom Vlter for packet inspection speedup. In GLOBECOM, 2007.
[9]
J. Collins and D. M. Tullsen. Hardware identiVcation of cache conflict misses. In MICRO, 1999.
[10]
C. Ding and K. Kennedy. Improving eUective bandwidth through compiler enhancement of global cache reuse. JPDC, 2004.
[11]
S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 2008.
[12]
A. Goel and P. Gupta. Small subset queries and Bloom Vlters using ternary associative memories, with applications. In SIGMETRICS, 2010.
[13]
E. G. Hallnor and S. K. Reinhardt. A fully associative software managed cache design. In ISCA, 2000.
[14]
Intel. Intel 64 and IA-32 architectures software developer's manuals, 2011.
[15]
R. Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004.
[16]
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In PACT, 2008.
[17]
A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction. In ISCA, 2010.
[18]
S. Jiang and X. Zhang. LIRS: An eXcient low inter-reference recency set replacement policy to improve buffer cache performance. In SIGMETRICS, 2002.
[19]
T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. IEEE TC, 1999.
[20]
T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB, 1994.
[21]
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990.
[22]
R. Kalla, B. Sinharoy, W. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 2010.
[23]
G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD, 2007.
[24]
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.
[25]
Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.
[26]
Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010.
[27]
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO, 2008.
[28]
K. Luo, J. Gummaraju, and M. Franklin. Balancing thoughput and fairness in SMT processors. In ISPASS, 2001.
[29]
M. J. Lyons and D. Brooks. The design of a Bloom Vlter hardware accelerator for ultra low power systems. In ISLPED, 2009.
[30]
N. Megiddo and D. S. Modha. ARC: A self-tuning, low overhead replacement cache. In FAST, 2003.
[31]
K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA, 2007.
[32]
E. J. O'Neil, P. E. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. In SIGMOD, 1993.
[33]
J.-K. Peir, S.-C. Lai, S.-L. Lu, J. Stark, and K. Lai. Bloom filtering cache misses for accurate data speculation and prefetching. In ICS, 2002.
[34]
T. Piquet, O. Rochecouste, and A. Seznec. Exploiting single-usage for effective memory management. In ACSAC, 2007.
[35]
M. K. Qureshi, A. Jaleel, Y. Patt, S. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA, 2007.
[36]
M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA, 2006.
[37]
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.
[38]
M. K. Qureshi, D. Thompson, and Y. N. Patt. The V-way cache: Demand based associativity via global replacement. In ISCA, 2005.
[39]
K. Rajan and G. Ramaswamy. Emulating optimal replacement with a shepherd cache. In MICRO, 2007.
[40]
R. Rajwar, M. Herlihy, and K. Lai. Virtualizing transactional memory. In ISCA, 2005.
[41]
M. V. Ramakrishna, E. Fu, and E. Bahcekapili. Efficient hardware hashing functions for high performance computers. IEEE TC, 1997.
[42]
G. Rivera and C.-W. Tseng. Compiler optimizations for eliminating cache conWict misses. Technical Report UMIACS-TR-97-59, University of Maryland, College Park, 1997.
[43]
J. Rivers and E. S. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In ICPP, 1996.
[44]
D. Rolan, B. B. Fraguela, and R. Doallo. Adaptive line placement with the set balancing cache. In MICRO, 2009.
[45]
D. Rolan, B. B. Fraguela, and R. Doallo. Reducing capacity and conflict misses using set saturation levels. In HiPC, 2010.
[46]
D. Sanchez and C. Kozyrakis. The ZCache: Decoupling ways and associativity. In MICRO, 2010.
[47]
V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The Evicted-Address Filter: A uniVed mechanism to address both cache pollution and thrashing. Technical Report SAFARI 2012-002, CMU, 2012.
[48]
A. Seznec. A case for two-way skewed-associative caches. In ISCA, 1993.
[49]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, 2002.
[50]
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000.
[51]
H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE TC, Sep. 1992.
[52]
G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002.
[53]
G. S. Tyson, M. K. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO, 1995.
[54]
Z. Wang, K. S. McKinley, A. L. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In PACT, 2002.
[55]
C.-J. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr., and J. Emer. SHIP: Signature-based hit predictor for high performance caching. In MICRO, 2011.
[56]
Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA, 2009.
[57]
D. Zhan, H. Jiang, and S. C. Seth. STEM: Spatiotemporal management of capacity for intra-core last level caches. In MICRO, 2010.

Cited By

View all
  • (2024)UniMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692021(463-477)Online publication date: 10-Jul-2024
  • (2024)Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00048(560-577)Online publication date: 2-Mar-2024
  • (2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
  • Show More Cited By

Index Terms

  1. The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
    September 2012
    512 pages
    ISBN:9781450311823
    DOI:10.1145/2370816
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 September 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. caching
    2. insertion policy
    3. memory
    4. pollution
    5. thrashing

    Qualifiers

    • Research-article

    Conference

    PACT '12
    Sponsor:
    • IFIP WG 10.3
    • SIGARCH
    • IEEE CS TCPP
    • IEEE CS TCAA

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)UniMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692021(463-477)Online publication date: 10-Jul-2024
    • (2024)Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00048(560-577)Online publication date: 2-Mar-2024
    • (2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
    • (2024)Leveraging Replacement Algorithm for Improved Cache Management SystemWireless Personal Communications10.1007/s11277-024-11022-5135:1(389-401)Online publication date: 29-Apr-2024
    • (2023)Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUsProceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT58117.2023.00019(124-136)Online publication date: 21-Oct-2023
    • (2023)CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071125(1208-1220)Online publication date: Feb-2023
    • (2023)ACIC: Admission-Controlled Instruction Cache2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071033(165-178)Online publication date: Feb-2023
    • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
    • (2022) RC-RNN: R econfigurable C ache Architecture for Storage Systems Using R ecurrent N eural N etworks IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.310204110:3(1492-1506)Online publication date: 1-Jul-2022
    • (2022)Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth MemoryIEEE Transactions on Computers10.1109/TC.2022.314089771:10(2675-2686)Online publication date: 1-Oct-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media