skip to main content
10.1145/1542275.1542290acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Less reused filter: improving l2 cache performance via filtering less reused lines

Published: 08 June 2009 Publication History

Abstract

The L2 cache is commonly managed using LRU policy. For workloads that have a working set larger than L2 cache, LRU behaves poorly, resulting in a great number of less reused lines that are never reused or reused for few times. In this case, the cache performance can be improved through retaining a portion of working set in cache for a period long enough. Previous schemes approach this by bypassing never reused lines. Nevertheless, severely constrained by the number of never reused lines, sometimes they deliver no benefit due to the lack of never reused lines.
This paper proposes a new filtering mechanism that filters out the less reused lines rather than just never reused lines. The extended scope of bypassing provides more opportunities to fit the working set into cache. This paper also proposes a Less Reused Filter (LRF), a separate structure that precedes L2 cache, to implement the above mechanism. LRF employs a reuse frequency predictor to accurately identify the less reused lines from incoming lines. Meanwhile, based on our observation that most less reused lines have a short life span, LRF places the filtered lines into a small filter buffer to fully utilize them, avoiding extra misses.
Our evaluation, for 24 SPEC 2000 benchmarks, shows that augmenting a 512KB LRU-managed L2 cache with a LRF having 32KB filter buffer reduces the average MPKI by 27.5%, narrowing the gap between LRU and OPT by 74.4%.

References

[1]
A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, and J. Martinez. Scavenger: A new last level cache architecture with global block priority. In MICRO-40, 2007.
[2]
L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems journal, pages 78--101, 1966.
[3]
H. Dybdahl, P. Stenström, and L. Natvig. An lru-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches. In MEDEA '06: Proceedings of the 2006 workshop on MEmory performance, 2006.
[4]
M. Takagi and K. Hiraki. Inter-reference gap distribution replacement: an improved replacement algorithm for set-associative caches. In ICS-18, 2004.
[5]
R. Subramanian, Y. Smaragdakis, and G. H. Loh. Adaptive caches: Effective shaping of cache behavior to workloads. In MICRO-39, 2006.
[6]
W. A. Wong and J.-L. Baer. Modified lru policies for improving second-level cache behavior. In HPCA-6, 2000.
[7]
K. Rajan and G. Ramaswamy. Emulating optimal replacement with a shepherd cache. In MICRO-40, 2007.
[8]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007.
[9]
A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS-9, 1995.
[10]
M. K. Qureshi, D. Thompson, and Y. N. Patt. The v-way cache: Demand based associativity via global replacement. In ISCA-32, 2005.
[11]
Y. Smaragdakis, S. Kaplan, and P. Wilson. The eelru adaptive replacement algorithm. Perform. Eval., 53(2):93--123, 2003.
[12]
K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In ISLPED'99: Proceedings of the 1999 international symposium on Low power electronics and design, 1999.
[13]
B. Calder and D. Grunwald. Next cache line and set prediction. In ISCA-22, 1995.
[14]
Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the memory system: predicting and optimizing memory behavior. In ISCA-02, 2002.
[15]
A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001.
[16]
P. Pujara and A. Aggarwal. Increasing the cache efficiency by eliminating noise. In HPCA-12, 2006.
[17]
W. Lin and S. Reinhardt. Predicting last-touch references under optimal replacement. Technical Report CSE-TR-447-02, University of Michigan, 2002.
[18]
E. J. O'Neil, P. E. O'Neil, and G. Weikum. The lru-k page replacement algorithm for database disk buffering. In SIGMOD'93, 1993.
[19]
T. R. Puzak. Analysis of cache replacement-algorithms. Ph.D. thesis, 1985.
[20]
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990.
[21]
D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. Lrfu: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. Comput., 50(12):1352--1361, 2001.
[22]
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO'08: Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, 2008.
[23]
N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In FAST-2, 2003.
[24]
J. Abella, A. González, X. Vera, and M. F. P. O'Boyle. Iatac: a smart predictor to turn-off l2 cache lines. ACM Trans. Archit. Code Optim., 2(1):55--77, 2005.
[25]
M. Kharbutli and Y. Solihin. Counter-based cache replacement algorithms. In ICCD'05, 2005.
[26]
M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for mlp-aware cache replacement. In ISCA-33, 2006.
[27]
C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. System Sciences, 1989. Vol.I: Architecture Track, Proceedings of the Twenty-Second Annual Hawaii International Conference on, 1:277--285 vol.1, 1989.
[28]
Y. Wu, R. Rakvic, L.-L. Chen, C.-C. Miao, G. Chrysos, and J. Fang. Compiler managed micro-cache bypassing for high performance epic processors. In MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, 2002.
[29]
J. Rivers and E. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In ICPP'96, 1996.
[30]
T. L. Johnson and W. mei W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. In ISCA-24, 1997.
[31]
T. L. Johnson, D. A. Connors, M. C. Merten, and W. mei W. Hwu. Run-time cache bypassing. IEEE Trans. Comput., 48(12):1338--1354, 1999.
[32]
Y. Etsion and D. G. Feitelson. L1 cache filtering through random selection of memory references. In PACT-16, 2007.
[33]
E. S. Tam, J. A. Rivers, V. Srinivasan, G. S. Tyson, and E. S. Davidson. Active management of data caches by exploiting reuse information. IEEE Trans. Comput., 48(11):1244--1259, 1999.
[34]
J. Jalminger and P. Stenstrom. A novel approach to cache block reuse predictions. In ICPP'03, 2003.
[35]
S. Kumar and C. Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In ISCA-25, 1998.
[36]
E. G. Hallnor and S. K. Reinhardt. A fully associative software-managed cache design. In ISCA-27, 2000.

Cited By

View all
  • (2024)GMT: GPU Orchestrated Memory Tiering for the Big Data EraProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651353(464-478)Online publication date: 27-Apr-2024
  • (2024)Skyway: Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data ManagementJournal of Computer Science and Technology10.1007/s11390-023-2939-x39:4(871-894)Online publication date: 20-Sep-2024
  • (2023)ACIC: Admission-Controlled Instruction Cache2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071033(165-178)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Less reused filter: improving l2 cache performance via filtering less reused lines

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '09: Proceedings of the 23rd international conference on Supercomputing
    June 2009
    544 pages
    ISBN:9781605584980
    DOI:10.1145/1542275
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache filtering
    2. less reused line

    Qualifiers

    • Research-article

    Conference

    ICS '09
    Sponsor:
    ICS '09: International Conference on Supercomputing
    June 8 - 12, 2009
    NY, Yorktown Heights, USA

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GMT: GPU Orchestrated Memory Tiering for the Big Data EraProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651353(464-478)Online publication date: 27-Apr-2024
    • (2024)Skyway: Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data ManagementJournal of Computer Science and Technology10.1007/s11390-023-2939-x39:4(871-894)Online publication date: 20-Sep-2024
    • (2023)ACIC: Admission-Controlled Instruction Cache2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071033(165-178)Online publication date: Feb-2023
    • (2021)Dead Page and Dead Block Predictors: Cleaning TLBs and Caches Together2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00050(507-519)Online publication date: Feb-2021
    • (2019)Reducing Data Movement and Energy in Multilevel Cache Hierarchies without Losing Performance: Can you have it all?Proceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00037(382-393)Online publication date: 23-Sep-2019
    • (2017)A DVFS-aware cache bypassing technique for multiple clock domain mobile SoCsIEICE Electronics Express10.1587/elex.14.2017032414:11(20170324-20170324)Online publication date: 2017
    • (2016)A Survey of Cache Bypassing TechniquesJournal of Low Power Electronics and Applications10.3390/jlpea60200056:2(5)Online publication date: 28-Apr-2016
    • (2016)The IBP Replacement Algorithm Based on Process BindingSoftware Engineering and Applications10.12677/SEA.2016.5302005:03(181-189)Online publication date: 2016
    • (2015)Applying SVM to data bypass prediction in multi core last-level cachesIEICE Electronics Express10.1587/elex.12.2015073612:22(20150736-20150736)Online publication date: 2015
    • (2014)Retention Benefit Based Intelligent Cache ReplacementJournal of Computer Science and Technology10.1007/s11390-014-1481-229:6(947-961)Online publication date: 17-Nov-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media