skip to main content
10.1145/1669112.1669164acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Published:12 December 2009Publication History

ABSTRACT

Cache blocks often exhibit a small number of uses during their life time in the last-level cache. Past research has exploited this property in two different ways. First, replacement policies have been designed to evict dead blocks early and retain the potentially live blocks. Second, dynamic insertion policies attempt to victimize single-use blocks (dead on fill) as early as possible, thereby leaving most of the working set undisturbed in the cache. However, we observe that as the last-level cache grows in capacity and associativity, the traditional dead block prediction-based replacement policy loses effectiveness because often the LRU block itself is dead leading to an LRU replacement decision. The benefit of dynamic insertion policies is also small in a large class of applications that exhibit a significant number of cache blocks with small, yet more than one, uses.

To address these drawbacks, we introduce pseudo-last-in-first-out (pseudo-LIFO), a fundamentally new family of replacement heuristics that manages each cache set as a fill stack (as opposed to the traditional access recency stack). We specify three members of this family, namely, dead block prediction LIFO, probabilistic escape LIFO, and probabilistic counter LIFO. The probabilistic escape LIFO (peLIFO) policy is the central contribution of this paper. It dynamically learns the use probabilities of cache blocks beyond each fill stack position to implement a new replacement policy. Our detailed simulation results show that peLIFO, while having less than 1% storage overhead, reduces the execution time by 10% on average compared to a baseline LRU replacement policy for a set of fourteen single-threaded applications on a 2 MB 16-way set associative L2 cache. It reduces the average CPI by 19% on average for a set of twelve multiprogrammed workloads while satisfying a strong fairness requirement on a four-core chip-multiprocessor with an 8 MB 16-way set associative shared L2 cache. Further, it reduces the parallel execution time by 17% on average for a set of six multi-threaded programs on an eight-core chip-multiprocessor with a 4 MB 16-way set associative shared L2 cache. For the architectures considered in this paper, the storage overhead of the peLIFO policy is one-fifth to half of that of a state-of-the-art dead block prediction-based replacement policy. However, the peLIFO policy delivers better average performance for the selected single-threaded and multiprogrammed workloads and similar average performance for the multi-threaded workloads compared to the dead block prediction-based replacement policy.

References

  1. A. Basu et al. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In Proc. of the 40th Intl. Symp. on Microarchitecture, pages 421--432, December 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. HP Labs. CACTI 4.2. Available at http://www.hpl.hp.com/personal/Norman_Jouppi/cacti4.html.Google ScholarGoogle Scholar
  3. Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc. of the 29th Intl. Symp. on Computer Architecture, pages 209--220, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Jaleel et al. Adaptive Insertion Policies for Managing Shared Caches. In Proc. of the 17th Intl. Conf. on Parallel Architecture and Compilation Techniques, pages 208--219, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. P. Jouppi. Improving Direct-mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers. In Proc. of the 17th Intl. Symp. on Computer Architecture, pages 364--373, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. E. Kessler and M. D. Hill. Page Placement Algorithms for Large Real-indexed Caches. In ACM Transactions on Computer Systems, 10(4): 338--359, November 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A-C. Lai, C. Fide, and B. Falsafi. Dead-block Prediction&Dead-block Correlating Prefetchers. In Proc. of the 28th Intl. Symp. on Computer Architecture, pages 144--154, June/July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. of the 24th Intl. Symp. on Computer Architecture, pages 241--251, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Liu et al. Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency. In Proc. of the 41st Intl. Symp. on Microarchitecture, pages 222--233, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Intl. Symp. on Microarchitecture, pages 423--432, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. K. Qureshi et al. Adaptive Insertion Policies for High Performance Caching. In Proc. of the 34th Intl. Symp. on Computer Architecture, pages 381--391, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Sherwood et al. Automatically Characterizing Large Scale Program Behavior. In Proc. of the 10th Intl. Conf. on Architectural Support on Programming Languages and Operating Systems, pages 45--57, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Srikantaiah, M. Kandemir, and M. J. Irwin. Adaptive Set Pinning: Managing Shared Caches in Chip Multiprocessors. In Proc. of the 13th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 135--144, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. K. Tam et al. RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations. In Proc. of the 14th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 121--132, March 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Xie and G. H. Loh. PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-core Shared Caches. In Proc. of the 36th Intl. Symp. on Computer Architecture, pages 174--183, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Zhang, Z. Zhu, and X. Zhang. A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. In Proc. of the 33rd Intl. Symp. on Microarchitecture, pages 32--41, December 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
      December 2009
      601 pages
      ISBN:9781605587981
      DOI:10.1145/1669112

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 December 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader