research-article

Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Author:
Mainak Chaudhuri

Indian Institute of Technology, Kanpur, India

Indian Institute of Technology, Kanpur, India
View Profile

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2009Pages 401–412https://doi.org/10.1145/1669112.1669164

Published:12 December 2009Publication History

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 401–412

ABSTRACT

Cache blocks often exhibit a small number of uses during their life time in the last-level cache. Past research has exploited this property in two different ways. First, replacement policies have been designed to evict dead blocks early and retain the potentially live blocks. Second, dynamic insertion policies attempt to victimize single-use blocks (dead on fill) as early as possible, thereby leaving most of the working set undisturbed in the cache. However, we observe that as the last-level cache grows in capacity and associativity, the traditional dead block prediction-based replacement policy loses effectiveness because often the LRU block itself is dead leading to an LRU replacement decision. The benefit of dynamic insertion policies is also small in a large class of applications that exhibit a significant number of cache blocks with small, yet more than one, uses.

To address these drawbacks, we introduce pseudo-last-in-first-out (pseudo-LIFO), a fundamentally new family of replacement heuristics that manages each cache set as a fill stack (as opposed to the traditional access recency stack). We specify three members of this family, namely, dead block prediction LIFO, probabilistic escape LIFO, and probabilistic counter LIFO. The probabilistic escape LIFO (peLIFO) policy is the central contribution of this paper. It dynamically learns the use probabilities of cache blocks beyond each fill stack position to implement a new replacement policy. Our detailed simulation results show that peLIFO, while having less than 1% storage overhead, reduces the execution time by 10% on average compared to a baseline LRU replacement policy for a set of fourteen single-threaded applications on a 2 MB 16-way set associative L2 cache. It reduces the average CPI by 19% on average for a set of twelve multiprogrammed workloads while satisfying a strong fairness requirement on a four-core chip-multiprocessor with an 8 MB 16-way set associative shared L2 cache. Further, it reduces the parallel execution time by 17% on average for a set of six multi-threaded programs on an eight-core chip-multiprocessor with a 4 MB 16-way set associative shared L2 cache. For the architectures considered in this paper, the storage overhead of the peLIFO policy is one-fifth to half of that of a state-of-the-art dead block prediction-based replacement policy. However, the peLIFO policy delivers better average performance for the selected single-threaded and multiprogrammed workloads and similar average performance for the multi-threaded workloads compared to the dead block prediction-based replacement policy.

References

A. Basu et al. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In Proc. of the 40th Intl. Symp. on Microarchitecture, pages 421--432, December 2007. Google ScholarDigital Library
HP Labs. CACTI 4.2. Available at http://www.hpl.hp.com/personal/Norman_Jouppi/cacti4.html.Google Scholar
Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc. of the 29th Intl. Symp. on Computer Architecture, pages 209--220, May 2002. Google ScholarDigital Library
A. Jaleel et al. Adaptive Insertion Policies for Managing Shared Caches. In Proc. of the 17th Intl. Conf. on Parallel Architecture and Compilation Techniques, pages 208--219, October 2008. Google ScholarDigital Library
N. P. Jouppi. Improving Direct-mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers. In Proc. of the 17th Intl. Symp. on Computer Architecture, pages 364--373, June 1990. Google ScholarDigital Library
R. E. Kessler and M. D. Hill. Page Placement Algorithms for Large Real-indexed Caches. In ACM Transactions on Computer Systems, 10(4): 338--359, November 1992. Google ScholarDigital Library
A-C. Lai, C. Fide, and B. Falsafi. Dead-block Prediction&Dead-block Correlating Prefetchers. In Proc. of the 28th Intl. Symp. on Computer Architecture, pages 144--154, June/July 2001. Google ScholarDigital Library
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. of the 24th Intl. Symp. on Computer Architecture, pages 241--251, June 1997. Google ScholarDigital Library
H. Liu et al. Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency. In Proc. of the 41st Intl. Symp. on Microarchitecture, pages 222--233, November 2008. Google ScholarDigital Library
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Intl. Symp. on Microarchitecture, pages 423--432, December 2006. Google ScholarDigital Library
M. K. Qureshi et al. Adaptive Insertion Policies for High Performance Caching. In Proc. of the 34th Intl. Symp. on Computer Architecture, pages 381--391, June 2007. Google ScholarDigital Library
T. Sherwood et al. Automatically Characterizing Large Scale Program Behavior. In Proc. of the 10th Intl. Conf. on Architectural Support on Programming Languages and Operating Systems, pages 45--57, October 2002. Google ScholarDigital Library
S. Srikantaiah, M. Kandemir, and M. J. Irwin. Adaptive Set Pinning: Managing Shared Caches in Chip Multiprocessors. In Proc. of the 13th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 135--144, March 2008. Google ScholarDigital Library
D. K. Tam et al. RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations. In Proc. of the 14th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 121--132, March 2009. Google ScholarDigital Library
Y. Xie and G. H. Loh. PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-core Shared Caches. In Proc. of the 36th Intl. Symp. on Computer Architecture, pages 174--183, June 2009. Google ScholarDigital Library
Z. Zhang, Z. Zhu, and X. Zhang. A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. In Proc. of the 33rd Intl. Symp. on Microarchitecture, pages 32--41, December 2000. Google ScholarDigital Library

Index Terms

Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

Inclusive caches have been widely used in Chip Multiprocessors (CMPs) to simplify cache coherence. However, they have poor performance compared with noninclusive caches not only because of the limited capacity of the entire cache hierarchy but also due ...
Read More
WADE: Writeback-aware dynamic cache management for NVM-based main memory system

Emerging Non-Volatile Memory (NVM) technologies are explored as potential alternatives to traditional SRAM/DRAM-based memory architecture in future microprocessor design. One of the major disadvantages for NVM is the latency and energy overhead ...
Read More
Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

Die-stacked DRAM technology enables a large Last Level Cache (LLC) that provides high bandwidth data access to the processor. However, it requires a large tag array that may take a significant portion of the on-chip SRAM budget. To reduce this SRAM ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
December 2009
601 pages
ISBN:9781605587981
DOI:10.1145/1669112
General Chairs:
David Albonesi
Cornell
,
Margaret Martonosi
Princeton
,
Program Chairs:
David August
Princeton/Parakinetics
,
José Martínez
Cornell
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 December 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chip-multiprocessor
last-level cache
replacement policy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 75
  Total Citations
  View Citations
- 663
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

WADE: Writeback-aware dynamic cache management for NVM-based main memory system

Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

WADE: Writeback-aware dynamic cache management for NVM-based main memory system

Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media