research-article

Block value based insertion policy for high performance last-level caches

Authors:

Xu ChengAuthors Info & Claims

ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Pages 63 - 72

https://doi.org/10.1145/2597652.2597653

Published: 10 June 2014 Publication History

Abstract

Last-level cache performance has been proved to be crucial to the system performance. Essentially, any cache management policy improves performance by retaining blocks that it believes to have higher values preferentially. Most cache management policies use the access time or reuse distance of a block as its value to minimize total miss count. However, cache miss penalty is variable in modern systems due to i) variable memory access latency and ii) the disparity in latency toleration ability across different misses. Some recently proposed policies thus take into account the miss penalty as the block value. However, only considering miss penalty is not enough. In fact, the value of a block includes not only the penalty on its misses, but also the reduction of processor stall cycles on its hits, i.e., hit benefit. Therefore, we propose a method to compute both miss penalty and hit benefit. Then, the value of a block is calculated by accumulating all the miss penalty and hit benefits of its requests. Using our notion of block value, we propose Value based Insertion Policy (VIP) which aims to reserve more blocks with higher values in the cache. VIP keeps track of a small number of incoming and victim block pairs to learn the relationship between the value of the incoming block and that of the victim. On a miss, if the value of the incoming block is learned to be lower than that of the victim block in the past, VIP will predict that the incoming block is valueless and insert it with a high eviction priority. The evaluation shows that VIP can improve cache performance significantly in both single-core and multi-core environment while requiring a low storage overhead.

References

[1]

J. Albericio, P. Ibánez, V. Vinals, and J. M. Llabería. Exploiting reuse locality on inclusive shared last-level caches. ACM Trans. Archit. Code Optim., 9(4):38:1--38:19, 2013.

Digital Library

[2]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, 2011.

Digital Library

[3]

M. Chaudhuri. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In MICRO-42, 2009.

Digital Library

[4]

C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. In HICSS-22, 1989.

[5]

N. Duong, D. Zhao, T. Kim, R. Cammarota, M. Valero, and A. V. Veidenbaum. Improving cache management policies using dynamic reuse distances. In MICRO-45, 2012.

Digital Library

[6]

H. Gao and C. Wilkerson. A dueling segmented LRU replacement algorithm with adaptive bypassing. In JWAC-1, 2010.

[7]

J. Gaur, M. Chaudhuri, and S. Subramoney. Bypass and insertion algorithms for exclusive last-level caches. In ISCA-38, 2011.

Digital Library

[8]

A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS-9, 1995.

[9]

J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34:1--17, 2006.

Digital Library

[10]

Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the memory system: Predicting and optimizing memory behavior. In ISCA-29, 2002.

Digital Library

[11]

Intel. Intel Core i7 processor. http://www.intel.com/products/\\processor/corei7/.

[12]

A. Jaleel, E. Borch, M. Bhandaru, S. Steely, and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In MICRO-43, 2010.

Digital Library

[13]

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In PACT-17, 2008.

Digital Library

[14]

A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In ISCA-37, 2010.

Digital Library

[15]

J. Jalminger and P. Stenstrom. A novel approach to cache block reuse predictions. In ICPP '03, 2003.

[16]

J. Jeong and M. Dubois. Optimal replacements in caches with two miss costs. In SPAA-17, 1999.

Digital Library

[17]

J. Jeong and M. Dubois. Cost-sensitive cache replacement algorithms. In HPCA-9, 2003.

Digital Library

[18]

J. Jeong and M. Dubois. Cache replacement algorithms with nonuniform miss costs. Computers, IEEE Transactions on, 55(4):353--365, 2006.

Digital Library

[19]

J. Jeong, P. Stenström, and M. Dubois. Simple penalty-sensitive replacement policies for caches. In CF-3, 2006.

Digital Library

[20]

D. A. Jiménez. Insertion and promotion for tree-based PseudoLRU last-level caches. In MICRO-46, 2013.

Digital Library

[21]

T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. Computers, IEEE Transactions on, 48(12):1338 --1354, 1999.

Digital Library

[22]

R. D.-c. Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In ISCA-28, 2001.

[23]

D. Kaseridis, M. Iqbal, and L. John. Cache friendliness-aware management of shared last-level caches for high performance multi-core systems. Computers, IEEE Transactions on, 2013.

[24]

G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD-25, 2007.

[25]

S. M. Khan, Y. Tian, and D. A. Jimenez. Sampling dead block prediction for last-level caches. In MICRO-43, 2010.

Digital Library

[26]

M. Kharbutli and R. Sheikh. LACS: A locality-aware cost-sensitive cache replacement algorithm. Computers, IEEE Transactions on, 2013.

[27]

M. Kharbutli and Y. Solihin. Counter-based cache replacement and bypassing algorithms. Computers, IEEE Transactions on, 57(4):433 --447, 2008.

Digital Library

[28]

D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981.

Digital Library

[29]

A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001.

Digital Library

[30]

L. Li, D. Tong, Z. Xie, J. Lu, and X. Cheng. Optimal bypass monitor for high performance last-level caches. In PACT-21, 2012.

Digital Library

[31]

H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO-41, 2008.

Digital Library

[32]

G. H. Loh and M. D. Hill. Efficiently enabling conventional block sizes for very large die-stacked dram caches. In MICRO-44, 2011.

Digital Library

[33]

R. Manikantan, K. Rajan, and R. Govindarajan. NUcache: An efficient multicore cache organization based on next-use distance. In HPCA-17, 2011.

Digital Library

[34]

R. Manikantan, K. Rajan, and R. Govindarajan. Probabilistic shared cache management (PriSM). In ISCA-39, 2012.

Digital Library

[35]

M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, 2012.

Digital Library

[36]

M. Moreto, F. Cazorla, A. Ramirez, and M. Valero. MLP-aware dynamic cache partitioning. In HiPEAC '08. 2008.

Digital Library

[37]

E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. SIGMETRICS Perform. Eval. Rev., 31:318--319, 2003.

Digital Library

[38]

M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007.

Digital Library

[39]

M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA-33, 2006.

Digital Library

[40]

M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006.

Digital Library

[41]

J. A. Rivers, E. S. Tam, G. S. Tyson, E. S. Davidson, and M. Farrens. Utilizing reuse information in data cache management. In ICS-12, 1998.

Digital Library

[42]

D. Sanchez and C. Kozyrakis. Vantage: Scalable and efficient fine-grain cache partitioning. In ISCA-38, 2011.

Digital Library

[43]

R. Sheikh and M. Kharbutli. Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms. In ICCD-28, 2010.

[44]

G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO-28, 1995.

Digital Library

[45]

S. P. Vanderwiel and D. J. Lilja. Data prefetch mechanisms. ACM Comput. Surv., 32(2):174--199, 2000.

Digital Library

[46]

C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr, and J. Emer. SHiP: Signature-based hit predictor for high performance caching. In MICRO-44, 2011.

Digital Library

[47]

Y. Xie. Modeling, architecture, and applications for emerging memory technologies. Design Test of Computers, IEEE, 28(1):44--51, 2011.

Digital Library

[48]

Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA-36, 2009.

Digital Library

Cited By

Wu JZhao ELi SWang Y(2021)Intelligent fitting global real‐time task scheduling strategy for high‐performance multi‐core systemsCAAI Transactions on Intelligence Technology10.1049/cit2.120637:2(244-255)Online publication date: 9-Sep-2021
https://doi.org/10.1049/cit2.12063
Ovant BGuney ISavas MKucuk G(2016)Allocation of last level cache partitions through thread classification with parallel universes2016 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2016.7568336(204-212)Online publication date: Jul-2016
https://doi.org/10.1109/HPCSim.2016.7568336
Dash BSwain DSwain D(2015)A Pragmatic Delineation on Cache Bypass Algorithm in Last-Level Cache (LLC)Computational Intelligence in Data Mining—Volume 210.1007/978-81-322-2731-1_4(37-45)Online publication date: 10-Dec-2015
https://doi.org/10.1007/978-81-322-2731-1_4

Index Terms

Block value based insertion policy for high performance last-level caches
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Bypass and insertion algorithms for exclusive last-level caches
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Inclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared ...
MRU-Tour-based Replacement Algorithms for Last-Level Caches
SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing

Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, ...
Optimal bypass monitor for high performance last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

In the last-level cache, large amounts of blocks have reuse distances greater than the available cache capacity. Cache performance and efficiency can be improved if some subset of these distant reuse blocks can reside in the cache longer. The bypass ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

June 2014

378 pages

ISBN:9781450326421

DOI:10.1145/2597652

General Chairs:
Arndt Bode
Technische Universität München and Leibniz Rechenzentrum, Germany
,
Michael Gerndt
Technische Universität München, Germany
,
Program Chairs:
Per Stenström
Chalmers University of Technology, Sweden
,
Lawrence Rauchwerger
Texas A&M University, USA
,
Barton Miller
University of Wisconsin, USA
,
Martin Schulz
Lawrence Livermore National Laboratory, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science and Technology Major Project of the Ministry of Science and Technology of China

Conference

ICS'14

Sponsor:

SIGARCH

ICS'14: 2014 International Conference on Supercomputing

June 10 - 13, 2014

Munich, Germany

Acceptance Rates

ICS '14 Paper Acceptance Rate 34 of 160 submissions, 21%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
265
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu JZhao ELi SWang Y(2021)Intelligent fitting global real‐time task scheduling strategy for high‐performance multi‐core systemsCAAI Transactions on Intelligence Technology10.1049/cit2.120637:2(244-255)Online publication date: 9-Sep-2021
https://doi.org/10.1049/cit2.12063
Ovant BGuney ISavas MKucuk G(2016)Allocation of last level cache partitions through thread classification with parallel universes2016 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2016.7568336(204-212)Online publication date: Jul-2016
https://doi.org/10.1109/HPCSim.2016.7568336
Dash BSwain DSwain D(2015)A Pragmatic Delineation on Cache Bypass Algorithm in Last-Level Cache (LLC)Computational Intelligence in Data Mining—Volume 210.1007/978-81-322-2731-1_4(37-45)Online publication date: 10-Dec-2015
https://doi.org/10.1007/978-81-322-2731-1_4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents