skip to main content
10.1145/2597652.2597653acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Block value based insertion policy for high performance last-level caches

Published: 10 June 2014 Publication History

Abstract

Last-level cache performance has been proved to be crucial to the system performance. Essentially, any cache management policy improves performance by retaining blocks that it believes to have higher values preferentially. Most cache management policies use the access time or reuse distance of a block as its value to minimize total miss count. However, cache miss penalty is variable in modern systems due to i) variable memory access latency and ii) the disparity in latency toleration ability across different misses. Some recently proposed policies thus take into account the miss penalty as the block value. However, only considering miss penalty is not enough. In fact, the value of a block includes not only the penalty on its misses, but also the reduction of processor stall cycles on its hits, i.e., hit benefit. Therefore, we propose a method to compute both miss penalty and hit benefit. Then, the value of a block is calculated by accumulating all the miss penalty and hit benefits of its requests. Using our notion of block value, we propose Value based Insertion Policy (VIP) which aims to reserve more blocks with higher values in the cache. VIP keeps track of a small number of incoming and victim block pairs to learn the relationship between the value of the incoming block and that of the victim. On a miss, if the value of the incoming block is learned to be lower than that of the victim block in the past, VIP will predict that the incoming block is valueless and insert it with a high eviction priority. The evaluation shows that VIP can improve cache performance significantly in both single-core and multi-core environment while requiring a low storage overhead.

References

[1]
J. Albericio, P. Ibánez, V. Vinals, and J. M. Llabería. Exploiting reuse locality on inclusive shared last-level caches. ACM Trans. Archit. Code Optim., 9(4):38:1--38:19, 2013.
[2]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, 2011.
[3]
M. Chaudhuri. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In MICRO-42, 2009.
[4]
C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. In HICSS-22, 1989.
[5]
N. Duong, D. Zhao, T. Kim, R. Cammarota, M. Valero, and A. V. Veidenbaum. Improving cache management policies using dynamic reuse distances. In MICRO-45, 2012.
[6]
H. Gao and C. Wilkerson. A dueling segmented LRU replacement algorithm with adaptive bypassing. In JWAC-1, 2010.
[7]
J. Gaur, M. Chaudhuri, and S. Subramoney. Bypass and insertion algorithms for exclusive last-level caches. In ISCA-38, 2011.
[8]
A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS-9, 1995.
[9]
J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34:1--17, 2006.
[10]
Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the memory system: Predicting and optimizing memory behavior. In ISCA-29, 2002.
[11]
Intel. Intel Core i7 processor. http://www.intel.com/products/\\processor/corei7/.
[12]
A. Jaleel, E. Borch, M. Bhandaru, S. Steely, and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In MICRO-43, 2010.
[13]
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In PACT-17, 2008.
[14]
A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In ISCA-37, 2010.
[15]
J. Jalminger and P. Stenstrom. A novel approach to cache block reuse predictions. In ICPP '03, 2003.
[16]
J. Jeong and M. Dubois. Optimal replacements in caches with two miss costs. In SPAA-17, 1999.
[17]
J. Jeong and M. Dubois. Cost-sensitive cache replacement algorithms. In HPCA-9, 2003.
[18]
J. Jeong and M. Dubois. Cache replacement algorithms with nonuniform miss costs. Computers, IEEE Transactions on, 55(4):353--365, 2006.
[19]
J. Jeong, P. Stenström, and M. Dubois. Simple penalty-sensitive replacement policies for caches. In CF-3, 2006.
[20]
D. A. Jiménez. Insertion and promotion for tree-based PseudoLRU last-level caches. In MICRO-46, 2013.
[21]
T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. Computers, IEEE Transactions on, 48(12):1338 --1354, 1999.
[22]
R. D.-c. Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In ISCA-28, 2001.
[23]
D. Kaseridis, M. Iqbal, and L. John. Cache friendliness-aware management of shared last-level caches for high performance multi-core systems. Computers, IEEE Transactions on, 2013.
[24]
G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD-25, 2007.
[25]
S. M. Khan, Y. Tian, and D. A. Jimenez. Sampling dead block prediction for last-level caches. In MICRO-43, 2010.
[26]
M. Kharbutli and R. Sheikh. LACS: A locality-aware cost-sensitive cache replacement algorithm. Computers, IEEE Transactions on, 2013.
[27]
M. Kharbutli and Y. Solihin. Counter-based cache replacement and bypassing algorithms. Computers, IEEE Transactions on, 57(4):433 --447, 2008.
[28]
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981.
[29]
A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001.
[30]
L. Li, D. Tong, Z. Xie, J. Lu, and X. Cheng. Optimal bypass monitor for high performance last-level caches. In PACT-21, 2012.
[31]
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO-41, 2008.
[32]
G. H. Loh and M. D. Hill. Efficiently enabling conventional block sizes for very large die-stacked dram caches. In MICRO-44, 2011.
[33]
R. Manikantan, K. Rajan, and R. Govindarajan. NUcache: An efficient multicore cache organization based on next-use distance. In HPCA-17, 2011.
[34]
R. Manikantan, K. Rajan, and R. Govindarajan. Probabilistic shared cache management (PriSM). In ISCA-39, 2012.
[35]
M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, 2012.
[36]
M. Moreto, F. Cazorla, A. Ramirez, and M. Valero. MLP-aware dynamic cache partitioning. In HiPEAC '08. 2008.
[37]
E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. SIGMETRICS Perform. Eval. Rev., 31:318--319, 2003.
[38]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007.
[39]
M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA-33, 2006.
[40]
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006.
[41]
J. A. Rivers, E. S. Tam, G. S. Tyson, E. S. Davidson, and M. Farrens. Utilizing reuse information in data cache management. In ICS-12, 1998.
[42]
D. Sanchez and C. Kozyrakis. Vantage: Scalable and efficient fine-grain cache partitioning. In ISCA-38, 2011.
[43]
R. Sheikh and M. Kharbutli. Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms. In ICCD-28, 2010.
[44]
G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO-28, 1995.
[45]
S. P. Vanderwiel and D. J. Lilja. Data prefetch mechanisms. ACM Comput. Surv., 32(2):174--199, 2000.
[46]
C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr, and J. Emer. SHiP: Signature-based hit predictor for high performance caching. In MICRO-44, 2011.
[47]
Y. Xie. Modeling, architecture, and applications for emerging memory technologies. Design Test of Computers, IEEE, 28(1):44--51, 2011.
[48]
Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA-36, 2009.

Cited By

View all
  • (2021)Intelligent fitting global real‐time task scheduling strategy for high‐performance multi‐core systemsCAAI Transactions on Intelligence Technology10.1049/cit2.120637:2(244-255)Online publication date: 9-Sep-2021
  • (2016)Allocation of last level cache partitions through thread classification with parallel universes2016 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2016.7568336(204-212)Online publication date: Jul-2016
  • (2015)A Pragmatic Delineation on Cache Bypass Algorithm in Last-Level Cache (LLC)Computational Intelligence in Data Mining—Volume 210.1007/978-81-322-2731-1_4(37-45)Online publication date: 10-Dec-2015

Index Terms

  1. Block value based insertion policy for high performance last-level caches

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '14: Proceedings of the 28th ACM international conference on Supercomputing
    June 2014
    378 pages
    ISBN:9781450326421
    DOI:10.1145/2597652
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. hit benefit
    2. insertion
    3. last-level cache
    4. miss penalty
    5. value

    Qualifiers

    • Research-article

    Funding Sources

    • National Science and Technology Major Project of the Ministry of Science and Technology of China

    Conference

    ICS'14
    Sponsor:

    Acceptance Rates

    ICS '14 Paper Acceptance Rate 34 of 160 submissions, 21%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Intelligent fitting global real‐time task scheduling strategy for high‐performance multi‐core systemsCAAI Transactions on Intelligence Technology10.1049/cit2.120637:2(244-255)Online publication date: 9-Sep-2021
    • (2016)Allocation of last level cache partitions through thread classification with parallel universes2016 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2016.7568336(204-212)Online publication date: Jul-2016
    • (2015)A Pragmatic Delineation on Cache Bypass Algorithm in Last-Level Cache (LLC)Computational Intelligence in Data Mining—Volume 210.1007/978-81-322-2731-1_4(37-45)Online publication date: 10-Dec-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media