Retention Benefit Based Intelligent Cache Replacement

Li, Ling-Da; Lu, Jun-Lin; Cheng, Xu

doi:10.1007/s11390-014-1481-2

Retention Benefit Based Intelligent Cache Replacement

Regular Paper
Published: 17 November 2014

Volume 29, pages 947–961, (2014)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Ling-Da Li^1,2,3,
Jun-Lin Lu^1,2,3 &
Xu Cheng^1,2,3

136 Accesses
Explore all metrics

Abstract

The performance loss resulting from different cache misses is variable in modern systems for two reasons: 1) memory access latency is not uniform, and 2) the latency toleration ability of processor cores varies across different misses. Compared with parallel misses and store misses, isolated fetch and load misses are more costly. The variation of cache miss penalty suggests that the cache replacement policy should take it into account. To that end, first, we propose the notion of retention benefit. Retention benefits can evaluate not only the increment of processor stall cycles on cache misses, but also the reduction of processor stall cycles due to cache hits. Then, we propose Retention Benefit Based Replacement (RBR) which aims to maximize the aggregate retention benefits of blocks reserved in the cache. RBR keeps track of the total retention benefit for each block in the cache, and it preferentially evicts the block with the minimum total retention benefit on replacement. The evaluation shows that RBR can improve cache performance significantly in both single-core and multi-core environment while requiring a low storage overhead. It also outperforms other state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Article 12 April 2024

References

Lai A C, Fide C, Falsafi B. Dead-block prediction & dead-block correlating prefetchers. In Proc. the 28th ISCA, Jun. 2001, pp.144–154.
Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. the 39th MICRO, Dec. 2006, pp.423–432.
Qureshi M K, Jaleel A, Patt Y N, Steely Jr S C, Emer J. Adaptive insertion policies for high performance caching. In Proc. the 34th ISCA, Jun. 2007, pp.381–391.
Kharbutli M, Solihin D. Counter-based cache replacement and bypassing algorithms. IEEE Transactions on Computers, 2008, 57(4): 433–447.
Article MathSciNet Google Scholar
Xie Y, Loh G H. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proc. the 36th ISCA, Jun. 2009, pp.174–183.
Jaleel A, Theobald K B, Steely Jr S C, Emer J. High performance cache replacement using re-reference interval prediction (RRIP). In Proc. the 37th ISCA, Jun. 2010, pp.60–71.
Khan S M, Tian Y, Jimenez D A. Sampling dead block prediction for last-level caches. In Proc. the 43rd MICRO, Dec. 2010, pp.175–186.
Wu C J, Jaleel A, Hasenplaugh W, Martonosi M, Steely Jr S C, Emer J. SHiP: Signature-based hit predictor for high performance caching. In Proc. the 44th MICRO, Dec. 2011, pp.430–441.
Xie Y. Modeling, architecture, and applications for emerging memory technologies. IEEE Design & Test of Computers, 2011, 28(1): 44–51.
Article Google Scholar
Loh G H, Hill M D. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In Proc. the 44th MICRO, Dec. 2011, pp.454–464.
Kroft D. Lockup-free instruction fetch/prefetch cache organization. In Proc. the 8th ISCA, May 1981, pp.81–87.
Vanderwiel S P, Lilja D J. Data prefetch mechanisms. ACM Comput. Surv., 2000, 32(2): 174–199.
Article Google Scholar
Jeong J, Dubois M. Optimal replacements in caches with two miss costs. In Proc. the 11th SPAA, Jun. 1999, pp.155–164.
Jeong J, Stenström P, Dubois M. Simple penalty-sensitive replacement policies for caches. In Proc. the 3rd CF, May 2006, pp.341–352.
Ju R D C, Lebeck A R, Wilkerson C. Locality vs. criticality. In Proc. the 28th ISCA, Jun. 2001, pp.132–143.
Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In Proc. the 33rd ISCA, Jun. 2006, pp.167–178.
Sheikh R, Kharbutli M. Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms. In Proc. the 28th ICCD, Oct. 2010, pp.76–83.
Kharbutli M, Sheikh R. LACS: A locality-aware cost-sensitive cache replacement algorithm. IEEE Transactions on Computers, 2013, 63(8): 1975–1987.
Article MathSciNet Google Scholar
19] Chaudhuri M. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In Proc. the 42nd MICRO, Dec. 2009, pp.401–412.
Keramidas G, Petoumenos P, Kaxiras S. Cache replacement based on reuse-distance prediction. In Proc. the 25th ICCD, Oct. 2007, pp.245–250.
Wu C J, Jaleel A, Martonosi M, Steely Jr S C, Emer J. PAC-Man: Prefetch-aware cache management for high performance caching. In Proc. the 44th MICRO, Dec. 2011, pp.442–453.
Duong N, Zhao D, Kim T, Cammarota R, Valero M, Veidenbaum A V. Improving cache management policies using dynamic reuse distances. In Proc. the 45th MICRO, Dec. 2012, pp.389–400.
Hu Z, Kaxiras S, Martonosi M. Timekeeping in the memory system: Predicting and optimizing memory behavior. In Proc. the 29th ISCA, May 2002, pp.209–220.
Liu H, Ferdman M, Huh J, Burger D. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proc. the 41st MICRO, Nov. 2008, pp.222–233.
Jalminger J, Stenstrom P. A novel approach to cache block reuse predictions. In Proc. the 2003 ICPP, Oct. 2003, pp.294–302.
Johnson T L, Connors D A, Merten M C, Hwu W M W. Run-time cache bypassing. IEEE Transactions on Computers, 1999, 48(12): 1338–1354.
Article Google Scholar
Rivers J A, Davidson E S. Reducing conflicts in direct-mapped caches with a temporality-based design. In Proc. the 1996 ICPP, Aug. 1996, Vol. 1, pp.154–163.
Rivers J A, Tam E S, Tyson G S, Davidson E S, Farrens M. Utilizing reuse information in data cache management. In Proc. the 12th ICS, Jul. 1998, pp.449–456.
John L K, Subramanian A. Design and performance evaluation of a cache assist to implement selective caching. In Proc. the 1997 ICCD, Oct. 1997, pp.510–518.
Walsh S J, Board J A. Pollution control caching. In Proc. the 1995 ICCD, Oct. 1995, pp.300–306.
Chi C H, Dietz H. Improving cache performance by selective cache bypass. In Proc. the 22nd HICSS, Jan. 1989, Vol. 1, pp.277–285.
González, A, Aliagas C, Valero M. A data cache with multiple caching strategies tuned to different types of locality. In Proc. the 9th ICS, Jul. 1995, pp.338–347.
Tyson G, Farrens M, Matthews J, Pleszkun A R. A modified approach to data cache management. In Proc. the 28th MICRO, Dec. 1995, pp.93–103.
Xiang L, Chen T, Shi Q, Hu W. Less reused filter: Improving L2 cache performance via filtering less reused lines. In Proc. the 23rd ICS, Jun. 2009, pp.68–79.
Gao H, Wilkerson C. A dueling segmented LRU replacement algorithm with adaptive bypassing. In Proc. the 1st JWAC, Jun. 2010.
Gaur J, Chaudhuri M, Subramoney S. Bypass and insertion algorithms for exclusive last-level caches. In Proc. the 38th ISCA, Jun. 2011, pp.81–92.
Li L, Tong D, Xie Z, Lu J, Cheng X. Optimal bypass monitor for high performance last-level caches. In Proc. the 21st PACT, Sept. 2012, pp.315–324.
Jaleel A, Hasenplaugh W, Qureshi M, Sebot J, Steely Jr S, Emer J. Adaptive insertion policies for managing shared caches. In Proc. the 17th PACT, Oct. 2008, pp.208–219.
Manikantan R, Rajan K, Govindarajan R. NUcache: An efficient multicore cache organization based on next-use distance. In Proc. the 17th HPCA, Feb. 2011, pp.243–253.
Sanchez D, Kozyrakis C. Vantage: Scalable and efficient fine-grain cache partitioning. In Proc. the 38th ISCA, Jun. 2011, pp.57–68.
Manikantan R, Rajan K, Govindarajan R. Probabilistic shared cache management (PriSM). In Proc. the 39th ISCA, Jun. 2012, pp.428–439.
Hsu L R, Reinhardt S K, Iyer R, Makineni S. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proc. the 15th PACT, Sept. 2006, pp.13–22.
Iyer R. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proc. the 18th ICS, Jun. 2004, pp.257–266.
Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. the 13th PACT, Sept. 2004, pp.111–122.
Jeong J, Dubois M. Cache replacement algorithms with nonuniform miss costs. IEEE Transactions on Computers, 2006, 55(4): 353–365.
Article Google Scholar
Jeong J, Dubois M. Cost-sensitive cache replacement algorithms. In Proc. the 9th HPCA, Feb. 2003, pp.327–337.
Moreto M, Cazorla F, Ramirez A, Valero M. MLP-aware dynamic cache partitioning. In Proc. the 3rd HiPEAC, Jan. 2008, pp.337–352.
Kaseridis D, Iqbal M, John L. Cache friendliness-aware management of shared last-level caches for high performance multi-core systems. IEEE Transactions on Computers, 2014, 63(4): 874–887.
Article MathSciNet Google Scholar
Lee H H S, Tyson G S, Farrens M K. Eager writeback — A technique for improving bandwidth utilization. In Proc. the 33rd MICRO, Dec. 2000, pp.11–21.
Lee C J, Narasiman V, Ebrahimi E, Mutlu O, Patt Y N. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report, TR-HPS-2010-002, High Performance Systems Group, Department of Electrical and Computer Engineering, The University of Texas at Austin & Department of Electrical andComputer Engineering, Carnegie Mellon University, April 2010.
Stuecheli J, Kaseridis D, Daly D, Hunter H C, John L K. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proc. the 37th ISCA, Jun. 2010, pp.72–82.
Wang Z, Khan S M, Jiménez D A. Improving writeback efficiency with decoupled last-write prediction. In Proc. the 39th ISCA, Jun. 2012, pp.309–320.
Lee C J, Ebrahimi E, Narasiman V, Mutlu O , Patt Y N. DRAM-aware last-level cache replacement. Technical Report, TR-HPS-2010-007, High Performance Systems Group, Department of Electrical and Computer Engineering, The University of Texas at Austin & Department of Electrical and Computer Engineering, Carnegie Mellon University, Dec. 2010.
HP. Inside the Intel® Itanium® 2 processor. HP Technical White Paper, July 2002. http://www.dig64.org/about/Itanium2_white_paper_public.pdf, Oct. 2014.
Oracle. UltraSPARC T2 supplement to the UltraSPARC architecture 2007. Draft D1.4.3, Sept. 2007. http://www.oracle.com/technetwork/systems/opensparc/t2-14-ust2-uasuppl-draft-hp-ext-1537761.html, Oct. 2014.
Jaleel A, Borch E, Bhandaru M, Steely Jr S C, Emer J. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In Proc. the 43rd MICRO, Dec. 2010, pp.151–162.
Martin M M K, Hill M D, Sorin D J. Why on-chip cache coherence is here to stay. Commun. ACM, 2012, 55(7): 78–89.
Article Google Scholar
Albericio J, Ibáñez P, Vi~nals V, Llabería J M. Exploiting reuse locality on inclusive shared last-level caches. ACM Trans. Archit. Code Optim., 2013, 9(4): Article No. 38.
Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, SenR, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. SIGARCH Comput. Archit. News, 2011, 39(2): 1–7.
Henning J L. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006, 34(4): 1–17.
Article MathSciNet Google Scholar
Perelman E, Hamerly G, Biesbrouck M V, Sherwood T, Calder B. Using SimPoint for accurate and efficient simulation. SIGMETRICS Perform. Eval. Rev., 2003, 31(1): 318–319.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Microprocessor Research and Development Center, Peking University, Beijing, 100871, China
Ling-Da Li, Jun-Lin Lu & Xu Cheng
Engineering Research Center of Microprocessor and System, Ministry of Education, Beijing, 100871, China
Ling-Da Li, Jun-Lin Lu & Xu Cheng
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Ling-Da Li, Jun-Lin Lu & Xu Cheng

Authors

Ling-Da Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Lin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xu Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun-Lin Lu.

Additional information

The work was supported in part by the National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant No. 2009ZX01029-001-002-2.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 79 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, LD., Lu, JL. & Cheng, X. Retention Benefit Based Intelligent Cache Replacement. J. Comput. Sci. Technol. 29, 947–961 (2014). https://doi.org/10.1007/s11390-014-1481-2

Download citation

Received: 19 December 2013
Revised: 06 May 2014
Published: 17 November 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11390-014-1481-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Retention Benefit Based Intelligent Cache Replacement

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

In-memory database acceleration on FPGAs: a survey

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Retention Benefit Based Intelligent Cache Replacement

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

In-memory database acceleration on FPGAs: a survey

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation