Skip to main content
Log in

Retention Benefit Based Intelligent Cache Replacement

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The performance loss resulting from different cache misses is variable in modern systems for two reasons: 1) memory access latency is not uniform, and 2) the latency toleration ability of processor cores varies across different misses. Compared with parallel misses and store misses, isolated fetch and load misses are more costly. The variation of cache miss penalty suggests that the cache replacement policy should take it into account. To that end, first, we propose the notion of retention benefit. Retention benefits can evaluate not only the increment of processor stall cycles on cache misses, but also the reduction of processor stall cycles due to cache hits. Then, we propose Retention Benefit Based Replacement (RBR) which aims to maximize the aggregate retention benefits of blocks reserved in the cache. RBR keeps track of the total retention benefit for each block in the cache, and it preferentially evicts the block with the minimum total retention benefit on replacement. The evaluation shows that RBR can improve cache performance significantly in both single-core and multi-core environment while requiring a low storage overhead. It also outperforms other state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lai A C, Fide C, Falsafi B. Dead-block prediction & dead-block correlating prefetchers. In Proc. the 28th ISCA, Jun. 2001, pp.144–154.

  2. Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. the 39th MICRO, Dec. 2006, pp.423–432.

  3. Qureshi M K, Jaleel A, Patt Y N, Steely Jr S C, Emer J. Adaptive insertion policies for high performance caching. In Proc. the 34th ISCA, Jun. 2007, pp.381–391.

  4. Kharbutli M, Solihin D. Counter-based cache replacement and bypassing algorithms. IEEE Transactions on Computers, 2008, 57(4): 433–447.

    Article  MathSciNet  Google Scholar 

  5. Xie Y, Loh G H. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proc. the 36th ISCA, Jun. 2009, pp.174–183.

  6. Jaleel A, Theobald K B, Steely Jr S C, Emer J. High performance cache replacement using re-reference interval prediction (RRIP). In Proc. the 37th ISCA, Jun. 2010, pp.60–71.

  7. Khan S M, Tian Y, Jimenez D A. Sampling dead block prediction for last-level caches. In Proc. the 43rd MICRO, Dec. 2010, pp.175–186.

  8. Wu C J, Jaleel A, Hasenplaugh W, Martonosi M, Steely Jr S C, Emer J. SHiP: Signature-based hit predictor for high performance caching. In Proc. the 44th MICRO, Dec. 2011, pp.430–441.

  9. Xie Y. Modeling, architecture, and applications for emerging memory technologies. IEEE Design & Test of Computers, 2011, 28(1): 44–51.

    Article  Google Scholar 

  10. Loh G H, Hill M D. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In Proc. the 44th MICRO, Dec. 2011, pp.454–464.

  11. Kroft D. Lockup-free instruction fetch/prefetch cache organization. In Proc. the 8th ISCA, May 1981, pp.81–87.

  12. Vanderwiel S P, Lilja D J. Data prefetch mechanisms. ACM Comput. Surv., 2000, 32(2): 174–199.

    Article  Google Scholar 

  13. Jeong J, Dubois M. Optimal replacements in caches with two miss costs. In Proc. the 11th SPAA, Jun. 1999, pp.155–164.

  14. Jeong J, Stenström P, Dubois M. Simple penalty-sensitive replacement policies for caches. In Proc. the 3rd CF, May 2006, pp.341–352.

  15. Ju R D C, Lebeck A R, Wilkerson C. Locality vs. criticality. In Proc. the 28th ISCA, Jun. 2001, pp.132–143.

  16. Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In Proc. the 33rd ISCA, Jun. 2006, pp.167–178.

  17. Sheikh R, Kharbutli M. Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms. In Proc. the 28th ICCD, Oct. 2010, pp.76–83.

  18. Kharbutli M, Sheikh R. LACS: A locality-aware cost-sensitive cache replacement algorithm. IEEE Transactions on Computers, 2013, 63(8): 1975–1987.

    Article  MathSciNet  Google Scholar 

  19. 19] Chaudhuri M. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In Proc. the 42nd MICRO, Dec. 2009, pp.401–412.

  20. Keramidas G, Petoumenos P, Kaxiras S. Cache replacement based on reuse-distance prediction. In Proc. the 25th ICCD, Oct. 2007, pp.245–250.

  21. Wu C J, Jaleel A, Martonosi M, Steely Jr S C, Emer J. PAC-Man: Prefetch-aware cache management for high performance caching. In Proc. the 44th MICRO, Dec. 2011, pp.442–453.

  22. Duong N, Zhao D, Kim T, Cammarota R, Valero M, Veidenbaum A V. Improving cache management policies using dynamic reuse distances. In Proc. the 45th MICRO, Dec. 2012, pp.389–400.

  23. Hu Z, Kaxiras S, Martonosi M. Timekeeping in the memory system: Predicting and optimizing memory behavior. In Proc. the 29th ISCA, May 2002, pp.209–220.

  24. Liu H, Ferdman M, Huh J, Burger D. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proc. the 41st MICRO, Nov. 2008, pp.222–233.

  25. Jalminger J, Stenstrom P. A novel approach to cache block reuse predictions. In Proc. the 2003 ICPP, Oct. 2003, pp.294–302.

  26. Johnson T L, Connors D A, Merten M C, Hwu W M W. Run-time cache bypassing. IEEE Transactions on Computers, 1999, 48(12): 1338–1354.

    Article  Google Scholar 

  27. Rivers J A, Davidson E S. Reducing conflicts in direct-mapped caches with a temporality-based design. In Proc. the 1996 ICPP, Aug. 1996, Vol. 1, pp.154–163.

  28. Rivers J A, Tam E S, Tyson G S, Davidson E S, Farrens M. Utilizing reuse information in data cache management. In Proc. the 12th ICS, Jul. 1998, pp.449–456.

  29. John L K, Subramanian A. Design and performance evaluation of a cache assist to implement selective caching. In Proc. the 1997 ICCD, Oct. 1997, pp.510–518.

  30. Walsh S J, Board J A. Pollution control caching. In Proc. the 1995 ICCD, Oct. 1995, pp.300–306.

  31. Chi C H, Dietz H. Improving cache performance by selective cache bypass. In Proc. the 22nd HICSS, Jan. 1989, Vol. 1, pp.277–285.

  32. González, A, Aliagas C, Valero M. A data cache with multiple caching strategies tuned to different types of locality. In Proc. the 9th ICS, Jul. 1995, pp.338–347.

  33. Tyson G, Farrens M, Matthews J, Pleszkun A R. A modified approach to data cache management. In Proc. the 28th MICRO, Dec. 1995, pp.93–103.

  34. Xiang L, Chen T, Shi Q, Hu W. Less reused filter: Improving L2 cache performance via filtering less reused lines. In Proc. the 23rd ICS, Jun. 2009, pp.68–79.

  35. Gao H, Wilkerson C. A dueling segmented LRU replacement algorithm with adaptive bypassing. In Proc. the 1st JWAC, Jun. 2010.

  36. Gaur J, Chaudhuri M, Subramoney S. Bypass and insertion algorithms for exclusive last-level caches. In Proc. the 38th ISCA, Jun. 2011, pp.81–92.

  37. Li L, Tong D, Xie Z, Lu J, Cheng X. Optimal bypass monitor for high performance last-level caches. In Proc. the 21st PACT, Sept. 2012, pp.315–324.

  38. Jaleel A, Hasenplaugh W, Qureshi M, Sebot J, Steely Jr S, Emer J. Adaptive insertion policies for managing shared caches. In Proc. the 17th PACT, Oct. 2008, pp.208–219.

  39. Manikantan R, Rajan K, Govindarajan R. NUcache: An efficient multicore cache organization based on next-use distance. In Proc. the 17th HPCA, Feb. 2011, pp.243–253.

  40. Sanchez D, Kozyrakis C. Vantage: Scalable and efficient fine-grain cache partitioning. In Proc. the 38th ISCA, Jun. 2011, pp.57–68.

  41. Manikantan R, Rajan K, Govindarajan R. Probabilistic shared cache management (PriSM). In Proc. the 39th ISCA, Jun. 2012, pp.428–439.

  42. Hsu L R, Reinhardt S K, Iyer R, Makineni S. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proc. the 15th PACT, Sept. 2006, pp.13–22.

  43. Iyer R. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proc. the 18th ICS, Jun. 2004, pp.257–266.

  44. Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. the 13th PACT, Sept. 2004, pp.111–122.

  45. Jeong J, Dubois M. Cache replacement algorithms with nonuniform miss costs. IEEE Transactions on Computers, 2006, 55(4): 353–365.

    Article  Google Scholar 

  46. Jeong J, Dubois M. Cost-sensitive cache replacement algorithms. In Proc. the 9th HPCA, Feb. 2003, pp.327–337.

  47. Moreto M, Cazorla F, Ramirez A, Valero M. MLP-aware dynamic cache partitioning. In Proc. the 3rd HiPEAC, Jan. 2008, pp.337–352.

  48. Kaseridis D, Iqbal M, John L. Cache friendliness-aware management of shared last-level caches for high performance multi-core systems. IEEE Transactions on Computers, 2014, 63(4): 874–887.

    Article  MathSciNet  Google Scholar 

  49. Lee H H S, Tyson G S, Farrens M K. Eager writeback — A technique for improving bandwidth utilization. In Proc. the 33rd MICRO, Dec. 2000, pp.11–21.

  50. Lee C J, Narasiman V, Ebrahimi E, Mutlu O, Patt Y N. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report, TR-HPS-2010-002, High Performance Systems Group, Department of Electrical and Computer Engineering, The University of Texas at Austin & Department of Electrical andComputer Engineering, Carnegie Mellon University, April 2010.

  51. Stuecheli J, Kaseridis D, Daly D, Hunter H C, John L K. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proc. the 37th ISCA, Jun. 2010, pp.72–82.

  52. Wang Z, Khan S M, Jiménez D A. Improving writeback efficiency with decoupled last-write prediction. In Proc. the 39th ISCA, Jun. 2012, pp.309–320.

  53. Lee C J, Ebrahimi E, Narasiman V, Mutlu O , Patt Y N. DRAM-aware last-level cache replacement. Technical Report, TR-HPS-2010-007, High Performance Systems Group, Department of Electrical and Computer Engineering, The University of Texas at Austin & Department of Electrical and Computer Engineering, Carnegie Mellon University, Dec. 2010.

  54. HP. Inside the Intel® Itanium® 2 processor. HP Technical White Paper, July 2002. http://www.dig64.org/about/Itanium2_white_paper_public.pdf, Oct. 2014.

  55. Oracle. UltraSPARC T2 supplement to the UltraSPARC architecture 2007. Draft D1.4.3, Sept. 2007. http://www.oracle.com/technetwork/systems/opensparc/t2-14-ust2-uasuppl-draft-hp-ext-1537761.html, Oct. 2014.

  56. Jaleel A, Borch E, Bhandaru M, Steely Jr S C, Emer J. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In Proc. the 43rd MICRO, Dec. 2010, pp.151–162.

  57. Martin M M K, Hill M D, Sorin D J. Why on-chip cache coherence is here to stay. Commun. ACM, 2012, 55(7): 78–89.

    Article  Google Scholar 

  58. Albericio J, Ibáñez P, Vi~nals V, Llabería J M. Exploiting reuse locality on inclusive shared last-level caches. ACM Trans. Archit. Code Optim., 2013, 9(4): Article No. 38.

  59. Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, SenR, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. SIGARCH Comput. Archit. News, 2011, 39(2): 1–7.

  60. Henning J L. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006, 34(4): 1–17.

    Article  MathSciNet  Google Scholar 

  61. Perelman E, Hamerly G, Biesbrouck M V, Sherwood T, Calder B. Using SimPoint for accurate and efficient simulation. SIGMETRICS Perform. Eval. Rev., 2003, 31(1): 318–319.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Lin Lu.

Additional information

The work was supported in part by the National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant No. 2009ZX01029-001-002-2.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 79 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, LD., Lu, JL. & Cheng, X. Retention Benefit Based Intelligent Cache Replacement. J. Comput. Sci. Technol. 29, 947–961 (2014). https://doi.org/10.1007/s11390-014-1481-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1481-2

Keywords

Navigation