skip to main content
10.1145/2597652.2597653acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Block value based insertion policy for high performance last-level caches

Published:10 June 2014Publication History

ABSTRACT

Last-level cache performance has been proved to be crucial to the system performance. Essentially, any cache management policy improves performance by retaining blocks that it believes to have higher values preferentially. Most cache management policies use the access time or reuse distance of a block as its value to minimize total miss count. However, cache miss penalty is variable in modern systems due to i) variable memory access latency and ii) the disparity in latency toleration ability across different misses. Some recently proposed policies thus take into account the miss penalty as the block value. However, only considering miss penalty is not enough. In fact, the value of a block includes not only the penalty on its misses, but also the reduction of processor stall cycles on its hits, i.e., hit benefit. Therefore, we propose a method to compute both miss penalty and hit benefit. Then, the value of a block is calculated by accumulating all the miss penalty and hit benefits of its requests. Using our notion of block value, we propose Value based Insertion Policy (VIP) which aims to reserve more blocks with higher values in the cache. VIP keeps track of a small number of incoming and victim block pairs to learn the relationship between the value of the incoming block and that of the victim. On a miss, if the value of the incoming block is learned to be lower than that of the victim block in the past, VIP will predict that the incoming block is valueless and insert it with a high eviction priority. The evaluation shows that VIP can improve cache performance significantly in both single-core and multi-core environment while requiring a low storage overhead.

References

  1. J. Albericio, P. Ibánez, V. Vinals, and J. M. Llabería. Exploiting reuse locality on inclusive shared last-level caches. ACM Trans. Archit. Code Optim., 9(4):38:1--38:19, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Chaudhuri. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. In HICSS-22, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  5. N. Duong, D. Zhao, T. Kim, R. Cammarota, M. Valero, and A. V. Veidenbaum. Improving cache management policies using dynamic reuse distances. In MICRO-45, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Gao and C. Wilkerson. A dueling segmented LRU replacement algorithm with adaptive bypassing. In JWAC-1, 2010.Google ScholarGoogle Scholar
  7. J. Gaur, M. Chaudhuri, and S. Subramoney. Bypass and insertion algorithms for exclusive last-level caches. In ISCA-38, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS-9, 1995.Google ScholarGoogle Scholar
  9. J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34:1--17, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the memory system: Predicting and optimizing memory behavior. In ISCA-29, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Intel. Intel Core i7 processor. http://www.intel.com/products/\\processor/corei7/.Google ScholarGoogle Scholar
  12. A. Jaleel, E. Borch, M. Bhandaru, S. Steely, and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In MICRO-43, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In PACT-17, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In ISCA-37, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Jalminger and P. Stenstrom. A novel approach to cache block reuse predictions. In ICPP '03, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Jeong and M. Dubois. Optimal replacements in caches with two miss costs. In SPAA-17, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Jeong and M. Dubois. Cost-sensitive cache replacement algorithms. In HPCA-9, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Jeong and M. Dubois. Cache replacement algorithms with nonuniform miss costs. Computers, IEEE Transactions on, 55(4):353--365, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Jeong, P. Stenström, and M. Dubois. Simple penalty-sensitive replacement policies for caches. In CF-3, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. A. Jiménez. Insertion and promotion for tree-based PseudoLRU last-level caches. In MICRO-46, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. Computers, IEEE Transactions on, 48(12):1338 --1354, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. D.-c. Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In ISCA-28, 2001.Google ScholarGoogle Scholar
  23. D. Kaseridis, M. Iqbal, and L. John. Cache friendliness-aware management of shared last-level caches for high performance multi-core systems. Computers, IEEE Transactions on, 2013.Google ScholarGoogle Scholar
  24. G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD-25, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. M. Khan, Y. Tian, and D. A. Jimenez. Sampling dead block prediction for last-level caches. In MICRO-43, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Kharbutli and R. Sheikh. LACS: A locality-aware cost-sensitive cache replacement algorithm. Computers, IEEE Transactions on, 2013.Google ScholarGoogle Scholar
  27. M. Kharbutli and Y. Solihin. Counter-based cache replacement and bypassing algorithms. Computers, IEEE Transactions on, 57(4):433 --447, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Li, D. Tong, Z. Xie, J. Lu, and X. Cheng. Optimal bypass monitor for high performance last-level caches. In PACT-21, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO-41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. H. Loh and M. D. Hill. Efficiently enabling conventional block sizes for very large die-stacked dram caches. In MICRO-44, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Manikantan, K. Rajan, and R. Govindarajan. NUcache: An efficient multicore cache organization based on next-use distance. In HPCA-17, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Manikantan, K. Rajan, and R. Govindarajan. Probabilistic shared cache management (PriSM). In ISCA-39, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Moreto, F. Cazorla, A. Ramirez, and M. Valero. MLP-aware dynamic cache partitioning. In HiPEAC '08. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. SIGMETRICS Perform. Eval. Rev., 31:318--319, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA-33, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. A. Rivers, E. S. Tam, G. S. Tyson, E. S. Davidson, and M. Farrens. Utilizing reuse information in data cache management. In ICS-12, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Sanchez and C. Kozyrakis. Vantage: Scalable and efficient fine-grain cache partitioning. In ISCA-38, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Sheikh and M. Kharbutli. Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms. In ICCD-28, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  44. G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO-28, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. P. Vanderwiel and D. J. Lilja. Data prefetch mechanisms. ACM Comput. Surv., 32(2):174--199, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr, and J. Emer. SHiP: Signature-based hit predictor for high performance caching. In MICRO-44, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Xie. Modeling, architecture, and applications for emerging memory technologies. Design Test of Computers, IEEE, 28(1):44--51, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA-36, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Block value based insertion policy for high performance last-level caches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '14: Proceedings of the 28th ACM international conference on Supercomputing
      June 2014
      378 pages
      ISBN:9781450326421
      DOI:10.1145/2597652

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICS '14 Paper Acceptance Rate34of160submissions,21%Overall Acceptance Rate584of2,055submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader