ABSTRACT
Last-level cache performance has been proved to be crucial to the system performance. Essentially, any cache management policy improves performance by retaining blocks that it believes to have higher values preferentially. Most cache management policies use the access time or reuse distance of a block as its value to minimize total miss count. However, cache miss penalty is variable in modern systems due to i) variable memory access latency and ii) the disparity in latency toleration ability across different misses. Some recently proposed policies thus take into account the miss penalty as the block value. However, only considering miss penalty is not enough. In fact, the value of a block includes not only the penalty on its misses, but also the reduction of processor stall cycles on its hits, i.e., hit benefit. Therefore, we propose a method to compute both miss penalty and hit benefit. Then, the value of a block is calculated by accumulating all the miss penalty and hit benefits of its requests. Using our notion of block value, we propose Value based Insertion Policy (VIP) which aims to reserve more blocks with higher values in the cache. VIP keeps track of a small number of incoming and victim block pairs to learn the relationship between the value of the incoming block and that of the victim. On a miss, if the value of the incoming block is learned to be lower than that of the victim block in the past, VIP will predict that the incoming block is valueless and insert it with a high eviction priority. The evaluation shows that VIP can improve cache performance significantly in both single-core and multi-core environment while requiring a low storage overhead.
- J. Albericio, P. Ibánez, V. Vinals, and J. M. Llabería. Exploiting reuse locality on inclusive shared last-level caches. ACM Trans. Archit. Code Optim., 9(4):38:1--38:19, 2013. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, 2011. Google ScholarDigital Library
- M. Chaudhuri. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In MICRO-42, 2009. Google ScholarDigital Library
- C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. In HICSS-22, 1989.Google ScholarCross Ref
- N. Duong, D. Zhao, T. Kim, R. Cammarota, M. Valero, and A. V. Veidenbaum. Improving cache management policies using dynamic reuse distances. In MICRO-45, 2012. Google ScholarDigital Library
- H. Gao and C. Wilkerson. A dueling segmented LRU replacement algorithm with adaptive bypassing. In JWAC-1, 2010.Google Scholar
- J. Gaur, M. Chaudhuri, and S. Subramoney. Bypass and insertion algorithms for exclusive last-level caches. In ISCA-38, 2011. Google ScholarDigital Library
- A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS-9, 1995.Google Scholar
- J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34:1--17, 2006. Google ScholarDigital Library
- Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the memory system: Predicting and optimizing memory behavior. In ISCA-29, 2002. Google ScholarDigital Library
- Intel. Intel Core i7 processor. http://www.intel.com/products/\\processor/corei7/.Google Scholar
- A. Jaleel, E. Borch, M. Bhandaru, S. Steely, and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In MICRO-43, 2010. Google ScholarDigital Library
- A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In PACT-17, 2008. Google ScholarDigital Library
- A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In ISCA-37, 2010. Google ScholarDigital Library
- J. Jalminger and P. Stenstrom. A novel approach to cache block reuse predictions. In ICPP '03, 2003.Google ScholarCross Ref
- J. Jeong and M. Dubois. Optimal replacements in caches with two miss costs. In SPAA-17, 1999. Google ScholarDigital Library
- J. Jeong and M. Dubois. Cost-sensitive cache replacement algorithms. In HPCA-9, 2003. Google ScholarDigital Library
- J. Jeong and M. Dubois. Cache replacement algorithms with nonuniform miss costs. Computers, IEEE Transactions on, 55(4):353--365, 2006. Google ScholarDigital Library
- J. Jeong, P. Stenström, and M. Dubois. Simple penalty-sensitive replacement policies for caches. In CF-3, 2006. Google ScholarDigital Library
- D. A. Jiménez. Insertion and promotion for tree-based PseudoLRU last-level caches. In MICRO-46, 2013.Google ScholarDigital Library
- T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. Computers, IEEE Transactions on, 48(12):1338 --1354, 1999. Google ScholarDigital Library
- R. D.-c. Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In ISCA-28, 2001.Google Scholar
- D. Kaseridis, M. Iqbal, and L. John. Cache friendliness-aware management of shared last-level caches for high performance multi-core systems. Computers, IEEE Transactions on, 2013.Google Scholar
- G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD-25, 2007.Google ScholarCross Ref
- S. M. Khan, Y. Tian, and D. A. Jimenez. Sampling dead block prediction for last-level caches. In MICRO-43, 2010. Google ScholarDigital Library
- M. Kharbutli and R. Sheikh. LACS: A locality-aware cost-sensitive cache replacement algorithm. Computers, IEEE Transactions on, 2013.Google Scholar
- M. Kharbutli and Y. Solihin. Counter-based cache replacement and bypassing algorithms. Computers, IEEE Transactions on, 57(4):433 --447, 2008. Google ScholarDigital Library
- D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google ScholarDigital Library
- A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001. Google ScholarDigital Library
- L. Li, D. Tong, Z. Xie, J. Lu, and X. Cheng. Optimal bypass monitor for high performance last-level caches. In PACT-21, 2012. Google ScholarDigital Library
- H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO-41, 2008. Google ScholarDigital Library
- G. H. Loh and M. D. Hill. Efficiently enabling conventional block sizes for very large die-stacked dram caches. In MICRO-44, 2011. Google ScholarDigital Library
- R. Manikantan, K. Rajan, and R. Govindarajan. NUcache: An efficient multicore cache organization based on next-use distance. In HPCA-17, 2011. Google ScholarDigital Library
- R. Manikantan, K. Rajan, and R. Govindarajan. Probabilistic shared cache management (PriSM). In ISCA-39, 2012. Google ScholarDigital Library
- M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, 2012. Google ScholarDigital Library
- M. Moreto, F. Cazorla, A. Ramirez, and M. Valero. MLP-aware dynamic cache partitioning. In HiPEAC '08. 2008. Google ScholarDigital Library
- E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. SIGMETRICS Perform. Eval. Rev., 31:318--319, 2003. Google ScholarDigital Library
- M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007. Google ScholarDigital Library
- M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA-33, 2006. Google ScholarDigital Library
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006. Google ScholarDigital Library
- J. A. Rivers, E. S. Tam, G. S. Tyson, E. S. Davidson, and M. Farrens. Utilizing reuse information in data cache management. In ICS-12, 1998. Google ScholarDigital Library
- D. Sanchez and C. Kozyrakis. Vantage: Scalable and efficient fine-grain cache partitioning. In ISCA-38, 2011. Google ScholarDigital Library
- R. Sheikh and M. Kharbutli. Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms. In ICCD-28, 2010.Google ScholarCross Ref
- G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO-28, 1995. Google ScholarDigital Library
- S. P. Vanderwiel and D. J. Lilja. Data prefetch mechanisms. ACM Comput. Surv., 32(2):174--199, 2000. Google ScholarDigital Library
- C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr, and J. Emer. SHiP: Signature-based hit predictor for high performance caching. In MICRO-44, 2011. Google ScholarDigital Library
- Y. Xie. Modeling, architecture, and applications for emerging memory technologies. Design Test of Computers, IEEE, 28(1):44--51, 2011. Google ScholarDigital Library
- Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA-36, 2009. Google ScholarDigital Library
Index Terms
- Block value based insertion policy for high performance last-level caches
Recommendations
Bypass and insertion algorithms for exclusive last-level caches
ISCA '11: Proceedings of the 38th annual international symposium on Computer architectureInclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared ...
Insertion and promotion for tree-based PseudoLRU last-level caches
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on MicroarchitectureLast-level caches mitigate the high latency of main memory. A good cache replacement policy enables high performance for memory intensive programs. To be useful to industry, a cache replacement policy must deliver high performance without high ...
MRU-Tour-based Replacement Algorithms for Last-Level Caches
SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance ComputingMemory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, ...
Comments