Abstract
In present multi-core devices, the individual processors do not need to operate at the highest possible frequencies. Instead there is a need to reduce the power, complexity and area of individual processor components like caches. In this paper we propose a low area, high performance cache replacement policy for embedded processors called Hierarchical Non-Most-Recently-Used (H-NMRU). The H-NMRU is a parameterizable policy where we can trade-off performance with area. We extended the Dinero cache simulator with the H-NMRU policy and performed architectural exploration with a set of cellular and multimedia benchmarks. On a 16 way cache, a two level H-NMRU policy where the first and second levels have 8 and 2 branches, respectively, performs as good as the Pseudo-LRU policy with storage area saving of 27%. Compared to true LRU, H-NMRU on a 16 way cache saves huge amount of area (82%) with marginal increase of cache misses (3%). Similar result was also noticed on other cache like structures like branch target buffers. Therefore, the two level H-NMRU cache replacement policy (with associativity/2 and 2 branches on the two levels) is a very attractive option for caches on embedded processors with associativities greater than 4. We present a case-study where it can be used on the L2 cache with substantial gain in performance and area for single and dual core platforms.
Similar content being viewed by others
References
Papamarcos, M., Patel, J.: A low overhead coherence solution for multiprocessors with private cache memories. In: Proceedings of the 11th International Symposium on Computer Architecture (1984)
Smith A.J.: Cache memories. ACM Comput. Surv. 14(3), 473–530 (1982)
Gee, J.D., Hill, M.D., Smith, A.J.: Cache performance of the SPEC benchmark suite. University of California at Berkeley, CA, USA, Tech. Rep. CSD-91-648 (1991)
Al-Zoubi, H., Milenkovic, A., Milenkovic, M.: Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite. In: Proceedings of the 42nd Annual ACM Southeast Regional Conference. Huntsville, Alabama (2004)
Cantin J.F., Hill M.D.: Cache performance for selected SPEC CPU2000 benchmarks. ACM SIGARCH Comput. Archit. News 29(4), 13–18 (2001)
Kennedy, A., Alexander, M., Fiene, E., Lyon, J., Kuttanna, B., Patel, R., Pham, M., Putrino, M., Croxton, C., Litch, S., Burgess, B.: A G3 PowerPC superscalar low-power microprocessor. In: Proceedings of COMPCON 97 (1997)
Elder, J., Hill, M.: Dinero IV trace-driven uniprocessor cache simulator. [Online]. Available: http://www.cs.wisc.edu/~markhill/DineroIV (1997)
MPC8572E PowerQUICC III Integrated Host Processor Family Reference Manual. Freescale Semiconductors, May (2008)
Flynn M.J.: Computer Architecture—Pipelined and Parallel Processor Design. Jones and Bartlett Publishers Inc., Boston, MA (1995)
ARM1136JF-S and ARM1136J-S. Technical Reference Manual, ARM Limited (2002)
MSC8144 Quad Core Digital Signal Processor. Freescale Semiconductors, Apr. (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Roy, S. H-NMRU: An Efficient Cache Replacement Policy with Low Area. Int J Parallel Prog 38, 277–287 (2010). https://doi.org/10.1007/s10766-010-0130-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-010-0130-9