Abstract
An energy-efficient data cache organization for embedded processors with virtual memory is proposed. Application knowledge regarding memory references is used to eliminate most tag translations. A novel tagging scheme is introduced, where both virtual and physical tags coexist. Physical tags and special handling of superset index bits are only used for references to shared regions in order to avoid cache inconsistency. By eliminating the need for most address translations on cache access, a significant power reduction is achieved. We outline an efficient hardware architecture, where the application information is captured in a reprogrammable way and the cache is minimally modified.
- ARM, Ltd. 1995. ARM920T Technical Reference Manual. ARM, Ltd.Google Scholar
- Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Comput. 35, 2 (Feb.), 59--67. Google ScholarDigital Library
- Benini, L., Macii, A., and Poncino, M. 2003. Energy-Aware design of embedded memories: A survey of technologies, architectures, and optimization techniques. ACM Trans. Embed. Comput. Syst. 2, 1, 5--32. Google ScholarDigital Library
- Benini, L., Menichelli, F., and Olivieri, M. 2004. A class of code compression schemes for reducing power consumption in embedded microprocessor systems. IEEE Trans. Comput. 53, 4, 467--482. Google ScholarDigital Library
- Calder, B., Krintz, C., John, S., and Austin, T. 1998. Cache-Conscious data placement. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 139--149. Google ScholarDigital Library
- Cekleov, M. and Dubois, M. 1997. Virtual-Address caches. Part 1: Problems and solutions in uniprocessors. IEEE Micro. 17, 5 (Sept.), 64--71. Google ScholarDigital Library
- Chilimbi, T. M., Hill, M. D., and Larus, J. R. 1999. Cache-Conscious structure layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1--12. Google ScholarDigital Library
- Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. TLB and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). 243--246. Google ScholarDigital Library
- Intel Corp. 2007. Intel XScale microarchitecture. Intel Corporation.Google Scholar
- Jacob, B. and Mudge, T. 1998. Virtual memory: Issues of implementation. IEEE Comput. 31, 6 (Jun.), 33--43. Google ScholarDigital Library
- Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing TLB power requirements. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), 196--201. Google ScholarDigital Library
- Kadayif, I., Nath, P., Kandemir, M., and Sivasubramaniam, A. 2004. Compiler-Directed physical address generation for reducing DTLB power. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), 161--168. Google ScholarDigital Library
- Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., and Chen, G. 2002. Generating physical addresses directly for saving instruction TLB energy. In Proceedings of the International Symposium on Microarchitecture (MICRO), 185. Google ScholarDigital Library
- Kandemir, M., Kadayif, I., and Chen, G. 2004. Compiler-Directed code restructuring for reducing data TLB energy. In Proceedings of the International Conference on Hardware/Software Codedesign and System Synthesis (CODES and ISSS), 98--103. Google ScholarDigital Library
- Kim, J., Min, S., Jeon, S., Ahn, B., Jeong, D., and Kim, C. 1995. U-Cache: A cost-effective solution to synonym problem. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 243--252. Google ScholarDigital Library
- Kulkarni, C., Ghez, C., Miranda, M., Catthoor, F., and Man, H. D. 2005. Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications. IEEE Trans. Comput. 54, 1, 76--81. Google ScholarDigital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture (MICRO), 330--335. Google ScholarDigital Library
- Lee, J. H., Lee, J. S., Jeong, S., and Kim, S. 2001. A banked-promotion TLB for high performance and low power. In Proceedings of the IEEE International Conference on Computer Design (ICCD), 118--123. Google ScholarDigital Library
- Middha, B., Simpson, M., and Barua, R. 2005. MTSS: Multi task stack sharing for embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 191--201. Google ScholarDigital Library
- Panda, P. R., Catthoor, F., Dutt, N. D., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., and Kjeldsberg, P. G. 2001. Data and memory optimization techniques for embedded systems. ACM Trans. Des. Autom. Electron. Syst. 6, 2, 149--206. Google ScholarDigital Library
- Petrov, P., Tracy, D., and Orailoglu, A. 2005. Energy-Efficient physically tagged caches for embedded processors with virtual memory. In Proceedings of the IEEE/ACM Design Automation Conference (DAC), 17--22. Google ScholarDigital Library
- Qiu, X. and Dubois, M. 2001. Towards virtually-addressed memory hierarchies. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 51--62. Google ScholarDigital Library
- Simpson, M., Middha, B., and Barua, R. 2005. Segment protection for embedded systems using run-time checks. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 66--77. Google ScholarDigital Library
- Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. Cacti 4.0: An integrated cache timing, power and area model. Tech. Rep., HP Laboratories, Palo Alto, California, June.Google Scholar
- Vratonjic, M., Zeydel, B., and Oklobdzija, V. 2005. Low- and ultra low-power arithmetic units: Design and comparison. In Proceedings of the International Conference on Computer Design (ICCD), 249--252. Google ScholarDigital Library
- Woo, D., Ghosh, M., Ozer, E., Biles, S., and Lee, H.-H. 2006. Reducing energy of virtual cache synonym lookup using bloom filters. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), 179--189. Google ScholarDigital Library
Index Terms
- Heterogeneously tagged caches for low-power embedded systems with virtual memory support
Recommendations
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory
This paper presents a low-power tag organization for physically tagged caches in embedded processors with virtual memory support. An exceedingly small subset of tag bits is identified for each application hot-spot so that only these tag bits are used for ...
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited ...
Comments