This paper presents a low-power tag organization for physically tagged caches in embedded processors with virtual memory support. An exceedingly small subset of tag bits is identified for each application hot-spot so that only these tag bits are used for cache access with no performance sacrifice as they provide complete address resolution. The minimal subset of physical tag bits is dynamically updated following the changes in the physical address space of the application. Operating system support is introduced in order to maintain the reduced tags during program execution. Efficient algorithms are incorporated within the memory allocator and the dynamic linker in order to achieve dynamic update of the reduced tags. The only hardware support needed within the I/D-caches is the support for disabling bitlines of the tag arrays. An extensive set of experimental results demonstrates the efficacy of the proposed approach.
Similar content being viewed by others
References
J. Montanaro et al., A 160Mhz, 32b 0.5W CMOS RISC Microprocessor, in Proceedings of the IEEE ISCC, pp. 214–229, IEEE (1996).
P. Petrov and A. Orailoglu, Power Efficient Embedded Processor IP’s through Application-Specific Tag Compression in Data Caches, in Proceedings of the DATE, pp. 1065–1071 (2002).
Petrov P., Orailoglu A., (2004). Tag Compression for Low-Power in Dynamically Customizable Embedded Processors. IEEE Trans. Comput. Aided Des. Integrated Circuits Sys. 23(7):1031–1047
Geppert L., Perry T., (2000). Transmeta’s Magic Show. IEEE Spect. 37(5): 26–33
Furber S.B., (2000). ARM System-on-Chip Architecture. Addison-Wesley Publishing Co, Boston, MA
Y. Zhang, J. Lach, K. Skadron, and M. R. Stan, Odd/Even Bus Invert with Two-Phase Transfer for Busses with Coupling, in Proc. of the International Symposium on Low Power Electronics and Design, pp. 754–757, IEEE (2002).
L. Benini, A. Macii, E. Macii, and M. Poncino, Synthesis of Application-Specific Memories for Power Optimization in Embedded Systems, in Proc. of the 37th Design Automation Conference, pp. 300–303, IEEE (2000).
M. Ekman, F. Dahlgren, and P. Stenstrom, TLB and Snoop Energy-Reduction Using Virtual Caches in Low-Power Chip-Microprocessors, in Proc. of the International Symposium on Low Power Electronics and Design (ISLPED), pp. 243–246, IEEE (2002).
D. Chaver, L. Pinuel, M. Prineto, F. Tirado, and M. Huang, Branch Prediction on Demand: An Energy-Efficient Solution, in Proc. of the High-Performance Computer Architecture, pp. 25–27, IEEE (2003).
M. Kandemir, I. Kadayif, and G. Chen, Compiler-Directed Code Restructuring for Reducing Data TLB Energy, in Proc. of the International Conference on Hardware/Software Codedesign and System Synthesis, CODES+ISSS, pp. 98–103, IEEE (2004).
B. Middha, M. Simpson, and R. Barua, MTSS: Multi Task Stack Sharing for Systems, in Proc. of the CASES ’05: Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 191–201, New York, NY, USA (2005).
Kandemir M., Ramanujam J., Irwin M.J., Vijaykrishnan N., Kadayif I., Parikh A., (2004). A Compiler-Based Approach for Dynamically Managing Scratch-pad Memories in Embedded Systems. IEEE Trans. Comput.-Aided Des. Integrated Circuits Sys. 23(2): 243–260
G. Varatkar, and R. Marculescu, Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization, in ICCAD ’03: Proceedings of the 2003 IEEE/ACM International Conference on Computer-Aided Design, p. 510, IEEE Computer Society, Washington, DC, USA (2003).
W. Yuan and K. Nahrstedt, Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia systems, in SOSP ’03: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 149–163, ACM Press, New York, NY, USA (2003).
Edmondson J. et al., (1995). Internal Organization of the Alpha 21164, a 300 MHz 64-bit Quad-Issue CMOS RISC Microprocessor, Digi. Tech. J. 7(1):119–135
A. Hasegawa et al., Sh3: High Code Density, Low Power, in Proc. of the IEEE Micro, pp. 11–19, IEEE (1995).
R.E. Kessler, R. Joss, A. Lebeck, and M.D. Hill, Inexpensive Implementations Of Set-Associativity, in Proc. of the 16th ISCA, pp. 131–139, IEEE (1989).
K. Inoue, T. Ishihara, and K. Murakami, Way-Predicting Set-Associative Cache for High-Performance and Low Energy Consumption, in Proc. of the ISLPED, pp. 273–275, IEEE (1999).
D. H. Albonesi, Selective Cache Ways: On-Demand Cache Resource Allocation, in Proc. of the 32nd MICRO, pp. 248–259, IEEE (1999).
Cekleov M., Dubois M., (1997). Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors. IEEE Micro 17(5):64–71
J. Kim, S. Min, S. Jeon, B. Ahn, D. Jeong, and C. Kim, U-cache: A Cost-Effective Solution to Synonym Problem, in Proc. of the HPCA, pp. 243–252, IEEE (1995).
Kessler R., (1999). The Alpha 21264 Microprocessor. IEEE Micro 19(1):24–36
Givargis T., (2006). Zero Cost Indexing for Improved Embedded Processor Cache Performance. ACM Trans. Des. Autom. Electron. Sys. (TODAES) 11(1):3–25
N. Bellas, I. Hajj, and C. Polychronopoulos, A Detailed, Transistor-Level Energy Model for SRAM-Based Caches, in Proc. of the ISCAS, pp. 198–201, IEEE (1999).
Austin T., Larson E., Ernst D., (2002). SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Comput. 35(2):59–67
C. Lee, M. Potkonjak, and W. H. Mangione-Smith, MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems, in Proc. of the 30th MICRO, pp. 330–335, IEEE (1997).
E. Witchel, and K. Asanovic, The Span Cache: Software Controlled Tag Checks and Cache Line Size, in Proc. of the Workshop on Complexity-Effective Design, 28th ISCA, IEEE (2001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Petrov, P., Orailoglu, A. Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory. Int J Parallel Prog 35, 157–177 (2007). https://doi.org/10.1007/s10766-006-0030-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-006-0030-1