Abstract
Loop caches provide an effective method for decreasing memory hierarchy energy consumption by storing frequently executed code (critical regions) in a more energy efficient structure than the level one cache. However, due to code structure restrictions or costly design time pre-analysis efforts, previous loop cache designs are not suitable for all applications and system scenarios. We present an adaptive loop cache that is amenable to a wider range of system scenarios, which can provide an additional 20% average instruction cache energy savings (with individual benchmark energy savings as high as 69%) compared to the next best loop cache, the preloaded loop cache.
- Bellas, N., Hajj, I., Polychronopoulos, C., and Stamoulis, G. 1999. Energy and performance improvements in microprocessor design using a loop cache. In Proceedings of IEEE International Conference on Computer Design (ICCD'99). 378. Google ScholarDigital Library
- Burger, D., Austin, T., and Bennet, S. 1996. Evaluating future microprocessors: The SimpleScalar ToolSet. Tech. rep. CS-TR-1308, Computer Science Department, University of Wisconsin-Madison.Google Scholar
- Butts, J. A. and Sohi, G. S. 2000. A static power model for architects. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 33). Google ScholarDigital Library
- Chaver, D., Rojas, M. A., and Pinuel, L. 2005. Energy-aware fetch mechanism: Trace cache and BTB customization. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED 05). Google ScholarDigital Library
- EEMBC. http://www.eembc.org/.Google Scholar
- Eeckhout, L., Vandierendonck, H., and De Bosschere, K. 2002. Workload design: selecting representative program-input pairs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 83--94. Google ScholarDigital Library
- Gordon-Ross, A., Viana, P., Vahid, F., Najjar, W., and Barros, E. 2007. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of Design, Automation and Test in Europe (DATE 07). Google ScholarDigital Library
- Gordon-Ross, A. and Vahid, F. 2002a. Dynamic loop caching metes preloaded loop caching—A hybrid approach. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02). Google ScholarDigital Library
- Gordon-Ross, A., Cotterell, and Vahid, F. 2002b. Exploiting fixed programs in embedded systems: A Loop cache example. Comput. Architect. Letters, 1. Google ScholarDigital Library
- Gordon-Ross, A. and Vahid, F. 2003. Frequent loop detection using efficient non-intrusive on-chip hardware. In Proceedings of the International Conference on Compilers, Architecture and Synthesis For Embedded Systems (CASES'03). Google ScholarDigital Library
- Gordon-Ross, A., Lau, J., and Calder, B. 2008. Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI (GLSVLSI'08). Google ScholarDigital Library
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization. Google ScholarDigital Library
- Hines, S., Whalley, D., and Tyson, G. 2007. Guaranteeing hits to improve the efficiency of a small instruction cache. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. Google ScholarDigital Library
- Kin, J., Gupta, M., and Mangione-Smith, W. H. 1997. The filter cache: an energy efficient memory structure. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. Google ScholarDigital Library
- Lee, L. H., Moyer, W., and Arends, J. 1999. Low cost embedded program loop caching -- Revisited. Tech. rep. CSE-TR-411-99, University of Michigan.Google Scholar
- Malik, A., Moyer, W., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarDigital Library
- Montanaro, J. and Witek, R. 1997. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. Digital Techn. J. Google ScholarDigital Library
- Rivers, J. A., Asaad, S., Wellman, J.-D., and Moreno, J. H. 2003. Reducing instruction fetch energy with backwards branch control information and buffering. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'03). Google ScholarDigital Library
- Rotenberg, E., Bennett, S., and Smith, J. E. 1996. Trace cache: A low latency approach to high bandwidth instruction fetching. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. Google ScholarDigital Library
- Scott, J., Lee, L., Arends, J., and Moyer, B. 1998. Designing the low- power M∼CORE Architecture. Proceedings of the International Symposium on Computer Architecture Power Driven Microarchitecture Workshop. 145--150Google Scholar
- Segars, S. 2001. Low power design for microprocessors. In Proceedings of the International Solid State Circuit Conference.Google Scholar
- Sherwood, T 2003. Discovering and exploiting program phases. IEEE Micro 23, 6, 84--93, 2003. Google ScholarDigital Library
- Shivakumar, P. and Jouppi, N. P. 2001. Cacti3.0: An integrated cache timing and power model. COMPAQ Western Research Lab.Google Scholar
- Smith, M. D. 2000. Overcoming the challenges to feedback-directed optimization. SIGPLAN Not. 35, 7, 1--11. Google ScholarDigital Library
- Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2002. A Study on the loop behavior of embedded programs. Tech. rep. UCR-CSE-01-03, University of California, Riverside.Google Scholar
- Zhang, C. and Vahid, F. 2003. Cache configuration exploration on prototyping platforms. In Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP 03). Google ScholarDigital Library
- Zhang, C., Vahid, F., and Najjar, W. 2003. A highly-configurable cache architecture for embedded systems. In Proceedings of the 30th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Index Terms
- Adaptive loop caching using lightweight runtime control flow analysis
Recommendations
Lightweight runtime control flow analysis for adaptive loop caching
GLSVLSI '10: Proceedings of the 20th symposium on Great lakes symposium on VLSILoop caches provide an effective method for decreasing memory hierarchy energy consumption by storing frequently executed code in a more energy efficient structure than the level one cache. However, due to code structure restrictions and/or costly ...
Tiny instruction caches for low power embedded systems
Instruction caches have traditionally been used to improve software performance. Recently, several tiny instruction cache designs, including filter caches and dynamic loop caches, have been proposed to instead reduce software power. We propose several ...
Tuning of loop cache architectures to programs in embedded system design
ISSS '02: Proceedings of the 15th international symposium on System SynthesisAdding a small loop cache to a microprocessor has been shown to reduce average instruction fetch energy for various sets of embedded system applications. With the advent of core-based design, embedded system designers can now tune a loop cache ...
Comments