ABSTRACT
A highly-efficient fetch unit is essential not only to obtain good performance but also to achieve energy efficiency. However, existing designs are inflexible and depending on program behavior, can be either insufficient or an overkill. We introduce a phase-based adaptive fetch mechanism that can be dynamically adjusted based on feedback information of the program behavior. This design adds very little hardware complexity and relegates complex tasks to the software components. It is also very effective: saving 26.8% and 34.1% fetch energy on average compared with a conventional and a trace cache-based fetch unit, respectively. At the same time, performance is improved by 5.7% and 0.6%, respectively
- E. Rotenberg, S. Bennett, and J. E. Smith. Trace Cache: A low latency approach to high bandwidth instruction fetching. International Symposium on Microarchitecture. November, 1996. Google ScholarDigital Library
- J. S. Hu, N. Vijaykrishnan, M. J. Irwin, M. Kandemir. Optimizing Power Efficiency in Trace Cache Fetch Unit. Technical Report, Department of Computer Science and Engineering, Pennsylvania State University, 2003.Google Scholar
- J. S. Hu, N. Vijaykrishnan, M. J. Irwin, M. Kandemir. Using Dynamic Branch Behavior for Power-Efficient Instruction Fetch. International Symposium on VLSI, February, 2003. Google ScholarDigital Library
- D. H. Albonesi. Selective Cache Ways: On-Demand Cache Resource Allocation. Journal of Instruction-Level Parallelism, Vol. 2. May, 2000.Google Scholar
- S. Yang, M. D. Powell, Babak Falsafi, K. Roy, and T. N. Vijaykumar. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches. International Symposium on High-Performance Computer Architecture, January, 2001. Google ScholarDigital Library
- M. C. Huang, D. Chaver, L. Pinuel, M. Prieto, and F. Tirado. Customizing the Branch Predictor to Reduce Complexity and Energy Consumption. IEEE Micro 23(5):12--25, September 2003. Google ScholarDigital Library
- Wei Liu, and M. C. Huang. EXPERT: Expedited Simulation Exploiting Program Behavior Repetition. International Conference on Supercomputing, June 2004. Google ScholarDigital Library
- M. C. Huang, J. Renau, J. Torrellas. Positional Adaptation of Processors: Application to Energy Reduction. International Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
- T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. McGraw-Hill, 1989, pp. 333--336. Google ScholarDigital Library
- T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. Computer, 35(2), February 2002. Google ScholarDigital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations., International Symposium on Computer Architecture, July, 2001. Google ScholarDigital Library
- T. Y. Yeh and Y. N. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. International Symposium on Computer Architecture, May, 1992. Google ScholarDigital Library
- A. Buyuktosunoglu, T. Karkhanis, D. H. Albonesi, and P. Bose. Energy Efficient Co-Adaptive Instruction Fetch and Issue. Computer Architecture News, Vol. 31. May, 2003. Google ScholarDigital Library
- O. J. Santana, A. Ramirez, M. Valero. Reducing Fetch Architecture Complexity Using Procedure Inlining. INTERACT-8, Madrid, Spain. February 2004.Google Scholar
- A. Ramirez, O. J. Santana. Fetching instruction streams. International Symposium on Microarchitecture, November, 2002. Google ScholarDigital Library
- Michele Co and Kevin Skadron. Evaluating the Energy Efficiency of Trace Caches. Technical Report CS-2003-19, University of Virginia, 2003.Google Scholar
Index Terms
- Energy-aware fetch mechanism: trace cache and BTB customization
Recommendations
An Instruction Fetch Unit for a High-Performance Personal Computer
The instruction fetch unit (IFU) of the Dorado personal computer speeds up the emulation of instructions by prefetching, decoding, and preparing later instructions in parallel with the execution of earlier ones. It dispatches the machine's microcoded ...
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors
PACT '99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation TechniquesThe effective performance of wide-issue superscalar processors depends on many parameters, such as branch prediction accuracy, available instruction-level parallelism, and instruction-fetch bandwidth. This paper explores the relations between some of ...
Block-aware instruction set architecture
Instruction delivery is a critical component for wide-issue, high-frequency processors since its bandwidth and accuracy place an upper limit on performance. The processor front-end accuracy and bandwidth are limited by instruction-cache misses, ...
Comments