Skip to main content
Log in

CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wilcox K, Manne Srilatha. Alpha processors: A history of power issues and a look to the future. Nov. 15th, 1999, http://www.eecs.umich.edu/∼tnm/cool.html.

  2. Manne S, Klauser A, Grunwald D. Pipeline gating: Speculation control for energy reduction. In Proc. 25th Int. Symposium on Computer Architecture, Barcelona, Spain, 1998, pp.132–141.

  3. Montanaro J et al. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 1996, 32(11): 1703–1714.

    Article  Google Scholar 

  4. Kim N S, Flautner K, Blaauw D, Mudge T. Drowsy instruction caches. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.219–230.

  5. Chang Y, Ruan S, Lai F. Design and analysis of low-power cache using two-level filter scheme. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2003, 11(4): 568–580.

    Article  Google Scholar 

  6. Kadayif I, Sivasubramaniam A, Kandemir M, Kandiraju G, Chen G. Generating physical addresses directly for saving instruction TLB energy. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.185–196.

  7. Bellas N, Hajj I N, Polychronopoulos C D, Stamoulis G. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. Very Large Scale Integration Systems, 2000, 8(3): 317–326.

    Article  Google Scholar 

  8. Su C L, Despain A M. Cache design for energy efficiency. In Proc. 28th Int. System Sciences Conference, Hawaii, USA, 1995, pp.306–315.

  9. Ghose K, Kamble M B. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, CA, USA, 1999, pp.70–75.

  10. Powell M D, Agarwal A, Vijaykumar T N, Falsafi B, Roy K. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proc. Int. Symposium on Microarchitecture, Austin, Texas, USA, 2001, pp.54–65.

  11. Powell M D, Yang S, Falsafi B, Roy K, Vijaykumar T M. Reducing leakage in a high-performance deep submicron instruction cache. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2001, 9(1): 77–89.

    Article  Google Scholar 

  12. Kim N S, Flautner K, Blaauw D, Mudge T. Circuit and microarchitectural techniques for reducing cache leakage power. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2004, 12(2): 167–184.

    Article  Google Scholar 

  13. Agarwal A, Li H, Roy K. DRG-cache: A data retention gated-ground cache for low power. In Proc. Design Automation Conference, New Orleans, LA, USA, 2002, pp.473–478.

  14. Heo S, Barr K, Hampton M, Asanovic K. Dynamic fine-grain leakage reduction using leakage-biased bitlines. In Proc. Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.137–147.

  15. Soontae K, Vijaykrishnan N, Kandemir M, Irwin M J. Predictive precharging for bitline leakage energy reduction. In Proc. IEEE ASIC/SOC Conference, 2002, pp.36–40.

  16. Kim N S, Flautner K, Blaauw D, Mudge T. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.54–57.

  17. Lee J, Park G, Park S, Kim S. A selective filter-bank TLB system. In Proc. Int. Symposium on Low Power Electronics and Design, Seoul, Korea, 2003, pp.312–317.

  18. Fan D, Tang Z, Huang H, Gao G. An energy efficient TLB design methodology. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, California, USA, 2005, pp.351–356.

  19. Smith J E, Sohi G S. The microarchitecture of superscalar processors. Proc. the IEEE, 1995, 83(12): 1609–1624.

    Article  Google Scholar 

  20. Horel T, Lauterbach G. UltraSPARC-III: Designing third-generation 64-bit performance. IEEE Micro, 1999, 19(3): 73–85.

    Article  Google Scholar 

  21. Inoue K, Moshnyaga V G, Murakami K. A low energy set-associative I-Cache with extended BTB. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.187–192.

  22. Reinman G, Jouppi N. CACTI 2.0: An integrated cache timing and power model. Compaq, Palo Alto, CA, WRL Res. Rep., July 2000.

  23. Seznec A, Felix S, Krishnam V, Sazeides Y. Design tradeoffs for the Alpha EV8 conditional branch predictor. In Proc. 29th Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.295–306.

  24. Hossain A, Pease D J, Burns J S, Parveen N. Trace cache performance parameters. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.348–355.

  25. Hu J S, Vijaykrishnan N, Irwin M J, Kandemir M. Using dynamic branch behavior for power-efficient instruction fetch. In Proc. the IEEE Computer Society Annual Symposium on VLSI, Tampa, Florida, USA, 2003, pp.127–132.

  26. Zhang Y, Parikh D, Sankaranarayanan K, Skadron K, Stan M R. Hotleakage: An architectural, temperature-aware model of subthreshold and gate leakage. Tech. Report CS–2003–05, Department of Computer Sciences, University of Virginia, Virginia, USA, Mar. 2003.

  27. Burger D C, Austin T M. The SimpleScalar tool set, Version 2.0. Computer Architecture News, New York, USA, 1997, 25(3): 13–25.

  28. Brooks D, Tiwari V, Martonosi M. Wattch: A framework for architectural power analysis and optimizations. In Proc. 27th Int. Symposium on Computer Architecture, British Columbia, Canada, 2000, pp.83–94.

  29. Shivakumar P, Jouppi N. CACTI 3.0: An integrated cache timing, power, and area model. Compaq, Palo Alto, CA, WRL Res. Rep., Feb. 2001.

  30. Standard Performance Evaluation Corp. http://www. specbench.org.

  31. Baniasadi A, Moshovos A. SEPAS: A highly accurate and energy-efficient branch predictor. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.38–43.

  32. Deris K J, Baniasadi A. SABA: A zero timing overhead power-aware BTB for high-performance processors. Workshop on Unique Chips and Systems held in conjunction with IEEE International Symposium on Performance Analysis of Systems and Software, Austin, Texas, USA, 2006.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han-Xin Sun.

Additional information

Supported by the National High Technology Development 863 Program of China under Grant No. 2004AA1Z1010.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 82.7 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, HX., Yang, KP., Zhao, YL. et al. CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs. J. Comput. Sci. Technol. 23, 141–153 (2008). https://doi.org/10.1007/s11390-008-9117-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9117-z

Keywords

Navigation