Skip to main content

Energy-Effective Instruction Fetch Unit for Wide Issue Processors

  • Conference paper
Advances in Computer Systems Architecture (ACSAC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3740))

Included in the following conference series:

  • 1098 Accesses

Abstract

Continuing advances in semiconductor technology and demand for higher performance will lead to more powerful, superpipelined and wider issue processors. Instruction caches in such processors will consume a significant fraction of the on-chip energy due to very wide fetch on each cycle. This paper proposes a new energy-effective design of the fetch unit that exploits the fact that not all instructions in a given I-cache fetch line are used due to taken branches. A Fetch Mask Determination unit is proposed to detect which instructions in an I-cache access will actually be used to avoid fetching any of the other instructions. The solution is evaluated for a 4-, 8- and 16-wide issue processor in 100nm technology. Results show an average improvement in the I-cache Energy-Delay product of 20% for the 8-wide issue processor and 33% for the 16-wide issue processor for the SPEC2000, with no negative impact on performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. IBM RISC System/6000 Processor Architecture Manual

    Google Scholar 

  2. Aragón, J.L., González, J., González, A.: Power-Aware Control Speculation through Selective Throttling. In: Proc. Int. Symp. on High Performance Computer Architecture (HPCA 2003) (February 2003)

    Google Scholar 

  3. Aragón, J.L., Nicolaescu, D., Veidenbaum, A., Badulescu, A.M.: Energy–Efficient Design for Highly Associative Instruction Caches in Next–Generation Embedded Processors. In: Proc. of the Int. Conference on Design, Automation and Test in Europe (DATE 2004) (February 2004)

    Google Scholar 

  4. Bahar, I., Albera, G., Manne, S.: Power and Performance Trade-Offs Using Various Caching Strategies. In: Proc. of the Int. Symp. on Low-Power Electronics and Design (1998)

    Google Scholar 

  5. Brooks, D., Tiwari, V., Martonosi, M.: Wattch: A Frame-Work for Architectural-Level Power Analysis and Optimizations. In: Proc. of the Int. Symp. on Computer Architecture (2000)

    Google Scholar 

  6. Clark, L.T., et al.: An embedded 32b microprocessor core for low-power and high-performance applications. IEEE Journal of Solid State Circuits 36(11) (November 2001)

    Google Scholar 

  7. Clark, L.T., Choi, B., Wilkerson, M.: Reducing Translation Lookaside Buffer Active Power. In: Proc. of the Int. Symp. on Low Power Electronics and Design (2003)

    Google Scholar 

  8. Ghose, K., Kamble, M.B.: Reducing Power in Superscalar Processor Caches using Subbanking, Multiple Line Buffers and Bit-line Segmentation. In: Proc. Int. Symp. on Low Power Electronics and Design, pp. 70–75 (1999)

    Google Scholar 

  9. Gowan, M.K., Biro, L.L., Jackson, D.B.: Power Considerations in the Design of the Alpha 21264 Microprocessor. In: Proc. of the Design Automation Conference (June 1998)

    Google Scholar 

  10. Hasegawa, A., et al.: SH3: High Code Density, Low Power. IEEE Micro 15(6), 11–19 (1995)

    Article  Google Scholar 

  11. Inoue, K., Ishihara, T., Murakami, K.: Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption. In: Proc. Int. Symp. on Low Power Electronics and Design, August 1999, pp. 273–275 (1999)

    Google Scholar 

  12. Kamble, M.B., Ghose, K.: Analytical Energy Dissipation Models for Low Power Caches. In: Proc. Int. Symp. on Low-Power Electronics and Design (August 1997)

    Google Scholar 

  13. Kin, J., Gupta, M., Mangione-Smith, W.H.: The Filter Cache: An Energy Efficient Memory Structure. In: Proc. Int. Symp. on Microarchitecture (December 1997)

    Google Scholar 

  14. Krewell, K.: IBM’s Power4 Unveiling Continues. Microprocessor Report (November 2000)

    Google Scholar 

  15. Ma, A., Zhang, M., Asanovic, K.: Way Memoization to Reduce Fetch Energy in Instruction Caches. In: ISCA Workshop on Complexity-Effective Design (July 2001)

    Google Scholar 

  16. Memik, G., Reinman, G., Mangione-Smith, W.H.: Reducing Energy and Delay using Efficient Victim Caches. In: Proc. Int. Symp. on Low Power Electronics and Design (2003)

    Google Scholar 

  17. Montanaro, J., et al.: A 160Mhz, 32b, 0.5W CMOS RISC Microprocessor. IEEE Journal of Solid State Circuits 31(11), 1703–1712 (1996)

    Article  Google Scholar 

  18. Nicolaescu, D., Veidenbaum, A.V., Nicolau, A.: Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors. In: Proc. Int. Conf. on Design, Automation and Test in Europe (DATE 2003), March 2003, pp. 11064–11069 (2003)

    Google Scholar 

  19. Nicolaescu, D., Veidenbaum, A.V., Nicolau, A.: Reducing Data Cache Energy Consumption via Cached Load/Store Queue. In: Proc. Int. Symp. on Low Power Electronics and Design (ISLPED 2003), August 2003, pp. 252–257 (2003)

    Google Scholar 

  20. Powell, M.D., Agarwal, A., Vijaykumar, T., Falsafi, B., Roy, K.: Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-Mapping. In: Proc. Int. Symp. on Microarchitecture (December 2001)

    Google Scholar 

  21. Rotenberg, E., Bennett, S., Smith, J.E.: Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching. In: Proc. of the 29th Int. Symp. on Microarchitecture (November 1996)

    Google Scholar 

  22. Shivakumar, P., Jouppi, N.P.: Cacti 3.0: An Integrated Cache Timing, Power and Area Model. Tech. Report 2001/2, Digital Western Research Lab (2001)

    Google Scholar 

  23. Su, C., Despain, A.: Cache Design Tradeoffs for Power and Performance Optimization: A Case Study. In: Proc Int. Symp. on Low Power Design (1995)

    Google Scholar 

  24. Tang, W., Veidenbaum, A.V., Nicolau, A., Gupta, R.: Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, pp. 120–132. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  25. Yeh, T.Y., Patt, Y.N.: A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History. In: Proc. of the Int. Symp. on Computer Architecture, pp. 257–266 (1993)

    Google Scholar 

  26. Yoshimito, M., Anami, K., Shinohara, H., Yoshihara, T., Takagi, H., et al.: A Divided Word-Line Structure in the Static RAM and its Application to a 64k Full CMOS RAM. IEEE J. Solid-State Circuits SC-18, 479–485 (1983)

    Article  Google Scholar 

  27. Zhang, M., Asanovic, K.: Highly-Associative Caches for Low-power processors. In: Proc. Kool Chips Workshop, 33rd Int. Symp. on Microarchitecture (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aragón, J.L., Veidenbaum, A.V. (2005). Energy-Effective Instruction Fetch Unit for Wide Issue Processors. In: Srikanthan, T., Xue, J., Chang, CH. (eds) Advances in Computer Systems Architecture. ACSAC 2005. Lecture Notes in Computer Science, vol 3740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11572961_3

Download citation

  • DOI: https://doi.org/10.1007/11572961_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29643-0

  • Online ISBN: 978-3-540-32108-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics