Skip to main content
Log in

An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The performance of superscalar processors depends on many parameters with correlated effects. This paper explores the relations between some of these parameters, and more particularly, the requirement in instruction fetch bandwidth. We introduce new enhancements to increase the bandwidth of conventional instruction fetch engines. However, experiments show that the performance does not increase proportionally to the fetch. Once the measured IPC is half the instruction fetch bandwidth, increasing the fetch bandwidth brings very little improvement. In order to better understand this behavior, we develop a model from the empirical observation that the available instruction parallelism grows as the square root of the instruction window size. From the model, we derive that the fetch bandwidth requirement grows as the square root of the distance between mispredicted branches. We also verify experimentally that, to double the IPC, one should both double the fetch bandwidth and decrease the number of mispredicted branches fourfold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Keith Diefendorff, Hal makes Sparcs fly, Microprocessor Report 13(15):1-12 (November 1999).

    Google Scholar 

  2. E. Rotenberg, S. Bennett, and J. E. Smith, Trace cache: A low latency approach to high bandwidth instruction fetching, Proc. 29th Int'l. Symp. on Microarchitecture (1996).

  3. T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proc. 22nd Ann. Int'l. Symp. on Computer Architecture (1995).

  4. A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud, Multiple-block ahead branch predictor, Proc. Seventh Int'l. Conf. Architectural Support for Progr. Lang. Operat. Syst. (1996).

  5. T.-Y. Yeh, D. T. Marr, and Y. N. Patt, Increasing the instruction fetch rate via multiple branch prediction and a branch address cache, Proc. Seventh ACM Int'l. Conf. on Super-computing (July 1993).

  6. Tse-Yu Yeh and Yale Patt, Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors, Proc. 26th Int'l. Symp. on Microarchitedcture (1993).

  7. P.-Y. Chang, E. Hao, and Y. N. Patt, Target prediction for indirect jumps, Proc. 24th Ann. Int'l. Symp. on Computer Architecture (1997).

  8. R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer, Coping with code bloat, Proc. 22nd Ann. Int'l. Symp. on Computer Architecture (June 1995).

  9. P. Michaud, A. Seznec, and R. Uhlig, Trading conflict and capacity aliasing in conditional branch predictors,Proc. 24th Ann. Int'l. Symp. on Computer Architecture (1997).

  10. Karel Driesen and Urs Holzle, The cascaded predictor: Economical and adaptive branch target prediction, Proc. 31st Ann. Int'l. Symp. on Microarchitecture (1998).

  11. Brad Calder and Dirk Grunwald, Reducing branch costs via branch alignment, Proc. Sixth Int'l. Conf. Architectural Support for Progr. Lang. Operat. Syst. (1994).

  12. Pierre Michaud, André Seznec, and Stéphan Jourdan, Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors, Proc. Int'l. Conf. Parallel Architectures and Compilation Techniques (October 1999).

  13. Edward Riseman and Caxton Foster, The inhibition of potential parallelism by conditional jumps, IEEE Trans. on Computer Architectures C-21(12):1405-1411 (December 1972).

    Google Scholar 

  14. A. Klauser, T. Austin, D. Grunwald, and B. Calder, Dynamic Hammock predication for nonpredicated instruction set architectures,Proc. Int'l. Conf. on Parallel Architectures and Compilation Techniques (1998).

  15. Artur Klauser, Abhijit Paithankar, and Dirk Grunwald, Selective Eager execution on the polypath architecture, Proc. 25th Ann. Int'l. Symp. on Computer Architecture (1998).

  16. Scott A. Mahlke, Richard E. Hank, Roger A. Bringmann, John C. Gyllenhaal, David M. Gallagher, and Wen-mei W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. 27th Ann. Int'l. Symp. on Microarchitecture (1994).

  17. S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz, A novel renaming scheme to exploit value temporal locality through physical register reuse and unification, Proc. 31st Ann. Int'l. Symp. on Microarchitecture (1998).

  18. M. H. Lipasti and J. P. Shen, Exceeding the dataflow limit with value prediction, Proc. 29th Int'l. Symp. on Microarchitecture (1996).

  19. Y. Sazeides, S. Vassiliadis, and J. E. Smith, The performance potential of data dependence speculation and collapsing, Proc. 29th Int'l. Symp. on Microarchitecture (1996).

  20. A. Sodani and G. S. Sohi, Dynamic instruction reuse, Proc. 24th Ann. Int'l. Symp. on Computer Architecture (1997). 58 Michaud, Seznec, and Jourdan

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Michaud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Michaud, P., Seznec, A. & Jourdan, S. An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors. International Journal of Parallel Programming 29, 35–58 (2001). https://doi.org/10.1023/A:1026431920605

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026431920605

Navigation