Skip to main content
Log in

Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, called block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. We have constructed a compiler that generates block-structured ISA code, and a simulator that models the execution of that code on a block-structured ISA processor. We show that for the SPECint95 benchmarks, the block-structured ISA improves the performance of an aggressive wide issue, dynamically scheduled processor by 15% while using simpler microarchitectural mechanisms to support wide issue and dynamic scheduling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. T.-Y. Yeh, D. Marr, and Y. N. Patt, Increasing the instruction fetch rate via multiple branch prediction and branch address cache, Proc. Intl. Conf. Supercomputing, pp. 67–76 (1993).

  2. T. M. Conte, K. N. Menezes, P. M. Mills, and B. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proc. 22nd Ann. Intl. Symp. Computer Archit., pp. 333–344 (1995).

  3. S. Dutta and M. Franklin, Control flow prediction with tree-like subgraphs for superscalar processors, Proc. of the 28th Ann. ACM/IEEE Intl. Symp. Microarchitecture, pp. 258–263 (1995).

  4. A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud, Multiple-block ahead branch predictors, Proc. 7th Intl. Conf. Architectural Support for Progr. Lang. and Oper. Syst. (1996).

  5. J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. Computers, C-30(7):478–490 (July 1981).

    Google Scholar 

  6. W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The superblock: An effective technique for VLIW and superscalar compilation, J. Supercomputing 7:9–50 (1993).

    Google Scholar 

  7. S. Melvin and Y. N. Patt, Exploiting fine-grained parallelism through a combination of hardware and software techniques, Proc. 18th Ann. Intl. Symp. Computer Archit., pp. 287–297 (1991).

  8. S. Melvin and Y. Patt, Enhancing instruction scheduling with a block-structured ISA, Intl. J. Pa. Pro. 23(3):221–243 (1995).

    Google Scholar 

  9. E. Sprangle and Y. Patt, Facilitating superscalar processing via a combined static-dynamic register renaming scheme, Proc. 27th Ann. ACM/IEEE Intl. Symp. on Microarchit., pp. 143–147 (1994).

  10. M. Franklin and G. S. Sohi, Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors, Proc. 25th Annual ACM/IEEE Intl. Symp. Microarchit., pp. 236–245 (1992).

  11. A. V. Aho, R. Sethi, and J. D. Ullman, Compilers Principles, Techniques, and Tools, Addison-Wesley Publishing Company (1986).

  12. P. Hsu and E. Davidson, Highly concurrent scalar processing, Proc. 13th Ann. Intl. Symp. Computer Archit. (1986).

  13. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. 25th Ann. ACM/IEEE Intl. Symp. Microarchit., pp. 45–54 (1992).

  14. D. N. Pnevmatikatos and G. S. Sohi, Guarded execution and dynamic branch prediction in dynamic ILP processors, Proc. 21st Ann. Intl. Symp. Computer Archit., pp. 120–129 (1994).

  15. J. A. Fisher, 2n-way jump microinstruction hardware and an effective instruction binding method, Proc. 13th Ann. Microprogr. Workshop, pp. 64–75 (1980).

  16. K. Karplus and A. Nicolau, Efficient hardware for multi-way jumps and prefetches, Proc. 18th Ann. Microprogr. Workshop, pp. 11–18 (1985).

  17. K. Ebcioğlu, Some design ideas for a VLIW architecture for sequential natured software. Parallel Processing (Proc. IFIP WG 10.3 Working Conf. Parallel Processing), pp. 3–21 (April 1988).

  18. S.-M. Moon and K. Ebcioğlu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. 25th Ann. ACM/IEEE Intl. Symp. Microarchit., pp. 55–71 (1992).

  19. E. Rotenberg, S. Bennett, and J. E. Smith, Trace cache: a low latency approach to high bandwidth instruction fetching, Proc. 29th Ann. ACM/IEEE Intl. Symp. Microarchit. (1996).

  20. S. J. Patel, D. H. Friendly, and Y. N. Patt, Critical issues regarding the trace cache fetch mechanism, Technical Report CSE-TR-335–97, University of Michigan (May 1997).

  21. S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. 27th Ann. ACM/IEEE Intl. Symp. Microarchit., pp. 217–227 (1994).

  22. M. Franklin and G. S. Sohi, The expandable split window paradigm for exploiting finegrain parallelism, Proc. 19th Ann. Intl. Symp. Computer Archit., pp. 58–67 (1992).

  23. G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, Multiscalar processors, Proc. 22nd Ann. Intl. Symp. Computer Archit. (1995).

  24. Intel Corporation, Intel Reference C Compiler User's Guide for UNIX Systems, 1993.

  25. Y. Patt, W. Hwu, and M. Shebanowv, HPS, a new microarchitecture: Rationale and introduction, Proc. 18th Ann. Microprogr. Workshop, pp. 103–107 (1985).

  26. Y. N. Patt, S. W. Melvin, W. Hwu, and M. C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, Proc. 18th Ann. Microprogr. Workshop, pp. 109–116 (1985).

  27. W. W. Hwu and Y. N. Patt, Checkpoint repair for high-performance out-of-order execution machines, IEEE Trans. Computers C-36(12) (December 1987).

  28. S. McFarling, Combining branch predictors, Technical Report TN-36, Digital Western Research Laboratory, ( June 1993).

  29. T.-Y. Yeh and Y. N. Patt, Alternative implementations of two-level adaptive branch prediction, Proc. 19th Ann. Intl. Symp. Computer Archit., pp. 124–134 (1992).

  30. J. E. Smith, A study of branch prediction strategies, Proc. Eighth Ann. Intl. Symp. Computer Archit., pp. 135–148 (1981).

  31. J. K. F. Lee and A. J. Smith, Branch prediction strategies and branch target buffer design, IEEE Computer, pp. 6–22 ( January 1984).

  32. Q. Jacobson, E. Rotenberg, and J. E. Smith, Path-based next trace prediction, Proc. 30th Ann. IEEE/ACM Intl. Symp. Microarchitecture (1997).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, E., Chang, PY., Evers, M. et al. Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures. International Journal of Parallel Programming 26, 449–478 (1998). https://doi.org/10.1023/A:1018702632204

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018702632204