Abstract
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, called block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. We have constructed a compiler that generates block-structured ISA code, and a simulator that models the execution of that code on a block-structured ISA processor. We show that for the SPECint95 benchmarks, the block-structured ISA improves the performance of an aggressive wide issue, dynamically scheduled processor by 15% while using simpler microarchitectural mechanisms to support wide issue and dynamic scheduling.
Similar content being viewed by others
REFERENCES
T.-Y. Yeh, D. Marr, and Y. N. Patt, Increasing the instruction fetch rate via multiple branch prediction and branch address cache, Proc. Intl. Conf. Supercomputing, pp. 67–76 (1993).
T. M. Conte, K. N. Menezes, P. M. Mills, and B. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proc. 22nd Ann. Intl. Symp. Computer Archit., pp. 333–344 (1995).
S. Dutta and M. Franklin, Control flow prediction with tree-like subgraphs for superscalar processors, Proc. of the 28th Ann. ACM/IEEE Intl. Symp. Microarchitecture, pp. 258–263 (1995).
A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud, Multiple-block ahead branch predictors, Proc. 7th Intl. Conf. Architectural Support for Progr. Lang. and Oper. Syst. (1996).
J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. Computers, C-30(7):478–490 (July 1981).
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The superblock: An effective technique for VLIW and superscalar compilation, J. Supercomputing 7:9–50 (1993).
S. Melvin and Y. N. Patt, Exploiting fine-grained parallelism through a combination of hardware and software techniques, Proc. 18th Ann. Intl. Symp. Computer Archit., pp. 287–297 (1991).
S. Melvin and Y. Patt, Enhancing instruction scheduling with a block-structured ISA, Intl. J. Pa. Pro. 23(3):221–243 (1995).
E. Sprangle and Y. Patt, Facilitating superscalar processing via a combined static-dynamic register renaming scheme, Proc. 27th Ann. ACM/IEEE Intl. Symp. on Microarchit., pp. 143–147 (1994).
M. Franklin and G. S. Sohi, Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors, Proc. 25th Annual ACM/IEEE Intl. Symp. Microarchit., pp. 236–245 (1992).
A. V. Aho, R. Sethi, and J. D. Ullman, Compilers Principles, Techniques, and Tools, Addison-Wesley Publishing Company (1986).
P. Hsu and E. Davidson, Highly concurrent scalar processing, Proc. 13th Ann. Intl. Symp. Computer Archit. (1986).
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. 25th Ann. ACM/IEEE Intl. Symp. Microarchit., pp. 45–54 (1992).
D. N. Pnevmatikatos and G. S. Sohi, Guarded execution and dynamic branch prediction in dynamic ILP processors, Proc. 21st Ann. Intl. Symp. Computer Archit., pp. 120–129 (1994).
J. A. Fisher, 2n-way jump microinstruction hardware and an effective instruction binding method, Proc. 13th Ann. Microprogr. Workshop, pp. 64–75 (1980).
K. Karplus and A. Nicolau, Efficient hardware for multi-way jumps and prefetches, Proc. 18th Ann. Microprogr. Workshop, pp. 11–18 (1985).
K. Ebcioğlu, Some design ideas for a VLIW architecture for sequential natured software. Parallel Processing (Proc. IFIP WG 10.3 Working Conf. Parallel Processing), pp. 3–21 (April 1988).
S.-M. Moon and K. Ebcioğlu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. 25th Ann. ACM/IEEE Intl. Symp. Microarchit., pp. 55–71 (1992).
E. Rotenberg, S. Bennett, and J. E. Smith, Trace cache: a low latency approach to high bandwidth instruction fetching, Proc. 29th Ann. ACM/IEEE Intl. Symp. Microarchit. (1996).
S. J. Patel, D. H. Friendly, and Y. N. Patt, Critical issues regarding the trace cache fetch mechanism, Technical Report CSE-TR-335–97, University of Michigan (May 1997).
S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. 27th Ann. ACM/IEEE Intl. Symp. Microarchit., pp. 217–227 (1994).
M. Franklin and G. S. Sohi, The expandable split window paradigm for exploiting finegrain parallelism, Proc. 19th Ann. Intl. Symp. Computer Archit., pp. 58–67 (1992).
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, Multiscalar processors, Proc. 22nd Ann. Intl. Symp. Computer Archit. (1995).
Intel Corporation, Intel Reference C Compiler User's Guide for UNIX Systems, 1993.
Y. Patt, W. Hwu, and M. Shebanowv, HPS, a new microarchitecture: Rationale and introduction, Proc. 18th Ann. Microprogr. Workshop, pp. 103–107 (1985).
Y. N. Patt, S. W. Melvin, W. Hwu, and M. C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, Proc. 18th Ann. Microprogr. Workshop, pp. 109–116 (1985).
W. W. Hwu and Y. N. Patt, Checkpoint repair for high-performance out-of-order execution machines, IEEE Trans. Computers C-36(12) (December 1987).
S. McFarling, Combining branch predictors, Technical Report TN-36, Digital Western Research Laboratory, ( June 1993).
T.-Y. Yeh and Y. N. Patt, Alternative implementations of two-level adaptive branch prediction, Proc. 19th Ann. Intl. Symp. Computer Archit., pp. 124–134 (1992).
J. E. Smith, A study of branch prediction strategies, Proc. Eighth Ann. Intl. Symp. Computer Archit., pp. 135–148 (1981).
J. K. F. Lee and A. J. Smith, Branch prediction strategies and branch target buffer design, IEEE Computer, pp. 6–22 ( January 1984).
Q. Jacobson, E. Rotenberg, and J. E. Smith, Path-based next trace prediction, Proc. 30th Ann. IEEE/ACM Intl. Symp. Microarchitecture (1997).
Rights and permissions
About this article
Cite this article
Hao, E., Chang, PY., Evers, M. et al. Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures. International Journal of Parallel Programming 26, 449–478 (1998). https://doi.org/10.1023/A:1018702632204
Issue Date:
DOI: https://doi.org/10.1023/A:1018702632204