Skip to main content
Log in

Enhancing instruction scheduling with a block-structured ISA

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

It is now generally recognized that not enough parallelism exists within the small basic blocks of most general purpose programs to satisfy high performance processors. Thus, a wide variety of techniques have been developed to exploit instruction level parallelism across basic block boundaries. In this paper we discuss some previous techniques along with their hardware and software requirements. Then we propose a new paradigm for an instruction set architecture (ISA):block-structuring. This new paradigm is presented, its hardware and software requirements are discussed and the results from a simulation study are presented. We show that a block-structured ISA utilizes both dynamic and compile-time mechanisms for exploiting instruction level parallelism and has significant performance advantages over a conventional ISA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. V. Aho, R. Sethi, and J. D. Ullman,Compilers, Principles, Techniques, and Tools, Addison-Wesley (1986).

  2. R. M. Tomasulo, An efficient algorithm for exploiting multiple arithmetic units.IBM Journal of Research and Development 11:25–33 (1977).

    Article  Google Scholar 

  3. Y. N. Patt, W. W. Hwu, and M. C. Shebanow, HPS, A new microarchitecture: rationale and introduction,Proc., 18th Ann. Workshop on Microprogramming, Asilomar, California (December 1985).

  4. Y. N. Patt, M. C. Shebanow, W. Hwu, and S. W. Melvin, A C compiler for HPSI, a highly parallel excecution engine,Proc., 19th Hawaii Int'l. Conf. on Sys Sci., Honolulu, HI, January (1986).

  5. W. W. Hwu and Y. N. Patt, HPSm, a high performance restricted data flow architecture having minimal functionality,Proc., 13th Ann. Int'l. Symp. on Computer Architecture, Tokyo (June 1986).

  6. J. A. Fisher, Trace scheduling: a technique for global microcode compaction,IEEE Trans. on Computers, Vol. C-30, No. 7 (July 1981).

  7. J. R. Ellis, Bulldog: a compiler for VLIW architectures,The MIT Press (1986).

  8. A. Nicolau, Uniform parallelism exploitation in ordinary programs,Proc. of the Int'l. Conf. on Parallel Processing (August 1985).

  9. D. Bernstein and M. Rodeh, Global instruction scheduling for superscalar machines,Proc. Conf. Prog. Language Design and Implementation (June 1991).

  10. P. Chang, S. Mahlke, W. Chen, N. Warter, and W. Hwu, IMPACT: an architectural framework for multiple-instruction-issue processors,Proc., 18th Ann. Int'l Symp. on Computer Architecture (May 1991).

  11. M. Lam, Software pipelining: an effective scheduling technique for VLIW machines,Proc. of SIGPLAN '88, pp. 318–328 (June 1988).

  12. R. Jones and V. Allen, Software pipelining: a comparison and improvement,Proc. Micro, 23 pp. 46–56 (1990).

    Google Scholar 

  13. B. Rau, D. Yen, W. Yen, and R. Rowle, The Cydra 5 departmental supercomputer,IEEE Computer, pp. 12–35 (January 1989).

  14. A. Aiken and A. Nicolau, A development environment for horizontal microcode,IEEE Transactions on Software Engineering (May 1988).

  15. S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors,Proc., 25th Ann. Int'l. Symp. on Microarchitecture, Portland (December 1992).

  16. S. Mahlke, D. Lin, W. Chen, R. Hank, and R. Bringmann, Effective compiler support for predicated execution using the hyperblock,Proc., 25th Ann. Int'l. Symp. on Microarchitecture (December 1992).

  17. N. Warter, S. Mahlke, and W. Hwu, Reverse if-conversion, Technical Report, University of IIlinois (June 1993).

  18. M. Smith, M. Lam, and M. Horowitz, Boosting beyond static scheduling in a superscalar processor,Proc., 17th Ann. Int'l. Symp. on Computer Architecture, Seattle, Washington, pp 344–353 (May 1990).

  19. W. W. Hwu and Y. N. Patt, Checkpoint repair for out-of-order execution machines,Proc. 14th Ann. Int'l. Symp. on Computer Architecture, Pittsburgh, Pennsylvania (June 1987).

  20. M. Butler and Y. Patt, A comparative performance evaluation of various state maintenance mechanisms,Proc., 26th Ann. Int'l. Symp. on Microarchitecture, Austin (December 1993).

  21. S. W. Melvin, M. C. Shebanow, and Y. N. Patt, Hardware support for large atomic units in dynamically scheduled machines,Proc., 21st Ann. Workshop on Microprogramming and Microarchitecture, San Diego, California (November 1988).

  22. S. Melvin and Y. Patt, Exploiting fine-grained parallelism through a combination of hardware and software techniques,Proc., 18th Ann. Int'l. Symp. on Computer Architecture, Toronto (May 1991).

  23. M. Franklin and G. Sohi, The expandable split window paradigm for exploiting fine-grain parallelism,Proc., 19th Ann. Int'l. Symp. on Computer Architecture, Gold Coast (June 1992).

  24. T.-Y. Yeh and Y. N. Patt, A comparison of dynamic branch predictors that use two levels of branch history,Proc., 20th Ann. Int'l. Symp. on Computer Architecture, San Diego (May 1993).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melvin, S., Patt, Y. Enhancing instruction scheduling with a block-structured ISA. Int J Parallel Prog 23, 221–243 (1995). https://doi.org/10.1007/BF02577867

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02577867

Key Words

Navigation