Enhancing instruction scheduling with a block-structured ISA

Melvin, Stephen; Patt, Yale

doi:10.1007/BF02577867

Enhancing instruction scheduling with a block-structured ISA

Published: June 1995

Volume 23, pages 221–243, (1995)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Stephen Melvin¹ &
Yale Patt²

201 Accesses
3 Altmetric
Explore all metrics

Abstract

It is now generally recognized that not enough parallelism exists within the small basic blocks of most general purpose programs to satisfy high performance processors. Thus, a wide variety of techniques have been developed to exploit instruction level parallelism across basic block boundaries. In this paper we discuss some previous techniques along with their hardware and software requirements. Then we propose a new paradigm for an instruction set architecture (ISA):block-structuring. This new paradigm is presented, its hardware and software requirements are discussed and the results from a simulation study are presented. We show that a block-structured ISA utilizes both dynamic and compile-time mechanisms for exploiting instruction level parallelism and has significant performance advantages over a conventional ISA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

A. V. Aho, R. Sethi, and J. D. Ullman,Compilers, Principles, Techniques, and Tools, Addison-Wesley (1986).
R. M. Tomasulo, An efficient algorithm for exploiting multiple arithmetic units.IBM Journal of Research and Development 11:25–33 (1977).
Article Google Scholar
Y. N. Patt, W. W. Hwu, and M. C. Shebanow, HPS, A new microarchitecture: rationale and introduction,Proc., 18th Ann. Workshop on Microprogramming, Asilomar, California (December 1985).
Y. N. Patt, M. C. Shebanow, W. Hwu, and S. W. Melvin, A C compiler for HPSI, a highly parallel excecution engine,Proc., 19th Hawaii Int'l. Conf. on Sys Sci., Honolulu, HI, January (1986).
W. W. Hwu and Y. N. Patt, HPSm, a high performance restricted data flow architecture having minimal functionality,Proc., 13th Ann. Int'l. Symp. on Computer Architecture, Tokyo (June 1986).
J. A. Fisher, Trace scheduling: a technique for global microcode compaction,IEEE Trans. on Computers, Vol. C-30, No. 7 (July 1981).
J. R. Ellis, Bulldog: a compiler for VLIW architectures,The MIT Press (1986).
A. Nicolau, Uniform parallelism exploitation in ordinary programs,Proc. of the Int'l. Conf. on Parallel Processing (August 1985).
D. Bernstein and M. Rodeh, Global instruction scheduling for superscalar machines,Proc. Conf. Prog. Language Design and Implementation (June 1991).
P. Chang, S. Mahlke, W. Chen, N. Warter, and W. Hwu, IMPACT: an architectural framework for multiple-instruction-issue processors,Proc., 18th Ann. Int'l Symp. on Computer Architecture (May 1991).
M. Lam, Software pipelining: an effective scheduling technique for VLIW machines,Proc. of SIGPLAN '88, pp. 318–328 (June 1988).
R. Jones and V. Allen, Software pipelining: a comparison and improvement,Proc. Micro, 23 pp. 46–56 (1990).
Google Scholar
B. Rau, D. Yen, W. Yen, and R. Rowle, The Cydra 5 departmental supercomputer,IEEE Computer, pp. 12–35 (January 1989).
A. Aiken and A. Nicolau, A development environment for horizontal microcode,IEEE Transactions on Software Engineering (May 1988).
S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors,Proc., 25th Ann. Int'l. Symp. on Microarchitecture, Portland (December 1992).
S. Mahlke, D. Lin, W. Chen, R. Hank, and R. Bringmann, Effective compiler support for predicated execution using the hyperblock,Proc., 25th Ann. Int'l. Symp. on Microarchitecture (December 1992).
N. Warter, S. Mahlke, and W. Hwu, Reverse if-conversion, Technical Report, University of IIlinois (June 1993).
M. Smith, M. Lam, and M. Horowitz, Boosting beyond static scheduling in a superscalar processor,Proc., 17th Ann. Int'l. Symp. on Computer Architecture, Seattle, Washington, pp 344–353 (May 1990).
W. W. Hwu and Y. N. Patt, Checkpoint repair for out-of-order execution machines,Proc. 14th Ann. Int'l. Symp. on Computer Architecture, Pittsburgh, Pennsylvania (June 1987).
M. Butler and Y. Patt, A comparative performance evaluation of various state maintenance mechanisms,Proc., 26th Ann. Int'l. Symp. on Microarchitecture, Austin (December 1993).
S. W. Melvin, M. C. Shebanow, and Y. N. Patt, Hardware support for large atomic units in dynamically scheduled machines,Proc., 21st Ann. Workshop on Microprogramming and Microarchitecture, San Diego, California (November 1988).
S. Melvin and Y. Patt, Exploiting fine-grained parallelism through a combination of hardware and software techniques,Proc., 18th Ann. Int'l. Symp. on Computer Architecture, Toronto (May 1991).
M. Franklin and G. Sohi, The expandable split window paradigm for exploiting fine-grain parallelism,Proc., 19th Ann. Int'l. Symp. on Computer Architecture, Gold Coast (June 1992).
T.-Y. Yeh and Y. N. Patt, A comparison of dynamic branch predictors that use two levels of branch history,Proc., 20th Ann. Int'l. Symp. on Computer Architecture, San Diego (May 1993).

Download references

Author information

Authors and Affiliations

P.O. Box 2400, 94702-0400, Berkeley, California
Stephen Melvin
Department of Electrical Engineering and Computer Science, University of Michigan, 48109-2122, Ann Arbor, Michigan
Yale Patt

Authors

Stephen Melvin
View author publications
You can also search for this author in PubMed Google Scholar
Yale Patt
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melvin, S., Patt, Y. Enhancing instruction scheduling with a block-structured ISA. Int J Parallel Prog 23, 221–243 (1995). https://doi.org/10.1007/BF02577867

Download citation

Received: 08 July 1993
Issue Date: June 1995
DOI: https://doi.org/10.1007/BF02577867

Key Words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing instruction scheduling with a block-structured ISA

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Instruction set independent program encoding

Engineering an Optimized Instruction Set Architecture for AMIDAR Processors

The Architecture

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key Words

Subscribe and save

Buy Now

Enhancing instruction scheduling with a block-structured ISA

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Instruction set independent program encoding

Engineering an Optimized Instruction Set Architecture for AMIDAR Processors

The Architecture

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Subscribe and save

Buy Now