Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution

Chang, Po-Yung; Hao, Eric; Patt, Yale N.; Chang, Pohua P.

doi:10.1007/BF03356749

Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution

Published: 26 May 2016

Volume 24, pages 209–234, (1996)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Po-Yung Chang¹,
Eric Hao¹,
Yale N. Patt¹ &
…
Pohua P. Chang²

37 Accesses
4 Citations
Explore all metrics

Abstract

Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution^{(1, 2)} and predicated execution^(3–9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution can completely eliminate the penalty associated with a particular branch, but requires accurate branch prediction to be effective. Predicated execution does not require accurate branch prediction to eliminate the branch penalty, but is not applicable to all branches and can increase the latencies within the program. This paper examines the performance benefit of using both mechanisms to reduce the branch execution penalty. Predicated execution is used to handle the hard-to-predict branches and speculative execution is used to handle the remaining branches. The hard-to-predict branches within the program are determined by profiling. We show that this approach can significantly reduce the branch execution penalty suffered by wide-issue processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Instruction Filter for Time-Predictable Code Execution on Standard Processors

Dynamically Spawning Speculative Threads to Improve Speculative Path Execution

Patmos: a time-predictable microprocessor

Article 23 February 2018

References

Y. N. Patt, S. W. Melvin, W. Hwu, and M. C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, Proc. of the 18th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 109–116 (1985).
S. Melvin and Y. N. Patt, Exploiting fine-grained parallelism through a combination of hardware and software techniques, Proc. of the 18th Ann. Int’l. Symp. on Computer Architecture, pp. 287–297 (1991).
P. Hsu and E. Davidson, Highly concurrent scalar processing, Proc. of the 13th Ann. Int’l. Symp. on Computer Architecture (1986).
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer, IEEE Computer, 22:12–35 (January 1989).
Article Google Scholar
J. C. Dehnert, P. Y. T. Hsu, and J. P. Bratt, Overlapped loop support in the Cydra 5, Proc. of the 16th Ann. Int’l. Symp. on Computer Architecture, pp. 26–38 (1989).
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 45–54 (1992).
D. N. Pnevmatikatos and G. S. Sohi, Guarded execution and dynamic branch prediction in dynamic ILP processors, Proc. of the 21st Ann. Int’l. Symp. on Computer Architecture, pp. 120–129 (1994).
G. S. Tyson, The effects of predication on branch prediction, Proc. of the 27th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 196–206 (1994).
S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. of the 27th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 217–227 (1994).
E. M. Riseman and C. C. Foster, The inhibition of potential parallelism by conditional jumps, IEEE Trans. on Computers, C-21(12): 1405–1411 (1972).
Article MATH Google Scholar
J K. F. Lee and A. J. Smith, Branch prediction strategies and branch target buffer design, IEEE Computer, pp. 6–22 (January 1984).
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, Conversion of control dependence to data dependence, 10th Ann. ACM Symp. on Principles of Programming Languages, pp. 177–189 (1983).
T.-Y Yeh and Y. N. Patt, Two-level adaptive branch prediction, Proc. of the 24th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 51–61 (1991).
T.-Y. Yeh and Y. N. Patt, Alternative implementations of two-level adaptive branch prediction, Proc. of the 19th Ann. Int’l. Symp. on Computer Architecture, pp. 124–134 (1992).
P.-Y. Chang, E. Hao, T.-Y. Yeh, and Y. N. Patt, Branch classification: A new mechanism for improving branch predictor performance, Proc. of the 27 th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 22–31 (1994).
P. Tirumalai, M. Lee, and M. Schlanskar, Parallelization of loops with exits on pipelined architectures, Proc. Supercomputing ’90, (1990).
M. G. Butler, Aggressive execution engines for surpassing single basic execution, Ph.D. thesis, University of Michigan, 1993.
Google Scholar
R. M. Tomasulo, An efficient algorithm for exploiting multiple arithmetic units, IBM Journal of Res. and Development, 11:25–33 (January 1967).
Article MATH Google Scholar
Y. Patt, W. Hwu, and M. Shebanow, HPS, a new microarchitecture: Rationale and introduction, Proc. of the 18th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 103–107 (1985).
E. Sprangle and Y. Patt, Facilitating superscalar processing via a combined static/dynamic register renaming scheme, Proc. of the 27 th Ann. ACM/IEEE Int’l. Symp. Microarchitecture, pp. 143–147 (1994).
S. McFarling, Combining branch predictors, Technical Report TN-36, Digital Western Research Laboratory (June 1993).
Google Scholar
M. D. Smith, M. S. Lam, and M. A. Horowitz, Boosting beyond static scheduling in a superscalar processor, Proc. of the 17 th Ann. Int’l. Symp. on Computer Architecture, pp. 344–354 (1990).
P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An architectural framework for multiple-instruction-issue processors, Proc. of the 18th Ann. Int’l. Symp. on Computer Architecture, pp. 266–275 (1991).
L. Gwennap, Intel’s P6 uses decoupled superscalar design, Microprocessor Report, Vol. 9, (February 1995).
L. Gwennap, PA-8000 combines complexity and speed, Microprocessor Report, Vol. 8, No. 15 (November 1994).
T. Granlund and R. Kenner, Eliminating branches using a superoptimizer and the GNU C compiler, Proc. of the ACM SIGPLAN ’92 Conf. on Programming Language Design and Implementation, pp. 341–352 (1992).

Download references

Author information

Authors and Affiliations

The University of Michigan, Ann Arbor, Michigan, USA
Po-Yung Chang, Eric Hao & Yale N. Patt
Intel Architecture Laboratory, Santa Clara, California, USA
Pohua P. Chang

Authors

Po-Yung Chang
View author publications
You can also search for this author in PubMed Google Scholar
Eric Hao
View author publications
You can also search for this author in PubMed Google Scholar
Yale N. Patt
View author publications
You can also search for this author in PubMed Google Scholar
Pohua P. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, PY., Hao, E., Patt, Y.N. et al. Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution. Int J Parallel Prog 24, 209–234 (1996). https://doi.org/10.1007/BF03356749

Download citation

Published: 26 May 2016
Issue Date: June 1996
DOI: https://doi.org/10.1007/BF03356749

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution

Abstract

Access this article

Similar content being viewed by others

An Instruction Filter for Time-Predictable Code Execution on Standard Processors

Dynamically Spawning Speculative Threads to Improve Speculative Path Execution

Patmos: a time-predictable microprocessor

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key Words

Navigation

Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution

Abstract

Access this article

Similar content being viewed by others

An Instruction Filter for Time-Predictable Code Execution on Standard Processors

Dynamically Spawning Speculative Threads to Improve Speculative Path Execution

Patmos: a time-predictable microprocessor

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation