Skip to main content
Log in

Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution(1, 2) and predicated execution(3–9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution can completely eliminate the penalty associated with a particular branch, but requires accurate branch prediction to be effective. Predicated execution does not require accurate branch prediction to eliminate the branch penalty, but is not applicable to all branches and can increase the latencies within the program. This paper examines the performance benefit of using both mechanisms to reduce the branch execution penalty. Predicated execution is used to handle the hard-to-predict branches and speculative execution is used to handle the remaining branches. The hard-to-predict branches within the program are determined by profiling. We show that this approach can significantly reduce the branch execution penalty suffered by wide-issue processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Y. N. Patt, S. W. Melvin, W. Hwu, and M. C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, Proc. of the 18th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 109–116 (1985).

  2. S. Melvin and Y. N. Patt, Exploiting fine-grained parallelism through a combination of hardware and software techniques, Proc. of the 18th Ann. Int’l. Symp. on Computer Architecture, pp. 287–297 (1991).

  3. P. Hsu and E. Davidson, Highly concurrent scalar processing, Proc. of the 13th Ann. Int’l. Symp. on Computer Architecture (1986).

  4. B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer, IEEE Computer, 22:12–35 (January 1989).

    Article  Google Scholar 

  5. J. C. Dehnert, P. Y. T. Hsu, and J. P. Bratt, Overlapped loop support in the Cydra 5, Proc. of the 16th Ann. Int’l. Symp. on Computer Architecture, pp. 26–38 (1989).

  6. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 45–54 (1992).

  7. D. N. Pnevmatikatos and G. S. Sohi, Guarded execution and dynamic branch prediction in dynamic ILP processors, Proc. of the 21st Ann. Int’l. Symp. on Computer Architecture, pp. 120–129 (1994).

  8. G. S. Tyson, The effects of predication on branch prediction, Proc. of the 27th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 196–206 (1994).

  9. S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. of the 27th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 217–227 (1994).

  10. E. M. Riseman and C. C. Foster, The inhibition of potential parallelism by conditional jumps, IEEE Trans. on Computers, C-21(12): 1405–1411 (1972).

    Article  MATH  Google Scholar 

  11. J K. F. Lee and A. J. Smith, Branch prediction strategies and branch target buffer design, IEEE Computer, pp. 6–22 (January 1984).

  12. J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, Conversion of control dependence to data dependence, 10th Ann. ACM Symp. on Principles of Programming Languages, pp. 177–189 (1983).

  13. T.-Y Yeh and Y. N. Patt, Two-level adaptive branch prediction, Proc. of the 24th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 51–61 (1991).

  14. T.-Y. Yeh and Y. N. Patt, Alternative implementations of two-level adaptive branch prediction, Proc. of the 19th Ann. Int’l. Symp. on Computer Architecture, pp. 124–134 (1992).

  15. P.-Y. Chang, E. Hao, T.-Y. Yeh, and Y. N. Patt, Branch classification: A new mechanism for improving branch predictor performance, Proc. of the 27 th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 22–31 (1994).

  16. P. Tirumalai, M. Lee, and M. Schlanskar, Parallelization of loops with exits on pipelined architectures, Proc. Supercomputing ’90, (1990).

  17. M. G. Butler, Aggressive execution engines for surpassing single basic execution, Ph.D. thesis, University of Michigan, 1993.

    Google Scholar 

  18. R. M. Tomasulo, An efficient algorithm for exploiting multiple arithmetic units, IBM Journal of Res. and Development, 11:25–33 (January 1967).

    Article  MATH  Google Scholar 

  19. Y. Patt, W. Hwu, and M. Shebanow, HPS, a new microarchitecture: Rationale and introduction, Proc. of the 18th Ann. ACM/IEEE Int’l. Symp. on Microarchitecture, pp. 103–107 (1985).

  20. E. Sprangle and Y. Patt, Facilitating superscalar processing via a combined static/dynamic register renaming scheme, Proc. of the 27 th Ann. ACM/IEEE Int’l. Symp. Microarchitecture, pp. 143–147 (1994).

  21. S. McFarling, Combining branch predictors, Technical Report TN-36, Digital Western Research Laboratory (June 1993).

    Google Scholar 

  22. M. D. Smith, M. S. Lam, and M. A. Horowitz, Boosting beyond static scheduling in a superscalar processor, Proc. of the 17 th Ann. Int’l. Symp. on Computer Architecture, pp. 344–354 (1990).

  23. P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An architectural framework for multiple-instruction-issue processors, Proc. of the 18th Ann. Int’l. Symp. on Computer Architecture, pp. 266–275 (1991).

  24. L. Gwennap, Intel’s P6 uses decoupled superscalar design, Microprocessor Report, Vol. 9, (February 1995).

  25. L. Gwennap, PA-8000 combines complexity and speed, Microprocessor Report, Vol. 8, No. 15 (November 1994).

  26. T. Granlund and R. Kenner, Eliminating branches using a superoptimizer and the GNU C compiler, Proc. of the ACM SIGPLAN ’92 Conf. on Programming Language Design and Implementation, pp. 341–352 (1992).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, PY., Hao, E., Patt, Y.N. et al. Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution. Int J Parallel Prog 24, 209–234 (1996). https://doi.org/10.1007/BF03356749

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03356749

Key Words

Navigation