Abstract
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper, we motivate the need for renaming and for predicates that reflect path information. We present Predicated Static Single Assignment (PSSA) which uses renaming and introduces Full -Path Predicates to remove false dependences and enable aggressive predicated optimization and instruction scheduling. We demonstrate the usefulness of PSSA for Predicated Speculation and Control Height Reduction. These two predicated code optimizations used during instruction scheduling reduce the dependence length of the critical paths through a predicated region. Our results show that using PSSA to enable speculation and control height reduction reduces execution time from 12 to 68%.
Similar content being viewed by others
REFERENCES
D. I. August, K. M. Crozier, J. W. Sias, P. R. Eaton, Q. B. Olaniran, D. A. Connors, and W. W. Hwu, The IMPACT EPIC 1.0 Architecture and Instruction Set reference manual. Technical Report IMPACT-98-04, IMPACT, University of Illinois (February 1998).
L. Gwennap, Intel, HP make EPIC disclosure, Microprocessor Report, 11(14):1–9 (October 1997).
Intel Press Release, Merced processor and IA-64 architecture (1998). http://developer. intel.com/design/processor/future/iaa64.htm (1998).
J. C. H. Park and M. Schlansker, On predicated execution. Technical Report HPL-91-58, HP Labs (May 1991).
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective com-piler support for predicated execution using the hyperblock, Proc. 25th Ann. Int'l. Symp. Microarchitecture, pp. 45–54 (December 1992).
R. Cytron, J. Ferrante, B. K. Rosen, M. K. Wegman, and F. K. Zadeck, An efficient method of computing static single assignment form, 16th Ann. ACM Symp. Principles Progr. Lang., pp. 25–35 (1989).
L. Carter, B. Simon, B. Calder, L. Carter, and J. Ferrante, Predicated static single assign-ment, Proc. Int'l. Conf. Parallel Architectures and Compilation Techniques, pp. 245–255 (October 1999).
James C. Dehnert and Ross A. Towle, Compiling for the Cydra 5, J. Supercomputing, 7(1-2):181–227 (May 1993).
B. Ramakrishna Rau, David W. L. Yen, Wei Yen, and Ross A. Towle, The Cydra 5 departmental supercomputer, Computer, 22(1):12–35 (January 1989).
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, Conversion of control dependence to data dependence, Proc. Tenth ACM Symp. Principles of Progr. Lang., pp. 177–189 (January 1983).
S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. 27th Ann. Int'l. Symp. Microarchitecture, pp. 217–227 (December 1994).
Trimaran, An infrastructure for research in instruction level parallelism (1998). http://www.trimaran.org.
V. Kathail, M. S. Schlansker, and B. R. Rau, HPL PlayDoh architecture specification: Version 1.0. Technical Report HPL-93-80, HP Labs (February 1994).
R. Johnson and M. Schlansker, Analysis techniques for predicated code, Proc. 29th Ann. Int'l. Symp. Microarchitecture, pp. 100–113 (December 1996).
D. M. Gillies, D. R. Ju, R. Johnson, and M. Schlansker, Global predicate analysis and its application to register allocation, Proc. 29th Ann. Int'l. Symp. Microarchitecture, pp. 114–125 (December 1996).
A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques and Tools, Addison-Wesley (1986).
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, Efficiently com-puting static single assignment form and the control dependence graph, ACM Trans. Progr. Lang. Syst., 13(4):451–490 (October 1991).
M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, California (1996).
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg, The Multiflow Trace Scheduling compiler, J. Supercomputing, 7(1-2):51–142 (May 1993).
D. I. August, D. A. Connors, S. A. Mahlke, J. W. Sias, K. M. Crozier, B. Cheng, P. R. Eaton, Q. B. Olaniran, and W. W. Hwu, Integrated predicated and speculative execution in the IMPACT EPIC architecture, Proc. 25th Int'l. Symp. on Computer Architecture, pp. 227–237 (July 1998).
M. Schlansker, V. Kathail, and S. Anik, Height reduction of control recurrences for ILP processors, Proc. 27th Ann. Int'l. Symp. Microarchitecture, pp. 40–51 (December 1994).
M. Schlansker and V. Kathail, Critical path reduction for scalar programs, Proc. 28th Ann. Int'l. Symp. Microarchitecture, pp. 57–69 (December 1995).
M. Schlansker, S. Mahlke, and R. Johnson, Control CPR: A branch height reduction optimization for EPIC architectures, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 155–168 (May 1999).
IA-64 Application Developer's Architecture Guide, Revision 1.0 (1999).
G. S. Tyson, The effects of predicated execution on branch prediction, Proc. 27th Ann. Int'l. Symp. Microarchitecture, pp. 196–206 (December 1994).
D. I. August, W. Hwu, and S. A. Mahlke, A framework for balancing control flow and predication, 30th Ann. Int'l. Symp. on Microarchitecture, pp. 92–103 (December 1997).
N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau, Reverse if-conversion, Proc. SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 290–299 (June 1993).
G. Ammons and J. R. Larus, Improving data-flow analysis with path profiles, ACM SIGPLAN Notices, 33(5):72–84 (May 1998).
T. Ball and J. R. Larus, Efficient path profiling, Proc. 29th Ann. Int'l. Symp. Microarchitecture, pp. 46–57 (December 1996).
R. Gupta, D. A. Berson, and J. Z. Fang, Path profile guided partial dead code elimation using predication, Proc. Int'l. Conf. Parallel Architectures and Compilation Techniques, pp. 102–113 (November 1997).
S. Moon and K. Ebciogğlu, Parallelizing nonnumerical code with selective scheduling and software pipelining, ACM Trans. Progr. Lang. Syst., 19(6):853–898 (November 1997).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Carter, L., Simon, B., Calder, B. et al. Path Analysis and Renaming for Predicated Instruction Scheduling. International Journal of Parallel Programming 28, 563–588 (2000). https://doi.org/10.1023/A:1007512717742
Issue Date:
DOI: https://doi.org/10.1023/A:1007512717742