Abstract
Speculative execution is the execution of instructions before it is known whether these instructions should be executed. In the speculative execution for instruction level parallelism (ILP) processors, the concept of shadow register provides a hardware solution to maintain semantics of a program from the pollution of boosted instructions that are incorrectly predicted. In a recent study, Chang and Lai proposed a special register file based on shadow register, named conjugate register file (CRF), to support multilevel boosting in speculative execution. They also proposed a scheduling heuristic named frequency-driven scheduling to incorporate with CRF for execution. However, the ability of boosting is still constrained since the concept of register pair will force the results produced speculatively be stored in dedicated locations. Moreover, when the parallelism potential increases to tens through the advancement of hardware techniques, the heavy demand on register usage and the complexity of register file may well become a serious bottleneck for the exploitation of ILP.
In this paper, the algorithm of frequency-driven scheduling is modified by replacing the function of hardware CRF with the technique of variable renaming during compilation. The new scheduling technique, named LESS, can exploit the parallelism efficiently with limited number of registers. Moreover, since the technique can benefit ILP without any special hardware support, it can be incorporated with any other ILP architecture without changing its instruction set architecture (ISA).
Simulation results show that the performance achievable by LESS is better than other existing methods. For example, under the ILP model with an issue rate of 8, the speculative execution can achieve an increase of 34% in parallelism, as compared to 18% in CRF scheme.
Similar content being viewed by others
References
M. C. Chang and F. P. Lai. Efficient exploitation of instruction-level parallelism for superscalar processors by the conjugate register file scheme. IEEE Trans. Comput., 45(3):278-293, 1996.
B. R. Rau and J. A. Fisher. Instruction-level parallel processing: history, overview,and perspective. J. Supercomput., 7(1/2):9-50, 1993.
M. S. Schlansker and R. R. Rau. EPIC: Explicitly parallel instruction computing. IEEE Comput., 37-45, 2000.
D. I. August, D. A. Connors, S. A. Mahlke, J. W. Sias, K. M. Crozier, B. C. Cheng, P.R. Eaton, Q. B. Olaniran, and W. W. Hwu. Integrated predicated and speculative execution in the IMPACT EPIC architecture. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pp.138-149, 1998.
M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pp.46-57, 1992.
S. A. Mahlke, et al. Sentinel scheduling for VLIW and superscalar processors. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating System, pp.238-247, 1992.
H. Ando et al. Unconstrained speculative execution with predicated state buffering. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp.138-149, 1995.
M. D. Smith, M. S. Lam,and M. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pp.344-255, 1990.
S. A. Mahlke, D. C. Lin, and W. Y. Chen et al. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pp.45-54, 1992.
S. A. Mahlke, R. E. Hank, J. E. McCormick, D. I. August, and W. W. Hwu. A comparison of full and partial predicated execution support for ILP processors. In Proceedings of the 22th Annual International Symposium on Computer Architecture, pp.138-149, 1995.
J. A. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Trans. Comput., C-30:478-490, 1981.
W. W. Hwu et al. The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput., 7(1/2):229-248, 1993.
V. E. Kotov. Automatic Construction of Parallel Programs, Algorithms, Software and Hardware for Parallel Computers. Springer, Berlin, 1984.
R. Cytron, J. Ferrante, B. Rosen, M. Wegman,and F. Zadeck. Efficiently computing static single assignment and the control dependence graph. ACM Trans. Programm. Languages Syst., 13(4): 451-490, 1991.
P. P. Pineo. An efficient algorithm for the creation of single assignment forms. In Proceedings of the 29th Annual Hawaii International Conference on System Sciences, pp. 213-222, 1996.
V. H. Allan et al. Software pipelining. ACM Comput. Surveys, 27(3): pp.367-432, 1995.
D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach, 2nd ed. Morgan Maufmann, San Mateo, CA, 1996.
L. Wang and T. C. Yang. Compiler/hardware co-design for instruction boosting in ILP processors. IEE Proc. Comput. Digit. Technol., 146(6):269-274, 1999.
C. M. Chen, C. T. King, and Y. Y. Chen. Branch merging for scheduling concurrent executions of branch operations. IEE Proc. Comput. Digit. Technol., 143(6):278-293, 1996.
M. Srinivas and A. Nicolau. Analyzing the individual/combined effects of speculative and guarded execution on a superscalar architecture. In Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, pp.199-208, 1998.
J. Huck, D. Morris, J. Ross, A. Knies, H. Mulder, and R. Zahir. Introducing the IA-64 architecture. IEEE Micro, 20(5):12-23, 2000.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wang, L., Yang, T.C. On the Boosting of Instruction Scheduling by Renaming. The Journal of Supercomputing 19, 173–197 (2001). https://doi.org/10.1023/A:1011141304485
Issue Date:
DOI: https://doi.org/10.1023/A:1011141304485