Abstract
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with finite resources and for general dependence graphs. Measurements on a benchmark suite of 1327 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels show that the register requirements decrease by 24.8% on average when applying the optimal stage scheduler to the MRT-schedules of a register-insensitive modulo scheduler.
Similar content being viewed by others
References
B. R. Rau and J. A. Fisher, Instruction-level parallel processing: History, overview, and perspective, The Journal of Supercomputing, 7:9–50 (1993).
P. Y. Hsu, Highly Concurrent Scalar Processing. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1986.
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. of the Fourteenth Ann. Workshop on Microprogramming, pp. 183–198 (October 1981).
M. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. of the ACM SIGPLAN ’88 Conf. on Programming Language Design and Implementation, pp. 318–328 (June 1988).
N. J. Warter, G. E. Haab, K. Subramanian, and J. W. Bockhaus, Enhanced Modulo Scheduling for loops with conditional branches, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, pp. 170–179, (December 1992).
N. J. Warter, Modulo Scheduling with Isomorphic Control Transformations, Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1994.
P. P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. of Supercomputing ’90, pp. 200–212 (November 1990).
B. R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker, Register allocation for software pipelined loops, Proc. of the ACM SIGPLAN ’92 Conf. on Programming Language Design and Implementation, pp. 283–299 (June 1992).
W. Mangione-Smith, S. G. Abraham, and E. S. Davidson, Register requirements of pipelined processors, Proc. of the Int. Conf. on Supercomputing, pp. 260–271 (July 1992).
C. Eisenbeis and D. Windheiser, Optimal software pipelining in presence of resource constraints, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (August 1993).
R. A. Huff, Lifetime-sensitive modulo scheduling, Proc. of the ACM SIGPLAN ’93 Conf. on Programming Language Design and Implementation, pp. 258–267 (June 1993).
J. H. Patel and E. S. Davidson, Improving the throughput of a pipeline by insertion of delays, Proc. of the Third Ann. Int. Symp. on Computer Architecture, pp. 159–164 (1976).
C. Eisenbeis, W. Jalby, and A. Lichnewsky, Squeezing more performance out of a Cray-2 by vector block scheduling, Proc. of Supercomputing ’88, pp. 237–246 (November 1988).
G. R. Beck, D. W. L. Yen, and T. L. Anderson, The Cydra 5 mini-supercomputer: Architecture and implementation, The Journal of Supercomputing, 7:143–180 (1993).
A. E. Eichenberger and E. S. Davidson, Stage scheduling: A technique to reduce the register requirements of a modulo schedule, Proc. of the 28th Ann. Int. Symp. on Microarchitecture, pp. 338–349 (December 1995).
Q. Ning and G. R. Gao, A novel framework of register allocation for software pipelining, Proc. of the 20th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pp. 29–42 (1993).
R. Govindarajan, E. R. Altman, and G. R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proc. of the 27th Ann. Int. Symp. on Microarchitecture, pp. 85–94 (November 1994).
J. Wang, A. Krall, and M. A. Ertl, Decomposed software pipelining with reduced register requirement, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (June 1995).
Dupont de Dinechin, Simplex scheduling: More than lifetime-sensitive instruction scheduling, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (1994).
A. E Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. of the Int. Conf. on Supercomputing, pp. 31–40 (July 1995).
J. Llosa, M. Valero, E. Ayguadé, and A. González, Hypernode reduction modulo scheduling, Proc. of the 28th Ann. Int. Symp. on Microarchitecture, pp. 350–360 (November 1995).
S. Chaudhuri, R. A. Walker, and J. E. Mitchell, Analyzing and exploiting the structure of the constraints in the ILP approach to the scheduling problem, IEEE Transaction on Very Large Scale Integration 2(4):456–471 (December 1994).
K. Paton, An algorithm for finding a fundamental set of cycles of a graph, Comm. ACM 12(9):514–518 (September 1969).
B. R. Rau, Iterative Modulo Scheduling: An algorithm for software pipelining loops, Proc. 27th Ann. Int. Symp. on Microarchitecture, pp. 63–74 (November 1994).
M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Samah, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, L. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, and J. Martin, The Perfect Club Benchmarks: Effective performance evaluation of supercomputers, Int. J. of Supercomputer Applications, 3(3):5–40 (Fall 1989).
J. Uniejewski, SPEC Benchmark Suite: Designed for today’s advanced system, SPEC Newsletter (Fall 1989).
F. H. McMahon, The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, California, 1986.
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing, 7:181–227 (1993).
G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization, John Wiley & Sons (1988).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eichenberger, A.E., Davidson, E.S. & Abraham, S.G. Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling. Int J Parallel Prog 24, 103–132 (1996). https://doi.org/10.1007/BF03356744
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF03356744