Loop parallelization is the most critical aspect of any parallelizing compiler,since most of the time spent in executing a given program is spent in executingloops. In particular, the innermost loops of a program (i.e., the loops most deeplynested in the structure of the program) are typically the most heavily executed. Itis therefore critical for any parallelizing (Instruction Level Parallelism (ILP))compiler to try to expose and exploit as much parallelism as possible from theseloops.
Historically, loop unrolling, a standard non-local transformation, is used to expose parallelism beyond iteration boundaries. When a loop is unrolled, the loop body is replicated to create a new loop. Compaction of the unrolled loop body helps exploit parallelism across iterations as the operations in the unrolled loop body come from previously separate iterations. However, in general, loops cannot be fully unrolled, both because of space considerations, and because of the fact that loop bounds are...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Bibliography
Kogge P (1977) The microprogramming of pipelined processors. In: Proceedings of the fourth international symposium on computer architecture, ACM, New York, pp 63–69
Kogge PM (1981) The architecture of pipelined computers. Hemisphere Publishing Corporation, Washington
Tokoro M, Takizuka T, Tamura E, Yamaura I (1978) A technique of global optimization of microprograms. In: Proceedings of the 11th annual workshop on microprogramming, Pacific Grove, pp 41–50
Charlesworth A (1981) An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 family. IEEE Comput 14(9):18–27
Rau BR, Glaeser CD (1981) Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Proceedings of the 14th annual workshop on microprogramming, Chatham, pp 183–198
Hsu PS (1986) Highly Concurrent Scalar Processing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign
Lam M (1988) Software pipelining: an effective scheduing technique for VLIW machines. In: Proceedings of the SIGPLAN ’88 conference on programming language design and implementation, ACM, Atlanta
Kuck D, Kuhn R, Padua D, Leasure B, Wolfe MJ (1981) Dependence graphs and compiler optimizations. In: Conference record of the eighth annual ACM symposium on the principles of programming languages, Williamsburg
Ferrante J, Ottenstein K, Warren J (1987) The program dependence graph and its use in optimization. ACM T Progr Lang Sys 9(3):319–349
Zima H, Chapman B (1991) Supercompilers for parallel and vector computers. Addison-Wesley, New York
Bellman R (1958) On a routing problem. Q Appl Math 16(1):87–90
Cormen TH, Leiserson CE, Rivest RL (1990) Introduction to algorithms. The MIT Press, Cambridge
Rau BR (1995) Iterative modulo scheduling. Technical Report HPL-94-115, Hewlett Packard Laboratories, Palo Alto
Mahlke SA, Lin DC, Chen WY, Hank RE, Bringmann RA (1992) Effective compiler support for predicated execution using the hyperblock. In: Proceedings of the 25th international symposium of microarchitecture, Portland, pp 45–54
Mahlke SA, Hank RE, Bringmann RA, Gyllenhaal JC, Gallagher DM, Hwu WW (1994) Characterizing the impact of predicated execution on branch prediction. In: Proceedings of the 27th international symposium of microarchitecture, San Jose, pp 217–227
Allen JR, Kennedy K, Porterfield C, Warren J (1983) Conversion of control dependence to data dependence. In: Conference record of the tenth annual ACM symposium on the principles of programming languages, Austin
Park J, Schlansker M (1991) On predicated execution. Technical Report 58–91, Hewlett Packard Laboratories, Palo Alto
Tirumalai P, Lee M, Schlansker M (1990) Parallelization of loops with exits on pipelined architectures. In: Proceedings of the 1990 conference on supercomputing, IEEE Computer Society / ACM, New York, pp 200–212
Mahlke SA, Chen WY, Hwu WW, Rau BR, Schlansker MS (1992) Sentinel scheduling for VLIW and superscalar processors. ACM SIGPLAN Notices, 27(9):238–247
Mahlke SA, Chen WY, Bringmann RA, Hank RE, Hwu WW, Rau BR, Schlansker MS (1993) Sentinel scheduling: a model for compiler-controlled speculative execution. ACM T Comput Syst 11(4):376–408
Milicev D, Jovanovic Z (1998) Predicated software pipelining technique for loops with conditions. In: Proceedings of the 12th international parallel processing symposium, Orlando
Feautrier P (1988) Array expansion. In: Proceedings of the 2nd international conference on supercomputing, ACM, St. Malo
Feautrier P (1991) Dataflow analysis of scalar and array references. Int J Parallel Prog 20(1):23–52
Rau BR (1992) Data flow and dependence analysis for instruction level parallelism. In: Proceedings of the fourth workshop on languages and compilers for parallel computing, Springer-Verlag, London, pp 236–250
Tiernan JC (1970) An efficient search algorithm to find the elementary circuits of a graph. Commun ACM 13(12): 722–726
Mateti P, Deo N (1976) On algorithms for enumerating all circuits of a graph. SIAM J Comput 5(1):90–99
Touzeau RF (1984) A fortran compiler for the fps-164 scientific computer. In: Proceedings of the 1984 SIGPLAN symposium on compiler construction, ACM, New York, pp 48–57
Huff RA (1993) Lifetime-sensitive modulo scheduling. In: Proceedings of the SIGPLAN ’93 conference on programming language design and implementation, Albuquerque, pp 258–267
Lawler EL (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York
Feautrier P (1994) Fine-grain scheduling under resource constraints. In: Proceedings of the seventh annual workshop on languages, compilers and compilers for parallel computers, Ithaca
Govindarajan R, Altman ER, Gao GR (1994) Minimizing register requirements under resource-constrained rate-optimal software pipelining. In: Proceedings of the 27th international symposium of microarchitecture, ACM/IEEE, San Jose, pp 85–94
Eichenberger AE, Davidson ES, Abraham SG (1995) Optimum modulo schedules for minimum register requirements. In: Proceedings of the 9th international conference on supercomputing, ICS ’95, ACM, Barcelona, pp 31–40
Altman ER, Govindrajan R, Rao GR (1995) Scheduling and mapping: software pipelining in the presence of hazards. In: Proceedings of the SIGPLAN ’95 conference on programming language design and implementation, La Jolla
de Dinechin BD (1994) Simplex scheduling: more than lifetime-sensitive instruction scheduling. In: Proceedings of the international conference on parallel architectures and compilation techniques (PACT), Montreal
Reeves CR (1993) Modern heuristic techniques for combinatorial problems. Wiley, New York
Wood G (1979) Global optimization of microprograms through modular control constructs. In: Proceedings of the 12th workshop on microprogramming, IEEE Press, Piscataway, pp 1–6
Novack S, Nicolau A (1995) Trailblazing: a hierarchical approach to percolation scheduling. Int J Parallel Prog 23(1)
Rau BR, Yen WL, Yen W, Towle A (1989) The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. IEEE Comput 22(1):12–35
Dehnert JC, Hsu PYT, Bratt JP (1989) Overlapped loop support in the Cydra 5. In: Proceedings of the third international conference on architectural support for programming languages and operating systems (ASPLOS-III), ACM, Boston, pp 26–38
Warter NJ, Bockhaus JW, Haab GE, Subramaniam K (1992) Enhanced modulo scheduling for loops with conditional branches. In: Proceedings of the 25th international symposium of microarchitecture, ACM/IEEE, Portland
Warter NJ, Partamian N (1995) Modulo scheduling with multiple mutiple initiation intervals. In: Proceedings of the 28th international symposium of microarchitecture, ACM/IEEE, Ann Arbor
Jacobs D, Prins J, Siegel P, Wilson K (1982) Monte carlo techniques in code optimization. In: Proceedings of the 15th workshop on microprogramming, IEEE Press, Piscataway, pp 143–148
De Gloria A, Faraboschi P, Olivieri M (1992) A non-deterministic scheduler for a software pipelining compiler. In: Proceedings of the 25th international symposium of microarchitecture, ACM/IEEE, Portland, pp 41–44
Gasperoni F, Schwiegelshohn U (1992) Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. In: Proceedings of the second joint international conference on vector and parallel processing: parallel processing, pp 625–636
Wang J, Eisenbeis C (1993) Decomposed software pipelining. Technical Report RR-1838, INRIA-Rocquencourt, France
Ruttenberg J, Gao GR, Stoutchinin A, Lichtenstein W (1996) Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In: Proceedings of the SIGPLAN ’96 conference on programming language design and implementation, Philadelphia, pp 1–11
Allan VH, Jones RB, Lee RM, Allan SJ (1995) Software pipelining. ACM Comput Surv 27(3):367–432
Su B, Ding S, Xia J (1986) URPR - an extension of urcr for software pipelining. In: Proceedings of the 19th workshop on microprogramming, ACM, New York
Su B, Ding S, Xia J (1984) An improvement of trace scheduling for global microcode compaction. In: Proceedings of the 17th workshop on microprogramming, New Orleans
EbcioÄŸlu K (1987) A compilation technique for software pipelining of loops with conditional jumps. In: Proceedings of the 20th workshop on microprogramming, Colarado Springs
Aiken A, Nicolau A (1988) Optimal loop parallelization. In: Proceedings of the SIGPLAN ’88 conference on programming language design and implementation, Atlanta
Aiken A, Nicolau A (1987) Perfect pipelining: a new loop parallelization technique. Technical Report 87-873, Dept. of Computer Science, Cornell University
Callahan D, Cocke J, Kennedy K (1988) Estimating interlock and improving balance for pipelined machines. J Parallel Distr Com 5(4):334–358
Aiken AS (1988) Compaction-based parallelization. PhD thesis, Dept. of Computer Science, Cornell University
Cytron R (1984) Compile-time Scheduling and Optimization for Asynchronous Machines. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Kejariwal, A., Nicolau, A. (2011). Modulo Scheduling and Loop Pipelining. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_65
Download citation
DOI: https://doi.org/10.1007/978-0-387-09766-4_65
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09765-7
Online ISBN: 978-0-387-09766-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering