Modulo Scheduling and Loop Pipelining

Kejariwal, Arun; Nicolau, Alexandru

doi:10.1007/978-0-387-09766-4_65

Arun Kejariwal Dr.² &
Alexandru Nicolau Professor³

720 Accesses
1 Citations

Loop parallelization is the most critical aspect of any parallelizing compiler,since most of the time spent in executing a given program is spent in executingloops. In particular, the innermost loops of a program (i.e., the loops most deeplynested in the structure of the program) are typically the most heavily executed. Itis therefore critical for any parallelizing (Instruction Level Parallelism (ILP))compiler to try to expose and exploit as much parallelism as possible from theseloops.

Historically, loop unrolling, a standard non-local transformation, is used to expose parallelism beyond iteration boundaries. When a loop is unrolled, the loop body is replicated to create a new loop. Compaction of the unrolled loop body helps exploit parallelism across iterations as the operations in the unrolled loop body come from previously separate iterations. However, in general, loops cannot be fully unrolled, both because of space considerations, and because of the fact that loop bounds are...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,600.00; Price excludes VAT (USA)

Hardcover Book: USD 1,799.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

Kogge P (1977) The microprogramming of pipelined processors. In: Proceedings of the fourth international symposium on computer architecture, ACM, New York, pp 63–69
Google Scholar
Kogge PM (1981) The architecture of pipelined computers. Hemisphere Publishing Corporation, Washington
MATH Google Scholar
Tokoro M, Takizuka T, Tamura E, Yamaura I (1978) A technique of global optimization of microprograms. In: Proceedings of the 11th annual workshop on microprogramming, Pacific Grove, pp 41–50
Google Scholar
Charlesworth A (1981) An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 family. IEEE Comput 14(9):18–27
Article Google Scholar
Rau BR, Glaeser CD (1981) Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Proceedings of the 14th annual workshop on microprogramming, Chatham, pp 183–198
Google Scholar
Hsu PS (1986) Highly Concurrent Scalar Processing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign
Google Scholar
Lam M (1988) Software pipelining: an effective scheduing technique for VLIW machines. In: Proceedings of the SIGPLAN ’88 conference on programming language design and implementation, ACM, Atlanta
Google Scholar
Kuck D, Kuhn R, Padua D, Leasure B, Wolfe MJ (1981) Dependence graphs and compiler optimizations. In: Conference record of the eighth annual ACM symposium on the principles of programming languages, Williamsburg
Google Scholar
Ferrante J, Ottenstein K, Warren J (1987) The program dependence graph and its use in optimization. ACM T Progr Lang Sys 9(3):319–349
Article MATH Google Scholar
Zima H, Chapman B (1991) Supercompilers for parallel and vector computers. Addison-Wesley, New York
Google Scholar
Bellman R (1958) On a routing problem. Q Appl Math 16(1):87–90
MATH Google Scholar
Cormen TH, Leiserson CE, Rivest RL (1990) Introduction to algorithms. The MIT Press, Cambridge
MATH Google Scholar
Rau BR (1995) Iterative modulo scheduling. Technical Report HPL-94-115, Hewlett Packard Laboratories, Palo Alto
Google Scholar
Mahlke SA, Lin DC, Chen WY, Hank RE, Bringmann RA (1992) Effective compiler support for predicated execution using the hyperblock. In: Proceedings of the 25th international symposium of microarchitecture, Portland, pp 45–54
Chapter Google Scholar
Mahlke SA, Hank RE, Bringmann RA, Gyllenhaal JC, Gallagher DM, Hwu WW (1994) Characterizing the impact of predicated execution on branch prediction. In: Proceedings of the 27th international symposium of microarchitecture, San Jose, pp 217–227
Chapter Google Scholar
Allen JR, Kennedy K, Porterfield C, Warren J (1983) Conversion of control dependence to data dependence. In: Conference record of the tenth annual ACM symposium on the principles of programming languages, Austin
Google Scholar
Park J, Schlansker M (1991) On predicated execution. Technical Report 58–91, Hewlett Packard Laboratories, Palo Alto
Google Scholar
Tirumalai P, Lee M, Schlansker M (1990) Parallelization of loops with exits on pipelined architectures. In: Proceedings of the 1990 conference on supercomputing, IEEE Computer Society / ACM, New York, pp 200–212
Google Scholar
Mahlke SA, Chen WY, Hwu WW, Rau BR, Schlansker MS (1992) Sentinel scheduling for VLIW and superscalar processors. ACM SIGPLAN Notices, 27(9):238–247
Article Google Scholar
Mahlke SA, Chen WY, Bringmann RA, Hank RE, Hwu WW, Rau BR, Schlansker MS (1993) Sentinel scheduling: a model for compiler-controlled speculative execution. ACM T Comput Syst 11(4):376–408
Article Google Scholar
Milicev D, Jovanovic Z (1998) Predicated software pipelining technique for loops with conditions. In: Proceedings of the 12th international parallel processing symposium, Orlando
Google Scholar
Feautrier P (1988) Array expansion. In: Proceedings of the 2nd international conference on supercomputing, ACM, St. Malo
Google Scholar
Feautrier P (1991) Dataflow analysis of scalar and array references. Int J Parallel Prog 20(1):23–52
Article MATH Google Scholar
Rau BR (1992) Data flow and dependence analysis for instruction level parallelism. In: Proceedings of the fourth workshop on languages and compilers for parallel computing, Springer-Verlag, London, pp 236–250
Google Scholar
Tiernan JC (1970) An efficient search algorithm to find the elementary circuits of a graph. Commun ACM 13(12): 722–726
Article MATH MathSciNet Google Scholar
Mateti P, Deo N (1976) On algorithms for enumerating all circuits of a graph. SIAM J Comput 5(1):90–99
Article MATH MathSciNet Google Scholar
Touzeau RF (1984) A fortran compiler for the fps-164 scientific computer. In: Proceedings of the 1984 SIGPLAN symposium on compiler construction, ACM, New York, pp 48–57
Google Scholar
Huff RA (1993) Lifetime-sensitive modulo scheduling. In: Proceedings of the SIGPLAN ’93 conference on programming language design and implementation, Albuquerque, pp 258–267
Chapter Google Scholar
Lawler EL (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York
MATH Google Scholar
Feautrier P (1994) Fine-grain scheduling under resource constraints. In: Proceedings of the seventh annual workshop on languages, compilers and compilers for parallel computers, Ithaca
Google Scholar
Govindarajan R, Altman ER, Gao GR (1994) Minimizing register requirements under resource-constrained rate-optimal software pipelining. In: Proceedings of the 27th international symposium of microarchitecture, ACM/IEEE, San Jose, pp 85–94
Chapter Google Scholar
Eichenberger AE, Davidson ES, Abraham SG (1995) Optimum modulo schedules for minimum register requirements. In: Proceedings of the 9th international conference on supercomputing, ICS ’95, ACM, Barcelona, pp 31–40
Chapter Google Scholar
Altman ER, Govindrajan R, Rao GR (1995) Scheduling and mapping: software pipelining in the presence of hazards. In: Proceedings of the SIGPLAN ’95 conference on programming language design and implementation, La Jolla
Google Scholar
de Dinechin BD (1994) Simplex scheduling: more than lifetime-sensitive instruction scheduling. In: Proceedings of the international conference on parallel architectures and compilation techniques (PACT), Montreal
Google Scholar
Reeves CR (1993) Modern heuristic techniques for combinatorial problems. Wiley, New York
MATH Google Scholar
Wood G (1979) Global optimization of microprograms through modular control constructs. In: Proceedings of the 12th workshop on microprogramming, IEEE Press, Piscataway, pp 1–6
Google Scholar
Novack S, Nicolau A (1995) Trailblazing: a hierarchical approach to percolation scheduling. Int J Parallel Prog 23(1)
Google Scholar
Rau BR, Yen WL, Yen W, Towle A (1989) The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. IEEE Comput 22(1):12–35
Article Google Scholar
Dehnert JC, Hsu PYT, Bratt JP (1989) Overlapped loop support in the Cydra 5. In: Proceedings of the third international conference on architectural support for programming languages and operating systems (ASPLOS-III), ACM, Boston, pp 26–38
Chapter Google Scholar
Warter NJ, Bockhaus JW, Haab GE, Subramaniam K (1992) Enhanced modulo scheduling for loops with conditional branches. In: Proceedings of the 25th international symposium of microarchitecture, ACM/IEEE, Portland
Google Scholar
Warter NJ, Partamian N (1995) Modulo scheduling with multiple mutiple initiation intervals. In: Proceedings of the 28th international symposium of microarchitecture, ACM/IEEE, Ann Arbor
Google Scholar
Jacobs D, Prins J, Siegel P, Wilson K (1982) Monte carlo techniques in code optimization. In: Proceedings of the 15th workshop on microprogramming, IEEE Press, Piscataway, pp 143–148
Google Scholar
De Gloria A, Faraboschi P, Olivieri M (1992) A non-deterministic scheduler for a software pipelining compiler. In: Proceedings of the 25th international symposium of microarchitecture, ACM/IEEE, Portland, pp 41–44
Chapter Google Scholar
Gasperoni F, Schwiegelshohn U (1992) Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. In: Proceedings of the second joint international conference on vector and parallel processing: parallel processing, pp 625–636
Google Scholar
Wang J, Eisenbeis C (1993) Decomposed software pipelining. Technical Report RR-1838, INRIA-Rocquencourt, France
Google Scholar
Ruttenberg J, Gao GR, Stoutchinin A, Lichtenstein W (1996) Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In: Proceedings of the SIGPLAN ’96 conference on programming language design and implementation, Philadelphia, pp 1–11
Chapter Google Scholar
Allan VH, Jones RB, Lee RM, Allan SJ (1995) Software pipelining. ACM Comput Surv 27(3):367–432
Article Google Scholar
Su B, Ding S, Xia J (1986) URPR - an extension of urcr for software pipelining. In: Proceedings of the 19th workshop on microprogramming, ACM, New York
Google Scholar
Su B, Ding S, Xia J (1984) An improvement of trace scheduling for global microcode compaction. In: Proceedings of the 17th workshop on microprogramming, New Orleans
Google Scholar
Ebcioğlu K (1987) A compilation technique for software pipelining of loops with conditional jumps. In: Proceedings of the 20th workshop on microprogramming, Colarado Springs
Google Scholar
Aiken A, Nicolau A (1988) Optimal loop parallelization. In: Proceedings of the SIGPLAN ’88 conference on programming language design and implementation, Atlanta
Google Scholar
Aiken A, Nicolau A (1987) Perfect pipelining: a new loop parallelization technique. Technical Report 87-873, Dept. of Computer Science, Cornell University
Google Scholar
Callahan D, Cocke J, Kennedy K (1988) Estimating interlock and improving balance for pipelined machines. J Parallel Distr Com 5(4):334–358
Article Google Scholar
Aiken AS (1988) Compaction-based parallelization. PhD thesis, Dept. of Computer Science, Cornell University
Google Scholar
Cytron R (1984) Compile-time Scheduling and Optimization for Asynchronous Machines. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign
Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo! Inc., 701 First Avenue, Mail Stop F-301, 94089, Sunnyvale, CA, USA
Arun Kejariwal Dr.
Department of Computer Science, University of California Irvine, 444 CS, 92697-3425, Irvine, CA, USA
Alexandru Nicolau Professor

Authors

Arun Kejariwal Dr.
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Nicolau Professor
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Kejariwal, A., Nicolau, A. (2011). Modulo Scheduling and Loop Pipelining. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_65

Download citation

DOI: https://doi.org/10.1007/978-0-387-09766-4_65
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09765-7
Online ISBN: 978-0-387-09766-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics