Skip to main content

Modulo Scheduling and Loop Pipelining

  • Reference work entry
Encyclopedia of Parallel Computing

Loop parallelization is the most critical aspect of any parallelizing compiler,since most of the time spent in executing a given program is spent in executingloops. In particular, the innermost loops of a program (i.e., the loops most deeplynested in the structure of the program) are typically the most heavily executed. Itis therefore critical for any parallelizing (Instruction Level Parallelism (ILP))compiler to try to expose and exploit as much parallelism as possible from theseloops.

Historically, loop unrolling, a standard non-local transformation, is used to expose parallelism beyond iteration boundaries. When a loop is unrolled, the loop body is replicated to create a new loop. Compaction of the unrolled loop body helps exploit parallelism across iterations as the operations in the unrolled loop body come from previously separate iterations. However, in general, loops cannot be fully unrolled, both because of space considerations, and because of the fact that loop bounds are...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 1,600.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,799.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

  1. Kogge P (1977) The microprogramming of pipelined processors. In: Proceedings of the fourth international symposium on computer architecture, ACM, New York, pp 63–69

    Google Scholar 

  2. Kogge PM (1981) The architecture of pipelined computers. Hemisphere Publishing Corporation, Washington

    MATH  Google Scholar 

  3. Tokoro M, Takizuka T, Tamura E, Yamaura I (1978) A technique of global optimization of microprograms. In: Proceedings of the 11th annual workshop on microprogramming, Pacific Grove, pp 41–50

    Google Scholar 

  4. Charlesworth A (1981) An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 family. IEEE Comput 14(9):18–27

    Article  Google Scholar 

  5. Rau BR, Glaeser CD (1981) Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Proceedings of the 14th annual workshop on microprogramming, Chatham, pp 183–198

    Google Scholar 

  6. Hsu PS (1986) Highly Concurrent Scalar Processing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign

    Google Scholar 

  7. Lam M (1988) Software pipelining: an effective scheduing technique for VLIW machines. In: Proceedings of the SIGPLAN ’88 conference on programming language design and implementation, ACM, Atlanta

    Google Scholar 

  8. Kuck D, Kuhn R, Padua D, Leasure B, Wolfe MJ (1981) Dependence graphs and compiler optimizations. In: Conference record of the eighth annual ACM symposium on the principles of programming languages, Williamsburg

    Google Scholar 

  9. Ferrante J, Ottenstein K, Warren J (1987) The program dependence graph and its use in optimization. ACM T Progr Lang Sys 9(3):319–349

    Article  MATH  Google Scholar 

  10. Zima H, Chapman B (1991) Supercompilers for parallel and vector computers. Addison-Wesley, New York

    Google Scholar 

  11. Bellman R (1958) On a routing problem. Q Appl Math 16(1):87–90

    MATH  Google Scholar 

  12. Cormen TH, Leiserson CE, Rivest RL (1990) Introduction to algorithms. The MIT Press, Cambridge

    MATH  Google Scholar 

  13. Rau BR (1995) Iterative modulo scheduling. Technical Report HPL-94-115, Hewlett Packard Laboratories, Palo Alto

    Google Scholar 

  14. Mahlke SA, Lin DC, Chen WY, Hank RE, Bringmann RA (1992) Effective compiler support for predicated execution using the hyperblock. In: Proceedings of the 25th international symposium of microarchitecture, Portland, pp 45–54

    Chapter  Google Scholar 

  15. Mahlke SA, Hank RE, Bringmann RA, Gyllenhaal JC, Gallagher DM, Hwu WW (1994) Characterizing the impact of predicated execution on branch prediction. In: Proceedings of the 27th international symposium of microarchitecture, San Jose, pp 217–227

    Chapter  Google Scholar 

  16. Allen JR, Kennedy K, Porterfield C, Warren J (1983) Conversion of control dependence to data dependence. In: Conference record of the tenth annual ACM symposium on the principles of programming languages, Austin

    Google Scholar 

  17. Park J, Schlansker M (1991) On predicated execution. Technical Report 58–91, Hewlett Packard Laboratories, Palo Alto

    Google Scholar 

  18. Tirumalai P, Lee M, Schlansker M (1990) Parallelization of loops with exits on pipelined architectures. In: Proceedings of the 1990 conference on supercomputing, IEEE Computer Society / ACM, New York, pp 200–212

    Google Scholar 

  19. Mahlke SA, Chen WY, Hwu WW, Rau BR, Schlansker MS (1992) Sentinel scheduling for VLIW and superscalar processors. ACM SIGPLAN Notices, 27(9):238–247

    Article  Google Scholar 

  20. Mahlke SA, Chen WY, Bringmann RA, Hank RE, Hwu WW, Rau BR, Schlansker MS (1993) Sentinel scheduling: a model for compiler-controlled speculative execution. ACM T Comput Syst 11(4):376–408

    Article  Google Scholar 

  21. Milicev D, Jovanovic Z (1998) Predicated software pipelining technique for loops with conditions. In: Proceedings of the 12th international parallel processing symposium, Orlando

    Google Scholar 

  22. Feautrier P (1988) Array expansion. In: Proceedings of the 2nd international conference on supercomputing, ACM, St. Malo

    Google Scholar 

  23. Feautrier P (1991) Dataflow analysis of scalar and array references. Int J Parallel Prog 20(1):23–52

    Article  MATH  Google Scholar 

  24. Rau BR (1992) Data flow and dependence analysis for instruction level parallelism. In: Proceedings of the fourth workshop on languages and compilers for parallel computing, Springer-Verlag, London, pp 236–250

    Google Scholar 

  25. Tiernan JC (1970) An efficient search algorithm to find the elementary circuits of a graph. Commun ACM 13(12): 722–726

    Article  MATH  MathSciNet  Google Scholar 

  26. Mateti P, Deo N (1976) On algorithms for enumerating all circuits of a graph. SIAM J Comput 5(1):90–99

    Article  MATH  MathSciNet  Google Scholar 

  27. Touzeau RF (1984) A fortran compiler for the fps-164 scientific computer. In: Proceedings of the 1984 SIGPLAN symposium on compiler construction, ACM, New York, pp 48–57

    Google Scholar 

  28. Huff RA (1993) Lifetime-sensitive modulo scheduling. In: Proceedings of the SIGPLAN ’93 conference on programming language design and implementation, Albuquerque, pp 258–267

    Chapter  Google Scholar 

  29. Lawler EL (1976) Combinatorial optimization: networks and matroids. Holt, Rinehart and Winston, New York

    MATH  Google Scholar 

  30. Feautrier P (1994) Fine-grain scheduling under resource constraints. In: Proceedings of the seventh annual workshop on languages, compilers and compilers for parallel computers, Ithaca

    Google Scholar 

  31. Govindarajan R, Altman ER, Gao GR (1994) Minimizing register requirements under resource-constrained rate-optimal software pipelining. In: Proceedings of the 27th international symposium of microarchitecture, ACM/IEEE, San Jose, pp 85–94

    Chapter  Google Scholar 

  32. Eichenberger AE, Davidson ES, Abraham SG (1995) Optimum modulo schedules for minimum register requirements. In: Proceedings of the 9th international conference on supercomputing, ICS ’95, ACM, Barcelona, pp 31–40

    Chapter  Google Scholar 

  33. Altman ER, Govindrajan R, Rao GR (1995) Scheduling and mapping: software pipelining in the presence of hazards. In: Proceedings of the SIGPLAN ’95 conference on programming language design and implementation, La Jolla

    Google Scholar 

  34. de Dinechin BD (1994) Simplex scheduling: more than lifetime-sensitive instruction scheduling. In: Proceedings of the international conference on parallel architectures and compilation techniques (PACT), Montreal

    Google Scholar 

  35. Reeves CR (1993) Modern heuristic techniques for combinatorial problems. Wiley, New York

    MATH  Google Scholar 

  36. Wood G (1979) Global optimization of microprograms through modular control constructs. In: Proceedings of the 12th workshop on microprogramming, IEEE Press, Piscataway, pp 1–6

    Google Scholar 

  37. Novack S, Nicolau A (1995) Trailblazing: a hierarchical approach to percolation scheduling. Int J Parallel Prog 23(1)

    Google Scholar 

  38. Rau BR, Yen WL, Yen W, Towle A (1989) The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. IEEE Comput 22(1):12–35

    Article  Google Scholar 

  39. Dehnert JC, Hsu PYT, Bratt JP (1989) Overlapped loop support in the Cydra 5. In: Proceedings of the third international conference on architectural support for programming languages and operating systems (ASPLOS-III), ACM, Boston, pp 26–38

    Chapter  Google Scholar 

  40. Warter NJ, Bockhaus JW, Haab GE, Subramaniam K (1992) Enhanced modulo scheduling for loops with conditional branches. In: Proceedings of the 25th international symposium of microarchitecture, ACM/IEEE, Portland

    Google Scholar 

  41. Warter NJ, Partamian N (1995) Modulo scheduling with multiple mutiple initiation intervals. In: Proceedings of the 28th international symposium of microarchitecture, ACM/IEEE, Ann Arbor

    Google Scholar 

  42. Jacobs D, Prins J, Siegel P, Wilson K (1982) Monte carlo techniques in code optimization. In: Proceedings of the 15th workshop on microprogramming, IEEE Press, Piscataway, pp 143–148

    Google Scholar 

  43. De Gloria A, Faraboschi P, Olivieri M (1992) A non-deterministic scheduler for a software pipelining compiler. In: Proceedings of the 25th international symposium of microarchitecture, ACM/IEEE, Portland, pp 41–44

    Chapter  Google Scholar 

  44. Gasperoni F, Schwiegelshohn U (1992) Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. In: Proceedings of the second joint international conference on vector and parallel processing: parallel processing, pp 625–636

    Google Scholar 

  45. Wang J, Eisenbeis C (1993) Decomposed software pipelining. Technical Report RR-1838, INRIA-Rocquencourt, France

    Google Scholar 

  46. Ruttenberg J, Gao GR, Stoutchinin A, Lichtenstein W (1996) Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In: Proceedings of the SIGPLAN ’96 conference on programming language design and implementation, Philadelphia, pp 1–11

    Chapter  Google Scholar 

  47. Allan VH, Jones RB, Lee RM, Allan SJ (1995) Software pipelining. ACM Comput Surv 27(3):367–432

    Article  Google Scholar 

  48. Su B, Ding S, Xia J (1986) URPR - an extension of urcr for software pipelining. In: Proceedings of the 19th workshop on microprogramming, ACM, New York

    Google Scholar 

  49. Su B, Ding S, Xia J (1984) An improvement of trace scheduling for global microcode compaction. In: Proceedings of the 17th workshop on microprogramming, New Orleans

    Google Scholar 

  50. EbcioÄŸlu K (1987) A compilation technique for software pipelining of loops with conditional jumps. In: Proceedings of the 20th workshop on microprogramming, Colarado Springs

    Google Scholar 

  51. Aiken A, Nicolau A (1988) Optimal loop parallelization. In: Proceedings of the SIGPLAN ’88 conference on programming language design and implementation, Atlanta

    Google Scholar 

  52. Aiken A, Nicolau A (1987) Perfect pipelining: a new loop parallelization technique. Technical Report 87-873, Dept. of Computer Science, Cornell University

    Google Scholar 

  53. Callahan D, Cocke J, Kennedy K (1988) Estimating interlock and improving balance for pipelined machines. J Parallel Distr Com 5(4):334–358

    Article  Google Scholar 

  54. Aiken AS (1988) Compaction-based parallelization. PhD thesis, Dept. of Computer Science, Cornell University

    Google Scholar 

  55. Cytron R (1984) Compile-time Scheduling and Optimization for Asynchronous Machines. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Kejariwal, A., Nicolau, A. (2011). Modulo Scheduling and Loop Pipelining. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_65

Download citation

Publish with us

Policies and ethics