Skip to main content

Software Pipelining in Nested Loops with Prolog-Epilog Merging

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Abstract

Software pipelining (or modulo scheduling) is a powerful back-end optimization to exploit instruction and vector parallelism. Software pipelining is particularly popular for embedded devices as it improves the computation throughput without increasing the size of the inner loop kernel (unlike loop unrolling), a desirable property to minimize the amount of code in local memories or caches. Unfortunately, common media and signal processing codes exhibit series of low-trip-count inner loops. In this situation, software pipelining is often not an option: it incurs severe fill/drain time overheads and code size expansion due to nested prologs and epilogs.

We propose a method to pipeline series of inner loops without increasing the size of the loop nest, apart from an outermost prolog and epilog. Our method achieves significant code size savings and allows pipelining of low-trip-count loops. These benefits come at the cost of additional scheduling constraints, leading to a linear optimization problem to trade memory usage for pipelining opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufman, San Francisco (2002)

    Google Scholar 

  2. Carr, S., Ding, C., Sweany, P.: Improving software pipelining with unroll-and-jam. In: Proceedings of the 29th Hawaii Intl. Conf. on System Sciences (HICSS 1996). Software Technology and Architecture, vol. 1. IEEE, Los Alamitos (1996)

    Google Scholar 

  3. Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. on Programming Languages and Systems 13(4), 451–490 (1991)

    Article  Google Scholar 

  4. Darte, A., Huard, G.: Loop shifting for loop parallelization. Intl. J. of Parallel Programming 28(5), 499–534 (2000)

    Article  Google Scholar 

  5. Darte, A., Silber, G.-A., Vivien, F.: Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling. Parallel Processing Letters 7(4), 379–392 (1997)

    Article  Google Scholar 

  6. Dehnert, J.C., Hsu, P.Y., Bratt, J.P.: Overlapped loop support in the Cydra 5. In: Intl Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS 1989), pp. 26–38 (April 1989)

    Google Scholar 

  7. Dulong, C., Krishnaiyer, R., Kulkarni, D., Lavery, D., Li, W., Ng, J., Sehr, D.: An overview of the Intel IA-64 compiler. Intel. Technical Journal Q4 (1999)

    Google Scholar 

  8. Feautrier, P.: Array expansion. In: Intl. Conf. on Supercomputing (ICS 1988), St. Malo, France, pp. 429–441 (July 1988)

    Google Scholar 

  9. Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I, multidimensional time. Intl. J. of Parallel Programming 21(6), 315–348 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  10. Feautrier, P., Griebl, M., Lengauer, C.: On index set splitting. In: Parallel Architectures and Compilation Techniques (PACT 1999). IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

  11. Gerlek, M.P., Stoltz, E., Wolfe, M.J.: Beyond induction variables: detecting and classifying sequences using a demand-driven ssa form. ACM Trans. on Programming Languages and Systems 17(1), 85–122 (1995)

    Article  Google Scholar 

  12. Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 57 (2006); Special issue on Microgrids

    Google Scholar 

  13. Karczmarek, M., Thies, W., Amarasinghe, S.: Phased scheduling of stream programs. In: LCTES 2003 (June 2003)

    Google Scholar 

  14. Lam, M.S.: Software pipelining: An effective scheduling technique for vliw machines. In: ACM Principles, Logics, and Implementations of High-Level Programming Languages (1988)

    Google Scholar 

  15. Leiserson, C.E., Saxe, J.B.: Retiming synchronous circuitry. Algorithmica 6(1), 5–35 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  16. Maydan, D.E., Amarasinghe, S.P., Lam, M.S.: Array dataflow analysis and its use in array privatization. In: Principles of Programming Languages (PoPL 1993), Charleston, South Carolina, pp. 2–15 (January 1993)

    Google Scholar 

  17. McKinley, K., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)

    Article  Google Scholar 

  18. McNairy, C., Soltis, D.: Itanium 2 processor microarchitecture. IEEE Micro., 44–55 (March 2003)

    Google Scholar 

  19. Muthukumar, K., Doshi, G.: Software pipelining of nested loops. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  20. Parra-Hermandez, R., Dimopoulos, N.J.: A new heuristic for solving the multichoice multidimensional knapsack problem. IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems and Humans 35(5) (September 2005)

    Google Scholar 

  21. Petkov, D., Harr, R.E., Amarasinghe, S.P.: Efficient pipelining of nested loops: Unroll-and-squash. In: Proc. of the 16th Intl. Parallel and Distributed Processing Symp. (IPDPS 2002), Washington, DC (2002)

    Google Scholar 

  22. Puchinger, J., Raidl, G.R., Pfershy, U.: The multidimensional knapsack problem: Structure and algorithms. Technical Report No. 006149 INFORMS Journal of Computing (March 2007)

    Google Scholar 

  23. Ramanujam, J.: Optimal software pipelining of nested loops. In: International Symposium on Parallel Processing, Washington, D.C, pp. 335–342 (1994)

    Google Scholar 

  24. Rau, B.R.: Iterative modulo scheduling: an algorithm for software pipelining loops. In: MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pp. 63–74. ACM Press, New York (1994)

    Chapter  Google Scholar 

  25. Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Code generation for single-dimension software pipelining for multi-dimensional loops. In: Proceedings of the International Symposium on Code generation and Optimization(CGO 2004), pp. 175–186 (March 2004)

    Google Scholar 

  26. Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Single-dimension software pipelining for multi-dimensional loops. In: Proceedings of the International Symposium on Code generation and Optimization(CGO 2004), pp. 163–184 (2004)

    Google Scholar 

  27. Touati, S., Eisenbeis, C.: Early Control of Register Pressure for Software Pipelined Loops. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 17–32. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  28. Vasilache, N., Cohen, A., Pouchet, L.-N.: Automatic correction of loop transformations. In: Malyshkin, V.E. (ed.) PaCT 2007. LNCS, vol. 4671. Springer, Heidelberg (2007)

    Google Scholar 

  29. Verdoolaege, S., Bruynooghe, M., Janssens, G., Catthoor, F.: Multi-dimentsional incremetal loops fusion for data locality. In: ASAP, pp. 17–27 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fellahi, M., Cohen, A. (2009). Software Pipelining in Nested Loops with Prolog-Epilog Merging. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92990-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92989-5

  • Online ISBN: 978-3-540-92990-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics