Abstract
Software pipelining (or modulo scheduling) is a powerful back-end optimization to exploit instruction and vector parallelism. Software pipelining is particularly popular for embedded devices as it improves the computation throughput without increasing the size of the inner loop kernel (unlike loop unrolling), a desirable property to minimize the amount of code in local memories or caches. Unfortunately, common media and signal processing codes exhibit series of low-trip-count inner loops. In this situation, software pipelining is often not an option: it incurs severe fill/drain time overheads and code size expansion due to nested prologs and epilogs.
We propose a method to pipeline series of inner loops without increasing the size of the loop nest, apart from an outermost prolog and epilog. Our method achieves significant code size savings and allows pipelining of low-trip-count loops. These benefits come at the cost of additional scheduling constraints, leading to a linear optimization problem to trade memory usage for pipelining opportunities.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufman, San Francisco (2002)
Carr, S., Ding, C., Sweany, P.: Improving software pipelining with unroll-and-jam. In: Proceedings of the 29th Hawaii Intl. Conf. on System Sciences (HICSS 1996). Software Technology and Architecture, vol. 1. IEEE, Los Alamitos (1996)
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. on Programming Languages and Systems 13(4), 451–490 (1991)
Darte, A., Huard, G.: Loop shifting for loop parallelization. Intl. J. of Parallel Programming 28(5), 499–534 (2000)
Darte, A., Silber, G.-A., Vivien, F.: Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling. Parallel Processing Letters 7(4), 379–392 (1997)
Dehnert, J.C., Hsu, P.Y., Bratt, J.P.: Overlapped loop support in the Cydra 5. In: Intl Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS 1989), pp. 26–38 (April 1989)
Dulong, C., Krishnaiyer, R., Kulkarni, D., Lavery, D., Li, W., Ng, J., Sehr, D.: An overview of the Intel IA-64 compiler. Intel. Technical Journal Q4 (1999)
Feautrier, P.: Array expansion. In: Intl. Conf. on Supercomputing (ICS 1988), St. Malo, France, pp. 429–441 (July 1988)
Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I, multidimensional time. Intl. J. of Parallel Programming 21(6), 315–348 (1992)
Feautrier, P., Griebl, M., Lengauer, C.: On index set splitting. In: Parallel Architectures and Compilation Techniques (PACT 1999). IEEE Computer Society, Los Alamitos (1999)
Gerlek, M.P., Stoltz, E., Wolfe, M.J.: Beyond induction variables: detecting and classifying sequences using a demand-driven ssa form. ACM Trans. on Programming Languages and Systems 17(1), 85–122 (1995)
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 57 (2006); Special issue on Microgrids
Karczmarek, M., Thies, W., Amarasinghe, S.: Phased scheduling of stream programs. In: LCTES 2003 (June 2003)
Lam, M.S.: Software pipelining: An effective scheduling technique for vliw machines. In: ACM Principles, Logics, and Implementations of High-Level Programming Languages (1988)
Leiserson, C.E., Saxe, J.B.: Retiming synchronous circuitry. Algorithmica 6(1), 5–35 (1991)
Maydan, D.E., Amarasinghe, S.P., Lam, M.S.: Array dataflow analysis and its use in array privatization. In: Principles of Programming Languages (PoPL 1993), Charleston, South Carolina, pp. 2–15 (January 1993)
McKinley, K., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)
McNairy, C., Soltis, D.: Itanium 2 processor microarchitecture. IEEE Micro., 44–55 (March 2003)
Muthukumar, K., Doshi, G.: Software pipelining of nested loops. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027. Springer, Heidelberg (2001)
Parra-Hermandez, R., Dimopoulos, N.J.: A new heuristic for solving the multichoice multidimensional knapsack problem. IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems and Humans 35(5) (September 2005)
Petkov, D., Harr, R.E., Amarasinghe, S.P.: Efficient pipelining of nested loops: Unroll-and-squash. In: Proc. of the 16th Intl. Parallel and Distributed Processing Symp. (IPDPS 2002), Washington, DC (2002)
Puchinger, J., Raidl, G.R., Pfershy, U.: The multidimensional knapsack problem: Structure and algorithms. Technical Report No. 006149 INFORMS Journal of Computing (March 2007)
Ramanujam, J.: Optimal software pipelining of nested loops. In: International Symposium on Parallel Processing, Washington, D.C, pp. 335–342 (1994)
Rau, B.R.: Iterative modulo scheduling: an algorithm for software pipelining loops. In: MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pp. 63–74. ACM Press, New York (1994)
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Code generation for single-dimension software pipelining for multi-dimensional loops. In: Proceedings of the International Symposium on Code generation and Optimization(CGO 2004), pp. 175–186 (March 2004)
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Single-dimension software pipelining for multi-dimensional loops. In: Proceedings of the International Symposium on Code generation and Optimization(CGO 2004), pp. 163–184 (2004)
Touati, S., Eisenbeis, C.: Early Control of Register Pressure for Software Pipelined Loops. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 17–32. Springer, Heidelberg (2003)
Vasilache, N., Cohen, A., Pouchet, L.-N.: Automatic correction of loop transformations. In: Malyshkin, V.E. (ed.) PaCT 2007. LNCS, vol. 4671. Springer, Heidelberg (2007)
Verdoolaege, S., Bruynooghe, M., Janssens, G., Catthoor, F.: Multi-dimentsional incremetal loops fusion for data locality. In: ASAP, pp. 17–27 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fellahi, M., Cohen, A. (2009). Software Pipelining in Nested Loops with Prolog-Epilog Merging. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-92990-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92989-5
Online ISBN: 978-3-540-92990-1
eBook Packages: Computer ScienceComputer Science (R0)