Software Pipelining in Nested Loops with Prolog-Epilog Merging

Fellahi, Mohammed; Cohen, Albert

doi:10.1007/978-3-540-92990-1_8

Mohammed Fellahi⁶ &
Albert Cohen⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

1038 Accesses
3 Altmetric

Abstract

Software pipelining (or modulo scheduling) is a powerful back-end optimization to exploit instruction and vector parallelism. Software pipelining is particularly popular for embedded devices as it improves the computation throughput without increasing the size of the inner loop kernel (unlike loop unrolling), a desirable property to minimize the amount of code in local memories or caches. Unfortunately, common media and signal processing codes exhibit series of low-trip-count inner loops. In this situation, software pipelining is often not an option: it incurs severe fill/drain time overheads and code size expansion due to nested prologs and epilogs.

We propose a method to pipeline series of inner loops without increasing the size of the loop nest, apart from an outermost prolog and epilog. Our method achieves significant code size savings and allows pipelining of low-trip-count loops. These benefits come at the cost of additional scheduling constraints, leading to a linear optimization problem to trade memory usage for pipelining opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Instruction Level Loop De-optimization

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Article 06 August 2021

A Pipelining Loop Optimization Method for Dataflow Architecture

Article 26 January 2018

References

Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufman, San Francisco (2002)
Google Scholar
Carr, S., Ding, C., Sweany, P.: Improving software pipelining with unroll-and-jam. In: Proceedings of the 29th Hawaii Intl. Conf. on System Sciences (HICSS 1996). Software Technology and Architecture, vol. 1. IEEE, Los Alamitos (1996)
Google Scholar
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. on Programming Languages and Systems 13(4), 451–490 (1991)
Article Google Scholar
Darte, A., Huard, G.: Loop shifting for loop parallelization. Intl. J. of Parallel Programming 28(5), 499–534 (2000)
Article Google Scholar
Darte, A., Silber, G.-A., Vivien, F.: Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling. Parallel Processing Letters 7(4), 379–392 (1997)
Article Google Scholar
Dehnert, J.C., Hsu, P.Y., Bratt, J.P.: Overlapped loop support in the Cydra 5. In: Intl Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS 1989), pp. 26–38 (April 1989)
Google Scholar
Dulong, C., Krishnaiyer, R., Kulkarni, D., Lavery, D., Li, W., Ng, J., Sehr, D.: An overview of the Intel IA-64 compiler. Intel. Technical Journal Q4 (1999)
Google Scholar
Feautrier, P.: Array expansion. In: Intl. Conf. on Supercomputing (ICS 1988), St. Malo, France, pp. 429–441 (July 1988)
Google Scholar
Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I, multidimensional time. Intl. J. of Parallel Programming 21(6), 315–348 (1992)
Article MathSciNet MATH Google Scholar
Feautrier, P., Griebl, M., Lengauer, C.: On index set splitting. In: Parallel Architectures and Compilation Techniques (PACT 1999). IEEE Computer Society, Los Alamitos (1999)
Google Scholar
Gerlek, M.P., Stoltz, E., Wolfe, M.J.: Beyond induction variables: detecting and classifying sequences using a demand-driven ssa form. ACM Trans. on Programming Languages and Systems 17(1), 85–122 (1995)
Article Google Scholar
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 57 (2006); Special issue on Microgrids
Google Scholar
Karczmarek, M., Thies, W., Amarasinghe, S.: Phased scheduling of stream programs. In: LCTES 2003 (June 2003)
Google Scholar
Lam, M.S.: Software pipelining: An effective scheduling technique for vliw machines. In: ACM Principles, Logics, and Implementations of High-Level Programming Languages (1988)
Google Scholar
Leiserson, C.E., Saxe, J.B.: Retiming synchronous circuitry. Algorithmica 6(1), 5–35 (1991)
Article MathSciNet MATH Google Scholar
Maydan, D.E., Amarasinghe, S.P., Lam, M.S.: Array dataflow analysis and its use in array privatization. In: Principles of Programming Languages (PoPL 1993), Charleston, South Carolina, pp. 2–15 (January 1993)
Google Scholar
McKinley, K., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)
Article Google Scholar
McNairy, C., Soltis, D.: Itanium 2 processor microarchitecture. IEEE Micro., 44–55 (March 2003)
Google Scholar
Muthukumar, K., Doshi, G.: Software pipelining of nested loops. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027. Springer, Heidelberg (2001)
Chapter Google Scholar
Parra-Hermandez, R., Dimopoulos, N.J.: A new heuristic for solving the multichoice multidimensional knapsack problem. IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems and Humans 35(5) (September 2005)
Google Scholar
Petkov, D., Harr, R.E., Amarasinghe, S.P.: Efficient pipelining of nested loops: Unroll-and-squash. In: Proc. of the 16th Intl. Parallel and Distributed Processing Symp. (IPDPS 2002), Washington, DC (2002)
Google Scholar
Puchinger, J., Raidl, G.R., Pfershy, U.: The multidimensional knapsack problem: Structure and algorithms. Technical Report No. 006149 INFORMS Journal of Computing (March 2007)
Google Scholar
Ramanujam, J.: Optimal software pipelining of nested loops. In: International Symposium on Parallel Processing, Washington, D.C, pp. 335–342 (1994)
Google Scholar
Rau, B.R.: Iterative modulo scheduling: an algorithm for software pipelining loops. In: MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pp. 63–74. ACM Press, New York (1994)
Chapter Google Scholar
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Code generation for single-dimension software pipelining for multi-dimensional loops. In: Proceedings of the International Symposium on Code generation and Optimization(CGO 2004), pp. 175–186 (March 2004)
Google Scholar
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Single-dimension software pipelining for multi-dimensional loops. In: Proceedings of the International Symposium on Code generation and Optimization(CGO 2004), pp. 163–184 (2004)
Google Scholar
Touati, S., Eisenbeis, C.: Early Control of Register Pressure for Software Pipelined Loops. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 17–32. Springer, Heidelberg (2003)
Chapter Google Scholar
Vasilache, N., Cohen, A., Pouchet, L.-N.: Automatic correction of loop transformations. In: Malyshkin, V.E. (ed.) PaCT 2007. LNCS, vol. 4671. Springer, Heidelberg (2007)
Google Scholar
Verdoolaege, S., Bruynooghe, M., Janssens, G., Catthoor, F.: Multi-dimentsional incremetal loops fusion for data locality. In: ASAP, pp. 17–27 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Alchemy Group, INRIA Saclay, France, and HiPEAC Network, France
Mohammed Fellahi & Albert Cohen

Authors

Mohammed Fellahi
View author publications
You can also search for this author in PubMed Google Scholar
Albert Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRISA, Campus de Beaulieu, 35042, Rennes Cedex, France
André Seznec
Intel Corporation, Massachusetts Microprocessor Design Center, 77 Reed Road, MA 01749, Hudson, USA
Joel Emer
School of Informatics, Institute for Computing Systems Architecture, King’ s Buildings, EH9 3JZ, Edinburgh, United Kingdom
Michael O’Boyle
Department of Electrical Engineering, Princeton University, 34 Olden Street, NJ 08544-5263, Princeton, USA
Margaret Martonosi
Department of Computer Science, University of Augsburg, 86135, Augsburg, Germany
Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fellahi, M., Cohen, A. (2009). Software Pipelining in Nested Loops with Prolog-Epilog Merging. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-92990-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92989-5
Online ISBN: 978-3-540-92990-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Software Pipelining in Nested Loops with Prolog-Epilog Merging

Abstract

Access this chapter

Preview

Similar content being viewed by others

Instruction Level Loop De-optimization

Enhancing the Effectiveness of Inlining in Automatic Parallelization

A Pipelining Loop Optimization Method for Dataflow Architecture

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Software Pipelining in Nested Loops with Prolog-Epilog Merging

Abstract

Access this chapter

Preview

Similar content being viewed by others

Instruction Level Loop De-optimization

Enhancing the Effectiveness of Inlining in Automatic Parallelization

A Pipelining Loop Optimization Method for Dataflow Architecture

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation