Skip to main content

Removing Impediments to Loop Fusion Through Code Transformations

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2002)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 2481))

  • 608 Accesses

Abstract

Loop fusion is a common optimization technique that takes several loops and combines them into a single large loop. Most of the existing work on loop fusion concentrates on the heuristics required to optimize an objective function, such as data reuse or creation of instruction level parallelism opportunities. Often, however, the code provided to a compiler has only small sets of loops that are control flow equivalent, normalized, have the same iteration count, are adjacent, and have no fusion-preventing dependences. This paper focuses on code transformations that create more opportunities for loop fusion in the IBM®XL compiler suite that generates code for the IBM family of PowerPC®processors. In this compiler an objective function is used at the loop distributor to decide which portions of a loop should remain in the same loop nest and which portions should be redistributed. Our algorithm focuses on eliminating conditions that prevent loop fusion. By generating maximal fusion our algorithm increases the scope of later transformations. We tested our improved code generator in an IBM pSeriesTM690 machine equipped with a POWER4TMprocessor using the SPEC CPU2000 benchmark suite. Our improvements to loop fusion resulted in three times as many loops fused in a subset of CFP2000 benchmarks, and four times as many for a subset of CINT2000 benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lim, W., Liao, S.-W., Lam, M.S.: Blocking and array contraction across arbitrarily nested loops using affine partitioning. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001, pp. 103–112 (2001)

    Google Scholar 

  2. Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for high performance computing. ACM Computing Surveys 26(4), 345–420 (1994)

    Article  Google Scholar 

  3. Behling, S., Bell, R., Farrell, P., Holthoff, H., O’Connell, F., Weir, W.: The power4 processor introduction and tuning guide. Technical Report SG24-7041-00, IBM (November 2001)

    Google Scholar 

  4. Ding, C., Kennedy, K.: The memory bandwidth bottleneck and its amelioration by a compiler. In: 2000 International Parallel and Distributed Processing Symposium, Cancun, Mexico, May 2000, pp. 181–189 (2000)

    Google Scholar 

  5. Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. In: International Parallel and Distribute Processing Symposium, San Francisco, CA (April 2001)

    Google Scholar 

  6. Gao, G.R., Olsen, R., Sarkar, V., Thekkath, R.: Collective loop fusion for array contraction. In: 1992 Workshop on Languages and Compilers for Parallel Computing, New Haven, Conn., pp. 281–295. Springer, Berlin (1992)

    Google Scholar 

  7. Gupta, R., Bodik, R.: Adaptive loop transformations for scientific programs. In: IEEE Symposium on Parallel and Distributed Processing, San Antonio, Texas, October 1995, pp. 368–375 (1995)

    Google Scholar 

  8. Hsieh, B.-M., Hind, M., Cytron, R.: Loop distribution with multiple exits. In: Proceedings of Supercomputing, November 1992, pp. 204–213 (1992)

    Google Scholar 

  9. Kennedy, K., McKinley, K.S.: Loop distribution with arbitrary control flow. In: Proceedings of Supercomputing, pp. 407–417. IEEE Computer Society Press, Los Alamitos (1990)

    Chapter  Google Scholar 

  10. Kennedy, K., McKinley, K.S.: Typed fusion with applications to parallel and sequential code generation. Technical Report CRPC-TR94646, Rice University, Center for Research on Parallel Computation (1994)

    Google Scholar 

  11. Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: 1993 Workshop on Languages and Compilers for Parallel Computing, Portland, Ore., pp. 301–320. Springer, Berlin (1993)

    Google Scholar 

  12. Krewell, K.: Ibm’s power4 unveiling continues: New details revealed at microprocessor forum 2000. In: Microprocessor Report: The Insider’s Guide to Microprocessor Hardware (November 2000)

    Google Scholar 

  13. Kuck, D.J.: A survey of parallel machine organization and programming. ACM Computing Surveys 9(1), 29–59 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  14. Megiddo, N., Sarkar, V.: Optimal weighted loop fusion for parallel programs. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 282–291 (1997)

    Google Scholar 

  15. Muraoka, Y.: Parallelism Exposure and Exploitation in Programs. PhD thesis, University of Illinois at Urbana Champaign, Dept. of Computer Science, Report No. 71-424 (February 1971)

    Google Scholar 

  16. Singhai, S., McKinley, K.: A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal 40(6), 340–355 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blainey, B., Barton, C., Amaral, J.N. (2005). Removing Impediments to Loop Fusion Through Code Transformations. In: Pugh, B., Tseng, CW. (eds) Languages and Compilers for Parallel Computing. LCPC 2002. Lecture Notes in Computer Science, vol 2481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596110_21

Download citation

  • DOI: https://doi.org/10.1007/11596110_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30781-5

  • Online ISBN: 978-3-540-31612-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics