Removing Impediments to Loop Fusion Through Code Transformations

Blainey, Bob; Barton, Christopher; Amaral, José Nelson

doi:10.1007/11596110_21

Bob Blainey⁶,
Christopher Barton⁷ &
José Nelson Amaral⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 2481))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

608 Accesses

Abstract

Loop fusion is a common optimization technique that takes several loops and combines them into a single large loop. Most of the existing work on loop fusion concentrates on the heuristics required to optimize an objective function, such as data reuse or creation of instruction level parallelism opportunities. Often, however, the code provided to a compiler has only small sets of loops that are control flow equivalent, normalized, have the same iteration count, are adjacent, and have no fusion-preventing dependences. This paper focuses on code transformations that create more opportunities for loop fusion in the IBM®XL compiler suite that generates code for the IBM family of PowerPC®processors. In this compiler an objective function is used at the loop distributor to decide which portions of a loop should remain in the same loop nest and which portions should be redistributed. Our algorithm focuses on eliminating conditions that prevent loop fusion. By generating maximal fusion our algorithm increases the scope of later transformations. We tested our improved code generator in an IBM pSeries^TM690 machine equipped with a POWER4^TMprocessor using the SPEC CPU2000 benchmark suite. Our improvements to loop fusion resulted in three times as many loops fused in a subset of CFP2000 benchmarks, and four times as many for a subset of CINT2000 benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Instruction Level Loop De-optimization

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Article 09 January 2017

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

References

Lim, W., Liao, S.-W., Lam, M.S.: Blocking and array contraction across arbitrarily nested loops using affine partitioning. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001, pp. 103–112 (2001)
Google Scholar
Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for high performance computing. ACM Computing Surveys 26(4), 345–420 (1994)
Article Google Scholar
Behling, S., Bell, R., Farrell, P., Holthoff, H., O’Connell, F., Weir, W.: The power4 processor introduction and tuning guide. Technical Report SG24-7041-00, IBM (November 2001)
Google Scholar
Ding, C., Kennedy, K.: The memory bandwidth bottleneck and its amelioration by a compiler. In: 2000 International Parallel and Distributed Processing Symposium, Cancun, Mexico, May 2000, pp. 181–189 (2000)
Google Scholar
Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. In: International Parallel and Distribute Processing Symposium, San Francisco, CA (April 2001)
Google Scholar
Gao, G.R., Olsen, R., Sarkar, V., Thekkath, R.: Collective loop fusion for array contraction. In: 1992 Workshop on Languages and Compilers for Parallel Computing, New Haven, Conn., pp. 281–295. Springer, Berlin (1992)
Google Scholar
Gupta, R., Bodik, R.: Adaptive loop transformations for scientific programs. In: IEEE Symposium on Parallel and Distributed Processing, San Antonio, Texas, October 1995, pp. 368–375 (1995)
Google Scholar
Hsieh, B.-M., Hind, M., Cytron, R.: Loop distribution with multiple exits. In: Proceedings of Supercomputing, November 1992, pp. 204–213 (1992)
Google Scholar
Kennedy, K., McKinley, K.S.: Loop distribution with arbitrary control flow. In: Proceedings of Supercomputing, pp. 407–417. IEEE Computer Society Press, Los Alamitos (1990)
Chapter Google Scholar
Kennedy, K., McKinley, K.S.: Typed fusion with applications to parallel and sequential code generation. Technical Report CRPC-TR94646, Rice University, Center for Research on Parallel Computation (1994)
Google Scholar
Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: 1993 Workshop on Languages and Compilers for Parallel Computing, Portland, Ore., pp. 301–320. Springer, Berlin (1993)
Google Scholar
Krewell, K.: Ibm’s power4 unveiling continues: New details revealed at microprocessor forum 2000. In: Microprocessor Report: The Insider’s Guide to Microprocessor Hardware (November 2000)
Google Scholar
Kuck, D.J.: A survey of parallel machine organization and programming. ACM Computing Surveys 9(1), 29–59 (1977)
Article MATH MathSciNet Google Scholar
Megiddo, N., Sarkar, V.: Optimal weighted loop fusion for parallel programs. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 282–291 (1997)
Google Scholar
Muraoka, Y.: Parallelism Exposure and Exploitation in Programs. PhD thesis, University of Illinois at Urbana Champaign, Dept. of Computer Science, Report No. 71-424 (February 1971)
Google Scholar
Singhai, S., McKinley, K.: A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal 40(6), 340–355 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM Toronto Software Laboratory, Toronto, Canada
Bob Blainey
Department of Computing Science, University of Alberta, Edmonton, Canada
Christopher Barton & José Nelson Amaral

Authors

Bob Blainey
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Barton
View author publications
You can also search for this author in PubMed Google Scholar
José Nelson Amaral
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deptartment of Computer Science, University of Maryland, 4135 A.V. Williams Bldg., College Park, 20742, MD, USA
Bill Pugh
Dept. of Computer Science, Univ. of Maryland at College Park,
Chau-Wen Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blainey, B., Barton, C., Amaral, J.N. (2005). Removing Impediments to Loop Fusion Through Code Transformations. In: Pugh, B., Tseng, CW. (eds) Languages and Compilers for Parallel Computing. LCPC 2002. Lecture Notes in Computer Science, vol 2481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596110_21

Download citation

DOI: https://doi.org/10.1007/11596110_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30781-5
Online ISBN: 978-3-540-31612-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Removing Impediments to Loop Fusion Through Code Transformations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Instruction Level Loop De-optimization

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Removing Impediments to Loop Fusion Through Code Transformations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Instruction Level Loop De-optimization

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation