Cross-loop reuse analysis and its application to cache optimizations

Cooper, Keith; Kennedy, Ken; McIntosh, Nathaniel

doi:10.1007/BFb0017242

Cross-loop reuse analysis and its application to cache optimizations

Keith Cooper¹,
Ken Kennedy¹ &
Nathaniel McIntosh¹

Automatic Data Distribution and Locality Enhancement
Conference paper
First Online: 01 January 2005

121 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Abstract

In this paper we describe the design of a data-flow framework for detecting cross-loop reuse. Cross-loop reuse takes place when a set of data items or cache lines is accessed in a given loop nest and then accessed again within some subsequent portion of the program, usually another outer loop nest. In contrast to intra-loop reuse, which occurs during the execution of a single loop nest, cross-loop reuse is hard to analyze using traditional dependence-based techniques. The framework we have constructed is based on a combination of array section analysis (to capture array access patterns at a high level) and data-flow analysis (to deal with intra-procedural control flow). The framework is designed to account for cache size when gathering reuse information, and when used in an interprocedural setting, the framework also provides a mechanism for summarizing the effects of procedure calls.

Cross-loop reuse information can be used to drive a number of transformations that enhance locality and improve cache utilization, including loop fusion and loop reversal. Although these transformations are not new, their impact on cache behavior has not always been possible to predict, making them difficult to apply. As part of this paper we report the results of a comprehensive experimental study in which we apply our techniques to a set of ten programs from the SPEC95 floating point benchmark suite. We were able to obtain modest performance gains overall for several of the programs, based mostly on improvements in cache utilization.

This work was supported in part by ARPA (Army Contract DABT63-95-C-0115).

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

B. Appelbe and B. Lakshmanan. Program transformations for locality using affinity regions. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
Google Scholar
V. Balasundaram. Interactive Parallelization of Numerical Scientific Programs. PhD thesis, Dept. of Computer Science, Rice University, May 1989.
Google Scholar
V. Balasundaram. A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor. Journal of Parallel and Distributed Computing, 9(2):154–170, June 1990.
Article Google Scholar
U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, MA, 1988.
Google Scholar
M. Burke and R. Cytron. Interprocedural dependence analysis and parallelization. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, Palo Alto, CA, June 1986.
Google Scholar
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subcripted variables. In Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation, White Plains, NY, June 1990.
Google Scholar
D. Callahan, J. Cocke, and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. Journal of Parallel and Distributed Computing, 5(5):517–550, October 1988.
Article Google Scholar
S. Carr, K. S. M^cKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, CA, October 1994.
Google Scholar
R. Cmelik and D. Keppel. Shade: A fast instruction-set simulator for execution profiling. Technical Report SMLI 93-12; UWCSE 93-06-06, Sun Microsystems Laboratories, Inc. and University of Washington, 1993.
Google Scholar
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.
Google Scholar
T. Gross and P. Steenkiste. Structured dataflow analysis for arrays and its use in an optimizing compiler. Software—Practice and Experience, 20(2):133–155, February 1990.
Google Scholar
M. Gupta, E. Schonberg, and H. Srinivasan. A unified data-flow framework for optimizing communication. In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, NY, August 1994.
Google Scholar
R. v. Hanxleden. Compiler Support for Machine-Independent Parallelization of Irregular Problems. PhD thesis, Dept. of Computer Science, Rice University, December 1994.
Google Scholar
R. v. Hanxleden and K. Kennedy. Give-N-Take — A balanced code placement framework. In Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation, Orlando, FL, June 1994.
Google Scholar
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350–360, July 1991.
Article Google Scholar
K. Kennedy and K. S. M^cKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
Google Scholar
W. Li and K. Pingali. Access normalization: Loop restructuring for NUMA compilers. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), Boston, MA, October 1992.
Google Scholar
R. E. Tarjan. Testing flow graph reducibility. Journal of Computer and System Sciences, 9:355–365, 1974.
Google Scholar
J. Uniejewski. SPEC Benchmark Suite: Designed for today's advanced systems. SPEC Newsletter Volume 1, Issue 1, SPEC, Fall 1989.
Google Scholar
M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.
Google Scholar
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge, MA, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rice University, Houston, Texas, USA
Keith Cooper, Ken Kennedy & Nathaniel McIntosh

Authors

Keith Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Ken Kennedy
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel McIntosh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cooper, K., Kennedy, K., McIntosh, N. (1997). Cross-loop reuse analysis and its application to cache optimizations. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017242

Download citation

DOI: https://doi.org/10.1007/BFb0017242
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics