On estimating and enhancing cache effectiveness

Ferrante, J.; Sarkar, V.; Thrash, W.

doi:10.1007/BFb0038674

J. Ferrante¹,
V. Sarkar² &
W. Thrash³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 589))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

Abstract

In this paper, we consider automatic analysis of a program's cache usage to achieve greater cache effectiveness. We show how to estimate efficiently the number of distinct cache lines used by a given loop in a nest of loops. Given this estimate of the number of cache lines needed, we can estimate the number of cache misses for a nest of loops. Our estimates can be used to guide program transformations such as loop interchange to achieve greater cache effectiveness. We present simulation results that show our estimates are reasonable for simple cases such as matrix multiply. We analyze the array sizes for which our estimates differ from our simulation results, and provide recommendations on how to handle such arrays in practice.

This work was partly supported by IBM and an NSF Graduate Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.
Google Scholar
Frances Allen, Michael Burke, Philippe Charles, Ron Cytron, and Jeanne Ferrante. An overview of the ptran analysis system for multiprocessing. Proceedings of the ACM 1987 International Conference on Supercomputing, 1987. Also published in The Journal of Parallel and Distributed Computing, Oct., 1988, Vol. 5, No. 5, pp. 617–640.
Google Scholar
H. B. Bakoglu, G. F. Grohoski, and R. K. Montoye. The ibm risc system/6000 processor: Hardware overview. IBM Journal of Research and Development, 34(1):12–23, January 1990.
Google Scholar
H. B. Bakoglu and T. Whiteside. Risc system/6000 hardware overview. IBM RISC System/6000 Technology, pages 8–15, 1990. IBM Corporation SA23-2619.
Google Scholar
Vasanth Balasundaram. A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor. Journal of Parallel and Distributed Computing, 9:154–170, 1990.
Google Scholar
Utpal Banerjee. Data dependence in ordinary programs. Technical report, University of Illinois at Urbana-Champaign, 1976. M.S. Thesis.
Google Scholar
Utpal Banerjee. Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Norwell, Massachusetts, 1988.
Google Scholar
Michael Burke and Ron Cytron. Interprocedural dependence analysis and parallelization. Proceedings of the Sigplan '86 Symposium on Compiler Construction, 21(7):162–175, July 1986.
Google Scholar
David Callahan and Allan Porterfield. Data cache performance of supercomputer applications. Proceedings of Supercomputing '90, pages 564–572, November 1990. New York, New York.
Google Scholar
Larry Carter, Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. On estimating and enhancing cache effectivness, 1991. Full paper corresponding to this extended abstract.
Google Scholar
Kyle Gallivan, William Jalby, and Dennis Gannon. On the problem of optimizing data transfers for complex memory systems. Technical report, U. of IL-Center for Supercomputing Research and Development, July Also in Proc. of ACM 1988 Int'l. Conf. on Supercomputing, St. Malo, France, July 4–8, 1988, pp.238–253. 1988.
Google Scholar
Dennis Gannon, William Jalby, and Kyle Gallivan. Strategies for cache and local memory management by global program transformations. Proceedings of the First ACM International Conference on Supercomputing, June 1987.
Google Scholar
Kourosh Gharachorloo and Vivek Sarkar. Loop partitioning and blocking to reduce communication and cache miss traffic. Foils documenting work done at the IBM T.J. Watson Research Center during the summer of 1989., August 1989.
Google Scholar
John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1990.
Google Scholar
Donald E. Knuth. Seminumerical Algorithms, Volume 2, The Art of Computer Programming, Second Edition. Addison-Wesley, 1981.
Google Scholar
Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimization of blocked algorithms. Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.
Google Scholar
David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184–1201, December 1986.
Google Scholar
Allan K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. Rice COMP TR89-93.
Google Scholar
Rafael Saavedra-Barrera. Private communication, March 1991.
Google Scholar
Vivek Sarkar. Determining average program execution times and their variance. Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementation, 24(7):298–312, July 1989.
Google Scholar
Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew. An empirical study on array subscripts and data dependences. Technical report, University of Illinois-CSRD, May 1989. CSRD Rpt. No. 840 Appeared in the Proceedings of the 1989 Int'l Conf. on Parallel Processing.
Google Scholar
Michael E. Wolf and Monica S. Lam. A data locality optimization algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, June 1991.
Google Scholar
Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and The MIT Press, Cambridge, Massachusetts, 1989. In the series, Research Monographs in Parallel and Distributed Computing This monograph is a revised version of the author's Ph.D. dissertation published as Technical Report UIUCDCS-R-82-1105, U. Illinois at Urbana-Champaign, 1982.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research Division, T. J. Watson Research Center, P.O. Box 704, 10598, Yorktown Heights, NY
J. Ferrante
IBM Palo Alto Scientific Center, 1530 Page Mill Road, 94304, Palo Alto, CA
V. Sarkar
Department of Computer Science and Engineering, FR-35, University of Washington, 98195, Seattle, WA
W. Thrash

Authors

J. Ferrante
View author publications
You can also search for this author in PubMed Google Scholar
V. Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
W. Thrash
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrante, J., Sarkar, V., Thrash, W. (1992). On estimating and enhancing cache effectiveness. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038674

Download citation

DOI: https://doi.org/10.1007/BFb0038674
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55422-6
Online ISBN: 978-3-540-47063-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics