Abstract
In this paper, we consider automatic analysis of a program's cache usage to achieve greater cache effectiveness. We show how to estimate efficiently the number of distinct cache lines used by a given loop in a nest of loops. Given this estimate of the number of cache lines needed, we can estimate the number of cache misses for a nest of loops. Our estimates can be used to guide program transformations such as loop interchange to achieve greater cache effectiveness. We present simulation results that show our estimates are reasonable for simple cases such as matrix multiply. We analyze the array sizes for which our estimates differ from our simulation results, and provide recommendations on how to handle such arrays in practice.
This work was partly supported by IBM and an NSF Graduate Fellowship.
Preview
Unable to display preview. Download preview PDF.
References
A. V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.
Frances Allen, Michael Burke, Philippe Charles, Ron Cytron, and Jeanne Ferrante. An overview of the ptran analysis system for multiprocessing. Proceedings of the ACM 1987 International Conference on Supercomputing, 1987. Also published in The Journal of Parallel and Distributed Computing, Oct., 1988, Vol. 5, No. 5, pp. 617–640.
H. B. Bakoglu, G. F. Grohoski, and R. K. Montoye. The ibm risc system/6000 processor: Hardware overview. IBM Journal of Research and Development, 34(1):12–23, January 1990.
H. B. Bakoglu and T. Whiteside. Risc system/6000 hardware overview. IBM RISC System/6000 Technology, pages 8–15, 1990. IBM Corporation SA23-2619.
Vasanth Balasundaram. A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor. Journal of Parallel and Distributed Computing, 9:154–170, 1990.
Utpal Banerjee. Data dependence in ordinary programs. Technical report, University of Illinois at Urbana-Champaign, 1976. M.S. Thesis.
Utpal Banerjee. Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Norwell, Massachusetts, 1988.
Michael Burke and Ron Cytron. Interprocedural dependence analysis and parallelization. Proceedings of the Sigplan '86 Symposium on Compiler Construction, 21(7):162–175, July 1986.
David Callahan and Allan Porterfield. Data cache performance of supercomputer applications. Proceedings of Supercomputing '90, pages 564–572, November 1990. New York, New York.
Larry Carter, Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. On estimating and enhancing cache effectivness, 1991. Full paper corresponding to this extended abstract.
Kyle Gallivan, William Jalby, and Dennis Gannon. On the problem of optimizing data transfers for complex memory systems. Technical report, U. of IL-Center for Supercomputing Research and Development, July Also in Proc. of ACM 1988 Int'l. Conf. on Supercomputing, St. Malo, France, July 4–8, 1988, pp.238–253. 1988.
Dennis Gannon, William Jalby, and Kyle Gallivan. Strategies for cache and local memory management by global program transformations. Proceedings of the First ACM International Conference on Supercomputing, June 1987.
Kourosh Gharachorloo and Vivek Sarkar. Loop partitioning and blocking to reduce communication and cache miss traffic. Foils documenting work done at the IBM T.J. Watson Research Center during the summer of 1989., August 1989.
John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1990.
Donald E. Knuth. Seminumerical Algorithms, Volume 2, The Art of Computer Programming, Second Edition. Addison-Wesley, 1981.
Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimization of blocked algorithms. Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.
David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184–1201, December 1986.
Allan K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. Rice COMP TR89-93.
Rafael Saavedra-Barrera. Private communication, March 1991.
Vivek Sarkar. Determining average program execution times and their variance. Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementation, 24(7):298–312, July 1989.
Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew. An empirical study on array subscripts and data dependences. Technical report, University of Illinois-CSRD, May 1989. CSRD Rpt. No. 840 Appeared in the Proceedings of the 1989 Int'l Conf. on Parallel Processing.
Michael E. Wolf and Monica S. Lam. A data locality optimization algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, June 1991.
Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and The MIT Press, Cambridge, Massachusetts, 1989. In the series, Research Monographs in Parallel and Distributed Computing This monograph is a revised version of the author's Ph.D. dissertation published as Technical Report UIUCDCS-R-82-1105, U. Illinois at Urbana-Champaign, 1982.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferrante, J., Sarkar, V., Thrash, W. (1992). On estimating and enhancing cache effectiveness. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038674
Download citation
DOI: https://doi.org/10.1007/BFb0038674
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55422-6
Online ISBN: 978-3-540-47063-2
eBook Packages: Springer Book Archive