Skip to main content

On estimating and enhancing cache effectiveness

  • VIII. Cache Memory Issues
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1991)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 589))

Abstract

In this paper, we consider automatic analysis of a program's cache usage to achieve greater cache effectiveness. We show how to estimate efficiently the number of distinct cache lines used by a given loop in a nest of loops. Given this estimate of the number of cache lines needed, we can estimate the number of cache misses for a nest of loops. Our estimates can be used to guide program transformations such as loop interchange to achieve greater cache effectiveness. We present simulation results that show our estimates are reasonable for simple cases such as matrix multiply. We analyze the array sizes for which our estimates differ from our simulation results, and provide recommendations on how to handle such arrays in practice.

This work was partly supported by IBM and an NSF Graduate Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.

    Google Scholar 

  2. Frances Allen, Michael Burke, Philippe Charles, Ron Cytron, and Jeanne Ferrante. An overview of the ptran analysis system for multiprocessing. Proceedings of the ACM 1987 International Conference on Supercomputing, 1987. Also published in The Journal of Parallel and Distributed Computing, Oct., 1988, Vol. 5, No. 5, pp. 617–640.

    Google Scholar 

  3. H. B. Bakoglu, G. F. Grohoski, and R. K. Montoye. The ibm risc system/6000 processor: Hardware overview. IBM Journal of Research and Development, 34(1):12–23, January 1990.

    Google Scholar 

  4. H. B. Bakoglu and T. Whiteside. Risc system/6000 hardware overview. IBM RISC System/6000 Technology, pages 8–15, 1990. IBM Corporation SA23-2619.

    Google Scholar 

  5. Vasanth Balasundaram. A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor. Journal of Parallel and Distributed Computing, 9:154–170, 1990.

    Google Scholar 

  6. Utpal Banerjee. Data dependence in ordinary programs. Technical report, University of Illinois at Urbana-Champaign, 1976. M.S. Thesis.

    Google Scholar 

  7. Utpal Banerjee. Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Norwell, Massachusetts, 1988.

    Google Scholar 

  8. Michael Burke and Ron Cytron. Interprocedural dependence analysis and parallelization. Proceedings of the Sigplan '86 Symposium on Compiler Construction, 21(7):162–175, July 1986.

    Google Scholar 

  9. David Callahan and Allan Porterfield. Data cache performance of supercomputer applications. Proceedings of Supercomputing '90, pages 564–572, November 1990. New York, New York.

    Google Scholar 

  10. Larry Carter, Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. On estimating and enhancing cache effectivness, 1991. Full paper corresponding to this extended abstract.

    Google Scholar 

  11. Kyle Gallivan, William Jalby, and Dennis Gannon. On the problem of optimizing data transfers for complex memory systems. Technical report, U. of IL-Center for Supercomputing Research and Development, July Also in Proc. of ACM 1988 Int'l. Conf. on Supercomputing, St. Malo, France, July 4–8, 1988, pp.238–253. 1988.

    Google Scholar 

  12. Dennis Gannon, William Jalby, and Kyle Gallivan. Strategies for cache and local memory management by global program transformations. Proceedings of the First ACM International Conference on Supercomputing, June 1987.

    Google Scholar 

  13. Kourosh Gharachorloo and Vivek Sarkar. Loop partitioning and blocking to reduce communication and cache miss traffic. Foils documenting work done at the IBM T.J. Watson Research Center during the summer of 1989., August 1989.

    Google Scholar 

  14. John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1990.

    Google Scholar 

  15. Donald E. Knuth. Seminumerical Algorithms, Volume 2, The Art of Computer Programming, Second Edition. Addison-Wesley, 1981.

    Google Scholar 

  16. Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimization of blocked algorithms. Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.

    Google Scholar 

  17. David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184–1201, December 1986.

    Google Scholar 

  18. Allan K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. Rice COMP TR89-93.

    Google Scholar 

  19. Rafael Saavedra-Barrera. Private communication, March 1991.

    Google Scholar 

  20. Vivek Sarkar. Determining average program execution times and their variance. Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementation, 24(7):298–312, July 1989.

    Google Scholar 

  21. Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew. An empirical study on array subscripts and data dependences. Technical report, University of Illinois-CSRD, May 1989. CSRD Rpt. No. 840 Appeared in the Proceedings of the 1989 Int'l Conf. on Parallel Processing.

    Google Scholar 

  22. Michael E. Wolf and Monica S. Lam. A data locality optimization algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, June 1991.

    Google Scholar 

  23. Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and The MIT Press, Cambridge, Massachusetts, 1989. In the series, Research Monographs in Parallel and Distributed Computing This monograph is a revised version of the author's Ph.D. dissertation published as Technical Report UIUCDCS-R-82-1105, U. Illinois at Urbana-Champaign, 1982.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferrante, J., Sarkar, V., Thrash, W. (1992). On estimating and enhancing cache effectiveness. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038674

Download citation

  • DOI: https://doi.org/10.1007/BFb0038674

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-55422-6

  • Online ISBN: 978-3-540-47063-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics