skip to main content
article
Free Access

The influence of caches on the performance of heaps

Published:01 January 1996Publication History
Skip Abstract Section

Abstract

As memory access times grow larger relative to processor cycle times, the cache performance of algorithms has an increasingly large impact on overall performance. Unfortunately, most commonly used algorithms were not designed with cache performance in mind. This paper investigates the cache performance of implicit heaps. We present optimizations which significantly reduce the cache misses that heaps incur and improve their overall performance. We present an analytical model called collective analysis that allows cache performance to be predicted as a function of both cache configuration and algorithm configuration. As part of our investigation, we perform an approximate analysis of the cache performance of both traditional heaps and our improved heaps in our model. In addition empirical data is given for five architectures to show the impact our optimizations have on overall performance. We also revisit a priority queue study originally performed by Jones [25]. Due to the increases in cache miss penalties, the relative performance results we obtain on today's machines differ greatly from the machines of only ten years ago. We compare the performance of implicit heaps, skew heaps and splay trees and discuss the difference between our results and Jones's.

Skip Supplemental Material Section

Supplemental Material

References

  1. {1} A. Agarwal, M. Horowitz, and J. Hennessy. An analytical cache model. ACM Transactions on Computer Systems, 7:2:184-215, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} R. Agarwal, F. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development , 38:5:563-576, Sep 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {3} A. Aggarwal, K. Chandra, and M. Snir. A model for hierarchical memory. In 19th Annual ACM Symposium on Theory of Computing, pages 305-314, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Massachusetts, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy model of computation. Algorithmica, 12:2-3:72-109, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the 1993 ACM Symposium on Programming Languages Design and Implementation, pages 112-125. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} S. Carlsson. An optimal algorithm for deleting the root of a heap. Information Processing Letters, 37:2:117-120, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} S. Carr, K. McKinley, and C. W. Tseng. Compiler optimizations for improving data locality. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} M. Cierniak and Wet Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the 1995 ACM Symposium on Programming Languages Design and Implementation, pages 205-217. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} D. Clark. Cache performance of the VAX-11/780. ACM Transactions on Computer Systems, 1:1:24-37, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} E. Coffman and P. Denning. Operating Systems Theory. Prentice-Hall, Englewood Cliffs, NJ, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} J. De Graffe and W. Kosters. Expected heights in heaps. BIT, 32:4:570-579, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} E. Doberkat. Inserting a new element into a heap. BIT, 21:225-269, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  15. {15} E. Doberkat. Deleting the root of a heap. Acta Informatica, 17:245-265, 1982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} J. Dongarra, O. Brewer, J. Kohl, and S. Fineberg. A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processors. Journal of Parallel and Distributed Computing, 9:2:185-202, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} M. Farrens, G. Tyson, and A. Pleszkun. A study of single-chip processor/cache organizations for large numbers of transistors. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 338-347, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} D. Fenwick, D. Foley, W. Gist, S. VanDoren, and D. Wissell. The AlphaServer 8000 series: High-end server platform development. Digital Technical Journal, 7:1:43-65, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {19} Robert W. Floyd. Treesort 3. Communications of the ACM, 7:12:701, 1964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5:5:587- 616, Oct 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} G. Gonnet and J. Munro. Heaps on heaps. SIAM Journal of Computing, 15:4:964-971, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} D. Grunwald, B. Zorn, and R. Henderson. Improving the cache locality of memory allocation. In Proceedings of the 1993 ACM Symposium on Programming Languages Design and Implementation, pages 177-186. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} J. Hennesey and D. Patterson. Computer Architecture A Quantitative Approach. Morgan Kaufman Publishers, Inc., San Mateo, CA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} D.B. Johnson. Priority queues with update and finding minimum spanning trees. Information Processing Letters, 4, 1975.Google ScholarGoogle Scholar
  25. {25} D. Jones. An emperical comparison of priority-queue and event-set implementations. Communications of the ACM, 29:4:300-311, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {26} K. Kennedy and K. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 International Conference on Supercomputing, pages 323-334, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {27} D.E. Knuth. The Art of Computer Programming, vol III-Sorting and Searching. Addison-Wesely, Reading, MA, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {28} A. LaMarca. Caches and algorithms. Ph.D. Dissertation, University of Washington, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {29} A. LaMarca and R.E. Ladner. The influence of caches on the performance of sorting. Technical Report 96-10-01, University of Washington, Department of Computer Science and Engineering, 1992. Also appears in the Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, January 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {30} A. Lebeck and D. Wood. Cache profiling and the spec benchmarks: a case study. Computer, 27:10:15-26, Oct 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {31} M. Martonosi, A. Gupta, and T. Anderson. Memspy: analyzing memory system bottlenecks in programs. In Proceedings of the 1992 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 1-12, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. {32} D. Naor, C. Martel, and N. Matloff. Performance of priority queue structures in a virtual memory environment. Computer Journal, 34:5:428-437, Oct 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. {33} G. Rao. Performance analysis of cache memories. Journal of the ACM, 25:3:378-395, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {34} R. Sedgewick. Algorithms. Addison-Wesley, Reading, MA, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. {35} J.P. Singh, H.S. Stone, and D.F. Thiebaut. A model of workloads and its use in miss-rate prediction for fully associative caches. IEEE Transactions on Computers, 41:7:811-825, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. {36} D. Sleator and R. Tarjan. Self-adjusting binary search trees. Journal of the ACM, 32:3:652- 686, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. {37} Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of the 1994 ACM Symposium on Programming Languages Design and Implementation, pages 196-205. ACM, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. {38} O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems , pages 261-271, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. {39} R. Uhlig, D. Nagle, T. Stanley, T. Mudge, S. Sechrest, and R. Brown. Design tradeoffs for software-managed TLBs. ACM Transactions on Computer Systems, 12:3:175-205, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. {40} M. Weiss. Data structures and algorithm analysis. Benjamin/Cummings Pub. Co., Redwood City, CA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. {41} H. Wen and J. L. Baer. Efficient trace-driven simulation methods for cache performance analysis. ACM Transactions on Computer Systems, 9:3:222-241, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. {42} J. W. Williams. Heapsort. Communications of the ACM, 7:6:347-348, 1964.Google ScholarGoogle Scholar
  43. {43} M. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the 1991 ACM Symposium on Programming Languages Design and Implementation, pages 30-44. ACM, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The influence of caches on the performance of heaps

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Journal of Experimental Algorithmics
          ACM Journal of Experimental Algorithmics  Volume 1, Issue
          1996
          104 pages
          ISSN:1084-6654
          EISSN:1084-6654
          DOI:10.1145/235141
          Issue’s Table of Contents

          Copyright © 1996 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 January 1996
          Published in jea Volume 1, Issue

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader