skip to main content
10.1145/3558481.3591085acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Increment - and - Freeze: Every Cache, Everywhere, All of the Time

Published:17 June 2023Publication History

ABSTRACT

One of the most basic algorithmic problems concerning caches is to compute the LRU hit-rate curve on a given trace. Unfortunately, the known algorithms exhibit poor data locality and fail to scale to large caches. It is widely believed that the LRU hit-rate curve cannot be computed efficiently enough to be used in online production settings. This has led to a large literature on heuristics that aim to approximate the curve efficiently.

In this paper, we show that the poor data locality of past algorithms can be avoided. We introduce a new algorithm, called Increment-and-Freeze, for computing exact LRU hit-rate curves. The algorithm achieves RAM-model complexity O(n log n), external-memory complexity O(n over B log n), and parallelism Θ(log n). We also present two theoretical extensions of Increment-and-Freeze, one that achieves SORT complexity in the external-memory model, and one that achieves a parallel span of O(log2 n) which is near linear parallelism, while maintaining work efficiency.

We implement Increment-and-Freeze and obtain a speedup of up to 9x over the classical augmented-tree algorithm on a single processor. On 16 threads, the speedup becomes as large as 60x. In comparison to the previous state-of-the-art parallel algorithm, Increment-and-Freeze achieves a speedup of up to 10x when both algorithms use the same number of threads.

References

  1. Alok Aggarwal and S Vitter, Jeffrey. 1988. The input/output complexity of sorting and related problems. Commun. ACM, Vol. 31, 9 (1988), 1116--1127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. George Almási, Cualin Cacscaval, and David A. Padua. 2002. Calculating stack distances efficiently. In Proceedings of the 2002 Workshop on Memory System Performance (MSP). Berlin, Germany, 37--43. https://doi.org/10.1145/773146.773043Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Laszlo A. Bélády. 1966. A study of replacement algorithms for virtual storage computers. IBM Systems Journal, Vol. 5, 2 (1966), 78--101. https://doi.org/10.1147/sj.52.0078Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Laszlo A. Bélády and Frank P. Palermo. 1974. On-line measurement of paging behavior by the multivalued MIN algorithm. IBM Journal of Research and Development, Vol. 18, 1 (Jan. 1974), 2--19. https://doi.org/10.1147/rd.181.0002Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael A. Bender, Daniel DeLayo, Bradley C. Kuszmaul, William Kuszmaul, and Evan West. 2022. Increment-and-Freeze source code. https://github.com/etwest/Increment-and-Freeze.Google ScholarGoogle Scholar
  6. B. T. Bennett and V. J. Kruskal. 1975. LRU stack processing. IBM Journal of Research and Development, Vol. 19, 4 (July 1975), 353--357. https://doi.org/10.1147/rd.194.0353Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daniel S Berger, Nathan Beckmann, and Mor Harchol-Balter. 2018. Practical bounds on optimal caching with variable object sizes. Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Vol. 2, 2 (2018), 1--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gianfranco Bilardi, Kattamuri Ekanadham, and Pratap Pattnaik. 2011. Efficient stack distance computation for priority replacement policies. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF). https://doi.org/10.1145/2016604.2016607Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gianfranco Bilardi, Kattamuri Ekanadham, and Pratap Pattnaik. 2017. Optimal on-line computation of stack distances for MIN and OPT. In Proceedings of the Computing Frontiers Conference (CF). 237--246. https://doi.org/10.1145/3075564.3075571Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guy E Blelloch. 1993. Prefix sums and their applications. In Synthesis of Parallel Algorithms,, John H Reif (Ed.). Morgan Kaufmann Publishers Inc.Google ScholarGoogle Scholar
  11. Guy E Blelloch, Phillip B Gibbons, and Harsha Vardhan Simhadri. 2010. Low depth cache-oblivious algorithms. In Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 189--199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guy E Blelloch and Bruce M Maggs. 2010. Parallel algorithms. In Algorithms and Theory of Computation Handbook: Special Topics and Techniques,, Mikhail J Atallah and Marina Blanton (Eds.). 25--25.Google ScholarGoogle Scholar
  13. Daniel Byrne. 2018. A survey of miss-ratio curve construction techniques. https://arxiv.org/pdf/1804.01972.pdfGoogle ScholarGoogle Scholar
  14. Zachary Drudi, Nicholas JA Harvey, Stephen Ingram, Andrew Warfield, and Jake Wires. 2015. Approximating hit rate curves using streaming algorithms. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  15. David Eklov and Erik Hagersten. 2010. StatStack: Efficient modeling of LRU caches. In IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). 55--65.Google ScholarGoogle ScholarCross RefCross Ref
  16. Changpeng Fang, S Can, Soner Onder, and Zhenlin Wang. 2005. Instruction based memory distance analysis and its application to optimization. In 14th International Conference on Parallel Architectures and Compilation Techniques (PACT). 27--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lulu He, Zhibin Yu, and Hai Jin. 2012. FractalMRC: online cache miss rate curve prediction on commodity systems. In IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS). 1341--1351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yul H Kim, Mark D Hill, and David A Wood. 1991. Implementing stack simulation for highly-associative memories. ACM SIGMETRICS Performance Evaluation Review, Vol. 19, 1 (1991), 212--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Charles Eric Leiserson, Ronald L Rivest, Thomas H Cormen, and Clifford Stein. 1994. Introduction to Algorithms. Vol. 3. MIT press.Google ScholarGoogle Scholar
  20. R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal, Vol. 9, 2 (1970), 78--117. https://doi.org/10.1147/sj.92.0078Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Qingpeng Niu, James Dinan, Qingda Lu, and Ponnuswamy Sadayappan. 2012. PARDA: A fast parallel reuse distance analysis algorithm. In IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS). 1284--1294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Frank Olken. 1981. Efficient Methods for Calculating the Success Function of Fixed Space Replacement Policies. Technical Report LBL-12370. Physics, Computer Science & Mathematics Division, Lawerence Berkeley Laboratory, University of California. M.S. thesis.Google ScholarGoogle Scholar
  23. Trausti Saemundsson, Hjortur Bjornsson, Gregory Chockler, and Ymir Vigfusson. 2014. Dynamic performance profiling of cloud caches. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniel D Sleator and Robert E Tarjan. 1985. Amortized efficiency of list update and paging rules. Commun. ACM, Vol. 28, 2 (1985), 202--208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rabin A. Sugumar. 1993. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Ph.,D. Dissertation. University of Michigan.Google ScholarGoogle Scholar
  26. Carl A. Waldspurger, Nohhyun Park, Alexander Garthwaite, and Irfan Ahmad. 2015. Efficient MRC construction with SHARDS. In 13th USENIX Conference on File and Storage Technologies (FAST). Santa Clara, California, USA, 95--110. https://www.usenix.org/conference/fast15/technical-sessions/presentation/waldspurgerGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  27. Carl A Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. 2017. Cache Modeling and Optimization using Miniature Simulations. In USENIX Annual Technical Conference (ATC). 487--498.Google ScholarGoogle Scholar
  28. Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas JA Harvey, and Andrew Warfield. 2014. Characterizing storage workloads with counter stacks. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 335--349.Google ScholarGoogle Scholar
  29. Jiangwei Zhang and YC Tay. 2020. PG2S: Stack distance construction using popularity, gap and machine learning. In Proceedings of The Web Conference (WWW). 973--983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lei Zhang, Reza Karimi, Irfan Ahmad, and Ymir Vigfusson. 2020. Optimal data placement for heterogeneous cache, memory, and storage systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Vol. 4, 1 (2020), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yutao Zhong, Xipeng Shen, and Chen Ding. 2009. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 31, 6 (Aug. 2009). https://doi.org/10.1145/1552309.1552310 Article 20.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Increment - and - Freeze: Every Cache, Everywhere, All of the Time

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Article Metrics

              • Downloads (Last 12 months)87
              • Downloads (Last 6 weeks)2

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader