research-article

Increment - and - Freeze: Every Cache, Everywhere, All of the Time

Authors:
Michael A. Bender

Stony Brook University, Stony Brook, NY, USA

Stony Brook University, Stony Brook, NY, USA

0000-0001-7639-530X
View Profile

,
Daniel DeLayo

Stony Brook University, Stony Brook, NY, USA

Stony Brook University, Stony Brook, NY, USA

0000-0001-7636-0107
View Profile

,
Bradley C. Kuszmaul

Independent Researcher, Cambridge, MA, USA

Independent Researcher, Cambridge, MA, USA

0000-0001-6305-4290
View Profile

,
William Kuszmaul

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA

0000-0002-3855-3036
View Profile

,
Evan West

Stony Brook University, Stony Brook, NY, USA

Stony Brook University, Stony Brook, NY, USA

0000-0002-5974-7745
View Profile

SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and ArchitecturesJune 2023Pages 129–139https://doi.org/10.1145/3558481.3591085

Published:17 June 2023Publication History

SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures

Pages 129–139

ABSTRACT

One of the most basic algorithmic problems concerning caches is to compute the LRU hit-rate curve on a given trace. Unfortunately, the known algorithms exhibit poor data locality and fail to scale to large caches. It is widely believed that the LRU hit-rate curve cannot be computed efficiently enough to be used in online production settings. This has led to a large literature on heuristics that aim to approximate the curve efficiently.

In this paper, we show that the poor data locality of past algorithms can be avoided. We introduce a new algorithm, called Increment-and-Freeze, for computing exact LRU hit-rate curves. The algorithm achieves RAM-model complexity O(n log n), external-memory complexity O(n over B log n), and parallelism Θ(log n). We also present two theoretical extensions of Increment-and-Freeze, one that achieves SORT complexity in the external-memory model, and one that achieves a parallel span of O(log2 n) which is near linear parallelism, while maintaining work efficiency.

We implement Increment-and-Freeze and obtain a speedup of up to 9x over the classical augmented-tree algorithm on a single processor. On 16 threads, the speedup becomes as large as 60x. In comparison to the previous state-of-the-art parallel algorithm, Increment-and-Freeze achieves a speedup of up to 10x when both algorithms use the same number of threads.

References

Alok Aggarwal and S Vitter, Jeffrey. 1988. The input/output complexity of sorting and related problems. Commun. ACM, Vol. 31, 9 (1988), 1116--1127.Google ScholarDigital Library
George Almási, Cualin Cacscaval, and David A. Padua. 2002. Calculating stack distances efficiently. In Proceedings of the 2002 Workshop on Memory System Performance (MSP). Berlin, Germany, 37--43. https://doi.org/10.1145/773146.773043Google ScholarDigital Library
Laszlo A. Bélády. 1966. A study of replacement algorithms for virtual storage computers. IBM Systems Journal, Vol. 5, 2 (1966), 78--101. https://doi.org/10.1147/sj.52.0078Google ScholarDigital Library
Laszlo A. Bélády and Frank P. Palermo. 1974. On-line measurement of paging behavior by the multivalued MIN algorithm. IBM Journal of Research and Development, Vol. 18, 1 (Jan. 1974), 2--19. https://doi.org/10.1147/rd.181.0002Google ScholarDigital Library
Michael A. Bender, Daniel DeLayo, Bradley C. Kuszmaul, William Kuszmaul, and Evan West. 2022. Increment-and-Freeze source code. https://github.com/etwest/Increment-and-Freeze.Google Scholar
B. T. Bennett and V. J. Kruskal. 1975. LRU stack processing. IBM Journal of Research and Development, Vol. 19, 4 (July 1975), 353--357. https://doi.org/10.1147/rd.194.0353Google ScholarDigital Library
Daniel S Berger, Nathan Beckmann, and Mor Harchol-Balter. 2018. Practical bounds on optimal caching with variable object sizes. Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Vol. 2, 2 (2018), 1--38.Google ScholarDigital Library
Gianfranco Bilardi, Kattamuri Ekanadham, and Pratap Pattnaik. 2011. Efficient stack distance computation for priority replacement policies. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF). https://doi.org/10.1145/2016604.2016607Google ScholarDigital Library
Gianfranco Bilardi, Kattamuri Ekanadham, and Pratap Pattnaik. 2017. Optimal on-line computation of stack distances for MIN and OPT. In Proceedings of the Computing Frontiers Conference (CF). 237--246. https://doi.org/10.1145/3075564.3075571Google ScholarDigital Library
Guy E Blelloch. 1993. Prefix sums and their applications. In Synthesis of Parallel Algorithms,, John H Reif (Ed.). Morgan Kaufmann Publishers Inc.Google Scholar
Guy E Blelloch, Phillip B Gibbons, and Harsha Vardhan Simhadri. 2010. Low depth cache-oblivious algorithms. In Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 189--199.Google ScholarDigital Library
Guy E Blelloch and Bruce M Maggs. 2010. Parallel algorithms. In Algorithms and Theory of Computation Handbook: Special Topics and Techniques,, Mikhail J Atallah and Marina Blanton (Eds.). 25--25.Google Scholar
Daniel Byrne. 2018. A survey of miss-ratio curve construction techniques. https://arxiv.org/pdf/1804.01972.pdfGoogle Scholar
Zachary Drudi, Nicholas JA Harvey, Stephen Ingram, Andrew Warfield, and Jake Wires. 2015. Approximating hit rate curves using streaming algorithms. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
David Eklov and Erik Hagersten. 2010. StatStack: Efficient modeling of LRU caches. In IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). 55--65.Google ScholarCross Ref
Changpeng Fang, S Can, Soner Onder, and Zhenlin Wang. 2005. Instruction based memory distance analysis and its application to optimization. In 14th International Conference on Parallel Architectures and Compilation Techniques (PACT). 27--37.Google ScholarDigital Library
Lulu He, Zhibin Yu, and Hai Jin. 2012. FractalMRC: online cache miss rate curve prediction on commodity systems. In IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS). 1341--1351.Google ScholarDigital Library
Yul H Kim, Mark D Hill, and David A Wood. 1991. Implementing stack simulation for highly-associative memories. ACM SIGMETRICS Performance Evaluation Review, Vol. 19, 1 (1991), 212--213.Google ScholarDigital Library
Charles Eric Leiserson, Ronald L Rivest, Thomas H Cormen, and Clifford Stein. 1994. Introduction to Algorithms. Vol. 3. MIT press.Google Scholar
R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal, Vol. 9, 2 (1970), 78--117. https://doi.org/10.1147/sj.92.0078Google ScholarDigital Library
Qingpeng Niu, James Dinan, Qingda Lu, and Ponnuswamy Sadayappan. 2012. PARDA: A fast parallel reuse distance analysis algorithm. In IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS). 1284--1294.Google ScholarDigital Library
Frank Olken. 1981. Efficient Methods for Calculating the Success Function of Fixed Space Replacement Policies. Technical Report LBL-12370. Physics, Computer Science & Mathematics Division, Lawerence Berkeley Laboratory, University of California. M.S. thesis.Google Scholar
Trausti Saemundsson, Hjortur Bjornsson, Gregory Chockler, and Ymir Vigfusson. 2014. Dynamic performance profiling of cloud caches. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). 1--14.Google ScholarDigital Library
Daniel D Sleator and Robert E Tarjan. 1985. Amortized efficiency of list update and paging rules. Commun. ACM, Vol. 28, 2 (1985), 202--208.Google ScholarDigital Library
Rabin A. Sugumar. 1993. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Ph.,D. Dissertation. University of Michigan.Google Scholar
Carl A. Waldspurger, Nohhyun Park, Alexander Garthwaite, and Irfan Ahmad. 2015. Efficient MRC construction with SHARDS. In 13th USENIX Conference on File and Storage Technologies (FAST). Santa Clara, California, USA, 95--110. https://www.usenix.org/conference/fast15/technical-sessions/presentation/waldspurgerGoogle ScholarDigital Library
Carl A Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. 2017. Cache Modeling and Optimization using Miniature Simulations. In USENIX Annual Technical Conference (ATC). 487--498.Google Scholar
Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas JA Harvey, and Andrew Warfield. 2014. Characterizing storage workloads with counter stacks. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 335--349.Google Scholar
Jiangwei Zhang and YC Tay. 2020. PG2S: Stack distance construction using popularity, gap and machine learning. In Proceedings of The Web Conference (WWW). 973--983.Google ScholarDigital Library
Lei Zhang, Reza Karimi, Irfan Ahmad, and Ymir Vigfusson. 2020. Optimal data placement for heterogeneous cache, memory, and storage systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Vol. 4, 1 (2020), 1--27.Google ScholarDigital Library
Yutao Zhong, Xipeng Shen, and Chen Ding. 2009. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 31, 6 (Aug. 2009). https://doi.org/10.1145/1552309.1552310 Article 20.Google ScholarDigital Library

Index Terms

Increment - and - Freeze: Every Cache, Everywhere, All of the Time
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Modeling LRU cache with invalidation

Least Recently Used (LRU) is a very popular caching replacement policy. It is very easy to implement and offers good performance, especially when data requests are temporally correlated, as in the case of web traffic.When the data content can change ...
Read More
Dynamic Performance Profiling of Cloud Caches
SOCC '14: Proceedings of the ACM Symposium on Cloud Computing

Large-scale in-memory object caches such as memcached are widely used to accelerate popular web sites and to reduce burden on backend databases. Yet current cache systems give cache operators limited information on what resources are required to ...
Read More
Reuse-based online models for caches
SIGMETRICS '13: Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems

We develop a reuse distance/stack distance based analytical modeling framework for efficient, online prediction of cache performance for a range of cache configurations and replacement policies LRU, PLRU, RANDOM, NMRU. Our framework unifies existing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures
June 2023
504 pages
ISBN:9781450395458
DOI:10.1145/3558481
General Chair:
Kunal Agrawal
Washington University in St. Louis, USA
,
Program Chair:
Julian Shun
MIT, USA
Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
caching
divide-and-conquer
external-memory
hit-rate curves
io-optimal
lru
miss-ratio curves
parallelism
reuse distance
stack distance
success function
working set
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate447of1,461submissions,31%
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 87
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Increment - and - Freeze: Every Cache, Everywhere, All of the Time

SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling LRU cache with invalidation

Dynamic Performance Profiling of Cloud Caches

Reuse-based online models for caches