skip to main content
research-article

Optimal Data Placement for Heterogeneous Cache, Memory, and Storage Systems

Published: 27 May 2020 Publication History

Abstract

New memory technologies are blurring the previously distinctive performance characteristics of adjacent layers in the memory hierarchy. No longer are such layers orders of magnitude different in request latency or capacity. Beyond the traditional single-layer view of caching, we now must re-cast the problem as a data placement challenge: which data should be cached in faster memory if it could instead be served directly from slower memory? We present CHOPT, an offline algorithm for data placement across multiple tiers of memory with asymmetric read and write costs. We show that CHOPT is optimal and can therefore serve as the upper bound of performance gain for any data placement algorithm. We also demonstrate an approximation of CHOPT which makes its execution time for long traces practical using spatial sampling of requests incurring a small 0.2% average error on representative workloads at a sampling ratio of 1%. Our evaluation of CHOPT on more than 30 production traces and benchmarks shows that optimal data placement decisions could improve average request latency by 8.2%-44.8% when compared with the long-established gold standard: Belady and Mattson's offline, evict-farthest-in-the-future optimal algorithms. Our results identify substantial improvement opportunities for future online memory management research.

References

[1]
Neha Agarwal and Thomas F Wenisch. 2017. Thermostat: Application-transparent page management for two-tiered main memory. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 631--644.
[2]
Susanne Albers, Sanjeev Arora, and Sanjeev Khanna. 1999. Page replacement for general caching problems. In SODA, Vol. 99. Citeseer, 31--40.
[3]
Qasim Ali and Praveen Yedlapalli. 2019. Persistent Memory Performance in vSphere 6.7. (2019).
[4]
Mohamed Arafa, Bahaa Fahim, Sailesh Kottapalli, Akhilesh Kumar, Lily P Looi, Sreenivas Mandava, Andy Rudoff, Ian M Steiner, Bob Valentine, Geetha Vedaraman, et almbox. 2019. Cascade Lake: Next generation Intel Xeon scalable processor. IEEE Micro, Vol. 39, 2 (2019), 29--36.
[5]
Amotz Bar-Noy, Reuven Bar-Yehuda, Ari Freund, Joseph Naor, and Baruch Schieber. 2001. A unified approach to approximating resource allocation and scheduling. Journal of the ACM (JACM), Vol. 48, 5 (2001), 1069--1090.
[6]
Nathan Beckmann, Haoxian Chen, and Asaf Cidon. 2018. LHD: Improving Cache Hit Rate by Maximizing Hit Density. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 389--403.
[7]
Nathan Beckmann and Daniel Sanchez. 2015. Talus: A simple way to remove cliffs in cache performance. In 21st IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 64--75.
[8]
Nathan Beckmann and Daniel Sanchez. 2017. Maximizing cache performance under uncertainty. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 109--120.
[9]
Laszlo A. Belady. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Systems journal, Vol. 5, 2 (1966), 78--101.
[10]
Daniel S Berger. 2018a. Design and Analysis of Adaptive Caching Techniques for Internet Content Delivery. (2018).
[11]
Daniel S Berger. 2018b. Towards Lightweight and Robust Machine Learning for CDN Caching. In HotNets . 134--140.
[12]
Daniel S Berger, Nathan Beckmann, and Mor Harchol-Balter. 2018. Practical bounds on optimal caching with variable object sizes. Proceedings of the ACM Measurement and Analysis of Computing Systems, Vol. 2, 2 (2018), 32.
[13]
Daniel S Berger, Ramesh K Sitaraman, and Mor Harchol-Balter. 2017. AdaptSize: Orchestrating the hot object memory cache in a content delivery network. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) . 483--498.
[14]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In 17th International Conference on Parallel Architectures and Compilation Techniques (PACT 08) . ACM, 72--81.
[15]
Daniel Byrne, Nilufer Onder, and Zhenlin Wang. 2018. mPart: miss-ratio curve guided partitioning in key-value stores. In ACM SIGPLAN Notices, Vol. 53. ACM, 84--95.
[16]
Martin C Carlisle and Errol L Lloyd. 1991. On the k-coloring of intervals. In International Conference on Computing and Information. Springer, 90--101.
[17]
Yue Cheng, Fred Douglis, Philip Shilane, Grant Wallace, Peter Desnoyers, and Kai Li. 2016. Erasing Belady's limitations: In search of flash cache offline optimality. In USENIX Annual Technical Conference (ATC 16). 379--392.
[18]
Sangyeun Cho and Hyunjin Lee. 2009. Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 347--357. https://doi.org/10.1145/1669112.1669157
[19]
Gil Einziger, Ohad Eytan, Roy Friedman, and Ben Manes. 2018. Adaptive software cache management. In 19th International Middleware Conference (MIDDLEWARE 18). ACM, 94--106.
[20]
Gil Einziger, Roy Friedman, and Ben Manes. 2017. TinyLFU: A highly efficient cache admission policy. ACM Transactions on Storage (ToS), Vol. 13, 4 (2017), 35.
[21]
Assaf Eisenman, Asaf Cidon, Evgenya Pergament, Or Haimovich, Ryan Stutsman, Mohammad Alizadeh, and Sachin Katti. 2019. Flashield: a Hybrid Key-value Cache that Controls Flash Write Amplification. In NSDI . 65--78.
[22]
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018a. Reducing DRAM footprint with NVM in Facebook. In 13th EuroSys Conference. ACM, 42.
[23]
Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, and Sachin Katti. 2018b. Bandana: Using non-volatile memory for storing deep learning models. arXiv preprint arXiv:1811.05922 (2018).
[24]
Martin Farach-Colton and Vincenzo Liberatore. 2000. On local register allocation. Journal of Algorithms, Vol. 37, 1 (2000), 37--65.
[25]
Brad Fitzpatrick. 2009. Memcached . http://memcached.org Retrieved Aug 7 2019 from
[26]
Jayesh Gaur, Mainak Chaudhuri, and Sreenivas Subramoney. 2011. Bypass and insertion algorithms for exclusive last-level caches. In ACM SIGARCH Computer Architecture News, Vol. 39. ACM, 81--92.
[27]
Binny S Gill. 2008. On multi-level exclusive caching: offline optimality and why promotions are better than demotions. In Proceedings of the 6th USENIX Conference on File and Storage Technologies. USENIX Association, 4.
[28]
Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert NM Watson, and Steven Hand. 2016. Firmament: fast, centralized cluster scheduling at scale. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 99--115.
[29]
Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Chen Ding, and Zhenlin Wang. 2016. Kinetic modeling of data eviction in cache. In 2016 USENIX Annual Technical Conference (ATC 16) . 351--364.
[30]
Qi Huang, Ken Birman, Robbert Van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C Li. 2013. An analysis of Facebook photo caching. In 24th ACM Symposium on Operating Systems Principles (SOSP 13). ACM, 167--181.
[31]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et almbox. 2019. Basic performance measurements of the Intel Optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).
[32]
Akanksha Jain and Calvin Lin. 2016. Back to the future: leveraging Belady's algorithm for improved cache replacement. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 78--89.
[33]
Akanksha Jain and Calvin Lin. 2018. Rethinking belady's algorithm to accommodate prefetching. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 110--123.
[34]
Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. HeteroOS -- OS design for heterogeneous memory management in datacenter. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 521--534.
[35]
Richard E. Kessler, Mark D Hill, and David A Wood. 1994. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. Comput., Vol. 43, 6 (1994), 664--675.
[36]
Kunal Korgaonkar, Ishwar Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Steven Swanson, Ian Young, and Hong Wang. 2018. Density tradeoffs of non-volatile memory as a replacement for SRAM based last level cache. In 45th Annual International Symposium on Computer Architecture (ISCA 18). IEEE Press, 315--327.
[37]
Kornilios Kourtis, Nikolas Ioannou, and Ioannis Koltsidas. 2019. Reaping the performance of fast NVM storage with uDepot. In 17th USENIX Conference on File and Storage Technologies (FAST 19). 1--15.
[38]
Pengcheng Li, Colin Pronovost, William Wilson, Benjamin Tait, Jie Zhou, Chen Ding, and John Criswell. 2019. Beating OPT with Statistical Clairvoyance and Variable Size Caching. In 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 19). ACM, 243--256.
[39]
Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, and Onur Mutlu. 2017. Utility-based hybrid memory management. In 2017 IEEE International Conference on Cluster Computing (CLUSTER 17). IEEE, 152--165.
[40]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. In ACM SIGPLAN Notices, Vol. 40. ACM, 190--200.
[41]
Jasmina Malicevic, Subramanya Dulloor, Narayanan Sundaram, Nadathur Satish, Jeff Jackson, and Willy Zwaenepoel. 2015. Exploiting NVM in large-scale graph analytics. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads. ACM, 2.
[42]
Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation techniques for storage hierarchies. IBM Systems Journal, Vol. 9, 2 (1970), 78--117.
[43]
Nimrod Megiddo and Dharmendra S Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. In FAST, Vol. 3. 115--130.
[44]
Pierre Michaud. 2016. Some mathematical facts about optimal cache replacement. ACM Transactions on Architecture and Code Optimization, Vol. 13, 4 (2016).
[45]
Sparsh Mittal. 2016. A survey of cache bypassing techniques. Journal of Low Power Electronics and Applications, Vol. 6, 2 (2016), 5.
[46]
Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the Graph 500 . Cray Users Group (CUG), Vol. 19 (2010), 45--74.
[47]
Gotze Philipp, Baumann Stephan, and Sattler Kai-Uwe. 2018. An NVM-aware storage layout for analytical workloads. In 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW). IEEE, 110--115.
[48]
Hanfeng Qin and Hai Jin. 2017. Warstack: Improving LLC Replacement for NVM with a Writeback-Aware Reuse Stack. In 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). IEEE, 233--236.
[49]
Moinuddin K Qureshi, Michele M Franceschini, and Luis A Lastras-Montano. 2010. Improving read performance of phase change memories via write cancellation and write pausing. In HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. IEEE, 1--11.
[50]
Moinuddin K Qureshi, Aamer Jaleel, Yale N Patt, Simon C Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, Vol. 35, 2 (2007), 381--391.
[51]
Moinuddin K Qureshi and Yale N Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). IEEE, 423--432.
[52]
Frederic Sala, Ryan Gabrys, and Lara Dolecek. 2013. Dynamic threshold schemes for multi-level non-volatile memories. IEEE Transactions on Communications, Vol. 61, 7 (2013), 2624--2634.
[53]
Stefan Saroiu, Krishna P Gummadi, Richard J Dunn, Steven D Gribble, and Henry M Levy. 2002. An analysis of internet content delivery systems. ACM SIGOPS Operating Systems Review, Vol. 36, SI (2002), 315--327.
[54]
Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. 2019. Applying Deep Learning to the Cache Replacement Problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 413--425.
[55]
Steven Swanson. 2019. Redesigning File Systems for Nonvolatile Main Memory. IEEE Micro, Vol. 39, 1 (2019), 62--64.
[56]
Carl Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. 2017. Cache modeling and optimization using miniature simulations. In USENIX Annual Technical Conference (ATC 17). 487--498.
[57]
Carl A Waldspurger, Nohhyun Park, Alexander Garthwaite, and Irfan Ahmad. 2015. Efficient MRC Construction with SHARDS. In 13th USENIX Conference on File and Storage Technologies (FAST 15). 95--110.
[58]
Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie, and Daniel A Jiménez. 2013. WADE: Writeback-aware dynamic cache management for NVM-based main memory system. ACM Transactions on Architecture and Code Optimization (TACO), Vol. 10, 4 (2013), 51.
[59]
Kevin D Wayne. 2002. A polynomial combinatorial algorithm for generalized minimum cost flow. Mathematics of Operations Research, Vol. 27, 3 (2002), 445--459.
[60]
Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas JA Harvey, and Andrew Warfield. 2014. Characterizing storage workloads with counter stacks. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 335--349.
[61]
Theodore M Wong and John Wilkes. 2002. My Cache Or Yours?: Making Storage More Exclusive. In USENIX Annual Technical Conference, General Track. 161--175.
[62]
Fengguang Wu. 2018. PMEM NUMA node and hotness accounting/migration . In Linux Kernel Mailing List Archive. https://lkml.org/lkml/2018/12/26/138, Last accessed on 08-08--2019.
[63]
Jianhui Yue and Yifeng Zhu. 2013. Accelerating Write by Exploiting PCM Asymmetries. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (HPCA '13). IEEE Computer Society, Washington, DC, USA, 282--293. https://doi.org/10.1109/HPCA.2013.6522326
[64]
Yingjie Zhao, Nong Xiao, and Fang Liu. 2010. Red: An efficient replacement algorithm based on REsident distance for exclusive storage caches. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 1--6.
[65]
Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, and Yuncheng Guo. 2018. Improving the performance and endurance of encrypted non-volatile main memory through deduplicating writes. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 442--454.

Cited By

View all
  • (2025)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/366592657:6(1-38)Online publication date: 10-Feb-2025
  • (2025)PARL: Page Allocation in hybrid main memory using Reinforcement LearningJournal of Systems Architecture10.1016/j.sysarc.2024.103310159(103310)Online publication date: Feb-2025
  • (2025)QM-ARC: QoS-aware Multi-tier Adaptive Cache Replacement StrategyFuture Generation Computer Systems10.1016/j.future.2024.107548163(107548)Online publication date: Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 4, Issue 1
SIGMETRICS
March 2020
467 pages
EISSN:2476-1249
DOI:10.1145/3402934
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2020
Online AM: 07 May 2020
Published in POMACS Volume 4, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cost-aware cache replacement
  2. data placement
  3. memory hierarchy
  4. non-volatile memory
  5. offline optimal analysis
  6. spatial sampling

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)205
  • Downloads (Last 6 weeks)20
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/366592657:6(1-38)Online publication date: 10-Feb-2025
  • (2025)PARL: Page Allocation in hybrid main memory using Reinforcement LearningJournal of Systems Architecture10.1016/j.sysarc.2024.103310159(103310)Online publication date: Feb-2025
  • (2025)QM-ARC: QoS-aware Multi-tier Adaptive Cache Replacement StrategyFuture Generation Computer Systems10.1016/j.future.2024.107548163(107548)Online publication date: Feb-2025
  • (2024)BERT4Cache: a bidirectional encoder representations for data prefetching in cachePeerJ Computer Science10.7717/peerj-cs.225810(e2258)Online publication date: 29-Aug-2024
  • (2024)To Cache or Not to CacheAlgorithms10.3390/a1707030117:7(301)Online publication date: 7-Jul-2024
  • (2024)ScaleOPT: A Scalable Optimal Page Replacement Policy SimulatorProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004268:3(1-25)Online publication date: 13-Dec-2024
  • (2024)Performance Models for Task-based Scheduling with Disruptive Memory TechnologiesProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699376(1-8)Online publication date: 3-Nov-2024
  • (2024)Form-From: A Design Space of Social Media SystemsProceedings of the ACM on Human-Computer Interaction10.1145/36410068:CSCW1(1-47)Online publication date: 26-Apr-2024
  • (2023)Contextual Linear Types for Differential PrivacyACM Transactions on Programming Languages and Systems10.1145/358920745:2(1-69)Online publication date: 17-May-2023
  • (2023)Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the EdgeProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612158(9012-9021)Online publication date: 26-Oct-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media