skip to main content
10.1145/3503222.3507731acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

TMO: transparent memory offloading in datacenters

Published: 22 February 2022 Publication History

Abstract

The unrelenting growth of the memory needs of emerging datacenter applications, along with ever increasing cost and volatility of DRAM prices, has led to DRAM being a major infrastructure expense. Alternative technologies, such as NVMe SSDs and upcoming NVM devices, offer higher capacity than DRAM at a fraction of the cost and power. One promising approach is to transparently offload colder memory to cheaper memory technologies via kernel or hypervisor techniques. The key challenge, however, is to develop a datacenter-scale solution that is robust in dealing with diverse workloads and large performance variance of different offload devices such as compressed memory, SSD, and NVM. This paper presents TMO, Meta’s transparent memory offloading solution for heterogeneous datacenter environments. TMO introduces a new Linux kernel mechanism that directly measures in realtime the lost work due to resource shortage across CPU, memory, and I/O. Guided by this information and without any prior application knowledge, TMO automatically adjusts how much memory to offload to heterogeneous devices (e.g., compressed memory or SSD) according to the device’s performance characteristics and the application’s sensitivity to memory-access slowdown. TMO holistically identifies offloading opportunities from not only the application containers but also the sidecar containers that provide infrastructure-level functions. To maximize memory savings, TMO targets both anonymous memory and file cache, and balances the swap-in rate of anonymous memory and the reload rate of file pages that were recently evicted from the file cache. TMO has been running in production for more than a year, and has saved between 20-32% of the total memory across millions of servers in our large datacenter fleet. We have successfully upstreamed TMO into the Linux kernel.

References

[1]
Neha Agarwal and Thomas F Wenisch. 2017. Thermostat: Application-transparent page management for two-tiered main memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. 631–644.
[2]
Marcos K Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. 2017. Remote memory in the age of fast networks. In Proceedings of the 2017 Symposium on Cloud Computing. 121–127.
[3]
Hasan Al Maruf and Mosharaf Chowdhury. 2020. Effectively prefetching remote memory with leap. In 2020 $USENIX$ Annual Technical Conference ($USENIX$$ATC$ 20). 843–857.
[4]
Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. 2020. Can far memory improve job throughput? In Proceedings of the Fifteenth European Conference on Computer Systems. 1–16.
[5]
Thomas E. Anderson, Marco Canini, Jongyul Kim, Dejan Kostić, Youngjin Kwon, Simon Peter, Waleed Reda, Henry N. Schuh, and Emmett Witchel. 2020. Assise: Performance and Availability via Client-local NVM in a Distributed File System. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 1011–1027. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/anderson
[6]
Benjamin Berg, Daniel S. Berger, Sara McAllister, Isaac Grosof, Sathya Gunasekar, Jimmy Lu, Michael Uhlar, Jim Carrig, Nathan Beckmann, Mor Harchol-Balter, and Gregory R. Ganger. 2020. The CacheLib Caching Engine: Design and Experiences at Scale. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation.
[7]
Kristof Beyls and Erik D’Hollander. 2001. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and systems. 14, 350–360.
[8]
Danny Cobb and Amber Huffman. 2012. Nvm express and the pci express ssd revolution. In Intel Developer Forum. 2012.
[9]
Compute Express Link. [n. d.]. https://www.computeexpresslink.org/
[10]
Vladimir Davydov. 2015. Idle memory tracking. https://lwn.net/Articles/639341/ Accessed: 2021-07-19
[11]
Jaeyoung Do, Sudipta Sengupta, and Steven Swanson. 2019. Programmable Solid-State Storage in Future Cloud Datacenters. Commun. ACM, 62, 6 (2019), may, 54–62. issn:0001-0782 https://doi.org/10.1145/3286588
[12]
Subramanya R Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016. Data tiering in heterogeneous memory systems. In Proceedings of the Eleventh European Conference on Computer Systems. 1–16.
[13]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. 2017. Efficient memory disaggregation with infiniswap. In 14th $USENIX$ Symposium on Networked Systems Design and Implementation ($NSDI$ 17). 649–667.
[14]
Intel. [n. d.]. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html
[15]
Rohan Kadekodi, Saurabh Kadekodi, Soujanya Ponnapalli, Harshad Shirwadkar, Gregory R. Ganger, Aasheesh Kolli, and Vijay Chidambaram. 2021. WineFS: A Hugepage-Aware File System for Persistent Memory That Ages Gracefully. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21). Association for Computing Machinery, New York, NY, USA. 804–818. isbn:9781450387095 https://doi.org/10.1145/3477132.3483567
[16]
Sudarsun Kannan, Yujie Ren, and Abhishek Bhattacharjee. 2021. KLOCs: Kernel-Level Object Contexts for Heterogeneous Memory Systems. Association for Computing Machinery, New York, NY, USA. 65–78. isbn:9781450383172 https://doi.org/10.1145/3445814.3446745
[17]
Emre Kültürsay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 256–267.
[18]
Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. 2019. Software-Defined Far Memory in Warehouse-Scale Computers. ASPLOS ’19. 14 pages.
[19]
Seok-Hee Lee. 2016. Technology scaling challenges and opportunities of memory devices. In 2016 IEEE International Electron Devices Meeting (IEDM). 1–1.
[20]
Michel Lespinasse. 2011. Idle page tracking / working set estimation. https://lwn.net/Articles/460762/ Accessed: 2021-07-19
[21]
Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, and Onur Mutlu. 2017. Utility-based hybrid memory management. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 152–165.
[22]
Shuang Liang, Ranjit Noronha, and Dhabaleswar K Panda. 2005. Swapping to remote memory over infiniband: An approach using a high performance network block device. In 2005 IEEE International Conference on Cluster Computing. 1–10.
[23]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K Reinhardt, and Thomas F Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. ACM SIGARCH computer architecture news, 37, 3 (2009), 267–278.
[24]
Tz-Yi Liu, Tian Hong Yan, Roy Scheuerlein, Yingchang Chen, Jeffrey KoonYee Lee, Gopinath Balakrishnan, Gordon Yee, Henry Zhang, Alex Yap, Jingwen Ouyang, Takahiko Sasaki, Sravanti Addepalli, Ali Al-Shamma, Chin-Yu Chen, Mayank Gupta, Greg Hilton, Saurabh Joshi, Achal Kathuria, Vincent Lai, Deep Masiwal, Masahide Matsumoto, Anurag Nigam, Anil Pai, Jayesh Pakhale, Chang Hua Siau, Xiaoxia Wu, Ronald Yin, Liping Peng, Jang Yong Kang, Sharon Huynh, Huijuan Wang, Nicolas Nagel, Yoichiro Tanaka, Masaaki Higashitani, Tim Minvielle, Chandu Gorla, Takayuki Tsukamoto, Takeshi Yamaguchi, Mutsumi Okajima, Takayuki Okamura, Satoru Takase, Takahiko Hara, Hirofumi Inoue, Luca Fasoli, Mehrdad Mofidi, Ritu Shrivastava, and Khandker Quader. 2013. A 130.7mm2 2-layer 32Gb ReRAM memory device in 24nm technology. In 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers. 210–211. https://doi.org/10.1109/ISSCC.2013.6487703
[25]
Chris A. Mack. 2011. Fifty Years of Moore’s Law. IEEE Transactions on Semiconductor Manufacturing, 24, 2 (2011), 202–207. https://doi.org/10.1109/TSM.2010.2096437
[26]
Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, Sathya Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, and Gregory R. Ganger. 2021. Kangaroo: Caching Billions of Tiny Objects on Flash. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21). Association for Computing Machinery, New York, NY, USA. 243–262. isbn:9781450387095 https://doi.org/10.1145/3477132.3483568
[27]
Ian Neal, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon Peter, and Baris Kasikci. 2021. Rethinking File Mapping for Persistent Memory. In 19th USENIX Conference on File and Storage Technologies (FAST 21). USENIX Association, 97–111. isbn:978-1-939133-20-5 https://www.usenix.org/conference/fast21/presentation/neal
[28]
SeongJae Park, Yunjae Lee, and Heon Y Yeom. 2019. Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality. In Proceedings of the 20th International Middleware Conference Industrial Track. 1–7.
[29]
Persistent Memory Development Kit. [n. d.]. https://pmem.io/pmdk/
[30]
Moinuddin K Qureshi, Vijayalakshmi Srinivasan, and Jude A Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th annual international symposium on Computer architecture. 24–33.
[31]
Luiz E Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the international conference on Supercomputing. 85–95.
[32]
Amanda Raybuck, Tim Stamler, Wei Zhang, Mattan Erez, and Simon Peter. 2021. HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21). Association for Computing Machinery, New York, NY, USA. 392–407. isbn:9781450387095 https://doi.org/10.1145/3477132.3483550
[33]
Thomas Shull, Jian Huang, and Josep Torrellas. 2019. AutoPersist: An Easy-to-Use Java NVM Framework Based on Reachability. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 316–332. isbn:9781450367127 https://doi.org/10.1145/3314221.3314608
[34]
Zixuan Wang, Xiao Liu, Jian Yang, Theodore Michailidis, Steven Swanson, and Jishen Zhao. 2020. Characterizing and Modeling Non-Volatile Memory Systems. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 496–508. https://doi.org/10.1109/MICRO50266.2020.00049
[35]
Jian Xu and Steven Swanson. 2016. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA. 323–338. isbn:978-1-931971-28-7 https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu
[36]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. 2019. Nimble page management for tiered memory systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 331–345.
[37]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA. 169–182. isbn:978-1-939133-12-0 https://www.usenix.org/conference/fast20/presentation/yang
[38]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steven Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20). USENIX Association, USA. 169–182. isbn:9781939133120
[39]
Wangyuan Zhang and Tao Li. 2009. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In 2009 18th International Conference on Parallel Architectures and Compilation Techniques. 101–112.
[40]
Shengan Zheng, Morteza Hoseinzadeh, and Steven Swanson. 2019. Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks. In 17th USENIX Conference on File and Storage Technologies (FAST 19). USENIX Association, Boston, MA. 207–219. isbn:978-1-939133-09-0 https://www.usenix.org/conference/fast19/presentation/zheng
[41]
Pin Zhou, Vivek Pandey, Jagadeesan Sundaresan, Anand Raghuraman, Yuanyuan Zhou, and Sanjeev Kumar. 2004. Dynamic tracking of page miss ratio curve for memory management. ACM SIGPLAN Notices, 39, 11 (2004), 177–188.
[42]
Zstandard. [n. d.]. https://en.wikipedia.org/wiki/Zstandard
[43]
zswap. [n. d.]. https://www.kernel.org/doc/html/latest/vm/zswap.html

Cited By

View all
  • (2025)Meta’s Hyperscale Infrastructure: Overview and InsightsCommunications of the ACM10.1145/370129668:2(52-63)Online publication date: 21-Jan-2025
  • (2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 3-Feb-2025
  • (2024)TelescopeProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692018(409-424)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
February 2022
1164 pages
ISBN:9781450392051
DOI:10.1145/3503222
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2022

Check for updates

Author Tags

  1. Datacenters
  2. Memory Management
  3. Non-volatile Memory
  4. Operating Systems

Qualifiers

  • Research-article

Conference

ASPLOS '22

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)615
  • Downloads (Last 6 weeks)74
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Meta’s Hyperscale Infrastructure: Overview and InsightsCommunications of the ACM10.1145/370129668:2(52-63)Online publication date: 21-Jan-2025
  • (2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 3-Feb-2025
  • (2024)TelescopeProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692018(409-424)Online publication date: 10-Jul-2024
  • (2024)EXTMEMProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692017(397-408)Online publication date: 10-Jul-2024
  • (2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
  • (2024)Managing memory tiers with CXL in virtualized environmentsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691941(37-56)Online publication date: 10-Jul-2024
  • (2024)SymbiosisProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650701(51-70)Online publication date: 27-Feb-2024
  • (2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
  • (2024)Efficiently Enlarging RDMA-Attached Memory with SSDACM Transactions on Storage10.1145/370077221:2(1-27)Online publication date: 21-Oct-2024
  • (2024)TeraHeap: Exploiting Flash Storage for Mitigating DRAM Pressure in Managed Big Data FrameworksACM Transactions on Programming Languages and Systems10.1145/370059346:4(1-37)Online publication date: 15-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media