skip to main content
10.1145/3593856.3595902acmconferencesArticle/Chapter ViewAbstractPublication PageshotosConference Proceedingsconference-collections
research-article
Public Access

Towards Increased Datacenter Efficiency with Soft Memory

Published: 22 June 2023 Publication History

Abstract

Memory is the bottleneck resource in today's datacenters because it is inflexible: low-priority processes are routinely killed to free up resources during memory pressure. This wastes CPU cycles upon re-running killed jobs and incentivizes datacenter operators to run at low memory utilization for safety. This paper introduces soft memory, a software-level abstraction on top of standard primary storage that, under memory pressure, makes memory revocable for re-allocation elsewhere. We prototype soft memory with the Redis key-value store, and find that it has low overhead.

References

[1]
George Amvrosiadis, Jun Woo Park, Gregory R. Ganger, Garth A. Gibson, Elisabeth Baseman, and Nathan De-Bardeleben. "On the Diversity of Cluster Workloads and Its Impact on Research Results". In: Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference. Boston, Massachusetts, USA, 2018, pages 533--546.
[2]
Hans-Juergen Boehm. "Space Efficient Conservative Garbage Collection". In: SIGPLAN Notices 28.6 (June 1993), pages 197--206.
[3]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. "Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms". In: Proceedings of the 26th Symposium on Operating Systems Principles (SOSP). Shanghai, China, 2017, pages 153--167.
[4]
Christina Delimitrou and Christos Kozyrakis. "Quasar: Resource-Efficient and QoS-Aware Cluster Management". In: SIGPLAN Notices 49.4 (Feb. 2014), pages 127--144.
[5]
David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. "Garbage-First Garbage Collection". In: Proceedings of the 4th International Symposium on Memory Management. Vancouver, BC, Canada, 2004, pages 37--48.
[6]
Jason Evans. "A Scalable Concurrent malloc(3) Implementation for FreeBSD". In: Proceedings of the BSDCan Conference. Ottawa, Canada, 2006.
[7]
Alexander Fuerst, Stanko Novaković, Íñigo Goiri, Gohar Irfan Chaudhry, Prateek Sharma, Kapil Arya, Kevin Broas, Eugene Bak, Mehmet Iyigun, and Ricardo Bianchini. "Memory-Harvesting VMs in Cloud Platforms". In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Lausanne, Switzerland, 2022, pages 583--594.
[8]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. "Efficient Memory Disaggregation with INFINISWAP". In: Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation (NSDI). Boston, Massachusetts, USA, 2017, pages 649--667.
[9]
Google Inc. gperftools: Fast, multi-threaded malloc() and nifty performance analysis tools. url: http://code.google.com/p/gperftools/.
[10]
Michael Kuchnik, Ana Klimovic, Jiri Simsa, Virginia Smith, and George Amvrosiadis. "Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines". In: Proceedings of the 4th Conference on Machine Learning and Systems (ML-Sys). Volume 4. 2022, pages 33--51.
[11]
Abhishek Vijaya Kumar and Muthian Sivathanu. "Quiver: An Informed Storage Cache for Deep Learning". In: Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST). Santa Clara, California, USA, Feb. 2020, pages 283--296.
[12]
Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. "Software-Defined Far Memory in Warehouse-Scale Computers". In: Providence, Rhode Island, USA, 2019, pages 317--330.
[13]
Per-Åke Larson and Murali Krishnan. "Memory Allocation for Long-Running Server Applications". In: Proceedings of the 1st International Symposium on Memory Management (ISMM). Vancouver, British Columbia, Canada, 1998, pages 176--185.
[14]
Chengzhi Lu, Kejiang Ye, Guoyao Xu, Cheng-Zhong Xu, and Tongxin Bai. "Imbalance in the cloud: An analysis on Alibaba cluster trace". In: Proceedings of the 2017 IEEE International Conference on Big Data (Big Data). 2017, pages 2884--2892.
[15]
Diogenes Nunez, Samuel Z. Guyer, and Emery D. Berger. "Prioritized Garbage Collection: Explicit GC Support for Software Caches". In: SIGPLAN Notices 51.10 (Oct. 2016), pages 695--710.
[16]
Redis Ltd. Redis. url: https://redis.io/ (visited on 05/23/2023).
[17]
Zhenyuan Ruan, Seo Jin Park, Marcos K. Aguilera, Adam Belay, and Malte Schwarzkopf. "Nu: Achieving Microsecond-Scale Resource Fungibility with Logical Processes". In: Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI). Boston, Massachusetts, USA, Apr. 2023, pages 1409--1427.
[18]
Zhenyuan Ruan, Malte Schwarzkopf, Marcos Aguilera, and Adam Belay. "AIFM: High-Performance, Application-Integrated Far Memory". In: Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Banff, Canada, Nov. 2020, pages 315--332.
[19]
Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, and John Wilkes. "Autopilot: Workload Autoscaling at Google". In: Proceedings of the 15th European Conference on Computer Systems (EuroSys). Heraklion, Greece, Apr. 2020.
[20]
Tudor-Ioan Salomie, Gustavo Alonso, Timothy Roscoe, and Kevin Elphinstone. "Application Level Ballooning for Efficient Server Consolidation". In: Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). Prague, Czech Republic, 2013, pages 337--350.
[21]
Prateek Sharma, Ahmed Ali-Eldin, and Prashant Shenoy. "Resource Deflation: A New Approach For Transient Resource Reclamation". In: Proceedings of the 14th European Conference on Computer Systems (EuroSys). Dresden, Germany, 2019.
[22]
Muhammad Tirmazi, Adam Barker, Nan Deng, Md Ehtesam Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. "Borg: the Next Generation". In: Proceedings of the 15th European Conference on Computer Systems (EuroSys). Heraklion, Crete, Apr. 2020.
[23]
Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. "Large-scale cluster management at Google with Borg". In: Proceedings of the 10th European Conference on Computer Systems (EuroSys). Bordeaux, France, 2015.
[24]
Yongkang Zhang, Yinghao Yu, Wei Wang, Qiukai Chen, Jie Wu, Zuowei Zhang, Jiang Zhong, Tianchen Ding, Qizhen Weng, Lingyun Yang, Cheng Wang, Jian He, Guodong Yang, and Liping Zhang. "Workload Consolidation in Alibaba Clusters: The Good, the Bad, and the Ugly". In: Proceedings of the 13th Symposium on Cloud Computing (SoCC). San Francisco, California, 2022, pages 210--225.
[25]
Yang Zhou, Hassan M. G. Wassel, Sihang Liu, Jiaqi Gao, James Mickens, Minlan Yu, Chris Kennelly, Paul Turner, David E. Culler, Henry M. Levy, and Amin Vahdat. "Carbink: Fault-Tolerant Far Memory". In: Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Carlsbad, California, USA, July 2022, pages 55--71.

Cited By

View all
  • (2024)Harvesting idle memory for application-managed soft state with midasProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691894(1247-1265)Online publication date: 16-Apr-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HOTOS '23: Proceedings of the 19th Workshop on Hot Topics in Operating Systems
June 2023
247 pages
ISBN:9798400701955
DOI:10.1145/3593856
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

HOTOS '23
Sponsor:

Upcoming Conference

HOTOS '25
Workshop on Hot Topics in Operating Systems
May 14 - 16, 2025
Banff , AB , Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)170
  • Downloads (Last 6 weeks)31
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Harvesting idle memory for application-managed soft state with midasProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691894(1247-1265)Online publication date: 16-Apr-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media