skip to main content
article

Memory disaggregation: why now and what are the challenges

Published:28 June 2023Publication History
Skip Abstract Section

Abstract

Hardware disaggregation has emerged as one of the most fundamental shifts in how we build computer systems over the past decades. While disaggregation has been successful for several types of resources (storage, power, and others), memory disaggregation has yet to happen. We make the case that the time for memory disaggregation has arrived. We look at past successful disaggregation stories and learn that their success depended on two requirements: addressing a burning issue and being technically feasible. We examine memory disaggregation through this lens and find that both requirements are finally met. Once available, memory disaggregation will require software support to be used effectively. We discuss some of the challenges of designing an operating system that can utilize disaggregated memory for itself and its applications.

References

  1. Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. Can far memory improve job throughput? In European Conference on Computer Systems, pages 1--16, April 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cristiana Amza, Alan L. Cox, Shandya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. Tread- Marks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18--28, February 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Krste Asanovi´c. FireBox: A hardware building block for 2020 Warehouse-Scale computers. In USENIX Conference on File and Storage Technologies, February 2014. Keynote talk.Google ScholarGoogle Scholar
  4. J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Munin: Distributed shared memory based on typespecific memory coherence. In ACM Symposium on Principles and Practice of Parallel Programming, pages 168--176, March 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Maciej Bielski, Ilias Syrigos, Kostas Katrinis, Dimitris Syrivelis, Andrea Reale, Dimitris Theodoropoulos, Nikolaos Alachiotis, Dionisios N. Pnevmatikatos, Evert H. Pap, Georgios Zervas, Vaibhawa Mishra, Arsalan Saljoghei, Alvise Rigo, Jose Fernando Zazo, Sergio Lopez-Buedo, Martí Torrents, Ferad Zyulkyarov, Michael Enrico, and Oscar Gonzalez de Dios. dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter. In Design, Automation & Test in Europe Conference & Exhibition, pages 1093--1098, March 2018.Google ScholarGoogle Scholar
  6. VMware Bitfusion. https://core.vmware.com/ bitfusion.Google ScholarGoogle Scholar
  7. Qingchao Cai,Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. Efficient distributed memory management with RDMA and caching. Proceedings of the VLDB Endowment, 11(11):1604--1617, July 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Irina Calciu, M. Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli. Rethinking software runtimes for disaggregated memory. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, April 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Compute eXpress Link. https://www. computeexpresslink.org.Google ScholarGoogle Scholar
  10. Paolo Faraboschi, Kimberly Keeton, Tim Marsland, and Dejan Milojicic. Beyond processor-centric operating systems. In Workshop on Hot Topics in Operating Systems, May 2015.Google ScholarGoogle Scholar
  11. E. Felten and J. Zahorjan. Issues in the implementation of a remote memory paging system. Technical Report CSE TR 91-03-09, University of Washington, March 1991.Google ScholarGoogle Scholar
  12. Gen-Z consortium. https://en.wikipedia. org/wiki/Gen-Z_(consortium).Google ScholarGoogle Scholar
  13. Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. Direct access, highperformance memory disaggregation with DirectCXL. In USENIX Annual Technical Conference, pages 287--294, June 2022.Google ScholarGoogle Scholar
  14. Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. Efficient memory disaggregation with Infiniswap. In Symposium on Networked Systems Design and Implementation, pages 649--667, March 2017.Google ScholarGoogle Scholar
  15. Hannes Hapke and Catherine Nelson. Building Machine Learning Pipelines. O'Reilly Media, Inc, July 2020.Google ScholarGoogle Scholar
  16. Intel rack scale architecture. https: //www-conf.slac.stanford.edu/ xldb2016/talks/published/Tues_6_ Mohan-Kumar-Rack-Scale-XLDB-Updated. pdf.Google ScholarGoogle Scholar
  17. Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. Profiling a warehouse-scale computer. In International Symposium on Computer Architecture, pages 158--169, June 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kimberly Keeton. Memory driven computing. In USENIX Conference on File and Storage Technologies, February 2017. Keynote presentation.Google ScholarGoogle Scholar
  19. Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. MIND: In-network memory management for disaggregated data centers. In ACM Symposium on Operating Systems Principles, pages 488--504, October 2021.Google ScholarGoogle Scholar
  20. Huaicheng Li, Daniel S. Berger, Stanko Novakovic, Lisa Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. Pond: CXL-based memory pooling systems for cloud platforms. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K Reinhardt, and Thomas F Wenisch. Disaggregated memory for expansion and sharing in blade servers. ACM SIGARCH Computer Architecture News, 37(3):267--278, June 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mark Mansi and Michael M. Swift. /0sim: Preparing system software for a world with terabytescale memories. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, page 267--282, March 2020.Google ScholarGoogle Scholar
  23. Hasan Al Maruf, HaoWang, Abhishek Dhanotia, JohannesWeiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. TPP: Transparent page placement for CXL-enabled tiered-memory. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, page 742--755, March 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Latency-tolerant software distributed shared memory. In USENIX Annual Technical Conference, pages 291--305, July 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Open compute project. https://www. opencompute.org.Google ScholarGoogle Scholar
  26. Open19. https://www.open19.org.Google ScholarGoogle Scholar
  27. OpenVMS. https://en.wikipedia.org/wiki/ OpenVMS.Google ScholarGoogle Scholar
  28. Amanda Raybuck, Tim Stamler, Wei Zhang, Mattan Erez, and Simon Peter. HeMem: Scalable tiered memory management for big data applications and real NVM. In ACM Symposium on Operating Systems Principles, pages 392--407, October 2021.Google ScholarGoogle Scholar
  29. RDMA over Converged Ethernet. https: //en.wikipedia.org/wiki/RDMA_over_ Converged_Ethernet.Google ScholarGoogle Scholar
  30. Zhenyuan Ruan, Malte Schwarzkopf, Marcos K Aguilera, and Adam Belay. AIFM: Highperformance, application-integrated far memory. In Symposium on Operating Systems Design and Implementation, pages 315--332, November 2020.Google ScholarGoogle Scholar
  31. Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. LegoOS: A disseminated, distributed os for hardware resource disaggregation. In Symposium on Operating Systems Design and Implementation, pages 69--87, October 2018.Google ScholarGoogle Scholar
  32. Silicon photonics. https://en.wikipedia.org/ wiki/Silicon_photonics.Google ScholarGoogle Scholar
  33. Scalable memory development kit. https:// github.com/OpenMPDK/SMDK.Google ScholarGoogle Scholar
  34. Paul Teich. HPE powers up The Machine architecture, January 2017. https: //www.nextplatform.com/2017/01/09/ hpe-powers-machine-architecture.Google ScholarGoogle Scholar
  35. Stephanie Wang, Eric Liang, Edward Oakes, Benjamin Hindman, Frank Sifei Luan, Audrey Cheng, and Ion Stoica. Ownership: A distributed futures system for fine-grained tasks. In Symposium on Networked Systems Design and Implementation, pages 671--686, April 2021.Google ScholarGoogle Scholar
  36. Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. Nimble page management for tiered memory systems. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 331--345, April 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Workshop on Hot Topics in Cloud Computing, June 2010.Google ScholarGoogle Scholar
  38. Yang Zhou, Hassan MG Wassel, Sihang Liu, Jiaqi Gao, James Mickens, Minlan Yu, Chris Kennelly, Paul Turner, David E Culler, Henry M Levy, et al. Carbink: Fault-tolerant far memory. In Symposium on Operating Systems Design and Implementation, pages 55--71, July 2022. 46Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 57, Issue 1
    SIGOPS
    June 2023
    53 pages
    ISSN:0163-5980
    DOI:10.1145/3606557
    Issue’s Table of Contents

    Copyright © 2023 Copyright is held by the owner/author(s)

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 28 June 2023

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader