Article

Rethinking software runtimes for disaggregated memory

Authors:
Irina Calciu

VMware Research, USA

VMware Research, USA
View Profile

,
M. Talha Imran

Pennsylvania State University, USA

Pennsylvania State University, USA
View Profile

,
Ivan Puddu

ETH Zurich, Switzerland

ETH Zurich, Switzerland
View Profile

,
Sanidhya Kashyap

EPFL, Switzerland

EPFL, Switzerland
View Profile

,
Hasan Al Maruf

University of Michigan, USA

University of Michigan, USA
View Profile

,
Onur Mutlu

ETH Zurich, Switzerland

ETH Zurich, Switzerland
View Profile

,
Aasheesh Kolli

Pennsylvania State University, USA / Google, USA

Pennsylvania State University, USA / Google, USA
View Profile

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsApril 2021Pages 79–92https://doi.org/10.1145/3445814.3446713

Published:17 April 2021Publication History

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 79–92

ABSTRACT

Disaggregated memory can address resource provisioning inefficiencies in current datacenters. Multiple software runtimes for disaggregated memory have been proposed in an attempt to make disaggregated memory practical. These systems rely on the virtual memory subsystem to transparently offer disaggregated memory to applications using a local memory abstraction. Unfortunately, using virtual memory for disaggregation has multiple limitations, including high overhead that comes from the use of page faults to identify what data to fetch and cache locally, and high dirty data amplification that comes from the use of page-granularity for tracking changes to the cached data (4KB or higher).

In this paper, we propose a fundamentally new approach to designing software runtimes for disaggregated memory that addresses these limitations. Our main observation is that we can use cache coherence instead of virtual memory for tracking applications' memory accesses transparently, at cache-line granularity. This simple idea (1) eliminates page faults from the application critical path when accessing remote data, and (2) decouples the application memory access tracking from the virtual memory page size, enabling cache-line granularity dirty data tracking and eviction. Using this observation, we implemented a new software runtime for disaggregated memory that improves average memory access time by 1.7-5X and reduces dirty data amplification by 2-10X, compared to state-of-the-art systems.

References

Balance LRU lists based on relative thrashing. https://lwn.net/Articles/690069/.Google Scholar
CCIX. https://www.ccixconsortium.com.Google Scholar
Enzian, a research computer built by the Systems Group at ETH Zürich. http://www.enzian.systems/index.html.Google Scholar
memtier benchmark: A high-throughput benchmarking tool for redis and memcached. https://redislabs.com/blog/memtier_benchmark-a-high-throughputbenchmarking-tool-for-redis-memcached/.Google Scholar
Pin-a dynamic binary instrumentation tool. https://software.intel.com/enus/articles/pin-a-dynamic-binary-instrumentation-tool.Google Scholar
Reconsidering swapping. https://lwn.net/Articles/690079/.Google Scholar
Redis : open-source, in-memory data structure store. https://redis.io.Google Scholar
VOLTDB. https://www.voltdb.com/.Google Scholar
Atul Adya, Robert Grandl, Daniel Myers, and Henry Qin. Fast key-value stores: An idea whose time has come and gone. In Workshop on Hot Topics in Operating Systems (HotOS), 2019.Google ScholarDigital Library
Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novakovic, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. Remote regions: a simple abstraction for remote memory. In USENIX Annual Technical Conference (ATC), 2018.Google ScholarDigital Library
Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. Remote memory in the age of fast networks. In ACM Symposium on Cloud Computing (SoCC), 2017.Google ScholarDigital Library
Mohammed Alser, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan. Shouji: a fast and eficient pre-alignment filter for sequence alignment. Bioinformatics, 35 ( 21 ), 2019.Google Scholar
Mohammed Alser, Hasan Hassan, Hongyi Xin, Oðuz Ergin, Onur Mutlu, and Can Alkan. GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics, 33 ( 21 ), 2017.Google Scholar
Mohammed Alser, Taha Shahroodi, Juan Gómez-Luna, Can Alkan, and Onur Mutlu. SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs. Bioinformatics, 2020.Google Scholar
Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K. Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. Can far memory improve job throughput? In European Conference on Computer Systems (EuroSys), 2020.Google ScholarDigital Library
Cristiana Amza, Alan L. Cox, Shandya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. TreadMarks: Shared memory computing on networks of workstations. IEEE Computer, February 1996.Google ScholarDigital Library
Apple. How We Ported Linux to the M1. https://corellium.com/blog/linux-m1.Google Scholar
Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing. Computer, 40 ( 12 ): 33-37, December 2007.Google Scholar
J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Munin: Distributed shared memory based on type-specific memory coherence. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), March 1990.Google ScholarDigital Library
Abhishek Bhattacharjee. Translation-triggered prefetching. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.Google ScholarDigital Library
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. The Gem5 Simulator. SIGARCH Comput. Archit. News, 39 ( 2 ): 1-7, August 2011.Google ScholarDigital Library
M. Blott and K. Vissers. Dataflow architectures for 10 Gbps line-rate key-valuestores. In IEEE Hot Chips 25 Symposium (HCS), 2013.Google ScholarCross Ref
Derek Bruening, Qin Zhao, and Saman Amarasinghe. Transparent dynamic instrumentation. In International Conference on Virtual Execution Environments (VEE), 2012.Google ScholarDigital Library
Irina Calciu, Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli. Rethinking Software Runtimes for Disaggregated Memory, February 2021. https://github.com/project-kona/asplos21-ae.Google Scholar
Irina Calciu, Ivan Puddu, Aasheesh Kolli, Andreas Nowatzyk, Jayneel Gandhi, Onur Mutlu, and Pratap Subrahmanyam. Project PBerry: FPGA Acceleration for Remote Memory. In Workshop on Hot Topics in Operating Systems (HotOS), 2019.Google Scholar
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. Black-box concurrent data structures for NUMA architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.Google ScholarDigital Library
Adrian Caulfield, Eric Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. A Cloud-Scale Acceleration Architecture. In International Symposium on Microarchitecture (MICRO), 2016.Google Scholar
Convey Computer. The Convey HC-2 Computer. Architectural Overview. https://www.micron.com/~/media/documents/products/whitepaper/wp_convey_hc2_architectual_overview.pdf, 2012.Google Scholar
Aleksandar Dragojevi?, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. FaRM: Fast remote memory. In Symposium on Networked Systems Design and Implementation (NSDI), April 2014.Google Scholar
Aleksandar Dragojevi?, Dushyanth Narayanan, Ed Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. No compromises: distributed transactions with consistency, availability, and performance. In ACM Symposium on Operating Systems Principles (SOSP), October 2015.Google Scholar
Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. The design and operation of CloudLab. In USENIX Annual Technical Conference (ATC), 2019.Google Scholar
Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. Network requirements for resource disaggregation. In Symposium on Operating Systems Design and Implementation (OSDI), October 2016.Google ScholarDigital Library
Gen-Z draft core specification-december 2016. http://genzconsortium.org/draftcore-specification-december-2016.Google Scholar
G. Gibb, J. W. Lockwood, J. Naous, P. Hartke, and N. McKeown. NetFPGA: An open platform for teaching how to build Gigabit-rate network switches and routers. IEEE Transactions on Education, 2008.Google Scholar
Heiner Giefers, Raphael Polig, and Christoph Hagleitner. Accelerating Arithmetic Kernels with Coherent Attached FPGA Coprocessors. In Design, Automation & Test in Europe (DATE), 2015.Google ScholarDigital Library
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. Eficient Memory Disaggregation with Infiniswap. In Symposium on Networked Systems Design and Implementation (NSDI), 2017.Google Scholar
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. RDMA over Commodity Ethernet at Scale. In ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM), August 2016.Google ScholarDigital Library
Zhenhao He, David Sidler, Zsolt István, and Gustavo Alonso. A flexible k-means operator for hybrid databases. In International Conference on Field Programmable Logic and Applications (FPL), 2018.Google ScholarCross Ref
Intel. Cachegrind. https://valgrind.org/docs/manual/cg-manual.html.Google Scholar
Intel. EPT-based Sub-Page Permissions. https://software.intel.com/sites/default/ ifles/managed/c5/15/architecture-instruction-set-extensions-programmingreference.pdf.Google Scholar
Intel. Intel Xeon+FPGA Platform for the Data Center. http:// reconfigurablecomputing4themasses.net/files/2.2%20PK.pdf.Google Scholar
Intel. Page Modification Logging for Virtual Machine Monitor White Paper. https://www.intel.com/content/dam/www/public/us/en/documents/whitepapers/page-modification-logging-vmm-white-paper.pdf.Google Scholar
Intel. Intel® 64 and IA-32 Architectures Software Developer's Manual. November 2020.Google Scholar
Scott F. Kaplan, Lyle A. McGeoch, and Megan F. Cole. Adaptive caching for demand prepaging. In International Symposium on Memory Management (ISMM), 2002.Google ScholarDigital Library
Stefanos Kaxiras, David Klaftenegger, Magnus Norgren, Alberto Ros, and Konstantinos Sagonas. Turning centralized coherence and distributed critical-section execution on their head: A new approach for scalable distributed shared memory. In IEEE International Symposium on High Performance Distributed Computing (HPDC), 2015.Google ScholarDigital Library
Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J. Rossbach. Sharing, Protection, and Compatibility for Reconifgurable Fabric with AmorphOS. In Symposium on Operating Systems Design and Implementation (OSDI), Carlsbad, CA, 2018.Google Scholar
Maysam Lavasani, Hari Angepat, and Derek Chiou. An FPGA-based in-line accelerator for Memcached. IEEE Computer Architecture Letters, 2014.Google ScholarDigital Library
Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems (TOCS), November 1989.Google ScholarDigital Library
libibverbs. http://www.rdmamojo.com/ 2012 /05/18/libibverbs.Google Scholar
Kevin T. Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. System-level implications of disaggregated memory. In IEEE Symposium on High Performance Computer Architecture (HPCA), February 2012.Google ScholarDigital Library
Liu Ling, Neal Oliver, Chitlur Bhushan, Wang Qigang, Alvin Chen, Shen Wenbo, Yu Zhihong, Arthur Sheiman, Ian McCallum, Joseph Grecco, Henry Mitchel, Liu Dong, and Prabhat Gupta. High-performance, Energy-eficient Platforms Using In-socket FPGA Accelerators. In International Symposium on Field Programmable Gate Arrays (FPGA), 2009.Google Scholar
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph Hellerstein. GraphLab: A New Framework for Parallel Machine Learning. In Conference on Uncertainty in Artificial Intelligence (UAI), 2010.Google Scholar
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geof Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In International Conference on Programming Language Design and Implementation (PLDI), 2005.Google Scholar
Jiacheng Ma, Gefei Zuo, Kevin Loughlin, Xiaohe Cheng, Yanqiang Liu, Abel Mulugeta Eneyew, Zhengwei Qi, and Baris Kasikci. A Hypervisor for Shared-Memory FPGA Platforms. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020.Google Scholar
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. TABLA: A unified templatebased framework for accelerating statistical machine learning. In IEEE Symposium on High Performance Computer Architecture (HPCA), 2016.Google ScholarCross Ref
Yandong Mao, Robert Morris, and Frans Kaashoek. Optimizing MapReduce for multicore architectures. Technical Report MIT-CSAIL-TR-2010-020, May 2010.Google Scholar
Hasan Al Maruf and Mosharaf Chowdhury. Efectively Prefetching Remote Memory with Leap. In USENIX Annual Technical Conference (ATC), 2020.Google Scholar
Mellanox. Mellanox Innova? IPsec 4 Lx Ethernet Adapter Card User Manual. http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_4_Lx_Ethernet_Adapter_Card_User_Manual_rev_1_3.pdf.Google Scholar
Justin Meza, Tianyin Xu, Kaushik Veeraraghavan, and Onur Mutlu. A large scale study of data center network reliability. In Proceedings of the Internet Measurement Conference (IMC), 2018.Google ScholarDigital Library
Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood. A primer on memory consistency and cache coherence, second edition. Synthesis Lectures on Computer Architecture, 15 ( 1 ): 1-294, 2020.Google ScholarCross Ref
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Latency-tolerant software distributed shared memory. In USENIX Annual Technical Conference (ATC), July 2015.Google ScholarDigital Library
Muhsen Owaida, David Sidler, Kaan Kara, and Gustavo Alonso. Centaur: A framework for hybrid CPU-FPGA databases. In International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2017.Google ScholarCross Ref
Gagandeep Panwar, Da Zhang, Yihan Pang, Mai Dahshan, Nathan DeBardeleben, Binoy Ravindran, and Xun Jian. Quantifying Memory Underutilization in HPC Systems and Using It to Improve Performance via Architecture Support. In International Symposium on Microarchitecture (MICRO), 2019.Google ScholarDigital Library
Mark S. Papamarcos and Janak H. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. In International Symposium on Computer Architecture (ISCA), 1984.Google Scholar
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In International Symposium on Computer Architecture (ISCA), 2014.Google ScholarDigital Library
Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In ACM Symposium on Cloud Computing (SoCC), 2012.Google ScholarDigital Library
Zhenyuan Ruan, Malte Schwarzkopf, Marcos K. Aguilera, and Adam Belay. AIFM: High-performance, application-integrated far memory. In Symposium on Operating Systems Design and Implementation (OSDI), November 2020.Google Scholar
Daniel Sanchez and Christos Kozyrakis. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. In International Symposium on Computer Architecture (ISCA), 2013.Google Scholar
Daniel J. Scales, Kourosh Gharachorloo, and Chandramohan A. Thekkath. Shasta: A low overhead, software-only approach for supporting fine-grain shared memory. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1996.Google ScholarDigital Library
Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Steven K. Reinhardt, James R. Larus, and David A. Wood. Fine-grain access control for distributed shared memory. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1994.Google ScholarDigital Library
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. LegoOS: A disseminated, distributed OS for hardware resource disaggregation. In Symposium on Operating Systems Design and Implementation (OSDI), Carlsbad, CA, 2018.Google ScholarDigital Library
Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. Distributed shared persistent memory. In ACM Symposium on Cloud Computing (SoCC), 2017.Google ScholarDigital Library
Yongming Shen, Michael Ferdman, and Peter Milder. Maximizing CNN accelerator eficiency through resource partitioning. In International Symposium on Computer Architecture (ISCA), 2017.Google Scholar
Navin Shenoy. A Milestone in Moving Data. https://newsroom.intel.com/ editorials/milestone-moving-data.Google Scholar
David Sidler, Zsolt István, Muhsen Owaida, Kaan Kara, and Gustavo Alonso. doppioDB: A hardware accelerated database. In International Conference on Management of Data (SIGMOD), 2017.Google Scholar
Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan GómezLuna, Sander Stuijk, Onur Mutlu, and Henk Corporaal. NERO: A near highbandwidth memory stencil accelerator for weather prediction modeling. In International Conference on Field Programmable Logic and Applications (FPL), 2020.Google Scholar
Mario Smarduch. Enhanced Live Migration For Intensive Memory Loads. https://events.static.linuxfound.org/sites/events/files/slides/CloudOpenJapan-2015.pdf.Google Scholar
Muhammad Tirmazi, Adam Barker, Nan Deng, Md E. Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. Borg: The next generation. In European Conference on Computer Systems (EuroSys), 2020.Google Scholar
Shin-Yeh Tsai and Yiying Zhang. LITE kernel RDMA support for datacenter applications. In ACM Symposium on Operating Systems Principles (SOSP), October 2017.Google ScholarDigital Library
Userfaultfd. https://www.kernel.org/doc/Documentation/vm/userfaultfd.txt.Google Scholar
Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael D. Bond, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. Semeru: A memory-disaggregated managed runtime. In Symposium on Operating Systems Design and Implementation (OSDI), pages 261-280, November 2020.Google Scholar
Erfan Zamanian, Carsten Binnig, Tim Harris, and Tim Kraska. The End of a Myth: Distributed Transactions Can Scale. International Conference on Very Large Data Bases (VLDB), 10 ( 6 ), February 2017.Google ScholarDigital Library
Yue Zha and Jing Li. Virtualizing FPGAs in the Cloud. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020.Google Scholar

Index Terms

Rethinking software runtimes for disaggregated memory
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Distributed memory

Recommendations

Efficient Remote Memory Paging for Disaggregated Memory Systems
Algorithms and Architectures for Parallel Processing
Abstract
Memory disaggregation has attracted increasing attention in recent years because it is a cost-efficient approach to scale memory capacity for applications in a data center. However, the latency of remote memory access is a major concern in ...
Read More
DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated Memory
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture

Memory disaggregation is a promising solution to scale memory capacity and bandwidth shared by multiple server nodes in a flexible and cost-effective manner. DRAM power consumption, which is reported to be around 40% of the total system power in the ...
Read More
Reconsidering OS memory optimizations in the presence of disaggregated memory
ISMM 2022: Proceedings of the 2022 ACM SIGPLAN International Symposium on Memory Management

Tiered memory systems introduce an additional memory level with higher-than-local-DRAM access latency and require sophisticated memory management mechanisms to achieve cost-efficiency and high performance. Recent works focus on byte-addressable tiered ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA
Copyright © 2021 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
Author Tags
cache coherence
disaggregated memory
remote memory
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 2,488
  Total Downloads
- Downloads (Last 12 months)666
- Downloads (Last 6 weeks)70
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Rethinking software runtimes for disaggregated memory

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Remote Memory Paging for Disaggregated Memory Systems

DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated Memory

Reconsidering OS memory optimizations in the presence of disaggregated memory