ABSTRACT
As the latency of the network approaches that of memory, it becomes increasingly attractive for applications to use remote memory---random-access memory at another computer that is accessed using the virtual memory subsystem. This is an old idea whose time has come, in the age of fast networks. To work effectively, remote memory must address many technical challenges. In this paper, we enumerate these challenges, discuss their feasibility, explain how some of them are addressed by recent work, and indicate other promising ways to tackle them. Some challenges remain as open problems, while others deserve more study. In this paper, we hope to provide a broad research agenda around this topic, by proposing more problems than solutions.
- CCIX: cache coherent interconnect for accelerators. http://www.ccixconsortium.com. Accessed: 2017-05-05.Google Scholar
- etcd 3.1.7 documentation. https://coreos.com/etcd/docs/latest. Accessed: 2017-05-05.Google Scholar
- Gen-Z draft core specification, December 2016. http://genzconsortium.org/draft-core-specification-december-2016.Google Scholar
- InfiniBand. http://www.infinibandta.org/content/pages.php?pg=about_us_infiniband. Accessed on 2017-01-24.Google Scholar
- Intel Omni-Path. http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html. Accessed on 2017-01-24.Google Scholar
- Mellanox Connect X4. http://www.mellanox.com/page/products_dyn?product_family=201&. Accessed on 2017-01-24.Google Scholar
- OpenCAPI consortium. http://opencapi.org. Accessed: 2017-05-05.Google Scholar
- pmem.io persistent memory emulation in DRAM. http://pmem.io/2016/02/22/pm-emulation.html.Google Scholar
- Magic quadrant for x86 server virtualization infrastructure. https://www.gartner.com/doc/2788024/magic-quadrant-x-server-virtualization, 2014.Google Scholar
- C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18--28, Feb. 1996. Google ScholarDigital Library
- K. Asanovic and D. Patterson. FireBox: A hardware building block for 2020 warehouse-scale computers. In Keynote USENIX Conference on File and Storage Technologies (FAST), Feb. 2014.Google Scholar
- J. Behrens, S. Jha, M. Milano, E. Tremel, K. Birman, and R. van Renesse. The Derecho project. https://derecho-project.github.io.Google Scholar
- J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Munin: Distributed shared memory based on type-specific memory coherence. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 168--176, Mar. 1990. Google ScholarDigital Library
- N. Carriero and D. Gelernter. The S/Net's Linda kernel (extended abstract). In ACM Symposium on Operating Systems Principles (SOSP), page 160, Dec. 1985. Google ScholarDigital Library
- D. Comer and J. Griffioen. A new design for distributed systems: The remote memory model. In Usenix Summer 1990 Technical Conference, pages 127--136, June 1990.Google Scholar
- A. Dragojević, D. Narayanan, M. Castro, and O. Hodson. FaRM: Fast remote memory. In Symposium on Networked Systems Design and Implementation (NSDI), pages 401--414, Apr. 2014.Google Scholar
- A. Dragojević, D. Narayanan, E. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: distributed transactions with consistency, availability, and performance. In ACM Symposium on Operating Systems Principles (SOSP), pages 54--70, Oct. 2015. Google ScholarDigital Library
- J. Edge. DAX, mmap(), and a "go faster" flag. https://lwn.net/Articles/684828/. Accessed on 2017-01-24.Google Scholar
- P. Faraboschi, K. Keeton, T. Marsland, and D. Milojicic. Beyond processor-centric operating systems. In Workshop on Hot Topics in Operating Systems (HotOS), May 2015.Google ScholarDigital Library
- M. J. Feeley, W. E. Morgan, E. P. Pighin, A. R. Karlin, H. M. Levy, and C. A. Thekkath. Implementing global memory management in a workstation cluster. In ACM Symposium on Operating Systems Principles (SOSP), pages 201--212, Dec. 1995. Google ScholarDigital Library
- M. D. Flouris and E. P. Markatos. The network RamDisk: Using remote memory on heterogeneous nows. Cluster Computing, 2(4):281--293, Oct. 1999. Google ScholarDigital Library
- P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker. Network requirements for resource disaggregation. In Symposium on Operating Systems Design and Implementation (OSDI), pages 249--264, Oct. 2016.Google Scholar
- J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with Infiniswap. In Symposium on Networked Systems Design and Implementation (NSDI), pages 649--667, Mar. 2017.Google Scholar
- C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. RDMA over commodity ethernet at scale. In ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM), pages 202--215, Aug. 2016. Google ScholarDigital Library
- S. Han, N. Egi, A. Panda, S. Ratnasamy, G. Shi, and S. Shenker. Network support for resource disaggregation in next-generation datacenters. In Workshop on Hot Topics in Networks (HotNets), pages 10:1--10:7, Nov. 2013. Google ScholarDigital Library
- M. R. Hines, A. Gordon, M. Silva, D. Da Silva, K. Ryu, and M. Ben-Yehuda. Applications know best: Performance-driven memory overcommit with Ginkgo. In Cloud Computing Technology and Science (CloudCom), pages 130--137, Nov. 2011. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for internet-scale systems. In USENIX Annual Technical Conference (ATC), June 2010.Google ScholarDigital Library
- A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM), pages 295--306, Aug. 2014. Google ScholarDigital Library
- A. Khandual. [RFC 0/8] Define coherent device memory node. https://lkml.org/lkml/2016/10/24/19. Accessed on 2017-01-24.Google Scholar
- S. Koussih, A. Acharya, and S. Setia. Dodo: A user-level system for exploiting idle memory in workstation clusters. In IEEE International Symposium on High Performance Distributed Computing (HPDC), pages 301--308, July 1998.Google Scholar
- K. Li and P. Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems (TOCS), 7(4):321--359, Nov. 1989. Google ScholarDigital Library
- S. Liang, R. Noronha, and D. K. Panda. Swapping to remote memory over InfiniBand: An approach using a high performance network block device. In IEEE International Conference on Cluster Computing (CLUSTER), pages 1--10, Sept. 2005. Google ScholarCross Ref
- K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In International Symposium on Computer Architecture (ISCA), pages 267--278, June 2009. Google ScholarDigital Library
- K. T. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level implications of disaggregated memory. In IEEE Symposium on High Performance Computer Architecture (HPCA), pages 189--200, Feb. 2012. Google ScholarDigital Library
- M. Malka, N. Amit, M. Ben-Yehuda, and D. Tsafrir. riommu: Efficient iommu for i/o devices that employ ring buffers. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 355--368, Mar. 2015. Google ScholarDigital Library
- C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, cpu-efficient key-value store. In USENIX Annual Technical Conference (ATC), pages 103--114, June 2013.Google Scholar
- G. Natapov. Asynchronous page faults - AIX did it. http://www.linux-kvm.org/wiki/images/a/ac/2010-forum-Async-page-faults.pdf. Accessed on 2017-01-24.Google Scholar
- J. Nelson, B. Holt, B. Myers, P. Briggs, L. Ceze, S. Kahan, and M. Oskin. Latency-tolerant software distributed shared memory. In USENIX Annual Technical Conference (ATC), pages 291--305, July 2015.Google Scholar
- P. S. Rao and G. Porter. Is memory disaggregation feasible? A case study with Spark SQL. In Symposium on Architectures for Networking and Communications Systems (ANCS), pages 75--80, Mar. 2016.Google Scholar
- R. Sahita, V. Shanbhogue, G. Neiger, J. Edwards, I. Ouziel, B. Huntley, S. Shwartsman, D. Durham, A. Anderson, and M. LeMay. Method and apparatus for fine grain memory protection, Dec. 2015. US Patent App. 14/320,334.Google Scholar
- T.-I. Salomie, G. Alonso, T. Roscoe, and K. Elphinstone. Application level ballooning for efficient server consolidation. In European Conference on Computer Systems (EuroSys), pages 337--350, Apr. 2013. Google ScholarDigital Library
- D. J. Scales, K. Gharachorloo, and C. A. Thekkath. Shasta: A low overhead, software-only approach for supporting fine-grain shared memory. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 174--185, Oct. 1996. Google ScholarDigital Library
- I. Schoinas, B. Falsafi, A. R. Lebeck, S. K. Reinhardt, J. R. Larus, and D. A. Wood. Fine-grain access control for distributed shared memory. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 297--306, Oct. 1994. Google ScholarDigital Library
- I. C. Tuduce and T. R. Gross. Adaptive main memory compression. In USENIX Annual Technical Conference (ATC), pages 237--250, Apr. 2005.Google ScholarDigital Library
- J. Waldo, G. Wyant, A. Wollrath, and S. Kendall. A note on distributed computing. Technical Report SMLI TR-94--29, Sun Microsystems, Nov. 1994.Google Scholar
- C. A. Waldspurger. Memory resource management in VMware ESX server. In Symposium on Operating Systems Design and Implementation (OSDI), pages 181--194, Dec. 2002. Google ScholarCross Ref
- S. Woo. DRAM and memory system trends. In Keynote International Symposium on Memory Management (ISMM), Oct. 2004.Google Scholar
- Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A reliable and highly-available non-volatile memory system. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 3--18, Mar. 2015. Google ScholarDigital Library
Recommendations
Challenges and solutions for fast remote persistent memory access
SoCC '20: Proceedings of the 11th ACM Symposium on Cloud ComputingNon-volatile main memory DIMMs (NVMMs), such as Intel's Optane DC Persistent Memory modules, provide data durability with orders of magnitude higher performance than prior durable technologies. This paper explores the unique challenges that arise when ...
Makalu: fast recoverable allocation of non-volatile memory
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsByte addressable non-volatile memory (NVRAM) is likely to supplement, and perhaps eventually replace, DRAM. Applications can then persist data structures directly in memory instead of serializing them and storing them onto a durable block device. ...
Accurate age counter for wear leveling on non-volatile based main memory
Limited lifetime has been a key challenge in development of emerging non-volatile memories (NVM). Age counter based wear leveling is the most effective approach in the extension of their lifetime. The age counters in these approaches are determined by ...
Comments