ABSTRACT
RDMA technology enables a host to access the memory of a remote host without involving the remote CPU, improving the performance of distributed in-memory storage systems. Previous studies argued that RDMA suffers from scalability issues, because the NIC's limited resources are unable to simultaneously cache the state of all the concurrent network streams. These concerns led to various software-based proposals to reduce the size of this state by trading off performance.
We revisit these proposals and show that they no longer apply when using newer RDMA NICs in rack-scale environments. In particular, we find that one-sided remote memory primitives lead to better performance as compared to the previously proposed unreliable datagram and kernel-based stacks. Based on this observation, we design and implement Storm, a transactional dataplane utilizing one-sided read and write-based RPC primitives. We show that Storm outperforms eRPC, FaRM, and LITE by 3.3x, 3.6x, and 17.1x, respectively, on an InfiniBand cluster with Mellanox ConnectX-4 NICs.
- Openfabrics. dynamically connected transport. https://www.openfabrics.org/images/eventpresos/workshops2014/DevWorkshop/presos/Monday/pdf/05_DC_Verbs.pdf, 2014.Google Scholar
- Supplement to infiniband architecture specification volume 1 release 1.2.2 annex a17: Rocev2 (ip routable roce). https://cw.infinibandta.org/document/dl/7781, 2014.Google Scholar
- Amazon elastic fabric adapter (efa). https://lwn.net/Articles/773973/, 2018.Google Scholar
- Speculative Execution Exploit Performance Impact. https://access.redhat.com/articles/3307751, 2019.Google Scholar
- M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, S. Novaković, A. Ramanathan, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei. Remote regions: a simple abstraction for remote memory. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC), 2018. Google ScholarDigital Library
- B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload Analysis of a Large-Scale Key-Value Store. In Proceedings of the 2012 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2012. Google ScholarDigital Library
- C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. In Proc. VLDB Endow., 2016. Google ScholarDigital Library
- Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using rdma and htm. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys), 2016. Google ScholarDigital Library
- A. Dragojevic, D. Narayanan, and M. Castro. RDMA Reads: To Use or Not to Use? In IEEE Data Eng. Bull., 2017.Google Scholar
- A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. FaRM: Fast Remote Memory. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), 2014. Google ScholarDigital Library
- J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2017. Google ScholarDigital Library
- C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. RDMA over Commodity Ethernet at Scale. In Proceedings of the ACM SIGCOMM 2016 Conference, 2016. Google ScholarDigital Library
- J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached design on high performance rdma capable interconnects. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP), 2011. Google ScholarDigital Library
- A. Kalia, M. Kaminsky, and D. Andersen. Datacenter rpcs can be general and fast. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2019. Google ScholarDigital Library
- A. Kalia, M. Kaminsky, and D. G. Andersen. Design guidelines for high performance rdma systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (ATC), 2016. Google ScholarDigital Library
- A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th Symposium on Operating System Design and Implementation (OSDI), 2016. Google ScholarDigital Library
- H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. In Proceedings of the ACM Transactions on Database Systems., 1981. Google ScholarDigital Library
- Y. Kwon, H. Yu, S. Peter, C. J. Rossbach, and E. Witchel. Coordinated and efficient huge page management with ingens. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016. Google ScholarDigital Library
- B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. Kv-direct: High-performance in-memory key-value store with programmable nic. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP), 2017. Google ScholarDigital Library
- H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014. Google ScholarDigital Library
- Y. Lu, G. Chen, B. Li, K. Tan, Y. Xiong, P. Cheng, J. Zhang, E. Chen, and T. Moscibroda. Multi-path transport for RDMA in datacenters. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2018. Google ScholarDigital Library
- Y. Lu, J. Shu, Y. Chen, and T. Li. Octopus: an rdma-enabled distributed persistent memory file system. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC), 2017. Google ScholarDigital Library
- LWN. A deep dive into CMA. https://lwn.net/Articles/486301/, 2012.Google Scholar
- Mellanox. Personal communication. 2018.Google Scholar
- Mellanox. Physical Address Memory Region. https://community.mellanox.com/docs/DOC-2480, 2019.Google Scholar
- C. Mitchell, Y. Geng, and J. Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In USENIX Conference on Annual Technical Conference (ATC), 2013. Google ScholarDigital Library
- C. Mitchell, K. Montgomery, L. Nelson, S. Sen, and J. Li. Balancing CPU and network in the cell distributed b-tree store. In Proceedings of the 2016 USENIX Annual Technical Conference (ATC), 2016. Google ScholarDigital Library
- R. Mittal, V. T. Lam, N. Dukkipati, E. R. Blem, H. M. G. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. TIMELY: RTT-based Congestion Control for the Datacenter. In Proceedings of the ACM SIGCOMM 2015 Conference, 2015. Google ScholarDigital Library
- T. P. Morgan. Intel shows off 3D XPoint memory performance. https://searchstorage.techtarget.com/definition/3D-XPoint, 2017.Google Scholar
- S. Narravula, A. Marnidala, A. Vishnu, K. Vaidyanathan, and D. K. Panda. High performance distributed lock management services using network-based remote atomic operations. In Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2007. Google ScholarDigital Library
- J. Nelson, B. Holt, B. Myers, P. Briggs, L. Ceze, S. Kahan, and M. Oskin. Latency-tolerant software distributed shared memory. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC), 2015. Google ScholarDigital Library
- R. Neugebauer, G. Antichi, J. Zazo, Y. Audvevich, S. López-Buedo, and A. Moore. Understanding pcie performance for end host networking. In Proceedings of the ACM SIGCOMM 2018 Conference, 2018. Google ScholarDigital Library
- V. Nitu, B. Teabe, A. Tchana, C. Isci, and D. Hagimont. Welcome to zombieland: Practical and energy-efficient memory disaggregation in a datacenter. In Proceedings of the 13th EuroSys Conference (EuroSys), 2018. Google ScholarDigital Library
- A. Panwar, A. Prasad, and K. Gopinath. Making huge pages actually useful. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018. Google ScholarDigital Library
- G. F. Pfister. An introduction to the infiniband architecture. High Performance Mass Storage and Parallel I/O, 2001.Google Scholar
- R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia. A remote direct memory access protocol specification (rfc 5040). 2007.Google Scholar
- Y. Shan, S.-Y. Tsai, and Y. Zhang. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC), 2017. Google ScholarDigital Library
- P. Stuedi, A. Trivedi, J. Pfefferle, R. Stoica, B. Metzler, N. Ioannou, and I. Koltsidas. Crail: A high-performance i/o architecture for distributed data processing. In IEEE Data Eng. Bull., 2017.Google Scholar
- M. Su, M. Zhang, K. Chen, Z. Guo, and Y. Wu. Rfp: When rpc is faster than server-bypass with rdma. In Proceedings of the 12th European Conference on Computer Systems (EuroSys), 2017. Google ScholarDigital Library
- M. M. Swift. Towards o(1) memory. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS), 2017. Google ScholarDigital Library
- A. Tavakkol, A. Kolli, S. Novakovic, K. Razavi, J. Gomez-Luna, H. Hassan, C. Barthels, Y. Wang, M. Sadrosadati, S. Ghose, et al. Enabling efficient rdma-based synchronous mirroring of persistent memory transactions. In arXiv preprint arXiv:1810.09360, 2018.Google Scholar
- S.-Y. Tsai and Y. Zhang. LITE Kernel RDMA Support for Datacenter Applications. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), 2017. Google ScholarDigital Library
- X. Wei, Z. Dong, R. Chen, and H. Chen. Deconstructing rdma-enabled distributed transactions: Hybrid is better! In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018. Google ScholarDigital Library
- X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast in-memory transaction processing using rdma and htm. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), 2015. Google ScholarDigital Library
- D. Y. Yoon, M. Chowdhury, and B. Mozafari. Distributed lock management with rdma: Decentralization without starvation. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018. Google ScholarDigital Library
- E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The end of a myth: Distributed transactions can scale. In Proc. VLDB Endow., 2017. Google ScholarDigital Library
- Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion Control for Large-Scale RDMA Deployments. In Proceedings of the ACM SIGCOMM 2015 Conference, 2015. Google ScholarDigital Library
Index Terms
- Storm: a fast transactional dataplane for remote data structures
Recommendations
Pre-Copy and post-copy VM live migration for memory intensive applications
Euro-Par'12: Proceedings of the 18th international conference on Parallel processing workshopsVirtualization technology provides a means for server consolidation, reducing the number of physical servers required for running a given workload. Virtual Machine (VM) live migration facilitates the transfer of a running (VM) between physical hosts ...
Hardware-supported remote persistence for distributed persistent memory
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisThe advent of Persistent Memory (PM) necessitates an evolution of Remote Direct Memory Access (RDMA) technologies for supporting remote data persistence. Previous software-based solutions require remote CPU intervention and postpone the visibility of ...
A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsDMA-capable interconnects, providing ultra-low latency and high bandwidth, are increasingly being used in the context of distributed storage and data processing systems. However, the deployment of such systems in virtualized data centers is currently ...
Comments