research-article

Storm: a fast transactional dataplane for remote data structures

Authors:
Stanko Novakovic

Microsoft Research

Microsoft Research
View Profile

,
Yizhou Shan

Purdue University

Purdue University
View Profile

,
Aasheesh Kolli

VMware and The Pennsylvania State University

VMware and The Pennsylvania State University
View Profile

,
Michael Cui

VMware

VMware
View Profile

,
Yiying Zhang

Purdue University

Purdue University
View Profile

,
Haggai Eran

Mellanox and Technion

Mellanox and Technion
View Profile

,
Boris Pismenny

Mellanox

Mellanox
View Profile

,
Liran Liss

Mellanox

Mellanox
View Profile

,
Michael Wei

VMware

VMware
View Profile

,
Dan Tsafrir

VMware and Technion

VMware and Technion
View Profile

,
Marcos Aguilera

VMware

VMware
View Profile

SYSTOR '19: Proceedings of the 12th ACM International Conference on Systems and StorageMay 2019Pages 97–108https://doi.org/10.1145/3319647.3325827

Published:22 May 2019Publication History

SYSTOR '19: Proceedings of the 12th ACM International Conference on Systems and Storage

Pages 97–108

ABSTRACT

RDMA technology enables a host to access the memory of a remote host without involving the remote CPU, improving the performance of distributed in-memory storage systems. Previous studies argued that RDMA suffers from scalability issues, because the NIC's limited resources are unable to simultaneously cache the state of all the concurrent network streams. These concerns led to various software-based proposals to reduce the size of this state by trading off performance.

We revisit these proposals and show that they no longer apply when using newer RDMA NICs in rack-scale environments. In particular, we find that one-sided remote memory primitives lead to better performance as compared to the previously proposed unreliable datagram and kernel-based stacks. Based on this observation, we design and implement Storm, a transactional dataplane utilizing one-sided read and write-based RPC primitives. We show that Storm outperforms eRPC, FaRM, and LITE by 3.3x, 3.6x, and 17.1x, respectively, on an InfiniBand cluster with Mellanox ConnectX-4 NICs.

References

Openfabrics. dynamically connected transport. https://www.openfabrics.org/images/eventpresos/workshops2014/DevWorkshop/presos/Monday/pdf/05_DC_Verbs.pdf, 2014.Google Scholar
Supplement to infiniband architecture specification volume 1 release 1.2.2 annex a17: Rocev2 (ip routable roce). https://cw.infinibandta.org/document/dl/7781, 2014.Google Scholar
Amazon elastic fabric adapter (efa). https://lwn.net/Articles/773973/, 2018.Google Scholar
Speculative Execution Exploit Performance Impact. https://access.redhat.com/articles/3307751, 2019.Google Scholar
M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, S. Novaković, A. Ramanathan, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei. Remote regions: a simple abstraction for remote memory. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC), 2018. Google ScholarDigital Library
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload Analysis of a Large-Scale Key-Value Store. In Proceedings of the 2012 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2012. Google ScholarDigital Library
C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. In Proc. VLDB Endow., 2016. Google ScholarDigital Library
Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using rdma and htm. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys), 2016. Google ScholarDigital Library
A. Dragojevic, D. Narayanan, and M. Castro. RDMA Reads: To Use or Not to Use? In IEEE Data Eng. Bull., 2017.Google Scholar
A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. FaRM: Fast Remote Memory. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), 2014. Google ScholarDigital Library
J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2017. Google ScholarDigital Library
C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. RDMA over Commodity Ethernet at Scale. In Proceedings of the ACM SIGCOMM 2016 Conference, 2016. Google ScholarDigital Library
J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached design on high performance rdma capable interconnects. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP), 2011. Google ScholarDigital Library
A. Kalia, M. Kaminsky, and D. Andersen. Datacenter rpcs can be general and fast. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2019. Google ScholarDigital Library
A. Kalia, M. Kaminsky, and D. G. Andersen. Design guidelines for high performance rdma systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (ATC), 2016. Google ScholarDigital Library
A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th Symposium on Operating System Design and Implementation (OSDI), 2016. Google ScholarDigital Library
H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. In Proceedings of the ACM Transactions on Database Systems., 1981. Google ScholarDigital Library
Y. Kwon, H. Yu, S. Peter, C. J. Rossbach, and E. Witchel. Coordinated and efficient huge page management with ingens. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016. Google ScholarDigital Library
B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. Kv-direct: High-performance in-memory key-value store with programmable nic. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP), 2017. Google ScholarDigital Library
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014. Google ScholarDigital Library
Y. Lu, G. Chen, B. Li, K. Tan, Y. Xiong, P. Cheng, J. Zhang, E. Chen, and T. Moscibroda. Multi-path transport for RDMA in datacenters. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2018. Google ScholarDigital Library
Y. Lu, J. Shu, Y. Chen, and T. Li. Octopus: an rdma-enabled distributed persistent memory file system. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC), 2017. Google ScholarDigital Library
LWN. A deep dive into CMA. https://lwn.net/Articles/486301/, 2012.Google Scholar
Mellanox. Personal communication. 2018.Google Scholar
Mellanox. Physical Address Memory Region. https://community.mellanox.com/docs/DOC-2480, 2019.Google Scholar
C. Mitchell, Y. Geng, and J. Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In USENIX Conference on Annual Technical Conference (ATC), 2013. Google ScholarDigital Library
C. Mitchell, K. Montgomery, L. Nelson, S. Sen, and J. Li. Balancing CPU and network in the cell distributed b-tree store. In Proceedings of the 2016 USENIX Annual Technical Conference (ATC), 2016. Google ScholarDigital Library
R. Mittal, V. T. Lam, N. Dukkipati, E. R. Blem, H. M. G. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. TIMELY: RTT-based Congestion Control for the Datacenter. In Proceedings of the ACM SIGCOMM 2015 Conference, 2015. Google ScholarDigital Library
T. P. Morgan. Intel shows off 3D XPoint memory performance. https://searchstorage.techtarget.com/definition/3D-XPoint, 2017.Google Scholar
S. Narravula, A. Marnidala, A. Vishnu, K. Vaidyanathan, and D. K. Panda. High performance distributed lock management services using network-based remote atomic operations. In Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2007. Google ScholarDigital Library
J. Nelson, B. Holt, B. Myers, P. Briggs, L. Ceze, S. Kahan, and M. Oskin. Latency-tolerant software distributed shared memory. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC), 2015. Google ScholarDigital Library
R. Neugebauer, G. Antichi, J. Zazo, Y. Audvevich, S. López-Buedo, and A. Moore. Understanding pcie performance for end host networking. In Proceedings of the ACM SIGCOMM 2018 Conference, 2018. Google ScholarDigital Library
V. Nitu, B. Teabe, A. Tchana, C. Isci, and D. Hagimont. Welcome to zombieland: Practical and energy-efficient memory disaggregation in a datacenter. In Proceedings of the 13th EuroSys Conference (EuroSys), 2018. Google ScholarDigital Library
A. Panwar, A. Prasad, and K. Gopinath. Making huge pages actually useful. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018. Google ScholarDigital Library
G. F. Pfister. An introduction to the infiniband architecture. High Performance Mass Storage and Parallel I/O, 2001.Google Scholar
R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia. A remote direct memory access protocol specification (rfc 5040). 2007.Google Scholar
Y. Shan, S.-Y. Tsai, and Y. Zhang. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC), 2017. Google ScholarDigital Library
P. Stuedi, A. Trivedi, J. Pfefferle, R. Stoica, B. Metzler, N. Ioannou, and I. Koltsidas. Crail: A high-performance i/o architecture for distributed data processing. In IEEE Data Eng. Bull., 2017.Google Scholar
M. Su, M. Zhang, K. Chen, Z. Guo, and Y. Wu. Rfp: When rpc is faster than server-bypass with rdma. In Proceedings of the 12th European Conference on Computer Systems (EuroSys), 2017. Google ScholarDigital Library
M. M. Swift. Towards o(1) memory. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS), 2017. Google ScholarDigital Library
A. Tavakkol, A. Kolli, S. Novakovic, K. Razavi, J. Gomez-Luna, H. Hassan, C. Barthels, Y. Wang, M. Sadrosadati, S. Ghose, et al. Enabling efficient rdma-based synchronous mirroring of persistent memory transactions. In arXiv preprint arXiv:1810.09360, 2018.Google Scholar
S.-Y. Tsai and Y. Zhang. LITE Kernel RDMA Support for Datacenter Applications. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), 2017. Google ScholarDigital Library
X. Wei, Z. Dong, R. Chen, and H. Chen. Deconstructing rdma-enabled distributed transactions: Hybrid is better! In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018. Google ScholarDigital Library
X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast in-memory transaction processing using rdma and htm. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), 2015. Google ScholarDigital Library
D. Y. Yoon, M. Chowdhury, and B. Mozafari. Distributed lock management with rdma: Decentralization without starvation. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018. Google ScholarDigital Library
E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The end of a myth: Distributed transactions can scale. In Proc. VLDB Endow., 2017. Google ScholarDigital Library
Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion Control for Large-Scale RDMA Deployments. In Proceedings of the ACM SIGCOMM 2015 Conference, 2015. Google ScholarDigital Library

Index Terms

Storm: a fast transactional dataplane for remote data structures
1. Networks
  1. Network performance evaluation
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

Pre-Copy and post-copy VM live migration for memory intensive applications
Euro-Par'12: Proceedings of the 18th international conference on Parallel processing workshops

Virtualization technology provides a means for server consolidation, reducing the number of physical servers required for running a given workload. Virtual Machine (VM) live migration facilitates the transfer of a running (VM) between physical hosts ...
Read More
Hardware-supported remote persistence for distributed persistent memory
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

The advent of Persistent Memory (PM) necessitates an evolution of Remote Direct Memory Access (RDMA) technologies for supporting remote data persistence. Previous software-based solutions require remote CPU intervention and postpone the visibility of ...
Read More
A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

DMA-capable interconnects, providing ultra-low latency and high bandwidth, are increasingly being used in the context of distributed storage and data processing systems. However, the deployment of such systems in virtualized data centers is currently ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SYSTOR '19: Proceedings of the 12th ACM International Conference on Systems and Storage
May 2019
211 pages
ISBN:9781450367493
DOI:10.1145/3319647
General Chair:
Moshik Hershcovitch
IBM Research
,
Program Chairs:
Ashvin Goel
University of Toronto
,
Adam Morrison
Tel Aviv University
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
RDMA
RPC
data structures
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate94of285submissions,33%
Upcoming Conference
SYSTOR '24

Sponsor:

sigops

The 17th ACM International Systems and Storage Conference

September 23 - 25, 2024

Tel-Aviv , Israel
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 756
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Storm: a fast transactional dataplane for remote data structures

SYSTOR '19: Proceedings of the 12th ACM International Conference on Systems and Storage

ABSTRACT

References

Cited By

Index Terms

Recommendations

Pre-Copy and post-copy VM live migration for memory intensive applications

Hardware-supported remote persistence for distributed persistent memory

A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces