skip to main content
10.1145/3319647.3325827acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Storm: a fast transactional dataplane for remote data structures

Published:22 May 2019Publication History

ABSTRACT

RDMA technology enables a host to access the memory of a remote host without involving the remote CPU, improving the performance of distributed in-memory storage systems. Previous studies argued that RDMA suffers from scalability issues, because the NIC's limited resources are unable to simultaneously cache the state of all the concurrent network streams. These concerns led to various software-based proposals to reduce the size of this state by trading off performance.

We revisit these proposals and show that they no longer apply when using newer RDMA NICs in rack-scale environments. In particular, we find that one-sided remote memory primitives lead to better performance as compared to the previously proposed unreliable datagram and kernel-based stacks. Based on this observation, we design and implement Storm, a transactional dataplane utilizing one-sided read and write-based RPC primitives. We show that Storm outperforms eRPC, FaRM, and LITE by 3.3x, 3.6x, and 17.1x, respectively, on an InfiniBand cluster with Mellanox ConnectX-4 NICs.

References

  1. Openfabrics. dynamically connected transport. https://www.openfabrics.org/images/eventpresos/workshops2014/DevWorkshop/presos/Monday/pdf/05_DC_Verbs.pdf, 2014.Google ScholarGoogle Scholar
  2. Supplement to infiniband architecture specification volume 1 release 1.2.2 annex a17: Rocev2 (ip routable roce). https://cw.infinibandta.org/document/dl/7781, 2014.Google ScholarGoogle Scholar
  3. Amazon elastic fabric adapter (efa). https://lwn.net/Articles/773973/, 2018.Google ScholarGoogle Scholar
  4. Speculative Execution Exploit Performance Impact. https://access.redhat.com/articles/3307751, 2019.Google ScholarGoogle Scholar
  5. M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, S. Novaković, A. Ramanathan, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei. Remote regions: a simple abstraction for remote memory. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload Analysis of a Large-Scale Key-Value Store. In Proceedings of the 2012 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. In Proc. VLDB Endow., 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using rdma and htm. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Dragojevic, D. Narayanan, and M. Castro. RDMA Reads: To Use or Not to Use? In IEEE Data Eng. Bull., 2017.Google ScholarGoogle Scholar
  10. A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. FaRM: Fast Remote Memory. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. RDMA over Commodity Ethernet at Scale. In Proceedings of the ACM SIGCOMM 2016 Conference, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached design on high performance rdma capable interconnects. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kalia, M. Kaminsky, and D. Andersen. Datacenter rpcs can be general and fast. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Kalia, M. Kaminsky, and D. G. Andersen. Design guidelines for high performance rdma systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (ATC), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th Symposium on Operating System Design and Implementation (OSDI), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. In Proceedings of the ACM Transactions on Database Systems., 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Kwon, H. Yu, S. Peter, C. J. Rossbach, and E. Witchel. Coordinated and efficient huge page management with ingens. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. Kv-direct: High-performance in-memory key-value store with programmable nic. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Lu, G. Chen, B. Li, K. Tan, Y. Xiong, P. Cheng, J. Zhang, E. Chen, and T. Moscibroda. Multi-path transport for RDMA in datacenters. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Lu, J. Shu, Y. Chen, and T. Li. Octopus: an rdma-enabled distributed persistent memory file system. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. LWN. A deep dive into CMA. https://lwn.net/Articles/486301/, 2012.Google ScholarGoogle Scholar
  24. Mellanox. Personal communication. 2018.Google ScholarGoogle Scholar
  25. Mellanox. Physical Address Memory Region. https://community.mellanox.com/docs/DOC-2480, 2019.Google ScholarGoogle Scholar
  26. C. Mitchell, Y. Geng, and J. Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In USENIX Conference on Annual Technical Conference (ATC), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Mitchell, K. Montgomery, L. Nelson, S. Sen, and J. Li. Balancing CPU and network in the cell distributed b-tree store. In Proceedings of the 2016 USENIX Annual Technical Conference (ATC), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Mittal, V. T. Lam, N. Dukkipati, E. R. Blem, H. M. G. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. TIMELY: RTT-based Congestion Control for the Datacenter. In Proceedings of the ACM SIGCOMM 2015 Conference, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. P. Morgan. Intel shows off 3D XPoint memory performance. https://searchstorage.techtarget.com/definition/3D-XPoint, 2017.Google ScholarGoogle Scholar
  30. S. Narravula, A. Marnidala, A. Vishnu, K. Vaidyanathan, and D. K. Panda. High performance distributed lock management services using network-based remote atomic operations. In Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Nelson, B. Holt, B. Myers, P. Briggs, L. Ceze, S. Kahan, and M. Oskin. Latency-tolerant software distributed shared memory. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Neugebauer, G. Antichi, J. Zazo, Y. Audvevich, S. López-Buedo, and A. Moore. Understanding pcie performance for end host networking. In Proceedings of the ACM SIGCOMM 2018 Conference, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Nitu, B. Teabe, A. Tchana, C. Isci, and D. Hagimont. Welcome to zombieland: Practical and energy-efficient memory disaggregation in a datacenter. In Proceedings of the 13th EuroSys Conference (EuroSys), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Panwar, A. Prasad, and K. Gopinath. Making huge pages actually useful. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. F. Pfister. An introduction to the infiniband architecture. High Performance Mass Storage and Parallel I/O, 2001.Google ScholarGoogle Scholar
  36. R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia. A remote direct memory access protocol specification (rfc 5040). 2007.Google ScholarGoogle Scholar
  37. Y. Shan, S.-Y. Tsai, and Y. Zhang. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Stuedi, A. Trivedi, J. Pfefferle, R. Stoica, B. Metzler, N. Ioannou, and I. Koltsidas. Crail: A high-performance i/o architecture for distributed data processing. In IEEE Data Eng. Bull., 2017.Google ScholarGoogle Scholar
  39. M. Su, M. Zhang, K. Chen, Z. Guo, and Y. Wu. Rfp: When rpc is faster than server-bypass with rdma. In Proceedings of the 12th European Conference on Computer Systems (EuroSys), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. M. Swift. Towards o(1) memory. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Tavakkol, A. Kolli, S. Novakovic, K. Razavi, J. Gomez-Luna, H. Hassan, C. Barthels, Y. Wang, M. Sadrosadati, S. Ghose, et al. Enabling efficient rdma-based synchronous mirroring of persistent memory transactions. In arXiv preprint arXiv:1810.09360, 2018.Google ScholarGoogle Scholar
  42. S.-Y. Tsai and Y. Zhang. LITE Kernel RDMA Support for Datacenter Applications. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. X. Wei, Z. Dong, R. Chen, and H. Chen. Deconstructing rdma-enabled distributed transactions: Hybrid is better! In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast in-memory transaction processing using rdma and htm. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Y. Yoon, M. Chowdhury, and B. Mozafari. Distributed lock management with rdma: Decentralization without starvation. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The end of a myth: Distributed transactions can scale. In Proc. VLDB Endow., 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion Control for Large-Scale RDMA Deployments. In Proceedings of the ACM SIGCOMM 2015 Conference, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Storm: a fast transactional dataplane for remote data structures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SYSTOR '19: Proceedings of the 12th ACM International Conference on Systems and Storage
        May 2019
        211 pages
        ISBN:9781450367493
        DOI:10.1145/3319647

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 May 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate94of285submissions,33%

        Upcoming Conference

        SYSTOR '24
        The 17th ACM International Systems and Storage Conference
        September 23 - 25, 2024
        Tel-Aviv , Israel

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader