ABSTRACT
In-memory key-value store (KVS) is a crucial component of data center applications. Since DRAM provides high bandwidth and low latency, the major performance bottleneck of common in-memory KVS lies in the network stack. Prior works have attempted to replace the traditional network stack with remote direct memory access (RDMA), which achieve orders of magnitude higher throughput and reduce the response latency. To further increase the throughput of an in-memory KVS, we propose a framework called hKVS, which enables the developers to design high-throughput heterogeneous KVS systems by adding the latest generations of smart network interface cards (SmartNIC), such as the NVIDIA BlueField DPU, to the host machines. The hKVS enables a host server to efficiently exploit the computational resources and utilize the RDMA capability of the SmartNICs to offload the workload for the CPU and increase the network bandwidth. The hKVS allows popular key-value objects to be replicated from the host to SmartNIC to form a high-throughput RDMA KVS jointly. We design the architecture of the hKVS, optimize its software implementation, and conduct a series of experiments to evaluate the resulted performance in realistic applications. By adding a SmartNIC to the host, hKVS achieves up to 1.86X and 1.48X higher throughput in 100% and 95% read workloads, which is cost-effective and scalable compared to building a KVS with multiple hosts, considering the SmartNIC costs much less than a high-performance server and multiple SmartNICs can be added to scale the throughput if needed.
- 2021. linux-rdma/perftest: Infiniband Verbs Performance Tests. (2021). https://github.com/linux-rdma/perftest Version: 4.5-0.2.Google Scholar
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-Scale Key-Value Store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '12). New York, NY, USA, 53--64.Google ScholarDigital Library
- L. Breslau, Pei Cao, Li Fan, G. Phillips, and S. Shenker. 1999. Web caching and Zipf-like distributions: evidence and implications. In IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320), Vol. 1. 126--134 vol.1.Google Scholar
- Benjamin Cassell, Tyler Szepesi, Bernard Wong, Tim Brecht, Jonathan Ma, and Xiaoyi Liu. 2017. Nessie: A Decoupled, Client-Driven Key-Value Store Using RDMA. IEEE Transactions on Parallel and Distributed Systems 28, 12 (2017), 3537--3552.Google ScholarDigital Library
- Sean Choi, Muhammad Shahbaz, Balaji Prabhakar, and Mendel Rosenblum. 2020. λ-NIC: Interactive Serverless Compute on Programmable SmartNICs. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS). 67--77.Google ScholarCross Ref
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). New York, NY, USA, 143--154.Google ScholarDigital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). Seattle, WA, 401--414.Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-Value Services. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). New York, NY, USA, 295--306.Google Scholar
- Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). Denver, CO, 437--450.Google ScholarDigital Library
- Jongyul Kim, Insu Jang, Waleed Reda, Jaeseong Im, Marco Canini, Dejan Kostić, Youngjin Kwon, Simon Peter, and Emmett Witchel. 2021. LineFS: Efficient Smart-NIC Offload of a Distributed File System with Pipeline Parallelism. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21). New York, NY, USA, 756--771.Google ScholarDigital Library
- Yanfang Le, Hyunseok Chang, Sarit Mukherjee, Limin Wang, Aditya Akella, Michael M. Swift, and T. V. Lakshman. 2017. UNO: Uniflying Host and Smart NIC Offload for Flexible Packet Processing. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC '17). New York, NY, USA, 506--519.Google Scholar
- Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). Seattle, WA, 429--444.Google ScholarDigital Library
- Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading Distributed Applications onto SmartNICs Using IPipe. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '19). New York, NY, USA, 318--333.Google ScholarDigital Library
- NVIDIA. 2021. NVIDIA BlueField-2 Datasheet. (2021). https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf Accessed: 2022-04-20.Google Scholar
- NVIDIA. 2022. Functional Diagram - BlueField DPU OS 3.8.5 - NVIDIA Networking Docs. (Jan. 2022). https://docs.nvidia.com/networking/display/BlueFieldDPUOSv385/Functional+Diagram Accessed: 2022-04-30.Google Scholar
- NVIDIA. 2022. NVIDIA InfiniBand Adapters. (2022). https://www.nvidia.com/en-us/networking/infiniband-adapters/ Accessed: 2022-05-02.Google Scholar
- Renato J. Recio, Paul R. Culley, Dave Garcia, Bernard Metzler, and Jeff Hilland. 2007. A Remote Direct Memory Access Protocol Specification. RFC 5040. (2007).Google Scholar
- Henry N. Schuh, Weihao Liang, Ming Liu, Jacob Nelson, and Arvind Krishnamurthy. 2021. Xenic: SmartNIC-Accelerated Distributed Transactions. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21). New York, NY, USA, 740--755.Google ScholarDigital Library
- Andrew S. Tanenbaum and Maarten van Steen. 2007. Distributed Systems: Principles and Paradigms (2 ed.). Pearson Prentice Hall, Upper Saddle River, NJ.Google ScholarDigital Library
- Shin-Yeh Tsai, Yizhou Shan, and Yiying Zhang. 2020. Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 33--48.Google Scholar
- Juncheng Yang, Yao Yue, and K. V. Rashmi. 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 191--208.Google Scholar
Index Terms
- hKVS: a framework for designing a high throughput heterogeneous key-value store with SmartNIC and RDMA
Recommendations
DPFS: DPU-Powered File System Virtualization
SYSTOR '23: Proceedings of the 16th ACM International Conference on Systems and StorageAs we move towards hyper-converged cloud solutions, the efficiency and overheads of distributed file systems at the cloud tenant side (i.e., client) become of paramount importance. Often, the clientside driver of a cloud file system is complex and CPU ...
KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC
SOSP '17: Proceedings of the 26th Symposium on Operating Systems PrinciplesPerformance of in-memory key-value store (KVS) continues to be of great importance as modern KVS goes beyond the traditional object-caching workload and becomes a key infrastructure to support distributed main-memory computation in data centers. Recent ...
Muninn: a Versioning Flash Key-Value Store Using an Object-based Storage Model
SYSTOR 2014: Proceedings of International Conference on Systems and StorageWhile non-volatile memory (NVRAM) devices have the potential to alleviate the trade-off between performance, scalability, and energy in storage and memory subsystems, a block interface and storage subsystems designed for slow I/O devices make it ...
Comments