Abstract
Persistent key-value (KV) stores mostly build on the Log-Structured Merge (LSM) tree for high write performance, yet the LSM-tree suffers from the inherently high I/O amplification. KV separation mitigates I/O amplification by storing only keys in the LSM-tree and values in separate storage. However, the current KV separation design remains inefficient under update-intensive workloads due to its high garbage collection (GC) overhead in value storage. We propose HashKV, which aims for high update performance atop KV separation under update-intensive workloads. HashKV uses hash-based data grouping, which deterministically maps values to storage space to make both updates and GC efficient. We further relax the restriction of such deterministic mappings via simple but useful design extensions. We extensively evaluate various design aspects of HashKV. We show that HashKV achieves 4.6× update throughput and 53.4% less write traffic compared to the current KV separation design. In addition, we demonstrate that we can integrate the design of HashKV with state-of-the-art KV stores and improve their respective performance.
- Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC’08). 57--70. Google ScholarDigital Library
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). 53--64. Google ScholarDigital Library
- Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 363--375. Google ScholarDigital Library
- Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a needle in haystack: Facebook’s photo storage. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10). 47--60. Google ScholarDigital Library
- Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, and Yinlong Xu. 2018. HashKV: Enabling efficient updates in KV storage via hashing. In Proceedings of the USENIX Annual Technical Conference (ATC’18). 1007--1019. Google ScholarDigital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 15--15. Google ScholarDigital Library
- Yu Lin Chen, Shuai Mu, Jinyang Li, Cheng Huang, Jin Li, Aaron Ogus, and Douglas Phillips. 2017. Giza: Erasure coding objects across global data centers. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 539--551. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). 143--154. Google ScholarDigital Library
- Biplob Debnath, Sudipta Sengupta, and Jin Li. 2010. FlashStore: High throughput persistent key-value store. Proceedings of the VLDB Endowment 3, 1--2 (Sept. 2010), 1414--1425. Google ScholarDigital Library
- Biplob Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’11). 25--36. Google ScholarDigital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07). 205--220. Google ScholarDigital Library
- Dgraph Labs. 2019. BadgerDB. Retrieved from https://github.com/dgraph-io/badger/.Google Scholar
- Robert Escriva. 2019. HyperLevelDB. Retrieved from https://github.com/rescrv/HyperLevelDB/.Google Scholar
- Facebook. 2019. RocksDB. Retrieved from https://rocksdb.org.Google Scholar
- Facebook. 2019. RocksDB Features that are not in LevelDB. Retrieved from https://github.com/facebook/rocksdb/wiki/Features-Not-in-LevelDB.Google Scholar
- Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). 371--384. Google ScholarDigital Library
- Brad Fitzpatrick. 2004. Distributed caching with memcached. Linux J. 2004, 124 (Aug. 2004). Retrieved from http://www.linuxjournal.com/article/7451 Google ScholarDigital Library
- S. Ghemawat and J. Dean. 2019. LevelDB. Retrieved from https://leveldb.org.Google Scholar
- Nigel Griffiths. 2019. nmon for Linux. Retrieved from http://nmon.sourceforge.net/.Google Scholar
- Jen-Wei Hsieh, Tei-Wei Kuo, and Li-Pin Chang. 2006. Efficient identification of hot data for flash memory storage systems. ACM Trans. Storage 2, 1 (Feb. 2006), 22--40. Google ScholarDigital Library
- S. Kavalanekar, B. Worthington, Qi Zhang, and V. Sharda. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’08). 119--128.Google Scholar
- C. Lai, S. Jiang, L. Yang, S. Lin, G. Sun, Z. Hou, C. Cui, and J. Cong. 2015. Atlas: Baidu’s key-value storage system for cloud data. In Proceedings of the 31st Symposium on Mass Storage Systems and Technologies (MSST’15). 1--14.Google Scholar
- Jongsung Lee and Jin-Soo Kim. 2013. An empirical study of hot/cold data separation policies in solid state drives (SSDs). In Proceedings of the 6th International Systems and Storage Conference (SYSTOR’13). 12. Google ScholarDigital Library
- Yongkun Li, Patrick P. C. Lee, John C. S. Lui, and Yinlong Xu. 2015. Impact of data locality on garbage collection in SSDs: A general analytical study. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE’15). 305--315. Google ScholarDigital Library
- Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 1--13. Google ScholarDigital Library
- Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). 429--444. Google ScholarDigital Library
- Linux Raid Wiki. 2019. RAID setup. Retrieved from https://raid.wiki.kernel.org/index.php/RAID_setup.Google Scholar
- Lanyue Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. 2017. WiscKey: Separating keys from values in SSD-conscious storage. ACM Trans. Storage 13, 1 (Mar. 2017), 5. Google ScholarDigital Library
- Chen Luo and Michael J. Carey. 2018. LSM-based storage techniques: A survey. Retrieved from http://arxiv.org/abs/1812.07527.Google Scholar
- John MacCormick, Nicholas Murphy, Venugopalan Ramasubramanian, Udi Wieder, Junfeng Yang, and Lidong Zhou. 2009. Kinesis: A new approach to replica placement in distributed storage systems. ACM Trans. Storage 4, 4 (2009), 11. Google ScholarDigital Library
- Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, FTL-aware key-value store. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 207--219. Google ScholarDigital Library
- Jeanna Neefe Matthews, Drew Roselli, Adam M. Costello, Randolph Y. Wang, and Thomas E. Anderson. 1997. Improving the performance of log-structured file systems with adaptive methods. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP’97). 238--251. Google ScholarDigital Library
- Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 12--12. Google ScholarDigital Library
- Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Enkateshwaran Venkataramani. 2013. Scaling memcache at Facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation. 385--398. Google ScholarDigital Library
- Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385. Google ScholarDigital Library
- Oracle. 2017. Oracle Berkeley DB, Java Edition: Getting Started with Berkeley DB Java Edition, 12c Release 2 Library Version 12.2.7.5.Google Scholar
- Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2016. Tucana: Design and implementation of a fast and efficient scale-up key-value store. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 537--550. Google ScholarDigital Library
- Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). 497--514. Google ScholarDigital Library
- Redis. Retrieved in June 2019. Retrieved from http://redis.io.Google Scholar
- Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26--52. Google ScholarDigital Library
- Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’16). 1--16. Google ScholarDigital Library
- Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’12). 217--228. Google ScholarDigital Library
- Zhaoyan Shen, Feng Chen, Yichen Jia, and Zili Shao. 2018. DIDACache: A deep integration of device and application for flash-based key-value caching. ACM Trans. Storage 14, 3 (Nov. 2018), 26. Google ScholarDigital Library
- Pradeep J. Shetty, Richard P. Spillane, Ravikant R. Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 17--30. Google ScholarDigital Library
- Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Yanfeng Zhang, Siyuan Ma, and Xiaodong Zhang. 2018. A low-cost disk solution enabling LSM-tree to achieve high performance for mixed read/write workloads. ACM Trans. Storage 14, 2 (2018), 15:1--15:26. Google ScholarDigital Library
- Threadpool. Retrieved in June 2019. Retrieved from http://threadpool.sourceforge.net/.Google Scholar
- TPC. Retrieved in June 2019. TPC-C is an On-Line Transaction Processing Benchmark. Retrieved from http://www.tpc.org/tpcc/.Google Scholar
- Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 307--320. Google ScholarDigital Library
- Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 71--82. Google ScholarDigital Library
- Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 349--362. Google ScholarDigital Library
- Ting Yao, Jiguang Wan, Ping Huang, Xubin He, Qingxin Gui, Fei Wu, and Changsheng Xie. 2017. A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In Proceedings of the 33rd International Conference on Massive Storage Systems and Technology (MSST’17).Google Scholar
- Yinliang Yue, Bingsheng He, Yuzhe Li, and Weiping Wang. 2017. Building an efficient put-intensive key-value store with skip-tree. IEEE Trans. Parallel Distrib. Syst. 28, 4 (Apr. 2017), 961--973. Google ScholarDigital Library
- Heng Zhang, Mingkai Dong, and Haibo Chen. 2017. Efficient and available in-memory KV-store with hybrid erasure coding and replication. ACM Trans. Storage 13, 3 (Oct. 2017), 25. Google ScholarDigital Library
Index Terms
- Enabling Efficient Updates in KV Storage via Hashing: Design and Performance Evaluation
Recommendations
Building Efficient Key-Value Stores via a Lightweight Compaction Tree
Special Issue on MSST 2017 and Regular PapersLog-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by ...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
An Efficient Memory-Mapped Key-Value Store for Flash Storage
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingPersistent key-value stores have emerged as a main component in the data access path of modern data processing systems. However, they exhibit high CPU and I/O overhead. Today, due to power limitations it is important to reduce CPU overheads for data ...
Comments