research-article

Enabling Efficient Updates in KV Storage via Hashing: Design and Performance Evaluation

Authors:
Yongkun Li

University of Science and Technology of China, Hefei, Anhui, China

University of Science and Technology of China, Hefei, Anhui, China
View Profile

,
Helen H. W. Chan

The Chinese University of Hong Kong, Hong Kong, China

The Chinese University of Hong Kong, Hong Kong, China

0000-0002-4466-8534
View Profile

,
Patrick P. C. Lee

The Chinese University of Hong Kong, Hong Kong, China

The Chinese University of Hong Kong, Hong Kong, China

0000-0002-4501-4364
View Profile

,
Yinlong Xu

University of Science and Technology of China, Hefei, Anhui, China

University of Science and Technology of China, Hefei, Anhui, China
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 15 Issue 3Article No.: 20pp 1–29https://doi.org/10.1145/3340287

Published:13 August 2019Publication History

ACM Transactions on Storage

Abstract

Persistent key-value (KV) stores mostly build on the Log-Structured Merge (LSM) tree for high write performance, yet the LSM-tree suffers from the inherently high I/O amplification. KV separation mitigates I/O amplification by storing only keys in the LSM-tree and values in separate storage. However, the current KV separation design remains inefficient under update-intensive workloads due to its high garbage collection (GC) overhead in value storage. We propose HashKV, which aims for high update performance atop KV separation under update-intensive workloads. HashKV uses hash-based data grouping, which deterministically maps values to storage space to make both updates and GC efficient. We further relax the restriction of such deterministic mappings via simple but useful design extensions. We extensively evaluate various design aspects of HashKV. We show that HashKV achieves 4.6× update throughput and 53.4% less write traffic compared to the current KV separation design. In addition, we demonstrate that we can integrate the design of HashKV with state-of-the-art KV stores and improve their respective performance.

References

Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC’08). 57--70. Google ScholarDigital Library
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). 53--64. Google ScholarDigital Library
Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 363--375. Google ScholarDigital Library
Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a needle in haystack: Facebook’s photo storage. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10). 47--60. Google ScholarDigital Library
Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, and Yinlong Xu. 2018. HashKV: Enabling efficient updates in KV storage via hashing. In Proceedings of the USENIX Annual Technical Conference (ATC’18). 1007--1019. Google ScholarDigital Library
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 15--15. Google ScholarDigital Library
Yu Lin Chen, Shuai Mu, Jinyang Li, Cheng Huang, Jin Li, Aaron Ogus, and Douglas Phillips. 2017. Giza: Erasure coding objects across global data centers. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 539--551. Google ScholarDigital Library
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). 143--154. Google ScholarDigital Library
Biplob Debnath, Sudipta Sengupta, and Jin Li. 2010. FlashStore: High throughput persistent key-value store. Proceedings of the VLDB Endowment 3, 1--2 (Sept. 2010), 1414--1425. Google ScholarDigital Library
Biplob Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’11). 25--36. Google ScholarDigital Library
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07). 205--220. Google ScholarDigital Library
Dgraph Labs. 2019. BadgerDB. Retrieved from https://github.com/dgraph-io/badger/.Google Scholar
Robert Escriva. 2019. HyperLevelDB. Retrieved from https://github.com/rescrv/HyperLevelDB/.Google Scholar
Facebook. 2019. RocksDB. Retrieved from https://rocksdb.org.Google Scholar
Facebook. 2019. RocksDB Features that are not in LevelDB. Retrieved from https://github.com/facebook/rocksdb/wiki/Features-Not-in-LevelDB.Google Scholar
Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). 371--384. Google ScholarDigital Library
Brad Fitzpatrick. 2004. Distributed caching with memcached. Linux J. 2004, 124 (Aug. 2004). Retrieved from http://www.linuxjournal.com/article/7451 Google ScholarDigital Library
S. Ghemawat and J. Dean. 2019. LevelDB. Retrieved from https://leveldb.org.Google Scholar
Nigel Griffiths. 2019. nmon for Linux. Retrieved from http://nmon.sourceforge.net/.Google Scholar
Jen-Wei Hsieh, Tei-Wei Kuo, and Li-Pin Chang. 2006. Efficient identification of hot data for flash memory storage systems. ACM Trans. Storage 2, 1 (Feb. 2006), 22--40. Google ScholarDigital Library
S. Kavalanekar, B. Worthington, Qi Zhang, and V. Sharda. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’08). 119--128.Google Scholar
C. Lai, S. Jiang, L. Yang, S. Lin, G. Sun, Z. Hou, C. Cui, and J. Cong. 2015. Atlas: Baidu’s key-value storage system for cloud data. In Proceedings of the 31st Symposium on Mass Storage Systems and Technologies (MSST’15). 1--14.Google Scholar
Jongsung Lee and Jin-Soo Kim. 2013. An empirical study of hot/cold data separation policies in solid state drives (SSDs). In Proceedings of the 6th International Systems and Storage Conference (SYSTOR’13). 12. Google ScholarDigital Library
Yongkun Li, Patrick P. C. Lee, John C. S. Lui, and Yinlong Xu. 2015. Impact of data locality on garbage collection in SSDs: A general analytical study. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE’15). 305--315. Google ScholarDigital Library
Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 1--13. Google ScholarDigital Library
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). 429--444. Google ScholarDigital Library
Linux Raid Wiki. 2019. RAID setup. Retrieved from https://raid.wiki.kernel.org/index.php/RAID_setup.Google Scholar
Lanyue Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. 2017. WiscKey: Separating keys from values in SSD-conscious storage. ACM Trans. Storage 13, 1 (Mar. 2017), 5. Google ScholarDigital Library
Chen Luo and Michael J. Carey. 2018. LSM-based storage techniques: A survey. Retrieved from http://arxiv.org/abs/1812.07527.Google Scholar
John MacCormick, Nicholas Murphy, Venugopalan Ramasubramanian, Udi Wieder, Junfeng Yang, and Lidong Zhou. 2009. Kinesis: A new approach to replica placement in distributed storage systems. ACM Trans. Storage 4, 4 (2009), 11. Google ScholarDigital Library
Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, FTL-aware key-value store. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 207--219. Google ScholarDigital Library
Jeanna Neefe Matthews, Drew Roselli, Adam M. Costello, Randolph Y. Wang, and Thomas E. Anderson. 1997. Improving the performance of log-structured file systems with adaptive methods. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP’97). 238--251. Google ScholarDigital Library
Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 12--12. Google ScholarDigital Library
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Enkateshwaran Venkataramani. 2013. Scaling memcache at Facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation. 385--398. Google ScholarDigital Library
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385. Google ScholarDigital Library
Oracle. 2017. Oracle Berkeley DB, Java Edition: Getting Started with Berkeley DB Java Edition, 12c Release 2 Library Version 12.2.7.5.Google Scholar
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2016. Tucana: Design and implementation of a fast and efficient scale-up key-value store. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 537--550. Google ScholarDigital Library
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). 497--514. Google ScholarDigital Library
Redis. Retrieved in June 2019. Retrieved from http://redis.io.Google Scholar
Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26--52. Google ScholarDigital Library
Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’16). 1--16. Google ScholarDigital Library
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’12). 217--228. Google ScholarDigital Library
Zhaoyan Shen, Feng Chen, Yichen Jia, and Zili Shao. 2018. DIDACache: A deep integration of device and application for flash-based key-value caching. ACM Trans. Storage 14, 3 (Nov. 2018), 26. Google ScholarDigital Library
Pradeep J. Shetty, Richard P. Spillane, Ravikant R. Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 17--30. Google ScholarDigital Library
Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Yanfeng Zhang, Siyuan Ma, and Xiaodong Zhang. 2018. A low-cost disk solution enabling LSM-tree to achieve high performance for mixed read/write workloads. ACM Trans. Storage 14, 2 (2018), 15:1--15:26. Google ScholarDigital Library
Threadpool. Retrieved in June 2019. Retrieved from http://threadpool.sourceforge.net/.Google Scholar
TPC. Retrieved in June 2019. TPC-C is an On-Line Transaction Processing Benchmark. Retrieved from http://www.tpc.org/tpcc/.Google Scholar
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 307--320. Google ScholarDigital Library
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 71--82. Google ScholarDigital Library
Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 349--362. Google ScholarDigital Library
Ting Yao, Jiguang Wan, Ping Huang, Xubin He, Qingxin Gui, Fei Wu, and Changsheng Xie. 2017. A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In Proceedings of the 33rd International Conference on Massive Storage Systems and Technology (MSST’17).Google Scholar
Yinliang Yue, Bingsheng He, Yuzhe Li, and Weiping Wang. 2017. Building an efficient put-intensive key-value store with skip-tree. IEEE Trans. Parallel Distrib. Syst. 28, 4 (Apr. 2017), 961--973. Google ScholarDigital Library
Heng Zhang, Mingkai Dong, and Haibo Chen. 2017. Efficient and available in-memory KV-store with hybrid erasure coding and replication. ACM Trans. Storage 13, 3 (Oct. 2017), 25. Google ScholarDigital Library

Index Terms

Enabling Efficient Updates in KV Storage via Hashing: Design and Performance Evaluation
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
        Key-value stores

Recommendations

Building Efficient Key-Value Stores via a Lightweight Compaction Tree
Special Issue on MSST 2017 and Regular Papers

Log-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by ...
Read More
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Key-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Read More
An Efficient Memory-Mapped Key-Value Store for Flash Storage
SoCC '18: Proceedings of the ACM Symposium on Cloud Computing

Persistent key-value stores have emerged as a main component in the data access path of modern data processing systems. However, they exhibit high CPU and I/O overhead. Today, due to power limitations it is important to reduce CPU overheads for data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Storage Volume 15, Issue 3
August 2019
173 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3336116
Editor:
Sam H. Noh
Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2019
- Accepted: 1 June 2019
- Revised: 1 March 2019
- Received: 1 November 2018
Published in tos Volume 15, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hashing
Key-value storage
LSM-tree
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 466
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Enabling Efficient Updates in KV Storage via Hashing: Design and Performance Evaluation

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Building Efficient Key-Value Stores via a Lightweight Compaction Tree

LSM-tree managed storage for large-scale key-value store

An Efficient Memory-Mapped Key-Value Store for Flash Storage

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Enabling Efficient Updates in KV Storage via Hashing: Design and Performance Evaluation

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Building Efficient Key-Value Stores via a Lightweight Compaction Tree

LSM-tree managed storage for large-scale key-value store

An Efficient Memory-Mapped Key-Value Store for Flash Storage

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media