skip to main content
10.1145/3605098.3635898acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Optimizing Read Performance of HBase through Dynamic Control of Data Block Sizes and KVCache

Published: 21 May 2024 Publication History

Abstract

LSM-Tree-based key-value stores such as HBase, RocksDB, and Cassandra use a fixed data block size. In this study, we show that using a fixed block size can lead to unnecessary read amplification and cache pollution. To address this issue, we propose a dynamic data block size control method to store small key-values in small data blocks and large key-values in large data blocks to minimize disk I/Os. However, using small data blocks for small key-values can result in performance issues due to increased disk seeks. To mitigate this problem, we implement a two-level cache system, which involves a lower level conventional BlockCache for storing larger, coarse-grained data blocks and an upper level cache, KVCache, for storing smaller, fine-grained key-value pairs. Our experiments show that the dynamic data block size control and fine-grained KVCache help effectively reduce read amplification and improve read performance in HBase.

References

[1]
Naver Portal. https://www.naver.com/.
[2]
Powered By Apache HBase. https://hbase.apache.org/poweredbyhbase.html.
[3]
Anirudh Badam, KyoungSoo Park, Vivek S. Pai, and Larry L. Peterson. HashCache: Cache storage for the next billion. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009.
[4]
Hyokyung Bahn, Sam H. Noh, Sang Lyul Min, and Kern Koh. Using full reference history for efficient document replacement in web caches. In Proceedings of the 2nd Conference on USENIX Symposium on Internet Technologies and Systems, 1999.
[5]
Edward Bortnikov, Anastasia Braginsky, Eshcar Hillel, Idit Keidar, and Gali Sheffi. Accordion: Better memory organization for LSM key-value stores. Proceedings of the VLDB Endowment, 11(12):1863--1875, 2018.
[6]
L. Breslau, Pei Cao, Li Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like Distributions: Evidence and Implications. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), 1999.
[7]
Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, and Yinlong Xu. HashKV: Enabling Efficient Updates in KV Storage via Hashing. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC), 2018.
[8]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. BigTable: A Distributed Storage System for Structured Data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006.
[9]
Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020.
[10]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's Highly Available Key-value Store. In Proceedings of the 21th ACM SIGOPS Symposium on Operating Systems Principles (SOSP), 2007.
[11]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. Optimizing Space Amplification in RocksDB. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2017.
[12]
HBase. https://hbase.apache.org/.
[13]
Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. The Unwritten Contract of Solid State Drives. In Proceedings of the 12th European Conference on Computer Systems (EuroSys), 2017.
[14]
Andy Huynh, Harshal A. Chaudhari, Evimaria Terzi, and Manos Athanassoulis. Endure: A Robust Tuning Paradigm for LSM Trees under Workload Uncertainty. Proceedings of the VLDB Endowment, 15(8):1605--1618, apr 2022.
[15]
Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, and Sungjin Lee. PinK: Highspeed In-storage Key-value Store with Bounded Tails. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC), 2020.
[16]
Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young ri Choi. SLM-DB: Single-Level Key-Value Store with Persistent Memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST), 2019.
[17]
Avinash Lakshman and Prashant Malik. Cassandra: A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, 44(2):35--40, April 2010.
[18]
Hoyoung Lee, Minho Lee, and Young Ik Eom. SFM: Mitigating Read/Write Amplification Problem of LSM-Tree-Based Key-Value Stores. IEEE Access, 9:103153--103166, 2021.
[19]
LevelDB. https://github.com/google/leveldb.
[20]
Yongkun Li, Chengjin Tian, Fan Guo, Cheng Li, and Yinlong Xu. ElasticBF: Elastic Bloom Filter with Hotness Awareness for Boosting Read Performance in Large Key-Value Stores. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC), 2019.
[21]
Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. SILT: A Memory-efficient, High-performance Key-value Store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), pages 1--13, 2011.
[22]
Kai Lu, Nannan Zhao, Jiguang Wan, Changhong Fei, Wei Zhao, and Tongliang Deng. TridentKV: A Read-Optimized LSM-Tree Based KV Store via Adaptive Indexing and Space-Efficient Partitioning. IEEE Transactions on Parallel and Distributed Systems, 33(8):1953--1966, 2022.
[23]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST), 2016.
[24]
Kaszo Mark and Legany Csaba. Analyzing Customer Behavior Model Graph (CBMG) using Markov Chains. In Proceedings of the 11th International Conference on Intelligent Engineering Systems, 2007.
[25]
Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. NVMKV: A Scalable, Lightweight, FTL-aware Key-Value Store. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC), 2015.
[26]
Fei Mei, Qiang Cao, Hong Jiang, and Lei Tian Tintri. LSM-tree Managed Storage for Large-scale Key-value Store. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC), 2017.
[27]
Nicolas Niclausse, Zhen Liu, and Philippe Nain. A New Efficient Caching Policy for the World Wide Web. In Proceedings of Workshop on Internet Server Performance (WISP'98), 1998.
[28]
Oracle Berkeley DB. https://www.oracle.com/database/technologies/related/berkeleydb.html.
[29]
Stefan Podlipnig and Laszlo Böszörmenyi. A survey of web cache replacement strategies. ACM Computing Surveys, 35(4):374--398, dec 2003.
[30]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. PebblesDB: Building Key-Value Stores Using Fragmented Log-Structured Merge Trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP), 2017.
[31]
P.A. Riyaz and Surekha Mariam Varghese. SLSM - A Scalable Log Structured Merge Tree with Bloom Filters for Low Latency Analytics. Procedia Technology, 24:1491--1498, 12 2016.
[32]
RocksDB. https://rocksdb.org/.
[33]
Subhadeep Sarkar, Dimitris Staratzis, Ziehen Zhu, and Manos Athanassoulis. Constructing and Analyzing the LSM Compaction Design Space. Proceedings of the VLDB Endowment, 14(11):2216--2229, jul 2021.
[34]
Fenggang Wu, Ming-Hong Yang, Baoquan Zhang, and David H.C. Du. AC-Key: Adaptive Caching for LSM-Based Key-Value Stores. In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (ATC), 2020.
[35]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC), 2020.
[36]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. Surf: Practical range query filtering with fast succinct tries. In Proceedings of the 2018 International Conference on Management of Data, 2018.
[37]
Jiacheng Zhang, Youyou Lu, Jiwu Shu, and Xiongjun Qin. FlashKV: Accelerating KV performance with open-channel SSDs. ACM Transactions on Embedded Computing Systems, 16(5s):1--19, 2017. Devices.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
April 2024
1898 pages
ISBN:9798400702433
DOI:10.1145/3605098
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2024

Check for updates

Author Tags

  1. key-value stores
  2. log-structured merge tree

Qualifiers

  • Research-article

Conference

SAC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 50
    Total Downloads
  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media