Partition pruning for range query on distributed log-structured merge-tree

Huang, Chenchen; Hu, Huiqi; Wei, Xing; Qian, Weining; Zhou, Aoying

doi:10.1007/s11704-019-8234-x

Partition pruning for range query on distributed log-structured merge-tree

Research Article
Published: 19 December 2019

Volume 14, article number 143604, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Chenchen Huang¹,
Huiqi Hu¹,
Xing Wei¹,
Weining Qian¹ &
…
Aoying Zhou¹

76 Accesses
3 Citations
Explore all metrics

Abstract

Log-structured merge tree (LSM-tree) is adopted by many distributed storage systems. It contains a Memtable and a number of SSTables. The Memtable is an in-memory structure and the SSTable is a disk-based structure. Data records are horizontally partitioned over the primary key and stored in different SSTables. Data writes on records are first served by the Memtable and then compacted to SSTables periodically. Although this design optimizes data writes by avoiding random disk writes, it is unfriendly to read request since the results should be retrieved and merged from both Memtable and SSTables. In particular, when the Memtable and SSTables are distributed on different nodes, it incurs expensive costs to serve range queries. A range query on non-primary key columns has to scan all partitions, which generates many network and I/O expenses. In this paper, we propose a partition pruning strategy to save cost for range queries. A statistics cache is designed to determine whether a partition contains the desired data or not, which enables read requests to avoid scanning useless partitions. As records can be updated in Memtable freely, to prevent incorrect filtering, a version-based cache synchronization strategy is proposed to ensure the queries to obtain the latest data state. We implement the proposed method in an open source distributed database and conduct comprehensive experiments. Experimental results reveal that the performance of range queries increased 30% ~ 40% with our partition pruning technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost Effective Load-Balancing Approach for Range-Partitioned Main-Memory Resident Data

GHStore: A High Performance Global Hash Based Key-Value Store

TrieKV: Managing Values After KV Separation to Optimize Scan Performance in LSM-Tree

References

O’Neil P, Cheng E, Gawlick D, O’Neil E. The log-structured mergetree (LSM-tree). Acta Informatica, 1996, 33(4): 351–385
Article Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh W C, Wallach D A, Burrows M, Chandra T, Fikes A, Gruber R E. Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 2008, 26(2): 1–26
Article Google Scholar
Lakshman A, Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35–40
Article Google Scholar
Sears R, Ramakrishnan R. BLSM: a general purpose log structured merge tree. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2012, 217–228
Ahmad M Y, Kemme B. Compaction management in distributed key-value datastores. Proceedings of the VLDB Endowment, 2015, 8(8): 850–861
Article Google Scholar
Wang J, Zhang Y, Gao Y, Xing C X. PLSM: a highly efficient LSM-tree index supporting real-time big data analysis. In: Proceedings of IEEE Computer Software & Applications Conference. 2013, 240–245
Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422–426
Article Google Scholar
Daudjee K, Salem K. Lazy database replication with snapshot isolation. In: Proceedings of International Conference on Very Large Databases. 2006, 715–726
OceanBase: an open source high performance distributed database system supporting massive data. Github Website
TPC-DS: a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. TPC-DS Homepage
Zhang H C, Lim H, Leis V, Andersen D G, Kaminsky M, Keeton K, Pavlo A. Surf: practical range query filtering with fast succinct tries. In: Proceedings of the 2018 International Conference on Management of Data. 2018, 323–336
Zhu T, Zhao Z Y, Li F F, Qian W N, Zhou A Y, Xie D. Solar: towards a shared-everything database on distributed log-structured storage. In: Proceedings of 2018 USENIX Annual Technical Conference. 2018, 795–807
RocksDB: an embeddable persistent key-value store for fast storage. Wikipedia
Shetty P, Spillane R, Malpani R, Andrews B, Justin S, Erez Z. Building workload-independent storage with VT-trees. In: Proceedings of Usenix Conference on File and Storage Technologies. 2013, 17–23
Zhu T, Hu H Q, Qian W N, Zhou A Y, Liu M Z, Zhao Q. Precise data access on distributed log-structured merge-tree. In: Proceedings of Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. 2017, 210–218
Chapter Google Scholar
Zhu Y C, Zhang Z, Cai P, Qian W N, Zhou A Y. An efficient bulk loading approach of secondary index in distributed log-structured data stores. In: Proceedings of International Conference on Database Systems for Advanced Applications. 2017, 87–102
DB2 Partitioned Tables. IBM Official Website
Baer H, Belden E, Dijcks J P, Fogel S, Hobbs L, Lane P, Lee S K. Oracle(R) Database VLDB and Partitioning Guide 11g Release 2. Oracle Corporation, 2011
Talmage R, Memtors S Q. Partitioned table and index strategies using SQL server 2008. Microsoft, 2009
Cloudera Impala: real-time queries in apache hadoop. Cloudera Official Website
Presto: Interacting with petabytes of data at facebook. Prestodb Official Website
Stinger: Interactive query for apache hive. Hortonworks Website

Download references

Acknowledgements

This work was partially supported by the Youth Science and Technology — “Yang Fan” Program of Shanghai (17YF1427800), Youth Foundation of Natural Science Foundation (61702189), National Hightech R&D Program (863 Program) (2015AA015307), the National Natural Science Foundation of China (Grant Nos. 61432006 and 61672232).

Author information

Authors and Affiliations

School of Data Science and Engineering, East China Normal University, Shanghai, 200062, China
Chenchen Huang, Huiqi Hu, Xing Wei, Weining Qian & Aoying Zhou

Authors

Chenchen Huang
View author publications
You can also search for this author inPubMed Google Scholar
Huiqi Hu
View author publications
You can also search for this author inPubMed Google Scholar
Xing Wei
View author publications
You can also search for this author inPubMed Google Scholar
Weining Qian
View author publications
You can also search for this author inPubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Huiqi Hu.

Additional information

Chenchen Huang is a PhD candidate in the School of Data Science and Engineering, East China Normal University, China. Her research interests mainly include database system theory and implementation, query optimization and index structure of inmemory database.

Huiqi Hu is currently a lecture in the School of Data Science and Engineering, East China Normal University, China. He received his Phd Degree from Tsinghua University, China. His research interests mainly include database system theory and implementation, query optimization.

Xing Wei is a PhD candidate in the School of Data Science and Engineering, East China Normal University, China. His research interests mainly include database system implementation, query optimization, and in-memory computing technology.

Weining Qian is currently a professor in computer science at East China Normal University, China. He received his MS and PhD in computer science from Fudan University, China in 2001 and 2004, respectively. He served as the co-chair of WISE 2012 Challenge, and program committee member of several international conferences, including ICDE 2009/2010/2012 and KDD 2013. His research interests include Web data management and mining of massive data sets.

Aoying Zhou is a professor on computer science at East China Normal University, China where he is heading the Institute for Data Science and Engineering. He got his master and bachelor degree in computer science from Sichuan University, China in 1988 and 1985 respectively, and won his PhD degree from Fudan University, China in 1993. He is now acting as the vice-director of ACM SIGMOD China and Technology Committee on Database of China Computer Federation. He is serving as a member of the editorial boards of some prestigious academic journals, such as VLDB Journal, and WWW Journal. His research interests include Web data management, data management for data-intensive computing, and inmemory data analytics.

Electronic Supplementary Material