Abstract
Log-structured merge tree (LSM-tree) is adopted by many distributed storage systems. It contains a Memtable and a number of SSTables. The Memtable is an in-memory structure and the SSTable is a disk-based structure. Data records are horizontally partitioned over the primary key and stored in different SSTables. Data writes on records are first served by the Memtable and then compacted to SSTables periodically. Although this design optimizes data writes by avoiding random disk writes, it is unfriendly to read request since the results should be retrieved and merged from both Memtable and SSTables. In particular, when the Memtable and SSTables are distributed on different nodes, it incurs expensive costs to serve range queries. A range query on non-primary key columns has to scan all partitions, which generates many network and I/O expenses. In this paper, we propose a partition pruning strategy to save cost for range queries. A statistics cache is designed to determine whether a partition contains the desired data or not, which enables read requests to avoid scanning useless partitions. As records can be updated in Memtable freely, to prevent incorrect filtering, a version-based cache synchronization strategy is proposed to ensure the queries to obtain the latest data state. We implement the proposed method in an open source distributed database and conduct comprehensive experiments. Experimental results reveal that the performance of range queries increased 30% ~ 40% with our partition pruning technique.
Similar content being viewed by others
References
O’Neil P, Cheng E, Gawlick D, O’Neil E. The log-structured mergetree (LSM-tree). Acta Informatica, 1996, 33(4): 351–385
Chang F, Dean J, Ghemawat S, Hsieh W C, Wallach D A, Burrows M, Chandra T, Fikes A, Gruber R E. Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 2008, 26(2): 1–26
Lakshman A, Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35–40
Sears R, Ramakrishnan R. BLSM: a general purpose log structured merge tree. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2012, 217–228
Ahmad M Y, Kemme B. Compaction management in distributed key-value datastores. Proceedings of the VLDB Endowment, 2015, 8(8): 850–861
Wang J, Zhang Y, Gao Y, Xing C X. PLSM: a highly efficient LSM-tree index supporting real-time big data analysis. In: Proceedings of IEEE Computer Software & Applications Conference. 2013, 240–245
Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422–426
Daudjee K, Salem K. Lazy database replication with snapshot isolation. In: Proceedings of International Conference on Very Large Databases. 2006, 715–726
OceanBase: an open source high performance distributed database system supporting massive data. Github Website
TPC-DS: a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. TPC-DS Homepage
Zhang H C, Lim H, Leis V, Andersen D G, Kaminsky M, Keeton K, Pavlo A. Surf: practical range query filtering with fast succinct tries. In: Proceedings of the 2018 International Conference on Management of Data. 2018, 323–336
Zhu T, Zhao Z Y, Li F F, Qian W N, Zhou A Y, Xie D. Solar: towards a shared-everything database on distributed log-structured storage. In: Proceedings of 2018 USENIX Annual Technical Conference. 2018, 795–807
RocksDB: an embeddable persistent key-value store for fast storage. Wikipedia
Shetty P, Spillane R, Malpani R, Andrews B, Justin S, Erez Z. Building workload-independent storage with VT-trees. In: Proceedings of Usenix Conference on File and Storage Technologies. 2013, 17–23
Zhu T, Hu H Q, Qian W N, Zhou A Y, Liu M Z, Zhao Q. Precise data access on distributed log-structured merge-tree. In: Proceedings of Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. 2017, 210–218
Zhu Y C, Zhang Z, Cai P, Qian W N, Zhou A Y. An efficient bulk loading approach of secondary index in distributed log-structured data stores. In: Proceedings of International Conference on Database Systems for Advanced Applications. 2017, 87–102
DB2 Partitioned Tables. IBM Official Website
Baer H, Belden E, Dijcks J P, Fogel S, Hobbs L, Lane P, Lee S K. Oracle(R) Database VLDB and Partitioning Guide 11g Release 2. Oracle Corporation, 2011
Talmage R, Memtors S Q. Partitioned table and index strategies using SQL server 2008. Microsoft, 2009
Cloudera Impala: real-time queries in apache hadoop. Cloudera Official Website
Presto: Interacting with petabytes of data at facebook. Prestodb Official Website
Stinger: Interactive query for apache hive. Hortonworks Website
Acknowledgements
This work was partially supported by the Youth Science and Technology — “Yang Fan” Program of Shanghai (17YF1427800), Youth Foundation of Natural Science Foundation (61702189), National Hightech R&D Program (863 Program) (2015AA015307), the National Natural Science Foundation of China (Grant Nos. 61432006 and 61672232).
Author information
Authors and Affiliations
Corresponding author
Additional information
Chenchen Huang is a PhD candidate in the School of Data Science and Engineering, East China Normal University, China. Her research interests mainly include database system theory and implementation, query optimization and index structure of inmemory database.
Huiqi Hu is currently a lecture in the School of Data Science and Engineering, East China Normal University, China. He received his Phd Degree from Tsinghua University, China. His research interests mainly include database system theory and implementation, query optimization.
Xing Wei is a PhD candidate in the School of Data Science and Engineering, East China Normal University, China. His research interests mainly include database system implementation, query optimization, and in-memory computing technology.
Weining Qian is currently a professor in computer science at East China Normal University, China. He received his MS and PhD in computer science from Fudan University, China in 2001 and 2004, respectively. He served as the co-chair of WISE 2012 Challenge, and program committee member of several international conferences, including ICDE 2009/2010/2012 and KDD 2013. His research interests include Web data management and mining of massive data sets.
Aoying Zhou is a professor on computer science at East China Normal University, China where he is heading the Institute for Data Science and Engineering. He got his master and bachelor degree in computer science from Sichuan University, China in 1988 and 1985 respectively, and won his PhD degree from Fudan University, China in 1993. He is now acting as the vice-director of ACM SIGMOD China and Technology Committee on Database of China Computer Federation. He is serving as a member of the editorial boards of some prestigious academic journals, such as VLDB Journal, and WWW Journal. His research interests include Web data management, data management for data-intensive computing, and inmemory data analytics.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Huang, C., Hu, H., Wei, X. et al. Partition pruning for range query on distributed log-structured merge-tree. Front. Comput. Sci. 14, 143604 (2020). https://doi.org/10.1007/s11704-019-8234-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-019-8234-x