skip to main content
research-article

Succinct Range Filters

Published: 05 November 2019 Publication History

Abstract

We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in SuRF for both point and range queries are tunable to satisfy different application needs. We evaluate SuRF in RocksDB as a replacement for its Bloom filters to reduce I/O by filtering requests before they access on-disk data structures. Our experiments on a 100 GB dataset show that replacing RocksDB's Bloom filters with SuRFs speeds up open-seek (without upper-bound) and closed-seek (with upper-bound) queries by up to 1.5× and 5× with a modest cost on the worst-case (all-missing) point query throughput due to slightly higher false positive rate.

References

[1]
Facebook MyRocks. http://myrocks.io/.
[2]
Facebook RocksDB. http://rocksdb.org/.
[3]
Google LevelDB. https://github.com/google/leveldb.
[4]
The influxdb storage engine and the time-structured merge tree (tsm). https://docs.influxdata.com/influxdb/v1. 0/concepts/storage_engine/.
[5]
Kairosdb. https://kairosdb.github.io/.
[6]
Quasardb. https://en.wikipedia.org/wiki/Quasardb.
[7]
RocksDB Tuning Guide. https://github.com/facebook/ rocksdb/wiki/RocksDB-Tuning-Guide.
[8]
Squid Web Proxy Cache. http://www.squid-cache.org/.
[9]
Succinct data structures. https://en.wikipedia.org/ wiki/Succinct_data_structure.
[10]
tx-trie 0.18 -- succinct trie implementation. https://github. com/hillbig/tx-trie, 2010.
[11]
K. Alexiou, D. Kossmann, and P.-Å. Larson. Adaptive range filters for cold data: Avoiding trips to siberia. Proceedings of the VLDB Endowment, 6(14):1714--1725, 2013.
[12]
D. Arroyuelo, R. Cánovas, G. Navarro, and K. Sadakane. Succinct trees in practice. In Proceedings of ALENEX '10, pages 84--97, 2010.
[13]
D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing trees of higher degree. Algorithmica, 43(4):275-- 292, 2005.
[14]
T. Bingmann. Stx b+ tree c++ template classes. http://idlebox. net/2007/stx-btree/, 2008.
[15]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
[16]
F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An improved construction for counting bloom filters. In European Symposium on Algorithms, pages 684--695. Springer, 2006.
[17]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of SOCC'10, pages 143--154. ACM, 2010.
[18]
C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: Sql server's memory-optimized oltp engine. In Proceedings of SIGMOD'13, pages 1243--1254. ACM, 2013.
[19]
S. Dong. personal communication, 2017. 2017-08--28.
[20]
S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum. Optimizing space amplification in rocksdb. In CIDR, volume 3, page 3, 2017.
[21]
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking (TON), 8(3):281--293, 2000.
[22]
R. González, S. Grabowski, V. Mäkinen, and G. Navarro. Practical implementation of rank and select queries. In Proceedings of WEA'05, pages 27--38, 2005.
[23]
R. Grossi and G. Ottaviano. Fast compressed tries through path decompositions. Journal of Experimental Algorithmics (JEA), 19:3--4, 2015.
[24]
G. Jacobson. Space-efficient static trees and graphs. In Foundations of Computer Science, pages 549--554. IEEE, 1989.
[25]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35-- 40, 2010.
[26]
V. Leis, A. Kemper, and T. Neumann. The adaptive radix tree: Artful indexing for main-memory databases. In ICDE'13, pages 38--49. IEEE, 2013.
[27]
M. Martínez-Prieto, N. Brisaboa, R. Cánovas, F. Claude, and G. Navarro. Practical compressed string dictionaries. Information Systems, 56:73--108, 2016.
[28]
J. I. Munro and V. Raman. Succinct representation of balanced parentheses and static trees. SIAM Journal on Computing, 31(3):762--776, 2001.
[29]
G. Navarro and E. Providel. Fast, small, simple rank/select on bitmaps. In Proceedings of SEA '12, pages 295--306, 2012.
[30]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (lsm-tree). Acta Informatica, 33(4):351--385, 1996.
[31]
F. Putze, P. Sanders, and J. Singler. Cache-, hash-and space-efficient bloom filters. In Proceedings of WEA'07, pages 108--121. Springer, 2007.
[32]
S. Rhea, E. Wang, E. Wong, E. Atkins, and N. Storer. Littletable: a time-series database and its uses. In Proceedings of SIGMOD'17, pages 125--138. ACM, 2017.
[33]
K. Sadakane and G. Navarro. Fully-functional succinct trees. In SODA'10, 2010.
[34]
R. Sears and R. Ramakrishnan. blsm: a general purpose log structured merge tree. In Proceedings of SIGMOD'12, pages 217--228. ACM, 2012.
[35]
H. Song, S. Dharmapurikar, J. Turner, and J. Lockwood. Fast hash table lookup using extended bloom filter: an aid to network processing. ACM SIGCOMM Computer Communication Review, 35(4):181--192, 2005.
[36]
S. Vigna. Broadword implementation of rank/select queries. In Proceedings of WEA'08, pages 154--168, 2008.
[37]
M. Yu, A. Fabrikant, and J. Rexford. Buffalo: Bloom filter forwarding architecture for large organizations. In Proceedings of CoNEXT'09, pages 313--324. ACM, 2009.
[38]
H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the storage overhead of main-memory oltp databases with hybrid indexes. In Proceedings of SIGMOD'16, pages 1567--1581. ACM, 2016.
[39]
H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. Surf: practical range query filtering with fast succinct tries. In Proceedings of SIGMOD'18, pages 323--336. ACM, 2018.
[40]
D. Zhou, D. G. Andersen, and M. Kaminsky. Space-efficient, highperformance rank and select structures on uncompressed bit sequences. In Proceedings of SEA '13, pages 151--163. Springer, 2013.

Cited By

View all
  • (2023)On Nonlinear Learned String IndexingIEEE Access10.1109/ACCESS.2023.329543411(74021-74034)Online publication date: 2023
  • (2021)REPOSE: Distributed Top-k Trajectory Similarity Search with Local Reference Point Tries2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00067(708-719)Online publication date: Apr-2021
  • (2020)Ext-LOUDS: A Space Efficient Extended LOUDS Index for Superset QueryApplied Sciences10.3390/app1023853010:23(8530)Online publication date: 28-Nov-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 48, Issue 1
March 2019
81 pages
ISSN:0163-5808
DOI:10.1145/3371316
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2019
Published in SIGMOD Volume 48, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)On Nonlinear Learned String IndexingIEEE Access10.1109/ACCESS.2023.329543411(74021-74034)Online publication date: 2023
  • (2021)REPOSE: Distributed Top-k Trajectory Similarity Search with Local Reference Point Tries2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00067(708-719)Online publication date: Apr-2021
  • (2020)Ext-LOUDS: A Space Efficient Extended LOUDS Index for Superset QueryApplied Sciences10.3390/app1023853010:23(8530)Online publication date: 28-Nov-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media