Skip to main content
Log in

Morton filters: fast, compressed sparse cuckoo filters

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Approximate set membership data structures (ASMDSs) are ubiquitous in computing. They trade a tunable, often small, error rate (\(\epsilon \)) for large space savings. The canonical ASMDS is the Bloom filter, which supports lookups and insertions but not deletions in its simplest form. Cuckoo filters (CFs), a recently proposed class of ASMDSs, add deletion support and often use fewer bits per item for equal \(\epsilon \). This work introduces the Morton filter (MF), a novel CF variant that introduces several key improvements to its progenitor. Like CFs, MFs support lookups, insertions, and deletions, and when using an optional batching interface raise their respective throughputs by up to 2.5\(\times \), 20.8\(\times \), and 1.3\(\times \). MFs achieve these improvements by (1) introducing a compressed block format that permits storing a logically sparse filter compactly in memory, (2) leveraging succinct embedded metadata to prune unnecessary memory accesses, and (3) more heavily biasing insertions to use a single hash function. With these optimizations, lookups, insertions, and deletions often only require accessing a single hardware cache line from the filter. MFs and CFs are then extended to support self-resizing, a feature of quotient filters (another ASMDS that uses fingerprints). MFs self-resize up to 13.9\(\times \) faster than rank-and-select quotient filters (a state-of-the-art self-resizing filter). These improvements are not at a loss in space efficiency, as MFs typically use comparable to slightly less space than CFs for equal \(\epsilon \).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

Notes

  1. Named after a certain elephant’s half-bird baby [27].

  2. Bender et al. [6] use the term fingerprint to mean a signature. However, its meaning is different from what a fingerprint is in a CF or MF, as it encompasses both the index and tag, whereas a fingerprint in a CF or MF is just a short hash or tag. We thus use the term signature to avoid confusion.

  3. Since publishing our VLDB’18 paper [13], Lang et al. showed that blocked Bloom filters can be faster than CFs [53].

References

  1. Almeida, P.S., Baquero, C., Preguiça, N.M., Hutchison, D.: Scalable Bloom filters. Inf. Process. Lett. 101(6), 255–261 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  2. Antoshenkov, G.: Byte-aligned bitmap compression. In DCC, pp. 476 (1995)

  3. Appleby, A.: MurmurHash. https://sites.google.com/site/murmurhash (2008). Accessed 2 May 2018

  4. Azar, Y., Broder, A.Z., Karlin, A.R., Upfal, E.: Balanced allocations. SIAM J. Comput. 29(1), 180–200 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  5. Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966)

    Article  Google Scholar 

  6. Bender, M.A., Farach-Colton, M., Johnson, R., Kraner, R., Kuszmaul, B.C., Medjedovic, D., Montes, P., Shetty, P., Spillane, R.P., Zadok, E.: Don’t thrash: how to cache your hash on flash. PVLDB 5(11), 1627–1637 (2012)

    Google Scholar 

  7. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. CACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  8. Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: memory access. In VLDB, pp. 54–65 (1999)

  9. Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In CIDR, pp. 225–237 (2005)

  10. Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting Bloom filters. ESA 6, 684–695 (2006)

    MathSciNet  MATH  Google Scholar 

  11. Bonomi, F., Mitzenmacher, M., Panigraphy, R., Singh, S., Varghese, G.: Bloom filters via d-left hashing and dynamic bit reassignment extended abstract. In Allerton, pp. 877–883 (2006)

  12. Bratbergsengen, K.: Hashing methods and relational algebra operations. In VLDB, pp. 323–333 (1984)

  13. Breslow, A., Jayasena, N.: Morton filters: faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. PVLDB 11(9), 1041–1055 (2018)

    Google Scholar 

  14. Breslow, A.D., Zhang, D.P., Greathouse, J.L., Jayasena, N., Tullsen, D.M.: Horton tables: fast hash tables for in-memory data-intensive computing. In USENIX ATC, pp. 281–294 (2016)

  15. Broder, A.Z., Mitzenmacher, M.: Network applications of Bloom filters: a survey. Internet Math. 1(4), 485–509 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In STOC, pp. 59–65, New York, NY (1978)

  17. Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with Roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)

    Article  Google Scholar 

  18. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: BigTable: a distributed storage system for structured data. TOCS 26(2), 4 (2008)

    Article  Google Scholar 

  19. Chen, H., Liao, L., Jin, H., Wu, J.: The dynamic cuckoo filter. In ICNP, pp. 1–10 (2017)

  20. Clark, M.: A new x86 core architecture for the next generation of computing. In Hot Chips, pp. 1–19 (2016)

  21. Cohen, S., Matias, Y.: Spectral Bloom filters. In SIGMOD, pp. 241–252 (2003)

  22. Colantonio, A., Pietro, R.D.: Concise: compressed ’n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)

    Article  MATH  Google Scholar 

  23. Cui, J., Zhang, J., Zhong, H., Xu, Y.: SPACF: a secure privacy-preserving authentication scheme for VANET with cuckoo filter. IEEE Trans. Veh. Technol. 66(11), 10283–10295 (2017)

    Article  Google Scholar 

  24. Dean, J., Ghemawat, S.: LevelDB: a fast persistent key-value store. https://opensource.googleblog.com/2011/07/leveldb-fast-persistent-key-value-store.html, July 27, 2011. Accessed 25 Jan 2017

  25. Deng, F., Rafiei, D.: Approximately detecting duplicates for streaming data using Stable Bloom filters. In SIGMOD, pp. 25–36 (2006)

  26. Dong, S., Callaghan, M., Galanis, L., Borthakur, D., Savor, T., Strum, M.: Optimizing space amplification in RocksDB. In CIDR (2017)

  27. Dr. Seuss. Horton Hatches the Egg. Random House (1940)

  28. Einziger, G., Friedman, R.: TinySet - an access efficient self adjusting Bloom filter construction. TON 25(4), 2295–2307 (2017)

    Google Scholar 

  29. Eppstein, D., Goodrich, M.T., Mitzenmacher, M., Torres, M.R.: 2-3 cuckoo filters for faster triangle listing and set intersection. In PODS, pp. 247–260 (2017)

  30. Erlingsson, U., Manasse, M., McSherry, F.: A cool and practical alternative to traditional hash tables. In WDAS (2006)

  31. Fan, B., Andersen, D.G., Kaminsky, M.: MemC3: compact and concurrent memcache with dumber caching and smarter hashing. In NSDI, pp. 371–384 (2013)

  32. Fan, B., Andersen, D.G., Kaminsky, M.: Cuckoo filter. https://github.com/efficient/cuckoofilter, (2017). Accessed 19 Nov 2017

  33. Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.: Cuckoo filter: practically better than Bloom. In CoNEXT, pp. 75–88 (2014)

  34. Fan, L., Cao, P., Almeida, J.M., Broder, A.Z.: Summary Cache: a scalable wide-area web cache sharing protocol. TON 8(3), 281–293 (2000)

    Google Scholar 

  35. Fisher, R.J., Dietz, H.G.: Compiling for SIMD within a register. In LCPC, pp. 290–304 (1998)

  36. Flynn, M.J.: Some computer organizations and their effectiveness. TOC 21(9):948–960 (1972)

    Article  MATH  Google Scholar 

  37. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31(3), 538–544 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  38. HBase, L George: The Definitive Guide: Random Access to Your Planet-size Data. O’Reilly Media, Inc., New York (2011)

    Google Scholar 

  39. González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In WEA, pp. 27–38 (2005)

  40. Goodman, J.R.: Using cache memory to reduce processor-memory traffic. In ISCA, pp. 124–131 (1983)

  41. Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In SC, pp. 769–780 (2014)

  42. Grissa, M., Yavuz, A.A., Hamdaoui, B.: Cuckoo filter-based location-privacy preservation in database-driven cognitive radio networks. In WSCNIS, pp. 1–7 (2015)

  43. Guo, D., Wu, J., Chen, H., Yuan, Y., Luo, X.: The dynamic Bloom filters. TKDE 22(1), 120–133 (2010)

    Google Scholar 

  44. Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In ICDE, pp. 484–495 (2014)

  45. Jacobson, G.: Space-efficient static trees and graphs. In FOCS, pp. 549–554 (1989)

  46. Kales, D., Rechberger, C., Schneider, T., Senker, M., Weinert, C.: Mobile private contact discovery at scale. In USENIX Security (2019)

  47. Kandemir, M., Zhao, H., Tang, X., Karakoy, M.: Memory row reuse distance and its role in optimizing application performance. In SIGMETRICS, pp. 137–149 (2015)

  48. Kogge, P.M., Stone, H.S.: A parallel algorithm for the efficient solution of a general class of recurrence equations. TOC 100(8), 786–793 (1973)

    MathSciNet  MATH  Google Scholar 

  49. Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T., Ching, C., Choi, A., Erickson, J., Grund, M., Hecht, D., Jacobs, M., Joshi, I., Kuff, L., Kumar, D., Leblang, A., Li, N., Pandis, I., Robinson, H., Rorke, D., Rus, S., Russell, J., Tsirogiannis, D., Wanderman-Milne, S., Yoder, M.: Impala: a modern, open-source SQL engine for Hadoop. In CIDR, (2015)

  50. Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S.E., Eaton, P.R., Geels, D., Gummadi, R., Rhea, S.C., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.Y.: . OceanStore: an architecture for global-scale persistent storage. In ASPLOS, pp. 190–201 (2000)

    Article  Google Scholar 

  51. Kwon, M., Shankar, V., Reviriego, P.: Position-aware cuckoo filters. In ANCS, pp. 151–153 (2018)

  52. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. OSR 44(2), 35–40 (2010)

    Google Scholar 

  53. lang, H., Neumann, T., Kemper, A., Boncz, P.: Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. PVLDB 12, 502–515 (2019)

    Google Scholar 

  54. Lemire, D.: A fast alternative to the modulo reduction. https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/, June 27, (2016). Accessed 07 Jan 2017

  55. Li, X., Andersen, D.G., Kaminsky, M., Freedman, M.J.: Algorithmic improvements for fast concurrent cuckoo hashing. In EuroSys, vol 27, pp. 1–27:14 (2014)

  56. Lomont, C.: Introduction to Intel advanced vector extensions. Intel White Paper, pp. 1–21 (2011)

  57. Loveman, D.B.: Program improvement by source-to-source transformation. J. ACM 24(1), 121–145 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  58. Luo, L., Guo, D., Rottenstreich, O., Ma, R.T., Luo, X., Ren, B.: The consistent cuckoo filter. In Infocom, (2019)

  59. Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for distributed queries. In VLDB, pp. 149–159 (1986)

  60. Melsted, P., Pritchard, J.K.: Efficient counting of k-mers in DNA sequences using a Bloom filter. BMC Bioinformatics 12, 333 (2011)

    Article  Google Scholar 

  61. Mitzenmacher, M.: Compressed Bloom filters. In PODC, pp. 144–150 (2001)

  62. Mitzenmacher, M.: The power of two choices in randomized load balancing. TPDPS 12(10), 1094–1104 (2001)

    Google Scholar 

  63. Mitzenmacher, M., Pontarelli, S., Reviriego, P.: Adaptive cuckoo filters. In ALENEX, pp 36–47

  64. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, Cambridge (2017)

    MATH  Google Scholar 

  65. Mula, W., Kurz, N., Lemire, D.: Faster population counts using AVX2 instructions. Comput. J. 61(1), 111–120 (2018)

    Article  Google Scholar 

  66. Navarro, G.: Compact Data Structures: A Practical Approach. Cambridge University Press, Cambridge (2016)

    Book  Google Scholar 

  67. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In Meeting on Algorithm Engineering & Expermiments, pp 60–70, (2007)

  68. O’Neil, P.E., Cheng, E., Gawlick, D., O’Neil, E.J.: The Log-Structured Merge-tree (LSM-tree). Acta Inform. 33(4), 351–385 (1996)

    Article  MATH  Google Scholar 

  69. Padua, D.A., Wolfe, M.J.: Advanced compiler optimizations for supercomputers. CACM 29(12), 1184–1201 (1986)

    Article  Google Scholar 

  70. Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  71. Pandey, P., Bender, M.A., Johnson, R., Patro, R.: A general-purpose counting filter: making every bit count. In SIGMOD, pp. 775–787 (2017)

  72. Pandey, P., Johnson, R.: A general-purpose counting filter: counting quotient filter. https://github.com/splatlab/cqf, (2017). Accessed 11 Sep 2017

  73. Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In SIGMOD, pp. 1493–1508 (2015)

  74. Putze, F., Sanders, P., Singler, J.: Cache-, hash-, and space-efficient Bloom filters. JEA, 14 (2009)

  75. Raman, R. Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In SODA, pp. 233–242 (2002)

  76. Raman, S.K., Pentkovski, V., Keshava, J.: Implementing streaming SIMD extensions on the Pentium III Processor. IEEE Micro 20(4), 47–57 (2000)

    Article  Google Scholar 

  77. Ren, K., Zheng, Q., Arulraj, J., Gibson, G.: SlimDB: a space-efficient key-value storage engine for semi-sorted data. PVLDB 10(13), 2037–2048 (2017)

    Google Scholar 

  78. Ross, K.A.: Efficient hash probes on modern processors. In Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K. (eds), ICDE, pp.1297–1301 (2007)

  79. Rottenstreich, O., Kanizo, Y., Keslassy, I.: The variable-increment counting Bloom filter. TON 22(4), 1092–1105 (2014)

    Google Scholar 

  80. Sears, R., Ramakrishnan, R.: bLSM: a general purpose Log Structured Merge tree. In SIGMOD, pp. 217–228 (2012)

  81. Seznec, A.: A new case for the TAGE branch predictor. In MICRO, pp. 117–127 (2011)

  82. Sigaev, T., Korotkov, A., Bartunov, O.: PostgreSQL 10 documentation: F.5. bloom. https://www.postgresql.org/docs/10/static/bloom.html (2017). Accessed 25 Jan 2018

  83. Singh, T., Rangarajan, S., John, D., Henrion, C., Southard, S., McIntyre, H., Novak, A., Kosonocky, S., Jotwani, R., Schaefer, A., Chang, E., Bell, J., Zen, M. Co.: a next-generation high-performance x86 core. ISSCC, pp. 52–53 (2017)

  84. Smith, J.E.: A study of branch prediction strategies. In ISCA, pp. 135–148 (1981)

  85. Stonebraker, M., Rowe, L.A., Hirohama, M.: The implementation of POSTGRES. TKDE 2(1), 125–142 (1990)

    Google Scholar 

  86. Sun, Y., Hua, Y., Jiang, S., Li, Q., Cao, S., Zuo, P.: SmartCuckoo: a fast and cost-efficient hashing index scheme for cloud storage systems. In USENIX ATC, pp. 553–565 (2017)

  87. Tarjan, R.E., Yao, A.C.: Storing a sparse table. CACM 22(11), 606–611 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  88. Tinney, W.F., Walker, J.W.: Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc. IEEE 55(11), 1801–1809 (1967)

    Article  Google Scholar 

  89. Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In ICPPW, pp. 207–216 (2010)

  90. Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L. : Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. In ISCA, pp. 191–202 (1996)

  91. Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In ISCA, pp. 392–403 (1995)

  92. Vöcking, B. How asymmetry helps load balancing. In FOCS, pp. 131–141 (1999)

  93. Wang, J., Lin, C., Papakonstantinou, Y., Swanson, S.: An experimental study of bitmap compression vs. inverted list compression. In SIGMOD, pp. 993–1008 (2017)

  94. Wolfe, M. More iteration space tiling. In SC, pp. 655–664 (1989)

  95. Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)

    Article  Google Scholar 

  96. Yoon, M.: Aging Bloom filter with two active buffers for dynamic sets. TKDE 22(1), 134–138 (2010)

    Google Scholar 

  97. Zhang, H., Lim, H., Leis, V., Andersen, D.G., Kaminsky, M., Keeton, K., Pavlo, A.: SuRF: practical range query filtering with fast succinct tries. In SIGMOD (2018)

  98. Zhang, K., Wang, K., Yuan, Y., Guo, L., Lee, R., Zhang, X.: Mega-KV: a case for GPUs to maximize the throughput of in-memory key-value stores. PVLDB 8(11), 1226–1237 (2015)

    Google Scholar 

Download references

Acknowledgements

We thank the VLDB reviewers and our kind colleagues Shaizeen Aga, Joseph L. Greathouse, Mike Ignatowski, and Gabriel Loh for their time and superb feedback which substantially improved the paper’s clarity and quality. We also thank John Kalamatianos and Jagadish Kotra for giving us access to the Skylake-X server and Karen Prairie for her edits. We finally thank the AMD Open Source Review Board, Alan Lee, Chip Freitag, Mike Chu, and the dozens of others who were involved in the auditing and open-sourcing of our Morton filter implementation. AMD is a trademark of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex D. Breslow.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Breslow, A.D., Jayasena, N.S. Morton filters: fast, compressed sparse cuckoo filters. The VLDB Journal 29, 731–754 (2020). https://doi.org/10.1007/s00778-019-00561-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00561-0

Keywords

Navigation