skip to main content
10.1145/3627703.3629590acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Wormhole Filters: Caching Your Hash on Persistent Memory

Published: 22 April 2024 Publication History

Abstract

Approximate membership query (AMQ) data structures can approximately determine whether an element is in the set with high efficiency. They are widely used in distributed systems, database systems, bioinformatics, IoT applications, data stream mining, etc. However, the memory consumption of AMQ data structures grows rapidly as the data scale grows, which limits the system's ability to process a massive amount of data. The emerging persistent memory provides a close-to-DRAM access speed and terabyte-level capacity, facilitating AMQ data structures to handle massive data. Nevertheless, existing AMQ data structures perform poorly on persistent memory due to intensive random accesses and/or sequential writes. Therefore, we propose a novel AMQ data structure called wormhole filter, which achieves high performance on persistent memory by reducing random accesses and sequential writes. In addition, we reduce the number of log records for lower recovery overhead. Theoretical analysis and experimental results show that wormhole filters significantly outperform competitive state-of-the-art AMQ data structures. For example, wormhole filters achieve 23.26× insertion throughput, 1.98× positive lookup throughput, and 8.82× deletion throughput of the best competing baseline.

References

[1]
2017. Partitioned Index/Filters. https://rocksdb.org/blog/2017/05/12/partitioned-index-filter.html
[2]
2019. The CAIDA UCSD Anonymized Internet Traces - 20191123. https://www.caida.org/data/passive/passive_dataset.xml
[3]
2021. Facebook/RocksDB: A Library That Provides an Embeddable, Persistent Key-Value Store for Fast Storage. https://github.com/facebook/rocksdb
[4]
2021. Shalla's Blacklists. http://www.shallalist.de
[5]
2023. LevelDB. https://github.com/google/leveldb
[6]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive Random Access Memory (ReRAM) Based on Metal Oxides. Proc. IEEE 98, 12 (2010), 2237--2251.
[7]
Miguel Ángel Arévalo-Gómez, Eduardo Carrillo Zambrano, Luis Felipe Herrera-Quintero, and Jaime Chavarriaga. 2018. Water Wells Monitoring Solution in Rural Zones using IoT Approaches and Cloud-Based Real-Time Databases. In Proceedings of the Euro American Conference on Telematics and Information Systems. ACM, 1--5.
[8]
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. 2012. Don't Thrash: How to Cache Your Hash on Flash. Proceedings of the VLDB Endowment 5, 11 (2012), 1627--1637.
[9]
Timo Bingmann, Phelim Bradley, Florian Gauger, and Zamin Iqbal. 2019. COBS: A Compact Bit-Sliced Signature Index. In Proceedings of International Symposium on String Processing and Information Retrieval. Springer, 285--303.
[10]
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM 13, 7 (1970), 422--426.
[11]
Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, and George Varghese. 2006. An Improved Construction for Counting Bloom Filters. In Proceedings of Annual European Symposium on Algorithms. Springer, 684--695.
[12]
Alexander Breslow and Nuwan Jayasena. 2018. Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity. Proceedings of the VLDB Endowment 11, 9 (2018), 1041--1055.
[13]
Wei Cao, Yusong Gao, Bingchen Lin, Xiaojie Feng, Yu Xie, Xiao Lou, and Peng Wang. 2018. TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in Real-time. In Proceedings of International Conference on Management of Data. ACM, 615--627.
[14]
Hokeun Cha, Xiangpeng Hao, Tianzheng Wang, Huanchen Zhang, Aditya Akella, and Xiangyao Yu. 2023. Blink-hash: An Adaptive Hybrid Index for In-Memory Time-Series Databases. Proceedings of the VLDB Endowment 16, 6 (2023), 1235--1248.
[15]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems 26, 2 (2008), 1--26.
[16]
Hanhua Chen, Liangyi Liao, Hai Jin, and Jie Wu. 2017. The Dynamic Cuckoo Filter. In Proceedings of International Conference on Network Protocols. IEEE, 1--10.
[17]
Peiqing Chen, Dong Chen, Lingxiao Zheng, Jizhou Li, and Tong Yang. 2021. Out of Many We are One: Measuring Item Batch with Clock-Sketch. In Proceedings of International Conference on Management of Data. ACM, 261--273.
[18]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of ACM Symposium on Cloud Computing. ACM, 143--154.
[19]
Haipeng Dai, Lei Meng, Hancheng Wang, Rong Gu, Siwen Chen, Feng Chen, and Wei Hu. 2023. Distantly Supervised Entity Linking with Selection Consistency Constraint. In Proceedings of Database Systems for Advanced Applications. Springer, 784--799.
[20]
Haipeng Dai, Jun Yu, Meng Li, Wei Wang, Alex X. Liu, Jinghao Ma, Lianyong Qi, and Guihai Chen. 2023. Bloom Filter With Noisy Coding Framework for Multi-Set Membership Testing. IEEE Transactions on Knowledge and Data Engineering 35, 7 (2023), 6710--6724.
[21]
Biplob K. Debnath, Sudipta Sengupta, Jin Li, David J. Lilja, and David Hung-Chang Du. 2011. BloomFlash: Bloom Filter on Flash-Based Storage. In Proceedings of International Conference on Distributed Computing Systems. IEEE, 635--644.
[22]
Tomer Even, Guy Even, and Adam Morrison. 2022. Prefix Filter: Practically and Theoretically Better Than Bloom. Proceedings of the VLDB Endowment 15, 7 (2022), 1311--1323.
[23]
Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of ACM International Conference on Emerging Networking Experiments and Technologies. ACM, 75--88.
[24]
Chan Fan, Xiaolei Dong, Zhenfu Cao, and Jiachen Shen. 2020. VCKSCF: Efficient Verifiable Conjunctive Keyword Search Based on Cuckoo Filter for Cloud Storage. In Proceedings of International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, 285--292.
[25]
Li Fan, Pei Cao, Jussara M. Almeida, and Andrei Z. Broder. 2000. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on Networking 8, 3 (2000), 281--293.
[26]
Thomas Mueller Graf and Daniel Lemire. 2020. Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters. Journal of Experimental Algorithmics 25, 1 (2020), 1--16.
[27]
Jan Grashöfer, Florian Jacob, and Hannes Hartenstein. 2018. Towards Application of Cuckoo Filters in Network Security Monitoring. In Proceedings of International Conference on Network and Service Management. IEEE, 373--377.
[28]
Rong Gu, Simian Li, Haipeng Dai, Hancheng Wang, Yili Luo, Bin Fan, Ran Ben Basat, Ke Wang, Zhenyu Song, Shouwei Chen, Beinan Wang, Yihua Huang, and Guihai Chen. 2023. Adaptive Online Cache Capacity Optimization via Lightweight Working Set Size Estimation at Scale. In Proceedings of Annual Technical Conference. USENIX, 467--484.
[29]
Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, Ryan A. Leo Elworth, Tharun Medini, Todd J. Treangen, and Anshumali Shrivastava. 2021. Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO). In Proceedings of International Conference on Management of Data. ACM, 2226--2234.
[30]
Maurice Herlihy, Nir Shavit, and Moran Tzafrir. 2008. Hopscotch Hashing. In Proceedings of International Symposium on Distributed Computing. Springer, 350--364.
[31]
Daokun Hu, Zhiwen Chen, Wenkui Che, Jianhua Sun, and Hao Chen. 2022. Halo: A Hybrid PMem-DRAM Persistent Hash Index with Fast Recovery. In Proceedings of International Conference on Management of Data. ACM, 1049--1063.
[32]
Kaisong Huang, Yuliang He, and Tianzheng Wang. 2022. The Past, Present and Future of Indexing on Persistent Memory. Proceedings of the VLDB Endowment 15, 12 (2022), 3774--3777.
[33]
Kun Huang and Tong Yang. 2021. Tagged Cuckoo Filters. In Proceedings of International Conference on Computer Communications and Networks. IEEE, 1--10.
[34]
Kiyoto Ichikawa, Takeshi Mita, and Osamu Hori. 2006. Component-Based Robust Face Detection Using Adaboost and Decision Tree. In Proceedings of Computer Society. IEEE, 413--420.
[35]
Robert Kelly, Barak A. Pearlmutter, and Phil Maguire. 2020. Lock-Free Hopscotch Hashing. In Proceedings of Symposium on Algorithmic Principles of Computer Systems. SIAM, 45--59.
[36]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of International Conference on Management of Data. ACM, 489--504.
[37]
Harald Lang, Thomas Neumann, Alfons Kemper, and Peter A. Boncz. 2019. Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput. Proceedings of the VLDB Endowment 12, 5 (2019), 502--515.
[38]
Kang Li, Wang Fat Lau, and Man Ho Au. 2019. A Secure and Efficient Privacy-Preserving Authentication Scheme for Vehicular Networks with Batch Verification Using Cuckoo Filter. In Proceedings of International Conference on Network and System Security. Springer, 615--631.
[39]
Peng Li, Baozhou Luo, Wenjun Zhu, and He Xu. 2020. Cluster-Based Distributed Dynamic Cuckoo Filter System for Redis. International Journal of Parallel, Emergent and Distributed Systems 35, 3 (2020), 340--353.
[40]
Yunchuan Li, Ziwei Wang, Ruixin Yang, Yan Zhao, Rui Zhou, and Kai Zheng. 2023. Learned Bloom Filter for Multi-key Membership Testing. In Proceedings of Database Systems for Advanced Applications. Springer, 62--79.
[41]
Jihang Liu and Shimin Chen. 2020. Initial Experience With 3D Xpoint Main Memory. Distributed Parallel Databases 38, 4 (2020), 865--880.
[42]
Jihang Liu, Shimin Chen, and Lujun Wang. 2020. LB+-Trees: Optimizing Persistent Index Performance on 3DXPoint Memory. Proceedings of the VLDB Endowment 13, 7 (2020), 1078--1090.
[43]
Jiaqian Liu, Haipeng Dai, Rui Xia, Meng Li, Ran Ben Basat, Rui Li, and Guihai Chen. 2022. DUET: A Generic Framework for Finding Special Quadratic Elements in Data Streams. In Proceedings of International World Wide Web Conference. ACM, 2989--2997.
[44]
Baotong Lu, Xiangpeng Hao, Tianzheng Wang, and Eric Lo. 2020. Dash: Scalable Hashing on Persistent Memory. Proceedings of the VLDB Endowment 13, 8 (2020), 1147--1161.
[45]
Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin, and Stratos Idreos. 2020. Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores. In Proceedings of International Conference on Management of Data. ACM, 2071--2086.
[46]
Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks: LSM-Tree Database Storage Engine Serving Facebook's Social Graph. Proceedings of the VLDB Endowment 13, 12 (2020), 3217--3230.
[47]
Bacem Mbarek, Nabil Sahli, and Nafaâ Jabeur. 2018. BFAN: A Bloom Filter-Based Authentication in Wireless Sensor Networks. In Proceedings of International Wireless Communications & Mobile Computing Conference. IEEE, 304--309.
[48]
Hunter McCoy, Steven A. Hofmeyr, Katherine A. Yelick, and Prashant Pandey. 2023. High-Performance Filters for GPUs. In Proceedings of Annual Symposium on Principles and Practice of Parallel Programming. ACM, 160--173.
[49]
Xiaobo Nie, Lide Wang, Baohua Wang, Biao Liu, and Ping Shen. 2016. A Dynamic Linear Hashing Method for Redundancy Management in Train Ethernet Consist Network. Mathematical Problems in Engineering 16, 1 (2016), 1--10.
[50]
Hyungjun Oh, Bongki Cho, Changdae Kim, Heejin Park, and Jiwon Seo. 2020. Anifilter: Parallel and Failure-Atomic Cuckoo Filter for Nonvolatile Memories. In Proceedings of European Conference on Computer Systems. ACM, 1--15.
[51]
Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. 2017. A General-Purpose Counting Filter: Making Every Bit Count. In Proceedings of International Conference on Management of Data. ACM, 775--787.
[52]
Prashant Pandey, Alex Conway, Joe Durie, Michael A. Bender, Martin Farach-Colton, and Rob Johnson. 2021. Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design. In Proceedings of International Conference on Management of Data. ACM, 1386--1399.
[53]
Yanqing Peng, Jinwei Guo, Feifei Li, Weining Qian, and Aoying Zhou. 2018. Persistent Bloom Filter: Membership Testing for the Entire History. In Proceedings of International Conference on Management of Data. ACM, 1037--1052.
[54]
Elakkiya Prakasam and Arun Manoharan. 2022. A Cache Efficient One Hashing Blocked Bloom Filter (OHBB) for Random Strings and the K-mer Strings in DNA Sequence. Symmetry 14, 9 (2022), 1--24.
[55]
Felix Putze, Peter Sanders, and Johannes Singler. 2007. Cache-, Hashand Space-Efficient Bloom Filters. In Proceedings of Experimental Algorithms. Springer, 108--121.
[56]
M. Zubair Rafique and Juan Caballero. 2013. FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors. In Proceedings of International Workshop on Recent Advances in Intrusion Detection. Springer, 144--163.
[57]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In Proceedings of Symposium on Operating Systems Principles. ACM, 497--514.
[58]
Simone Raoux, Geoffrey W. Burr, Matthew J. Breitwisch, Charles T. Rettner, Yi-Chou Chen, Robert M. Shelby, Martin Salinga, Daniel Krebs, Shih-Hung Chen, Hsiang-Lan Lung, and Chung Hon Lam. 2008. Phase-Change Random Access Memory: A Scalable Technology. IBM Journal of Research and Development 52, 4 (2008), 465--480.
[59]
Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu. 2017. Monkey: Optimal Navigable Key-Value Store. In Proceedings of International Conference on Management of Data. ACM, 79--94.
[60]
Steve Scargall. 2020. Programming Persistent Memory: A Comprehensive Guide for Developers. Springer Nature.
[61]
Ounacer Soumaya, Talhaoui Mohamed Amine, Ardchir Soufiane, Daif Abderrahmane, and Azouazi Mohamed. 2017. Real-Time Data Stream Processing Challenges and Perspectives. International Journal of Computer Science Issues 14, 5 (2017), 6--12.
[62]
Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. 2012. Theory and Practice of Bloom Filters for Distributed Systems. IEEE/ACM Communications Surveys Tutorials 14, 1 (2012), 131--155.
[63]
Hancheng Wang, Haipeng Dai, Meng Li, Jun Yu, Rong Gu, Jiaqi Zheng, and Guihai Chen. 2022. Bamboo Filters: Make Resizing Smooth. In Proceedings of IEEE International Conference on Data Engineering. IEEE, 979--991.
[64]
Minmei Wang, Mingxun Zhou, Shouqian Shi, and Chen Qian. 2019. Vacuum Filters: More Space-Efficient and Faster Replacement for Bloom and Cuckoo Filters. Proceedings of the VLDB Endowment 13, 2 (2019), 197--210.
[65]
Yinyin Wang, Yuwang Yang, Xiulin Qiu, Yaqi Ke, and Qingguang Wang. 2022. CCF-LRU: Hybrid Storage Cache Replacement Strategy Based on Counting Cuckoo Filter Hot-Probe Method. Applied Intelligence 52, 5 (2022), 5144--5158.
[66]
Xiaocan Wu, He Huang, Yang Du, Yue Sun, and Shigang Chen. 2023. Coupon Filter: A Universal and Lightweight Filter Framework for More Accurate Data Stream Processing. Computer Networks 228, 1 (2023), 1--13.
[67]
Shuang Yu, Xiongfei Li, Hancheng Wang, Xiaoli Zhang, and Shiping Chen. 2021. C_CART: An instance confidence-based decision tree algorithm for classification. Intelligent Data Analysis 25, 4 (2021), 929--948.
[68]
Fan Zhang, Hanhua Chen, Hai Jin, and Pedro Reviriego. 2021. The Logarithmic Dynamic Cuckoo Filter. In Proceedings of IEEE International Conference on Data Engineering. IEEE, 948--959.
[69]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of International Conference on Management of Data. ACM, 323--336.
[70]
Hao Zheng, Chen Tian, Tong Yang, Huiping Lin, Chang Liu, Zhaochen Zhang, Wanchun Dou, and Guihai Chen. 2022. Flymon: Enabling On-The-Fly Task Reconfiguration for Network Measurement. In Proceedings of Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. ACM, 486--502.

Cited By

View all
  • (2025)Course Design and Textbook Development for Introduction to Computer Systems Course in the Era of ConcurrencyComputing and Combinatorics10.1007/978-981-96-1195-9_7(42-47)Online publication date: 13-Feb-2025
  • (2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024
  • (2024)Bamboo Filters: Make Resizing Smooth and AdaptiveIEEE/ACM Transactions on Networking10.1109/TNET.2024.340399732:5(3776-3791)Online publication date: Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
April 2024
1245 pages
ISBN:9798400704376
DOI:10.1145/3627703
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate Membership Query
  2. Cuckoo Filter
  3. Persistent Memory
  4. Probabilistic Data Structure

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Huawei Technologies Co., Ltd.
  • Jiangsu High-level Innovation and Entrepreneurship (Shuangchuang) Program
  • Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University
  • National Natural Science Foundation of China

Conference

EuroSys '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)425
  • Downloads (Last 6 weeks)36
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Course Design and Textbook Development for Introduction to Computer Systems Course in the Era of ConcurrencyComputing and Combinatorics10.1007/978-981-96-1195-9_7(42-47)Online publication date: 13-Feb-2025
  • (2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024
  • (2024)Bamboo Filters: Make Resizing Smooth and AdaptiveIEEE/ACM Transactions on Networking10.1109/TNET.2024.340399732:5(3776-3791)Online publication date: Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media