skip to main content
10.1145/3393527.3393534acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article

Improving Multi-set Query Processing Via a Learned Oracle

Published: 26 October 2020 Publication History

Abstract

Multi-set query is a fundamental problem in computer systems and applications. Most traditional solutions for multi-set query are based on hash tables or bloom filters. However, when the sizes of multi-sets are large, these solutions cannot achieve small memory usage, fast query speed and high accuracy at the same time. In this work, we study the problem of using a learned oracle to improve the performance of traditional multi-set query processing empirically. The key idea is to train an oracle to predict which set contains a query item e as a classification problem. To ensure an exact query result, we combine the learned oracle with a standard bloom filter and an exact-match index to catch items that are not correctly identified by the oracle. When the oracle is both small and efficient, the whole query performance can be improved. In our framework, we treat the learned oracle as a complete black box, and is not dependent on its inner workings. Theoretical proofs and experimental results show that compared to the state-of-the-art, the error rate of our approach is 0% even with much less memory usage and a comparable speed.

References

[1]
Fabiano C Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani. 2005. A practical minimal perfect hashing method. In International Workshop on Experimental and Efficient Algorithms. Springer, 488--500.
[2]
Francis Chang, Wu-chang Feng, and Kang Li. 2004. Approximate caches for packet classification. In IEEE INFOCOM, Vol. 4. INSTITUTE OF ELECTRICAL ENGINEERS INC (IEEE), 2196--2207.
[3]
Zbigniew J Czech, George Havas, and Bohdan S Majewski. 1992. An optimal algorithm for generating minimal perfect hash functions. Information processing letters 43, 5 (1992), 257--264.
[4]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[5]
Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. 2014. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies. ACM, 75--88.
[6]
PEYMAN KAZEMIAN. [n.d.]. Hassel: Header space library.[Online, Retrieved February 17, 2016] https://bitbucket.org/peymank/hassel-public/wiki.
[7]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. ACM, 489--504.
[8]
Michael Mitzenmacher. 2018. A model for learned bloom filters and optimizing by sandwiching. In Advances in Neural Information Processing Systems. 464--473.
[9]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2 (2004), 122--144.
[10]
Yan Qiao, Tao Li, and Shigang Chen. 2011. One memory access bloom filters and their generalization. In 2011 Proceedings IEEE INFOCOM. IEEE, 1745--1753.
[11]
Yang Tong, Dongsheng Yang, Jie Jiang, Siang Gao, Bin Cui, Lei Shi, and Xiaoming Li. 2019. Coloring Embedder: a Memory Efficient Data Structure for Answering Multi-set Query. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1142--1153.
[12]
Dongsheng Yang, Deyu Tian, Junzhi Gong, Siang Gao, Tong Yang, and Xiaoming Li. 2017. Difference bloom filter: A probabilistic structure for multi-set membership query. In 2017 IEEE International Conference on Communications (ICC). IEEE, 1--6.
[13]
Tong Yang, Alex X Liu, Muhammad Shahzad, Dongsheng Yang, Qiaobin Fu, Gaogang Xie, and Xiaoming Li. 2017. A shifting framework for set queries. IEEE/ACM Transactions on Networking 25, 5 (2017), 3116--3131.
[14]
Tong Yang, Alex X Liu, Muhammad Shahzad, Yuankun Zhong, Qiaobin Fu, Zi Li, Gaogang Xie, and Xiaoming Li. 2016. A shifting bloom filter framework for set queries. Proceedings of the VLDB Endowment 9, 5 (2016), 408--419.

Index Terms

  1. Improving Multi-set Query Processing Via a Learned Oracle

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACM TURC '20: Proceedings of the ACM Turing Celebration Conference - China
    May 2020
    220 pages
    ISBN:9781450375344
    DOI:10.1145/3393527
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Baidu Research: Baidu Research

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bloom filters
    2. Exact-match index
    3. Multi-set query
    4. Oracle

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACM TURC'20

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 42
      Total Downloads
    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media