Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying

Published: 18 June 2021


Given a set S, Membership Querying (MQ) answers whether a query element $q\in S$. It is a fundamental task in areas like database systems and computer networks. In this paper, we consider a more general problem, Multi-Set Multi-Membership Querying (MS-MMQ). Given n sets $S_0,łdots,S_n-1 $, MS-MMQ answers which sets contain element q. A direct way to address MS-MMQ is to build an MQ structure (e.g., Bloom Filter) for each set. However, the query and space complexities grow linearly with n and become prohibitive for a large n. To address this challenge, we propose a novel Circular Shift and Coalesce (CSC) framework to efficiently achieve approximate MS-MMQ. Instead of building an MQ data structure for each set, the CSC index encodes all n sets into a compact sketch and retrieves only a few bytes in the sketch for a query, which achieves high memory-efficiency and boosts the query speed by several times. CSC is compatible with mainstream data structures for Approximate MQ. We conduct experiments on real-world datasets and results demonstrate that our framework is up to 91.2 times faster and up to 48.9 times more accurate than state-of-the-art methods.

Given a set $S$, the membership querying problem aims to answer whether a query element $q$ is in $S$ or not. It is a fundamental task in many areas such as database systems and computer networks. Given a family of sets $\{S_1,\ldots,S_n\}$, MS-MMQ aims to answer which sets contain element $q$.A direct way to address MS-MMQ is to build a membership query data structure(e.g., Bloom Filter) for each set in $\{S_1,\ldots,S_n\}$. The query and space complexities linearly grow with $n$ and so they become prohibitive when $n$ is large. To address this challenge, we propose a novel framework \emph{circular shift and coalesce} (CSC) to efficiently achieve approximate MS-MMQ. Instead of building a membership query data structure for each set, the CSC index we proposed encodes all sets $S_1, \ldots,S_n$ into a compact sketch and requires to retrieve only a few bytes in the sketch for a query's MS-MMQ, which achieves high memory-efficiency and speeds up the query speed by several times.Our framework is compatible with state-of-the-art data structures for approximatemembership querying, which support both element insertions and deletions. We conduct experiments on real-world datasets and experimental results demonstrate that our framework is about 5\textasciitilde 50 times faster and memory-efficient than state-of-the-art methods.


SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
Published: 18 June 2021


Author Tags

  membership query
  probabilistic data structure
  sketch


Funding Sources

  Natural Science Basic Research Plan in Zhejiang Province of China
  National Natural Science Foundation of China
  Shenzhen Basic Research Grant
  MoE-CMCC ``Artifical Intelligence' Project
  Natural Science Basic Research Plan in Shaanxi Province of China



