Abstract
Probabilistic data structures (PDS) are compact representations of high-volume data that provide approximate answers to queries about the data. They are commonplace in today’s computing systems, finding use in databases, networking and more. While PDS are designed to perform well under benign inputs, they are frequently used in applications where inputs may be adversarially chosen. This may lead to a violation of their expected behaviour, for example an increase in false positive rate.
In this work, we focus on PDS that handle approximate membership queries (AMQ). We consider adversarial users with the capability of making adaptive insertions, deletions and membership queries to AMQ-PDS, and analyse the performance of AMQ-PDS under such adversarial inputs.
We argue that deletions significantly empower adversaries, presenting a challenge to enforcing honest behaviour when compared to insertion-only AMQ-PDS. To address this, we introduce a new concept of an honest setting for AMQ-PDS with deletions. By leveraging simulation-based security definitions, we then quantify how much harm can be caused by adversarial users to the functionality of AMQ-PDS. Our resulting bounds only require calculating the maximal false positive probability and insertion failure probability achievable in our novel honest setting.
We apply our results to Cuckoo filters and Counting filters. We show how to protect these AMQ-PDS at low cost, by replacing or composing the hash functions with keyed pseudorandom functions in their construction. This strategy involves establishing practical bounds for the probabilities mentioned above. Using our new techniques, we demonstrate that achieving security against adversarial users making both insertions and deletions remains practical.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Probabilistic data structures (PDS) are widespread in today’s data-driven world. They find a multitude of uses across our computing systems, in databases, networking and communication. By compactly representing data, they offer improved efficiency, with the tradeoff of providing approximate (rather than exact) answers to queries about the data.
Each PDS is specifically designed to answer certain kinds of queries. An important category of PDS is those providing approximate answers to membership queries, i.e. “is an element x a member of a set S?". We refer to this category as AMQ-PDS, which are the focus of this work. Examples of AMQ-PDS include Bloom filters [5], Counting filters [12] and Cuckoo filters [10]. While Bloom filters only support insertions of elements into the set, Counting and Cuckoo filters also allow elements to be deleted. Other categories of PDS include frequency estimators such as Count-Min sketches [9] and Heavy Keepers [17], and cardinality estimators such as HyperLogLog [14] and KMV sketches [3].
PDS find a myriad of applications, from estimating the number of distinct Google search queries [19] and detecting anomalies in network traffic [17, 22] to building privacy-preserving recommendation systems [28]. AMQ-PDS are beneficial for database query speedup [34], spam detection [38], resource and packet routing in networks [6], certificate revocation systems [23], DNA sequence analysis [29, 37], and more [26]. In particular, AMQ-PDS that support deletions, such as Counting and Cuckoo filters, are useful for cache sharing among web proxies [12], speedup of post-quantum TLS handshakes [36], efficient certificate revocation checking [35], mobile private contact discovery [18, 20], and fighting fake news [25].
The wide deployment of PDS across applications, however, comes with an increasing risk of adversarial interference. By carefully choosing inputs, malicious users can force specific elements to become false positives in AMQ-PDS [16], cause frequencies of elements to be overestimated [8, 27], or artificially inflate cardinality estimates [33], for example. This leads to dangerous consequences for the use of PDS in practice. In spite of this, such adversarial settings are typically not covered by the performance guarantees of PDS; their expected behaviour is characterised assuming honest inputs. To protect PDS against adversarial influence, cryptographic techniques can be a powerful tool. Combining PDS with cryptography results in a significant new research area with many critical open questions.
1.1 Our Contributions
In this paper, we study the correctness of AMQ-PDS in adversarial settings. We focus on malicious users interacting with an AMQ-PDS hosted by an honest service provider. In practice, users interact with the AMQ-PDS through an API, allowing dynamic updates to the stored dataset, through insertions and deletions, as well as membership queries. In this work, we address how malicious users can leverage adaptive insertions, membership queries and deletions to manipulate the performance of AMQ-PDS. We will argue that deletions, in particular, are a powerful tool for adversaries. While we focus on two commonly used AMQ-PDS, Cuckoo filters [10] and Counting filters [12], our definitions are general and can be applied to a broad range of AMQ-PDS.
Syntax for AMQ-PDS. Inspired by [8, 13], we establish a syntax for AMQ-PDS that support insertions, deletions and membership queries. We identify consistency rules for the behaviour of AMQ-PDS, satisfied by Counting and Cuckoo filters, that will allow us to prove results on their adversarial correctness.
Simulation-Based framework. We employ a simulation-based approach [24] to define security, following recent work [13, 33]. In this approach, the adversary is modelled as interacting with the AMQ-PDS in either a “real world" or an “ideal world". In the real world, the adversary has access to the AMQ-PDS through an API that allows it to insert and delete items, and make membership queries. In the ideal world, the adversary instead interacts with a simulator that models honest behaviour of the AMQ-PDS. At the end of its execution, the adversary produces an output, which is used to distinguish between the two worlds. By quantifying the distance between the worlds, we bound how much harm the adversary can do in the real world by relating it to the honest operation of the AMQ-PDS in the ideal world.
Simulation-based security definitions are traditionally used to analyse notions of privacy (for example, in searchable encryption [7]), where the simulator is given some leakage. By proving that the two worlds are indistinguishable, one concludes that the adversary can only learn this leakage, which is deemed acceptable. In contrast, our approach does not require indistinguishability between the worlds in order to give useful bounds; we will show how they can be used to set parameters for secure PDS in practice.
The power of the simulation-based approach in analysing correctness is that it covers all adversarial goals, in contrast to the game-based approach with a specific adversarial goal [8]. In practice, this means that one only needs to compute the probability of achieving a particular goal in the honest setting (which is well-studied in the PDS literature), in order to upper bound the probability of achieving it in the adversarial setting.
Adversarial Correctness for AMQ-PDS with Insertions and Deletions. To analyse adversarial correctness using the simulation-based approach, the first question to address is how to define “honest" behaviour. Allowing deletions (in addition to insertions and membership queries), however, introduces substantial hurdles.
In [13], the notion of a non-adversarially-influenced (NAI) state was proposed for insertion-only AMQ-PDS. Intuitively, this captures the idea that the state of an AMQ-PDS can be thought of as honest if one cannot predict the effect of each insertion on the state, prior to the insertion. To achieve this for many prominent AMQ-PDS, one can replace the hash functions used in their constructions with keyed Pseudo-Random Functions (PRFs).
With deletions, however, the above idea no longer suffices to capture honesty. The ability to delete elements after inserting them means that an adversary could effectively reset the state if not satisfied. For example, consider cache summarisation for content routing [1, 12]. Here, an element is automatically added to or removed from the filter whenever the cache is updated. The cache’s size poses a natural bound on the number of elements the filter stores. So, the attacker might want to force removal of elements that do not contribute to its goal of, for example, increasing the false positive probability (FPP) or making a specific target a false positive. The former significantly increases time for content retrieval on average, while the latter substantially increases retrieval time of the targeted content. While such a final state satisfy insertion unpredictability, it would still be adversarially influenced. Therefore, the deletion functionality of AMQ-PDS forms an intrinsic barrier to enforcing honesty.
Further, another complication arises from false negatives. While insertion-only AMQ-PDS may have false positives (elements that appear to be in the set when they have not been inserted), deletions may also lead to false negatives (elements that appear to not be in the set when they have not been deleted). The FPP of AMQ-PDS is typically well-characterised; false negatives, which can arise (for example) through deleting elements that were never inserted, are often assumed not to occur under honest operation. In an adversarial setting, we can no longer assume this.
We circumvent these obstacles by proposing a new notion of honesty for AMQ-PDS with both insertions and deletions, which we call NAI*. We show that building a simulator that satisfies NAI* suffices to analyse adversarial correctness for our AMQ-PDS of interest.
Our results show how to provably protect AMQ-PDS by replacing or composing public hash functions with PRFs and giving concrete bounds on the probability of achieving any adversarial goal through adaptive queries. Practitioners can use our concrete bounds to set AMQ-PDS parameters that guarantee security even with adversarial users. This is in contrast to how parameters are currently set in practice, with bounds on (for example) FPP being easily violated through precomputation attacks (on public hash functions). Using our results, practitioners can guarantee that FPP will stay below a certain threshold even with adaptive queries. This extends to any adversarial goal, e.g. creating false negatives, causing insertion failures. By showing how to ensure AMQ-PDS behave as expected even with malicious users, our work impacts any application of AMQ-PDS - in particular, applications requiring dynamic deletions, insertions and membership queries (e.g. cache sharing, coupon validation, etc.).
We emphasise that our focus is on users exploiting the API that allows interaction with an AMQ-PDS hosted by an honest service provider. To our knowledge, such an API typically does not allow users to view its internal state, e.g. [2]. Of course, in a different adversarial scenario with a compromised service provider, users could gain access to the state. While out of scope in this work, we later discuss why our results are not directly applicable to such a setting in Remark 3.
Analysis of Counting and Cuckoo filters. We conclude by providing a concrete evaluation of our security theorems by analysing Counting and Cuckoo filters. The usage of public hash functions in their original formulations leads to vulnerabilities from precomputation attacks [8, 16]. Using our theorems, we demonstrate how to provably protect them by replacing or composing the hash functions with PRFs (at the cost of needing secure key management). This requires deriving novel bounds on their NAI* false positive probability, as well as their NAI* insertion failure probability, both of which we show how to upper bound using results from the (insertion-only) AMQ-PDS literature.
Finally, we investigate the impact of our analysis for choosing appropriate parameters to secure AMQ-PDS in practice. Our results illustrate that protecting AMQ-PDS against adversarial users who can harness their full functionality is practical. Further, as a result of our new insights and techniques, extending the user’s capabilities to include deletions does not compromise security.
1.2 Related Work
In [13], Filić et al. proposed a simulation-based framework for analysing the adversarial correctness and privacy of AMQ-PDS that only support insertions. By building a simulator that models the non-adversarial operation of AMQ-PDS, they derived bounds on the closeness of an adversarially generated state to that of an honest one, applying their framework to derive correctness guarantees for Bloom and insertion-only Cuckoo filters under adversarial inputs. In our work, we use a similar methodology but cover the full functionality of AMQ-PDS, i.e. allowing deletions as well as insertions. Thus, we solve an important question left unanswered by their work, resulting in a more complete analysis of adversarial correctness of AMQ-PDS.
A simulation-based approach was also employed in [33] to study the HyperLogLog cardinality estimator in adversarial settings. While our proof technique is conceptually similar, the types of queries supported by AMQ-PDS lead to more powerful adversarial strategies, and thus a more complicated analysis.
The work of Clayton et al. [8] focused on the adversarial correctness of AMQ-PDS Bloom and Counting filters. They examined an “l-thresholded" variant of Counting filters, where insertions are disallowed if more than \(\ell \) counters are set. Their approach utilised a game-based formalism, which required defining a specific winning condition for the adversary, i.e. finding a certain number of false positives or false negatives. We provide a more detailed comparison of our work with [8] in the full version. A similar approach was adopted in [4] with an adversary who tries to maximise the false positive rate of AMQ-PDS by repeating membership queries. In contrast to these game-based methods, the simulation-based formalism does not require specifying an adversarial goal. This allows one to use our results to re-derive bounds for any specific adversary.
In [31, 32], Naor and Yogev studied the adversarial correctness of Bloom filters, again using a game-based approach. Recent work by Naor and Oved [30] further extended this to propose various robustness notions for Bloom filters. However, their adversarial model is more restricted than ours, without the ability to make adaptive insertions and membership queries. Further, as their focus is on Bloom filters, deletions do not play a role.
In [39], Yeo analysed Cuckoo hash tables, which are closely related to Cuckoo filters. However, they considered a static adversarial setting, where a set of elements is inserted at the start, with a specific adversarial goal of causing the insertion of this set to fail. In this work, we are interested in a more powerful setting where adversaries can dynamically update the dataset and can have any goal. Adversarial influence on the false positive rate of Cuckoo filters was studied in [21], but in a similarly restricted adversarial model.
Therefore, in comparison to previous work, we are the first to rigorously analyse adversarial correctness of Counting and Cuckoo filters in their full capability, for any adversarial goal. This fills a significant gap in the literature.
For scenarios where the data itself is sensitive, studying privacy might also become important. Leveraging the power of deletions to deduce information about elements in Counting filters, [15] proposed attacks on their privacy. This highlights an intrinsic challenge in enforcing privacy for AMQ-PDS with deletions, leaving the task an interesting open question.
1.3 Paper Organisation
We start with preliminaries in Sect. 2. In Sect. 3.1, we define the syntax for AMQ-PDS with deletions, the notion of a non-adversarial setting, and properties of our AMQ-PDS of interest. We analyse adversarial correctness in Sect. 4, and discuss the usefulness of our results in practice in Sect. 5.
2 Preliminaries
Notation. We follow the notation of [13], repeated here for clarity. For an integer \(m \in \mathbb {Z}_{\ge 1}\), we write [m] to denote the set \(\{1,2,...,m\}\). We consider all logarithms to be in base 2. Given two sets \(\mathfrak {D}\) and \(\mathfrak {R}\), we define \(\textsf{Funcs}[\mathfrak {D},\mathfrak {R}]\) to be the set of functions from \(\mathfrak {D}\) to \(\mathfrak {R}\). We write to mean that F is a random function \(\mathfrak {D}\xrightarrow {F}\mathfrak {R}\). Given a set S, we denote the identity function over S as \(\text {Id}_S :S \rightarrow S\). For a probability distribution D, we write to mean that x is sampled according to D. We define the statistical distance between two random variables X, Y with finite support \(D = \operatorname {Supp}\left( {X}\right) = \operatorname {Supp}\left( {Y}\right) \) as \(SD(X, Y) := \frac{1}{2} \sum _{z \in D} |{\Pr [X = z] - \Pr [Y = z]}|\). For a set S (resp. a list L), we denote by |S| (resp. |L|) the number of elements in S (resp. L). A fixed-length list of length s initialised empty is denoted by \(a \leftarrow \bot ^s\). We denote by \(\text {load}(a)\) the number of set entries of a. To insert an entry x into the first unused slot in a we write \(a'\,{\leftarrow }\,a \diamond x\) such that \(a'\,{=}\,x\,\bot \,...\,\bot \) with \(s{-}1\) trailing \(\bot \)s and \(\text {load}(a') = 1\). A further insertion \(a''\,{\leftarrow }\,a' \diamond y\) results in \(a''\,{=}\,x\,y\,\bot \,...\,\bot \) with \(\text {load}(a'') = 2\), and so on. We refer to the i-th entry in a list a as a[i]. In algorithms, we assume that all key-value stores are initialised with value \(\bot \) at every index, using the convention that \(\bot < n\), \(\forall n \in \mathbb {R}\), and we denote it as \(\{\}\). For a key-value store a, we refer to the value of the entry with key k as a[k]. We write variable assignments using \(\leftarrow \), unless the value is output by a randomised algorithm, for which we use .
For a randomised algorithm \(\textsf {alg}\), we write \(\text {output} \leftarrow \textsf {alg}(\text {input}_1\), \(\text {input}_2, ..., \text {input}_\ell ; r)\), where \(r \in \mathcal {R}\) denotes the coins that can be used by \(\textsf {alg}\) and \(\mathcal {R}\) is the set of possible coins. We may also suppress coins whenever it is notationally convenient to do so. For a deterministic algorithm, r can be set to \(\bot \). We remark that the output of a randomised algorithm can be seen as a random variable over the output space of the algorithm. Unless otherwise specified, we will consider random coins to be sampled uniformly from \(\mathcal {R}\), independently from all other inputs and/or state, and refer to such r as “freshly sampled”. If \(\textsf {alg}\) is given oracle access to functions \(f_1\), ..., \(f_n\), we denote it by \(\textsf {alg}^{f_1, ... f_n}\).
We will consider AMQ-PDS that can store elements from finite domains \(\mathfrak {D}\) by letting \(\mathfrak {D}= \cup _{\ell =0}^L \{0,1\}^\ell \) for some large but finite value of L, say \(L = 2^{64}\). In our constructions, we will make use of pseudorandom functions, which we will model as truly random functions to which the AMQ-PDS has oracle access.
Definition 1
Consider the PRF experiment in Fig. 1. We say a pseudorandom function family \(R :\mathcal {K}\times \mathfrak {D}\rightarrow \mathfrak {R}\) is \((q, t, \varepsilon )\)-secure if for all adversaries \({\mathcal {B}}\) running in time at most t and making at most q queries to its \({\textbf {RoR}}\) oracle in \(Exp_R^{PRF}\),
We say \({\mathcal {B}}\) is a (q, t)-PRF adversary.
3 AMQ-PDS
In this section, we formalise the syntax of AMQ-PDS and their behaviour under non-adversarial inputs. We formally define our AMQ-PDS of interest, Counting and Cuckoo filters, and discuss some common properties that they satisfy.
3.1 Syntax
We now define the syntax of an AMQ-PDS, extending that of [13] to include deletions. Let \(\varPi \) be an AMQ-PDS. We denote its public parameters by pp, and its state as \(\sigma \in \varSigma \), where \(\varSigma \) denotes the space of possible states of \(\varPi \). The set of elements that can be inserted into \(\varPi \) is denoted by \(\mathfrak {D}\), unless stated otherwise. We consider a syntax consisting of four algorithms:
-
The setup algorithm \(\sigma \leftarrow \textsf {setup}(pp;\, r)\) sets up the initial state of an empty PDS with public parameters pp; it will always be called first to initialise the AMQ-PDS.
-
The insertion algorithm \((b,\, \sigma ') \leftarrow \textsf {ins}(x,\, \sigma ;\, r)\), given an element \(x \in \mathfrak {D}\), attempts to insert it into the AMQ-PDS, and returns a bit \(b \in \{\bot ,\top \}\) representing whether the insertion was successful (\(b = \top )\) or not (\(b = \bot \)), and the state \(\sigma '\) of the AMQ-PDS after the insertion.
-
The deletion algorithm \((b,\, \sigma ') \leftarrow \textsf {del}(x,\, \sigma )\), given an element \(x \in \mathfrak {D}\), attempts to delete x from the AMQ-PDS, i.e. attempts to remove everything that a successful insertion on x added to \(\sigma \). The algorithm return a bit \(b \in \{\bot ,\top \}\) representing whether the deletion was successful (\(b = \top )\) or not (\(b = \bot \)), and the state \(\sigma '\) of the AMQ-PDS after the deletion.
-
The membership querying algorithm \(b \leftarrow \textsf {qry}(x,\, \sigma )\), given an element \(x \in \mathfrak {D}\), returns a bit \(b \in \{\bot ,\top \}\) (approximately) answering whether x was previously inserted (\(b = \top \)) or not (\(b = \bot \)) into the AMQ-PDS.
We remark that we only consider AMQ-PDS where membership queries do not change the state of the AMQ-PDS; thus, \(\textsf {qry}\) does not need to output a new \(\sigma '\) value. This includes popular AMQ-PDS such as Counting and Cuckoo filters.
Due to the approximate nature of AMQ-PDS, \(\textsf {qry}\) calls may return a false positive result with a certain probability. That is, we may have \(\top \,{\leftarrow }\,\textsf {qry}(x,\, \sigma )\) even though no call \(\textsf {ins}(x,\, \sigma ';\, r)\) was made post setup and prior to the membership query. We refer to the probability \(\Pr [\top \,{\leftarrow }\,\textsf {qry}(x, \sigma ) \mid x\ \text {was not inserted into}\ \varPi ]\) as the false positive probability of an AMQ-PDS \(\varPi \). In addition, since Counting and Cuckoo filters support deletions, \(\textsf {qry}\) calls may return a false negative result, where we may have \(\bot \leftarrow \textsf {qry}(x, \sigma )\) even though an \(\textsf {ins}(x, \sigma '; r)\) call was made beforehand. We refer to the probability \(\Pr [\bot \leftarrow \textsf {qry}(x, \sigma ) \mid x \text {was inserted into} \ \varPi ]\) as the false negative probability of an AMQ-PDS \(\varPi \).
Moreover, the insertion algorithm may fail to insert an element, for example if the AMQ-PDS has reached capacity. We denote the probability \( \textsf {ins}(x, \sigma )]\) as the insertion failure probability.
3.2 AMQ-PDS Under Non-adversarial Inputs
We now define the expected behaviour of AMQ-PDS in a non-adversarial setting, since we will later quantify how much the state of an AMQ-PDS can deviate from this under adversarial inputs. As in [13], we will focus on AMQ-PDS that satisfy the following properties of function-decomposability and reinsertion invariance.
Definition 2
(Function-decomposability [13]). Let \(\varPi \) be an \(\text {AMQ-PDS} \) and let with \(\mathfrak {R}\subset \mathfrak {D}\) be a random function to which \(\varPi \) has oracle access. We say \(\varPi \) is F-decomposable if
where \(\textsf {ins}^{\text {Id}_\mathfrak {R}}\), \(\textsf {del}^{\text {Id}_\mathfrak {R}}\) and \(\textsf {qry}^{\text {Id}_\mathfrak {R}}\) cannot internally evaluate F due to not having oracle access to it and F being truly random. Function-decomposability also applies to AMQ-PDS with oracle access to multiple functions.
Definition 3
(Reinsertion invariance [13]). Let \(\varPi \) be an AMQ-PDS. We say \(\varPi \) is reinsertion invariant if for all \(x \in \mathfrak {D}, \sigma \in \varSigma \) such that \(\top \leftarrow \textsf {qry}(x, \sigma )\), we have \((\top , \sigma ') \leftarrow \textsf {ins}(x, \sigma ; r) \implies \sigma = \sigma '\) \(\forall r \in \mathcal {R}\).
Reinsertion invariance is a natural property to expect from AMQ-PDS since they are designed to represent sets and not multisets. Note that if reinsertion invariance does not apply, simply repeatedly inserting a single element could lead to blocking of further insertions.
If a reinsertion-invariant AMQ-PDS contains multiple copies of the same element, deleting one copy will result in all other copies being deleted. However, reinsertion invariance does not require the state of the AMQ-PDS to remain unchanged if elements are reinserted after being deleted.
For an insertion-only AMQ-PDS satisfying function-decomposability and reinsertion invariance, the notion of a non-adversarially influenced state was proposed in [13]. We give an alternative (but equivalent) definition below.
Definition 4
(n-NAI state). Let \(\varPi \) be an AMQ-PDS with public parameters pp using \(F=\text {Id}_\mathfrak {R}\) satisfying reinsertion invariance, and let \(\sigma \leftarrow \textsf {setup}(pp)\). Let n be a non-negative integer. Let . Let L be the list of operations on \(\sigma \), where \(L=[\textsf {ins}^{\text {Id}_\mathfrak {R}}(X_1, \sigma ), ..., \textsf {ins}^{\text {Id}_\mathfrak {R}}(X_n, \sigma )]\). Then, \(\sigma \) is an n-NAI state.
We then give an alternative (but equivalent) definition of the NAI false positive probability from [13].
Definition 5
(NAI false positive probability). Let \(\varPi \) be an AMQ-PDS with public parameters pp, using a random function \(F: \mathfrak {D}\rightarrow \mathfrak {R}\) satisfying F-decomposability and reinsertion invariance. Let n be a non-negative integer. Define the NAI false positive probability after n distinct insertions as
Remark 1
Definitions 4 and 5 are equivalent to that of [13, Def. 3.4] for F-decomposable AMQ-PDS. Sampling n distinct elements from \(\mathfrak {D}\) is equivalent to sampling n strings . Similarly, sampling the queried element from \(\mathfrak {D}\setminus V\), where V is the set of n inserted elements, is equivalent to sampling .
As mentioned, the NAI state constructed in Definition 4 captures honesty for insertion-only AMQ-PDS. As long as the effect of every insertion on the state is unpredictable, the final state cannot deviate from “honest". However, for AMQ-PDS that also allow deletions, defining an honest setting is more involved. The deletion capability means that a user could insert elements, observe their effects, and then decide whether to delete them, i.e. to reset the state if not satisfied. In other words, even if every insertion is unpredictable, the final state may still be adversarially influenced (i.e. no longer an NAI state).
We overcome these issues with a new definition of the non-adversarial setting for function-decomposable, reinsertion-invariant AMQ-PDS, which we call NAI*. NAI* captures honesty up to the extent that can be achieved with both insertions and deletions. We will show that the final state of the AMQ-PDS satisfying NAI* suffices to capture a non-adversarial setting that we can analyse using results from the PDS literature.
A key component of NAI* will be the following notion: for any element not previously inserted, the effect of its insertion on the state is unpredictable (insertion unpredictability). Intuitively, this can be thought of as replacing every insertion of an element \(x \in \mathfrak {D}\) with \(X \in \mathfrak {R}\) sampled uniformly at random. This is not necessarily ensured only by F-decomposability, since the interplay between \(\textsf {ins}, \textsf {del}\) and \(\textsf {qry}\) on the same input could reveal information about F. We define insertion unpredictability in Definition 6.
Definition 6
(Insertion unpredictability). Let \(\varPi \) be an AMQ-PDS with public parameters pp, using a random function \(F: \mathfrak {D}\leftarrow \mathfrak {R}\), and satisfying F-decomposability and reinsertion invariance. Let \(\sigma \leftarrow \textsf {setup}(pp)\). Let \(\{ z_i \}\) be the elements that are successfully inserted into \(\sigma \). For every first insertion of \(z_i\), let \((\top , \sigma ') \leftarrow \textsf {ins}^F(z_i, \sigma _{i})\) and . We say \(\sigma \) has insertion unpredictability if \(SD\big (\sigma ', \overline{\sigma }\big ) = 0\).
We are now ready to define an n-NAI* state. Although an NAI* state of an AMQ-PDS can be constructed through both insertions and deletions of elements, our definition will require that all insertions are unpredictable, deletions only happen on currently inserted elements, and repeated insertions of elements only change the state if that element has been deleted. These requirements essentially capture what we would expect from honest insertions and deletions on function-decomposable, reinsertion-invariant AMQ-PDS.
Definition 7
(n-NAI* state). Let \(\varPi \) be an AMQ-PDS with public parameters pp using \(F=\text {Id}_\mathfrak {R}\) satisfying reinsertion invariance, and let \(\sigma \leftarrow \textsf {setup}(pp)\). Let n be a non-negative integer. Let . Let L be the list of operations on \(\sigma \), where each item in L is either \(\textsf {ins}^{\text {Id}_\mathfrak {R}}(\cdot , \sigma )\) or \(\textsf {del}^{\text {Id}_\mathfrak {R}}(\cdot , \sigma )\) on \(X_1, ..., X_n\). Then, \(\sigma \) is an n-NAI* state if:
-
for all \(X_i\) there is an operation in L equal to \(\textsf {ins}^{\text {Id}_\mathfrak {R}}(X_i, \sigma )\),
-
for all successful \(\textsf {del}^{\text {Id}_\mathfrak {R}}(X_i, \sigma )\) operations in L, the preceding successful operation in L on \(X_i\) is \(\textsf {ins}^{\text {Id}_\mathfrak {R}}(X_i, \sigma )\),
-
all successful \(\textsf {ins}^{\text {Id}_\mathfrak {R}}(X_i, \sigma )\) operations in L for which any prior successful operation in L on \(X_i\) is \(\textsf {ins}^{\text {Id}_\mathfrak {R}}(X_i, \sigma )\) either do not change the state, or have \(\textsf {del}^{\text {Id}_\mathfrak {R}}(X_i, \sigma )\) as their preceding successful operation on \(X_i\) in L.
It is clear to see that every n-NAI state (Definition 4) is then an n-NAI* state. We now give an analogous formulation of Definition 7 for F-decomposable AMQ-PDS, where unsuccessful insertions do not change the state.
Corollary 1
Let \(\varPi \) be an AMQ-PDS with public parameters pp using a random function \(F: \mathfrak {D}\rightarrow \mathfrak {R}\) satisfying F-decomposability and reinsertion invariance, where unsuccessful insertions do not change the state. Let \(\sigma \leftarrow \textsf {setup}(pp)\) and n be a non-negative integer. Let L be the list of operations on \(\sigma \), where each item in L is either \(\textsf {ins}^F(\cdot , \sigma )\) or \(\textsf {del}^F(\cdot , \sigma )\). Then, \(\sigma \) is an n-NAI* state if:
-
it satisfies Definition 6,
-
there are n distinct elements \(\{ z_i \}_{i \in [n]}\) for which an operation in L on \(z_i\) is \(\textsf {ins}^F(z_i, \sigma )\),
-
for all successful \(\textsf {del}^{F}(z_i, \sigma )\) operations in L, the preceding successful operation in L on \(z_i\) is \(\textsf {ins}^{F}(z_i, \sigma )\), and
-
all \(\textsf {ins}^{F}(z_i, \sigma )\) operations in L for which any prior successful operation in L on \(z_i\) is \(\textsf {ins}^{F}(z_i, \sigma )\) either do not change the state, or have \(\textsf {del}^{F}(z_i, \sigma )\) as their preceding successful operation on \(z_i\) in L.
A natural next step would be to define the false positive probability and insertion failure probability for NAI* states, analogous to that of the insertion-only setting [13]. However, while deleting an inserted element may be an operation allowed under NAI*, a user could insert elements, observe their effects, and then decide whether to delete them, i.e. to reset the state. This means that, using only n distinct elements, a user can create many different NAI* states. Therefore, a more useful notion for NAI* states is the maximal false positive and insertion failure probability, defined in terms of the “worst possible" NAI* state.
Definition 8
(Maximal NAI* false positive probability). Let \(\varPi \) be an AMQ-PDS with public parameters pp using a random function \(F: \mathfrak {D}\rightarrow \mathfrak {R}\) satisfying F-decomposability and reinsertion invariance. Let n be a non-negative integer. Define the maximal NAI* false positive probability after n insertions as
where \(\mathcal {U}^{\text {Id}_\mathfrak {R}}_{\varPi ,pp}(X_1{,} ...{,} X_n)\) outputs an NAI* state created using \({\textsf {ins}}^{\text {Id}_\mathfrak {R}}(\cdot {,} \sigma ){,}{\textsf {del}}^{\text {Id}_\mathfrak {R}}(\cdot {,} \sigma )\) on \(X_1{,} ...{,} X_n\) that has the maximal false positive probability.
Definition 8 captures the false positive probability of the “worst possible" NAI* state that can be created with insertions and deletions. The algorithm \(\mathcal {U}\) gets n strings sampled uniformly at random from \(\mathfrak {R}\) as input, and finds the ordering of insertions and deletions of these strings (possibly excluding some) that maximises the false positive probability. Since the queried is sampled randomly, \(\mathcal {U}\) is not increasing the probability that a particular element is a false positive; rather, it is creating a state with the highest false positive probability in general.
Definition 9
(Maximal NAI* insertion failure probability). Let \(\varPi \) be an AMQ-PDS with public parameters pp, using a random function \(F: \mathfrak {D}\rightarrow \mathfrak {R}\) satisfying F-decomposability and reinsertion invariance. Let n be a non-negative integer. Define the maximal NAI* insertion failure probability within n insertions as
where \(\mathcal {V}_{\varPi , pp}^{\text {Id}_\mathfrak {R}}(X_1{,} ...{,} X_n)\) outputs an NAI* state created using \(\textsf {ins}^{\text {Id}_\mathfrak {R}}(\cdot {,} \sigma ), \textsf {del}^{\text {Id}_\mathfrak {R}}(\cdot {,} \sigma )\) on \(X_1{,} ...{,} X_n\) that has the maximal probability of an insertion on one of \(X_1{,} ...{,} X_n\) failing.
Definition 9 captures the insertion failure probability of the “worst possible" NAI* state that can be created with insertions and deletions. The algorithm \(\mathcal {V}\) gets as an input n strings sampled uniformly at random from \(\mathfrak {R}\), and then finds the ordering of insertions and deletions of these strings (possibly excluding some) that maximises the probability that inserting one of these strings will fail. Note that the definition is of a slightly different flavour to Definition 8; \(\mathcal {V}\) can optimise its output in respect to \(X_l\) that is most likely to result in an insertion failure. Definition 9 naturally extends upon insertion failure definitions found in the literature [10, 12], where the probability is defined as one among n insertions failing.
3.3 Counting Filters
Counting filters are an extension of the popular Bloom filters, with the added capability of supporting deletions of elements. A Counting filter consists of an array of counters \(\sigma \) of length m initially set to \(0^m\), and a family of k independent hash functions \(H_i: \{0, 1\}^* \rightarrow [m]\), for \(i \in [k]\). To insert an element x into the filter, all k counters \(H_i(x)\) of \(\sigma \) are incremented; if any counter reaches the maximum value \(\textit{maxVal}\), the insertion fails. To delete an element x from the filter, if all k counters \(H_i(x)\) are greater than zero, they are all decremented; if not, the deletion fails. A membership query on x returns \(\top \) if all k counters \(H_i(x)\) are greater than zero. Due to collisions in the hash functions \(H_i\), Counting filters can have both false positives and false negatives. As in [13], we will bundle the k hash functions \(H_i\) into a single function \(F: \mathfrak {D}\rightarrow [m]^k\).
We now formally define Counting filters.
Definition 10
Let \(m{,} k{,} \textit{maxVal}\) be positive integers. We define an \((m{,} k{,} \textit{maxVal})\)-Counting filter to be the AMQ-PDS with algorithms defined in Fig. 2, with \(pp = (m{,} k{,} \textit{maxVal})\), and \(F: \mathfrak {D}\rightarrow [m]^k\).
We recall from the literature a bound on the NAI false positive probability for Counting filters. Due to their membership query algorithm only checking for non-zero counters, as in the case of Bloom filters, this bound is the same for both Counting and Bloom filters.
Lemma 1
([12, 13, Lemma 3.7]). Let \(\varPi \) be an \((m, k, \textit{maxVal})\)-Counting filter using a random function \(F: \mathfrak {D}\rightarrow [m]^k\). Then, for any n, \( P_{\varPi ,pp}(FP\,|\,{n}) \le \big [ 1 - e^{-\frac{(n+0.5)k}{m-1}} \big ]^k. \)
We now derive upper bounds on the maximal NAI* false positive probability and the maximal NAI* insertion failure probability for Counting filters.
Lemma 2
Let \(\varPi \) be an \((m, k, \textit{maxVal})\)-Counting filter using a random function \(F: \mathfrak {D}\rightarrow [m]^k\). Then, for any n, \( P_{\varPi ,pp}^*(FP\,|\,{n}) \le \big [ 1 - e^{-\frac{(n+0.5)k}{m-1}} \big ]^k. \)
Proof
(sketch). We construct an algorithm that inserts all \(X_1, \dots , X_n\) with \(\textit{maxVal}\) set to \(\infty \), and show that the false positive probability of the resulting state (which follows from Lemma 1) is an upper bound on the false positive probability of any n-NAI* state. For the full proof, see the full version.
Lemma 3
Let \(\varPi \) be an \((m, k, \textit{maxVal})\)-Counting filter using a random function \(F: \mathfrak {D}\rightarrow [m]^k\). Then, for any n, \( P_{\varPi ,pp}^*(IF\,|\,{n}) \le m \cdot \big [ \frac{e \cdot n \cdot k}{\textit{maxVal}\cdot m} \big ]^{\textit{maxVal}}. \)
Proof
(sketch). We construct an algorithm that inserts all \(X_1, \dots , X_n\), using a modified insertion algorithm that always increments counters (i.e. the check in line 6 of the \(\textsf {ins}\) algorithm in Fig. 2 is skipped), and with \(\textit{maxVal}\) set to \(\infty \). Let the resulting state be denoted by \(\varDelta \). We show that the insertion failure probability of any n-NAI* state with \(\textit{maxVal}\) equal to some \(\textit{limit}\) can be upper bounded by the probability that any counter in \(\varDelta \) exceeds \(\textit{limit}\). For the full proof, see the full version.
3.4 Cuckoo Filters
Cuckoo filters were proposed as an alternative to Bloom filters with improved performance and support for deletions [10]. A Cuckoo filter consists of a collection \((\sigma _i)_i\) of \(2^{\lambda _I}\) buckets, each indexed by \(i \in [2^{\lambda _I}]\) and containing s slots, together with a stash \(\sigma _{stash}\) containing one slot. They use two hash functions \(H_I: \mathfrak {D}\rightarrow \{ 0, 1\}^{\lambda _I}\) and \(H_T: \mathfrak {D}\rightarrow \{0, 1\}^{\lambda _T}\). To insert (resp. delete) an element x into the filter, its tag is computed as \(H_T(x)\) and inserted (resp. deleted) into its first or second bucket, whose indices are computed as \(i_1 = H_I(x), i_2 = i_1 \oplus H_I(H_T(x))\) respectively. If both buckets are full, an eviction process begins. A membership query on x returns \(\top \) if \(H_T(x)\) is found in either of its corresponding buckets or the stash. As for Counting filters, membership queries can return both false positive and false negative responses.
In [13], a variant of the standard Cuckoo filter called the PRF-wrapped Cuckoo filter was proposed, which was required for the proofs of adversarial correctness and privacy. In this variant, inputs to the \(\textsf {ins}, \textsf {del}\) and \(\textsf {qry}\) algorithms are simply preprocessed with a random function \(F: \mathfrak {D}\rightarrow \mathfrak {R}\), resulting in a function-decomposable filter that remains easy to implement, while satisfying the desired properties. For this reason, our work will also make use of PRF-wrapped Cuckoo filters, which we formally define below.
Definition 11
Let \(pp = (s, \lambda _I, \lambda _T, num)\) be a tuple of positive integers. We define an \((s, \lambda _I, \lambda _T, num)\)-PRF-wrapped Cuckoo filter to be the AMQ-PDS with algorithms defined in Appendix A, with \(pp = (s, \lambda _I, \lambda _T, num)\), making use of hash functions \(H_T: \mathfrak {D}\rightarrow \{0, 1 \}^{\lambda _T}\) and \(H_I: \mathfrak {D}\rightarrow \{0, 1\}^{\lambda _I}\).
Our next step is to derive upper bounds on the NAI* false positive probability and the NAI* insertion failure probability for PRF-wrapped Cuckoo filters.
Lemma 4
Let \(\varPi \) be an \((s, \lambda _I, \lambda _T, num)\)-PRF-wrapped Cuckoo filter using random functions \(F: \mathfrak {D}\rightarrow \mathfrak {R}\), \(H_T: \mathfrak {D}\rightarrow \{0, 1\}^{\lambda _T}\), and \(H_I: \mathfrak {D}\rightarrow \{0, 1\}^{\lambda _I}\). Then, for any n, \( P_{\varPi ,pp}^*(FP\,|\,{n}) \le 1 - \big ( 1 - 2^{-\lambda _T} \big )^{2\,s+1} + \frac{n}{|\mathfrak {R}|}. \)
Proof
(sketch). We demonstrate that, apart from the collision probability in the range of F between the queried element and those used to create the state, the false positive probability bound in [10] upper bounds the probability for any n-NAI* state. For the full proof, see the full version.
Lemma 5
Let \(\varPi \) be an \((s, \lambda _I, \lambda _T, num)\)-PRF-wrapped Cuckoo filter using random functions \(F: \mathfrak {D}\rightarrow \mathfrak {R}\), \(H_T: \mathfrak {D}\rightarrow \{0, 1\}^{\lambda _T}\), and \(H_I: \mathfrak {D}\rightarrow \{0, 1\}^{\lambda _I}\). Then, for any n,
Proof
(sketch). We construct an algorithm that inserts all \(X_1, ..., X_n\), using a modified insertion algorithm where an element’s tag is added to both of its buckets (if they do not already contain it), and with s set to \(\infty \). Let the resulting state be denoted by \(\varDelta \). We show that the insertion failure probability of any n-NAI* state with s equal to some \(\textit{limit}\) can be upper bounded by the probability that the load of any bucket in \(\varDelta \) exceeds \(\textit{limit}\). For the full proof, see the full version.
3.5 Consistency Rules
In this work, we will consider AMQ-PDS that satisfy some properties that we refer to as consistency rules, specified below. These rules are satisfied by Counting and Cuckoo filters.
Definition 12
(AMQ-PDS consistency rules). Consider an AMQ-PDS \(\varPi \). We say \(\varPi \) has
-
Successful deletion of positives if for all \(x \in \mathfrak {D}, \sigma \in \varSigma \), \(\top \leftarrow \textsf {qry}(x, \sigma ) \implies (\top , \sigma ') \leftarrow \textsf {del}(x, \sigma )\).
-
Unsuccessful deletion of negatives if for all \(x \in \mathfrak {D}, \sigma \in \varSigma \), \(\bot \leftarrow \textsf {qry}(x, \sigma ) \implies (\bot , \sigma ) \leftarrow \textsf {del}(x, \sigma )\).
-
Unsuccessful operation invariance if for all \(x \in \mathfrak {D}, \sigma \in \varSigma \), \((\bot , \sigma ') \leftarrow \textsf {ins}(x, \sigma )\) \( \implies \sigma ' = \sigma \) and \((\bot , \sigma ') \leftarrow \textsf {del}(x, \sigma ) \implies \sigma ' = \sigma \).
In the insertion-only setting, AMQ-PDS satisfy an additional consistency rule: for all \(x \in \mathfrak {D}, \sigma \in \varSigma \), \((\top , \sigma ) \leftarrow {\textsf {ins}}(x, \sigma ) \implies \top \leftarrow {\textsf {qry}}(x, \sigma )\). In other words, a membership query on an inserted element will always return \(\top \), meaning that \(\sigma \) has no false negative elements. In a setting with both insertions and deletions, one might expect the same rule to hold as long as x has not yet been deleted from \(\sigma \). In Definition 13, we define \(\sigma \) having no false negatives more precisely.
Definition 13
(No false negatives). Let \(\varPi \) be an AMQ-PDS with public parameters pp satisfying reinsertion invariance, and let \(\sigma \leftarrow \textsf {setup}(pp)\). Let \(\{ z_i \}\) be the elements that are successfully inserted into \(\sigma \). Let \(L_i\) be the list of successful operations on \(z_i\), where each item in \(L_i\) is either \(\textsf {ins}(z_i, \sigma )\) or \(\textsf {del}(z_i, \sigma )\). We say \(\sigma \) has no false negatives if, for all \(z_i\), if the last item in \(L_i\) is \(\textsf {ins}(z_i, \sigma )\), then \(\top \leftarrow \textsf {qry}(z_i, \sigma )\).
Unfortunately, with deletions, we cannot say that \(\sigma \) contains no false negatives. They can arise as a result of inserting or deleting false positive elements, as we will see later. Therefore, Definition 13 is not satisfied by AMQ-PDS in general, and we will not require this from the AMQ-PDS we consider. Instead, we will analyse false negatives in our security proofs by using their relationship to false positives.
4 Adversarial Correctness
In this section, we analyse the correctness of AMQ-PDS under adversarial inputs. Our starting point is the simulation-based security definition for adversarial correctness in [13]. However, while their focus was on AMQ-PDS that only support insertions and membership queries, we are now interested in a more complex scenario with insertions, membership queries and deletions. As we will see, this increase in adversarial power requires tackling some new obstacles.
We derive bounds on the correctness of AMQ-PDS that satisfy function-decomposability, reinsertion invariance, and the consistency rules in Definition 12. Then, we apply our results to analyse Counting filters instantiated using a PRF, and PRF-wrapped Cuckoo filters. In both cases, we provide concrete guarantees on their adversarial correctness.
In the following, we consider an adversary \({\mathcal {A}}\) interacting with an AMQ-PDS \(\varPi \) through an API, which we model as three oracles: \({\textbf {Ins}}\), which inserts elements of its choice into \(\varPi \), \({\textbf {Del}}\), which deletes elements of its choice from \(\varPi \), and \({\textbf {Qry}}\), which responds to membership queries (i.e. whether x has been inserted into \(\varPi \)).
4.1 Notions of Correctness
We employ a simulation-based approach to analysing the adversarial correctness of AMQ-PDS, which proceeds as follows. The adversary \({\mathcal {A}}\) plays in either the “real" or “ideal" world. In the real world, \({\mathcal {A}}\) interacts with a keyed AMQ-PDS \(\varPi \), through oracles allowing it to make insertions, deletions and membership queries on elements of its choice. In the ideal world, \({\mathcal {A}}\) instead interacts with a simulator \(\mathcal {S}\), constructed such that it provides an NAI* view of \(\varPi \) to \({\mathcal {A}}\). (Note that this differs from the definition of adversarial correctness in [13], which required \(\mathcal {S}\) to provide an NAI view.)
\({\mathcal {A}}\) then produces some arbitrary output, which the distinguisher \(\mathcal {D}\) uses to compute which world \({\mathcal {A}}\) was operating in. Finally, we bound \(\mathcal {D}\)’s ability to distinguish between the two worlds. This allows us to quantify \({\mathcal {A}}\)’s probability of achieving any adversarial goal in the real world (through adaptive insertions, deletions and membership queries) by relating it to the ideal world, which we know how to analyse.
In Fig. 3, we define the \(\text {Real-or-Ideal}\) game.
Definition 14
Let \(\varPi \) be an AMQ-PDS with public parameters pp, and let \(R_K\) be a keyed function family. We say \(\varPi \) is \((q_\text {ins}, q_\text {qry}, q_\text {del},t_a,\) \(t_d, t_s, \varepsilon )\)-adversarially correct if, for all adversaries \({\mathcal {A}}\) running in time at most \(t_a\) and making \(q_\text {ins}, q_\text {qry}\), \(q_\text {del}\) queries to oracles \({\textbf {Ins}}{,}{\textbf {Qry}}{,}{\textbf {Del}}\) respectively in the \(\text {Real-or-Ideal}\) game (Fig. 3) with a simulator \(\mathcal {S}\) that provides an NAI* view of \(\varPi \) to \({\mathcal {A}}\) and runs in time at most \(t_s\), and for all distinguishers \(\mathcal {D}\) running in time at most \(t_d\), we have:
Remark 2
We discuss why Definition 14 captures adversarial correctness, by outlining how it can be used to analyse a specific adversarial goal. Consider an adversary \({\mathcal {A}}\) that, throughout its execution, makes \({\textbf {Ins}}\) and \({\textbf {Del}}\) queries on adversarially selected inputs \(x_1, ..., x_n\), interspersed with \({\textbf {Qry}}\) queries, and ending with a final membership query \({\textbf {Qry}}(x)\) with . Suppose the output of \({\mathcal {A}}\) is the result of that final query, and \(\mathcal {D}\)’s output is identical to that of \({\mathcal {A}}\). Then, \(\Pr [\textit{Real}({\mathcal {A}}, \mathcal {D})]\) is the adversarial false positive probability of \(\varPi \) produced by \({\mathcal {A}}\), for which we cannot directly compute an upper bound, since \({\mathcal {A}}\) makes adaptive queries. However, \(\Pr [\textit{Ideal}({\mathcal {A}}, \mathcal {D}, \mathcal {S})]\) is the NAI* false positive probability, for which we can derive upper bounds for our AMQ-PDS of interest. Then, if Definition 14 is satisfied, it means we can upper bound \(\Pr [\textit{Real}({\mathcal {A}}, \mathcal {D})]\) by \(\Pr [\textit{Ideal}({\mathcal {A}}, \mathcal {D}, \mathcal {S})] + \varepsilon \). Note that our definition covers any adversarial goal (see [13, Appendix C.2]).
In Fig. 4, we construct a simulator \(\mathcal {S}\) providing an NAI* view for function-decomposable AMQ-PDS supporting insertions, deletions and membership queries. We first observe that the state constructed by \(\mathcal {S}\) is always an NAI* state (Definition 7). Every insertion of element \(z_i\) either executes \({\textsf {ins}}^{\text {Id}_\mathfrak {R}}(\cdot , \sigma )\) on fresh or does not change the state if on a currently inserted element. Moreover, only deletions of \(z_i\) that are currently inserted run \({\textsf {del}}^{\text {Id}_\mathfrak {R}}(X_i, \sigma )\). Further, by inspection, the runtime of \(\mathcal {S}\) is not significantly higher than that of the underlying AMQ-PDS.
Theorem 1
Let \(q_\text {ins}, q_\text {del}, q_\text {qry}\) be non-negative integers, and let \(t_a,\, t_d > 0\). Let \(F:\mathfrak {D}\rightarrow \mathfrak {R}\). Let \(\varPi \) be an AMQ-PDS with public parameters pp and oracle access to F, such that \(\varPi \) satisfies the consistency rules from Definition 12, F-decomposability (Definition 2), and reinsertion invariance (Definition 3). Let \(\alpha , \beta , \gamma \) be the number of calls to F required to insert, query, delete an element respectively in \(\varPi \) using its \(\textsf {ins}, \textsf {qry}, \textsf {del}\) algorithms.
If \(R_K :\mathfrak {D}\rightarrow \mathfrak {R}\) is an \((\alpha q_\text {ins} + \beta q_\text {qry} + \gamma q_\text {del}, t_a + t_d, \varepsilon )\)-secure pseudorandom function with key , then \(\varPi \) is \((q_\text {ins}, q_\text {qry}, q_\text {del}, t_a, t_s, t_d, \varepsilon ')\)-adversarially correct with respect to the simulator in Fig. 4, where \(t_s \approx t_a\) and \(\varepsilon ' = \varepsilon + 2 P_{\varPi ,pp}^*(IF\,|\,{q_\text {ins}}) + (q_\text {ins} + 2 q_\text {qry} + q_\text {del}) \cdot P_{\varPi ,pp}^*(FP\,|\,{q_\text {ins}})\).
Proof
We define an intermediate game \(G^*\) in Fig. 5. Let \(\textit{Real}\) denote the \(d=0\) version of Real-or-\(G^*\), let \(G^*\) denote the \(d = 1\) version of Real-or-\(G^*\) (or equivalently, the \(d = 0\) version of \(G^*\)-or-Ideal), and let \(\textit{Ideal}\) denote the \(d = 1\) version of \(G^*\)-or-Ideal. Then,
Our proof proceeds by bounding the closeness of \(\textit{Real}, G^*\) in Lemma 6 in terms of the PRF advantage, and that of \(G^*, \textit{Ideal}\) in Lemma 7 in terms of the probability of some “bad" event. Then, we combine these lemmas to obtain our result.
Lemma 6
The difference in probability of an arbitrary \(t_d\)-distinguisher \(\mathcal {D}\) outputting 1 in experiments of game \(\textit{Real}\)-or-\(G^*\) in Fig. 5 with a \((q_\text {ins}, q_\text {qry}, q_\text {del}, t_a)\)-AMQ-PDS adversary \({\mathcal {A}}\) is bounded by the maximal PRF advantage \(\varepsilon \) of an \((\alpha q_\text {ins} + \beta q_\text {qry} + \gamma q_\text {del}, t_a + t_d, \varepsilon )\)-PRF adversary attacking \(R_K\):
Proof
Consider the PRF adversary \({\mathcal {B}}\) (Fig. 1), instantiating the AMQ-PDS queried by \({\mathcal {A}}\) using its \({\textbf {RoR}}\) oracle, in relation to the Real-or-\(G^*\) game (Fig. 5). When \(b = 0\), \({\mathcal {B}}\) is running \(\textit{Real}\) for \({\mathcal {A}}\), and when \(b = 1\), \({\mathcal {B}}\) is instead running \(G^*\) for \({\mathcal {A}}\). Then, the advantage of \({\mathcal {B}}\) is \( \text {Adv}_{R}^{PRF} ({\mathcal {B}}) = \text {Adv}_{\varPi , \mathcal {A, S}}^{{\textit{Real}\text {-or-}G^*}} (\mathcal {D}). \) Since \(R_K\) is an \((\alpha q_\text {ins} + \beta q_\text {qry} + \gamma q_\text {del}, t_a + t_d, \varepsilon )\)-secure PRF, \( \text {Adv}_{\varPi , \mathcal {A, S}}^{{\textit{Real}\text {-or-}G^*}} (\mathcal {D}) \le \varepsilon . \)
Lemma 7
The difference in probability of an arbitrary \(t_d\)-distinguisher \(\mathcal {D}\) outputting 1 in experiments of game \(G^*\)-or-Ideal in Fig. 5 with a \((q_\text {ins}, q_\text {qry}, q_\text {del}, t_a)\)-AMQ-PDS adversary \({\mathcal {A}}\) is bounded as follows:
Proof
We wish to bound the probability of distinguishing between \(G^*\) and \(\textit{Ideal}\). Let \(\textbf{E}\) be the divergence event between \(G^*\) and \(\textit{Ideal}\), which occurs due to a mismatch in responses to \({\textbf {Qry}}, {\textbf {Del}}, {\textbf {Ins}}\) queries across the two games (see Fig. 4).
First, we observe that \(\textit{Ideal}\) cannot have false negative responses to membership queries, but \(G^*\) could have. If \({\mathcal {A}}\) induces a false negative for some element x in \(G^*\) and then calls \({\textbf {Qry}}(x)\), the two games would diverge with probability one. False negatives lead to repercussions when comparing responses to all types of queries across the two games. Therefore, we will deal with them separately, by defining \(\textbf{E}_\text {FN}\) to be the event that a false negative occurs in \(G^*\) before any other query response mismatch. We then split the analysis of event \(\textbf{E}\) into two parts: (1) the query response mismatch occurs without a false negative occurring in \(G^*\) beforehand (i.e.\(\lnot \textbf{E}_\text {FN}\)), or (2) the query response mismatch occurs with a false negative occurring in \(G^*\) beforehand (i.e. \(\textbf{E}_\text {FN}\)). Then,
We will analyse \(\textbf{E}\wedge \lnot \textbf{E}_\text {FN}\) for each query type separately. Let \(a_i^{G}, b_i^{G}, c_i^{G}\) denote the responses to \({\mathcal {A}}\)’s i-th query, deletion, and insertion query in game \(G \in \{ G^*, \textit{Ideal}\}\). Then, the games diverge the first time that \(a_i^{G^*}, b_i^{G^*}\) or \(c_i^{G^*}\) does not match \(a_i^{\textit{Ideal}}, b_i^{\textit{Ideal}}\) or \(c_i^{\textit{Ideal}}\), respectively. We define
Hence,
We now proceed to bound the probability of each event in Eq. (5). In the following, we take the probability over the randomness used by \({\mathcal {A}}\) (which we refer to as \({\mathcal {A}}\)’s coins), and the randomness used by game \(G \in \{ G^*, \textit{Ideal}\}\) to answer \({\mathcal {A}}\)’s queries (which we refer to as G’s coins). We will use \(x_i, y_i\) and \(z_i\) to denote the input to \({\mathcal {A}}\)’s i-th query, deletion and insertion query, respectively.
Calculation of \({\Pr }\left[ \,\textbf{E}_\text {FN}\,\right] \). We start by analysing the probability of a false negative occurring in \(G^*\). Our key observation is that false negatives can only occur from inserting or deleting false positives.
Consider an element x that is a false positive due to insertions of \(\{ z_1, ..., z_\ell \}\), where \(\ell \ge 1\). By the consistency rule successful deletion of positives, the deletion of x will succeed, although it was never inserted. However, this may cause elements in \(\{ z_1, ..., z_\ell \}\) to become false negatives. Now, consider inserting this false positive x. By reinsertion invariance, the state will remain unchanged, but x will become a true positive. Then, deleting any element in \(\{ x, z_1, ..., z_\ell \}\) will succeed, but may cause other elements in \(\{ x, z_1, ..., z_\ell \}\) to become false negatives.
Recall that we are interested in analysing the probability of a false negative occurring in \(G^*\) before any other mismatch in query responses across the two games. Therefore, we do not need to consider deletions of false positives; it would result in a mismatch in \({\textbf {Del}}\) responses, since \(\textit{Ideal}\) does not allow deletions of false positives while \(G^*\) does. We will then focus solely on false negatives caused by insertions of false positives in the following. We write
Let \(\sigma ^*_{ i}\) denote the state of \(\varPi \) in game \(G^*\) just before the i-th \({\textbf {Ins}}\) query. Then, since no prior query response mismatch occurred and \(z_i\) is the first false positive inserted, \(\sigma ^*_i\) contains no false negatives up to this point. Then,
Let \(L_{i}\) be the list of successful operations on \(\sigma ^*\) in \(G^*\) up to the i-th \({\textbf {Ins}}\) query, where each item in \(L_i\) is either \(\textsf {ins}^F(\cdot ,\sigma ^*)\) or \(\textsf {del}^F(\cdot ,\sigma ^*)\) on \(z_1, ..., z_{i-1}\). By the consistency rule unsuccessful operation invariance, we do not need to consider unsuccessful operations when constructing \(\sigma ^*_i\). So,
Now, if no prior query response mismatch has occurred, \(\text {inserted}[z_i] = \bot \) implies that either \(z_i\) was never inserted into \(\sigma ^*_i\), or \(z_i\) was inserted but then deleted. In the latter scenario, since \(\sigma ^*_i\) contains no false negatives up to this point (as per Eq. (7)), \(z_i\) must be a positive at the time of its deletion. Then, by the consistency rule successful deletion of positives, its deletion will succeed, thus fully undoing the effect of its insertion on \(\sigma ^*_i\). Since F is a random function satisfying F-decomposability and \({\mathcal {A}}\) has no information about F, we write Eq. (7) as
Now, since F is a random function and \({\mathcal {A}}\) has no information about F, we can replace every first insertion of an element \(F(z_j)\) by (i.e. \(\sigma ^*_i\) satisfies insertion unpredictability). For repeated insertions on an element, we have two possibilities. If this element has not been deleted since its last insertion, the repeated insertion will not change the state, due to reinsertion invariance. However, if it has been deleted since its last insertion, it will change the state in the same way as its first insertion, since both use the same F. Therefore, we can rewrite the above by sampling \(|{ \{ z_1, ..., z_{i-1} \} }|\) random strings, and associating each string to a distinct \(z_j\), giving
We now argue that every \(\sigma ^*_i\) is an n-NAI* state, where n is upper bounded by \(q_\text {ins}\), by showing that it satisfies the requirements in Corollary 1. Firstly, observe that the construction of \(\sigma ^*_i\) in Eq. (8) enforces insertion unpredictability (Definition 6). Secondly, there are at most \(q_\text {ins}\) insertions in \(\sigma ^*_i\). Thirdly, since no query response mismatch has yet occurred, all deletions must be on elements for which the preceding successful operation was an insertion. Finally, since there are no false negatives up to this point and reinsertion invariance holds, any insertion on a currently inserted element will not change the state.
Let \(\mathcal {U}_{\varPi ,pp}^{\text {Id}_\mathfrak {R}}\) be the algorithm from Definition 8 that, given \(X_1, ..., X_n\), outputs an NAI* state created using \(\textsf {ins}^{\text {Id}_\mathfrak {R}}, \textsf {del}^{\text {Id}_\mathfrak {R}}\) on \(X_1, ..., X_{n}\) with the maximal false positive probability. Then, no matter how \(\sigma ^*_i\) is created, the state output by \(\mathcal {U}_{\varPi ,pp}^{\text {Id}_\mathfrak {R}}\) will result in an equal or higher false positive probability than that of \(\sigma ^*_i\). Since \(\ell \le q_{\textsf {ins}}\) and with more distinct insertions, \(\mathcal {U}_{\varPi ,pp}^{\text {Id}_\mathfrak {R}}\) may be able to create a state with even higher false positive probability,
Finally, applying Definition 8, we obtain
Calculation of \({\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}\,\right] \). We first rewrite Eq. (2) using the union bound as
We start by inspecting the \({\textbf {Qry}}\) algorithms of \(G^*\) and \(\textit{Ideal}\) to see where they could diverge. In \(G^*\), the responses to \({\mathcal {A}}\)’s \({\textbf {Qry}}\) queries are always computed using the same function F, while in \(\textit{Ideal}\), a fresh random string is sampled each time a non-inserted element is queried.
Let \(\sigma _i\) denote the state of \(\varPi \) in game \(\textit{Ideal}\) just before the i-th \({\textbf {Qry}}\) query, and \(\sigma ^*_i\) denote the corresponding state in game \(G^*\). Since \({\textbf {Qry}}(x_i)\) yields the first query response mismatch, both \(G^*\) and \(\textit{Ideal}\) must contain the same set of inserted elements. As \(\textbf{E}_{FN}\) did not yet occur, \(\sigma ^*_i\) has no false negatives. Moreover, \({\textbf {Qry}}\) queries in \(\textit{Ideal}\) do not give false negative responses. This means that \({\textbf {Qry}}\) queries on elements that were inserted (and not yet deleted) will always return a positive response in both games. Therefore, in order for \(x_i\) to yield a mismatch in \({\textbf {Qry}}\) query responses between the games, we must have that \(x_i\) is not currently inserted in \(\textit{Ideal}\) (i.e. \(\text {inserted}[x_i] = \bot \) in line 4 of \({\textbf {QrySim}}\)). This gives
where, for simplicity, we will use \({\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}^{\textit{Ideal}}\,\right] \) to denote the first term of Eq. (11), and \({\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}^{G^*}\,\right] \) to denote the second term.
We start by bounding \({\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}^{\textit{Ideal}}\,\right] \). In \(\textit{Ideal}\), a fresh random string is sampled each time a non-inserted element is queried, and so
We now argue that every \(\sigma _i\) is an n-NAI* state, with n being upper bounded by \(q_\text {ins}\), by showing that it satisfies the requirements in Definition 7. Firstly, from line 3 of \({\textbf {DelSim}}\), we observe that only deletions of currently inserted elements run \({\textsf {del}}^{\text {Id}_\mathfrak {R}}(\cdot , \sigma )\), possibly changing the state. Secondly, we note that in \({\textbf {InsSim}}\), every insertion either executes \({\textsf {ins}}^{\text {Id}_\mathfrak {R}}(\cdot , \sigma )\) on , or does not change the state if it is on a currently inserted element. Therefore, \(\sigma _i\) is an NAI* state containing at most \(q_\text {ins}\) elements. Then, we can upper bound the false positive probability of \(\sigma _i\) by that of the NAI* state with the maximal false positive probability (Definition 8), giving \( {\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}^{\textit{Ideal}}\,\right] \le P_{\varPi ,pp}^*(FP\,|\,{q_\text {ins}}). \)
We use a reasoning similar to calculating \(\textbf{E}_{FN}\) to compute \({\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}^{G^*}\,\right] \), replacing \(z_i\) with \(x_i\). Under \(\lnot \textbf{E}_{FN}\), the state \(\sigma _i^*\) contains no false negatives. Therefore, we can apply Eq. (6) from the \(\textbf{E}_{FN}\) calculation to get
Substituting \( {\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}^{\textit{Ideal}}\,\right] , {\Pr }\left[ \,\textbf{E}_{{\textbf {Qry}}}^{G^*}\,\right] \) in Eq. (12) gives
Calculation of \({\Pr }\left[ \,\textbf{E}_{{\textbf {Del}}}\,\right] \). We first rewrite Eq. (3) using the union bound as
We now examine the \({\textbf {Del}}\) algorithms of \(G^*\) and \(\textit{Ideal}\). In \(G^*\), the responses to \({\mathcal {A}}\)’s \({\textbf {Del}}\) queries are always computed using the same function F. In \(\textit{Ideal}\), deletions are only allowed on an element \(y_i\) if it is currently inserted in the filter, and use the same random string \(f[y_i]\) that was used for \(y_i\)’s insertion.
We note that in Eq. (15), we are only interested in the case where \({\textbf {Del}}(y_i)\) is the first query response mismatch. In this case, both \(G^*\) and \(\textit{Ideal}\) must contain the same set of inserted elements. As \(\textbf{E}_{FN}\) did not occur, every inserted element is a true positive in both games. We observe that in \(\textit{Ideal}\), true positives are always successfully deleted (see line 3 of \({\textbf {DelSim}}\)), while in \(G^*\), successful deletion of true positives is ensured by the consistency rule successful deletion of positives. However, by the same consistency rule, deletions of false positives also succeed in \(G^*\), while they do not in \(\textit{Ideal}\). Consequently, deletions in \(G^*\) succeed on at least the elements on which they succeed in \(\textit{Ideal}\). Thus, it never happens that a deletion succeeds in \(\textit{Ideal}\) but not in \(G^*\), and we can rewrite Eq. (15) as
Let \(\sigma ^*_{ i}\) denote the state of \(\varPi \) in game \(G^*\) just before the i-th \({\textbf {Del}}\) query. By the consistency rule unsuccessful deletion of negatives,
We use a reasoning similar to calculating \(\textbf{E}_{FN}\) to compute this, replacing \(z_i\) with \(y_i\). Under \(\lnot \textbf{E}_{FN}\), the state \(\sigma _i^*\) contains no false negatives. Therefore, we can apply Eq. (6) from the \(\textbf{E}_{FN}\) calculation to get
Calculation of \({\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}\,\right] \). We first rewrite Eq. (4) as
Let us now compare the \({\textbf {Ins}}\) algorithms of \(G^*\) and \(\textit{Ideal}\). In \(G^*\), the responses to \({\mathcal {A}}\)’s \({\textbf {Ins}}\) queries are always computed using the same function F. On the other hand, in \(\textit{Ideal}\), a fresh random string is sampled at each insertion of an element \(z_i\) which is not already inserted.
Let \(\sigma _i\) denote the state of \(\varPi \) in game \(\textit{Ideal}\) just before the i-th \({\textbf {Ins}}\) query, and \(\sigma ^*_i\) denote the corresponding state in game \(G^*\). If \({\textbf {Ins}}(z_i)\) is the first mismatch, it must be that both \(G^*\) and \(\textit{Ideal}\) contain the same set of inserted elements up to this point. In \(\textit{Ideal}\), by inspecting \({\textbf {InsSim}}\) we observe that the insertion of any currently inserted element \(z_i\) will always succeed. In \(G^*\), since we are only considering the case where \(\textbf{E}_{FN}\) did not yet occur, \(\sigma ^*_i\) has no false negatives. This means that any element \(z_i\) that was inserted and not yet deleted will result in \(\top \leftarrow {\textsf {qry}}^F(z_i, \sigma ^*_i)\). Then, by reinsertion invariance, the insertion of \(z_i\) will succeed (but not change the state) in \(G^*\). Therefore, for the first query response mismatch it must be that \(x_i\) is not currently inserted in \(\textit{Ideal}\) (i.e. \(\text {inserted}[z_i] = \bot \) in line 4 of \({\textbf {InsSim}}\)). Then,
where, for simplicity, we will use \({\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{\textit{Ideal}}\,\right] \) to denote the first term of Eq. (18), and \({\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{G^*}\,\right] \) to denote the second term.
We start by computing \({\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{\textit{Ideal}}\,\right] \). In \(\textit{Ideal}\), a fresh random string is sampled each time a non-inserted element is queried, and so we can write
Since every \(\sigma _i\) is an n-NAI* state, with n being upper bounded by \(q_\text {ins}\), we can upper bound the insertion failure probability of \(\sigma _i\) by that of the NAI* state with the maximal insertion failure probability (Definition 9), giving \( {\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{\textit{Ideal}}\,\right] \le P_{\varPi ,pp}^*(IF\,|\,{q_\text {ins}}).\)
We now compute \({\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{G^*}\,\right] \). We have that
Let \(L_i\) be the list of successful operations on \(\sigma ^*\) in \(G^*\) up to the i-th \({\textbf {Ins}}\) query, where each item in \(L_i\) is either \({\textsf {ins}}^F(\cdot , \sigma ^*)\) or \({\textsf {del}}^F(\cdot , \sigma ^*)\) on \(z_1, ..., z_{i-1}\). Recall that we do not need to consider unsuccessful operations when constructing \(\sigma ^*_i\), by the consistency rule unsuccessful operation invariance. Then,
by F-decomposability. Now, since F is a random function and \({\mathcal {A}}\) has no information about F, we can then proceed in a similar manner as in the \(\textbf{E}_{FN}\) calculation, with the caveat that we are now interested in any of the \(q_\text {ins}\) insertions failing:
Similarly as in Eq. (8), we conclude that \(\sigma ^*_i\) in Eq. (20) is an NAI* state. Then, we again upper bound the insertion failure probability of \(\sigma ^*_i\) by that of the NAI* state with the maximal insertion failure probability (Definition 9), giving \( {\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{G^*}\,\right] \le P_{\varPi ,pp}^*(IF\,|\,{q_\text {ins}}). \) Substituting \({\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{\textit{Ideal}}\,\right] , {\Pr }\left[ \,\textbf{E}_{{\textbf {Ins}}}^{G^*}\,\right] \) in Eq. (19), we obtain
Finally, substituting Eqs. (9, 14, 16, 21) in Eq. (5), we have
To prove Theorem 1, we then apply Lemmas 6 and 7 to Eq. (1) to obtain
Remark 3
We discuss why our results do not directly extend to a setting where \({\mathcal {A}}\) can access the internal state \(\sigma \). In Fig. 4, observe that, upon reinsertion of an element not currently in the filter, \(\textit{Ideal}\) always samples a fresh , while \(G^*\) inserts the same element again. This choice allowed us to obtain distinguishing bounds involving only the NAI* false positive and insertion failure probabilities. However, this difference is clearly detectable if \({\mathcal {A}}\) can view \(\sigma \) after reinsertion, leading to \(\textit{Ideal}\) and \(G^*\) being distinguished with a probability close to 1.
4.2 Guarantees for Counting and Cuckoo Filters
In this section, we will use Theorem 1 to give concrete correctness guarantees for Counting and Cuckoo filters.
Corollary 2
Let \(q_\text {ins}, q_\text {del}, q_\text {qry}\) be non-negative integers, and let \(t_a,\, t_d > 0\). Let \(F:\mathfrak {D}\rightarrow \mathfrak {R}\). Let \(\varPi \) be a Counting filter AMQ-PDS with public parameters pp and oracle access to F. If \(R_K\) for is a \((q_\text {ins} + q_\text {qry} + q_\text {del}, t_a + t_d, \varepsilon )\)-secure pseudorandom function and \(F = R_K\), then \(\varPi \) is \((q_\text {ins}, q_\text {qry}, q_\text {del}, t_a, t_s, t_d, \varepsilon ')\)-adversarially correct, where \(t_s \approx t_a\) and \( \varepsilon ' = \varepsilon + 2\,m \cdot \big [ \frac{ e \cdot q_\text {ins} \cdot k }{\textit{maxVal}\cdot m}\big ]^{\textit{maxVal}} + (q_\text {ins} + 2 q_\text {qry} + q_\text {del}) \cdot \big [ 1 - e^{-\frac{(q_\text {ins}+0.5)k}{m-1}} \big ]^k. \)
Proof
From the \(\textsf {ins}, \textsf {del}, \textsf {qry}\) algorithms in Fig. 2, we see that Counting filters with oracle access to a random function F are F-decomposable, reinsertion invariant, and satisfy the consistency rules in Def. 12. Further, each \(\textsf {ins}, \textsf {del}\) and \(\textsf {qry}\) call contains one call to the function F. Then, Theorem 1 holds with \(\alpha = \beta = \gamma = 1\). Using Lemmas 2 and 3, we obtain the result.
Remark 4
The adversarial correctness bound for Bloom filters in [13, Corollary 4.4] holds for insertion-only Counting filters.
Corollary 3
Let \(q_\text {ins}, q_\text {del}, q_\text {qry}\) be non-negative integers, and let \(t_a,\, t_d > 0\). Let \(F:\mathfrak {D}\rightarrow \mathfrak {R}\). Let \(\varPi \) be a PRF-wrapped Cuckoo filter AMQ-PDS with public parameters pp and oracle access to F. If \(R_K\) for is a \((q_\text {ins} + q_\text {qry} + q_\text {del}, t_a + t_d, \varepsilon )\)-secure pseudorandom function and \(F = R_K\), then \(\varPi \) is \((q_\text {ins}, q_\text {qry}, q_\text {del}, t_a, t_s, t_d, \varepsilon ')\)-adversarially correct, where \(t_s \approx t_a\) and
Proof
From the \(\textsf {ins}, \textsf {del}, \textsf {qry}\) algorithms in Appendix A, we see that PRF-wrapped Cuckoo filters with oracle access to a random function F are F-decomposable, reinsertion invariant, and satisfy the consistency rules in Definition 12. Further, each \(\textsf {ins}, \textsf {del}\) and \(\textsf {qry}\) call contains one call to the function F. Then, Theorem 1 holds with \(\alpha = \beta = \gamma = 1\). Using Lemmas 4 and 5, we obtain the result.
5 Secure Instances
In this section, we outline how our results can be used to secure AMQ-PDS in practice. Let us consider the example outlined in Remark 2, with the predicate
Since Theorem 1 holds for any predicate, the probability of an adversary \({\mathcal {A}}\) satisfying P in the real world is given by \( {\Pr }\left[ \,\mathcal {D}({\mathcal {A}}) {=} 1\,\right] \le \varepsilon + 2 P_{\varPi ,pp}^*(IF\,|\,{q_\text {ins}}) + (q_\text {ins} + 2 q_\text {qry} + q_\text {del} + 1) \cdot P_{\varPi ,pp}^*(FP\,|\,{q_\text {ins}}). \)
We illustrate the behaviour of this bound for the example of Counting filters. In Fig. 6, we plot an upper bound of the false positive probability against the size of the Counting filter in three settings: the non-adversarial setting, the insertion-only adversarial setting, and the setting with deletions studied in this work. By Remark 4, we can analyse the insertion-only setting using the results in [13]: \( {\Pr }\left[ \,\mathcal {D}({\mathcal {A}}) {=} 1\,\right] \le \varepsilon + (2 q_\text {qry} + 1) \cdot P_{\varPi ,pp}(FP\,|\,{q_\text {ins}}).\)
From Fig. 6, we observe that guaranteeing a specific false positive probability even in an adversarial setting with deletions requires roughly trebling the size of the filter, when compared to the honest (NAI) setting. Crucially, deletions do not incur a significant cost when compared to the insertion-only setting; the additional term of \(P_{\varPi ,pp}^*(IF\,|\,{q_\text {ins}})\) can be made very small with the choice of an appropriate \(\textit{maxVal}\). For Cuckoo filters, the same observation holds for the choice of \(\lambda _I\) and \(\lambda _T\). Hence, moving to the more complex scenario of allowing deletions does not hinder the practicality of our results.
References
Bloom filters and cuckoo filters for cache summarization. https://blog.fleek.network/post/bloom-and-cuckoo-filters-for-cache-summarization/.
Redisbloom: Probabilistic data structures for redis. https://redis.com/modules/redis-bloom/.
Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In International Workshop on Randomization and Approximation Techniques in Computer Science, 2002. https://doi.org/10.1007/3-540-45726-7_1.
Michael A. Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. Bloom filters, adaptivity, and the dictionary problem. In FOCS, 2018. https://doi.org/10.1109/FOCS.2018.00026.
Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970. https://doi.org/10.1145/362686.362692.
Andrei Z. Broder and Michael Mitzenmacher. Survey: Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485–509, 2003. https://doi.org/10.1080/15427951.2004.10129096.
Yan-Cheng Chang and Michael Mitzenmacher. Privacy preserving keyword searches on remote encrypted data. In Applied Cryptography and Network Security, 2005. https://doi.org/10.1007/11496137_30.
David Clayton, Christopher Patton, and Thomas Shrimpton. Probabilistic data structures in adversarial environments. In ACM SIGSAC CCS, 2019. https://doi.org/10.1145/3319535.3354235.
Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58–75, 2005. https://doi.org/10.1016/j.jalgor.2003.12.001.
Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. Cuckoo filter: Practically better than Bloom. In CoNEXT, 2014. https://doi.org/10.1145/2674005.2674994.
Bin Fan, David G. Andersen, and Michael Kaminsky. Cuckoo filter reference implementation. https://github.com/efficient/cuckoofilter/blob/917583d6abef692dfa8e14453bd77d6e0b61eef3/src/cuckoofilter.h#L139, 2013.
Li Fan, Pei Cao, J. Almeida, and A.Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281–293, 2000. https://doi.org/10.1109/90.851975.
Mia Filić, Kenny Paterson, Anupama Unnikrishnan, and Fernando Virdia. Adversarial correctness and privacy for probabilistic data structures. In ACM SIGSAC CCS, 2022. https://doi.org/10.1145/3548606.3560621.
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms, 2007. https://doi.org/10.46298/dmtcs.3545 .
Sergio Galán, Pedro Reviriego, Stefan Walzer, Alfonso Sánchez-Macian, Shanshan Liu, and Fabrizio Lombardi. On the privacy of counting bloom filters under a black-box attacker. IEEE Transactions on Dependable and Secure Computing, 20(5), 2023. https://doi.org/10.1109/TDSC.2022.3217115.
Thomas Gerbet, Amrit Kumar, and Cédric Lauradoux. The power of evil choices in Bloom filters. In IEEE/IFIP Conference on Dependable Systems and Networks, 2015. https://doi.org/10.1109/DSN.2015.21.
Junzhi Gong, Tong Yang, Haowei Zhang, Hao Li, Steve Uhlig, Shigang Chen, Lorna Uden, and Xiaoming Li. HeavyKeeper: An accurate algorithm for finding top-k elephant flows. In USENIX Annual Technical Conference, 2018. https://doi.org/10.1109/TNET.2019.2933868.
Laura Hetz, Thomas Schneider, and Christian Weinert. Scaling mobile private contact discovery to billions of users. In ESORICS, 2023. https://doi.org/10.1007/978-3-031-50594-2_23.
Stefan Heule, Marc Nunkesser, and Alexander Hall. Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. In Conference on Extending Database Technology, 2013. https://doi.org/10.1145/2452376.2452456.
Daniel Kales, Christian Rechberger, Thomas Schneider, Matthias Senker, and Christian Weinert. Mobile private contact discovery at scale. In USENIX Security, 2019.
Tsvi Kopelowitz, Samuel McCauley, and Ely Porat. Support optimality and adaptive cuckoo filters. In Algorithms and Data Structures, 2021. https://doi.org/10.1007/978-3-030-83508-8_40.
Anukool Lakhina, Mark Crovella, and Christiphe Diot. Characterization of network-wide anomalies in traffic flows. In ACM SIGCOMM Conference on Internet Measurement, 2004. https://doi.org/10.1145/1028788.1028813.
James Larisch, David Choffnes, Dave Levin, Bruce M Maggs, Alan Mislove, and Christo Wilson. Crlite: A scalable system for pushing all tls revocations to all browsers. In IEEE S &P, 2017. https://doi.org/10.1109/SP.2017.17.
Yehuda Lindell. How to simulate it – a tutorial on the simulation proof technique, 2017. https://doi.org/10.1007/978-3-319-57048-8_6.
Linsheng Liu, Daniel S. Roche, Austin Theriault, and Arkady Yerukhimovich. Fighting fake news in encrypted messaging with the fuzzy anonymous complaint tally system (facts). In Network and Distributed Systems Security Symposium, 2022. https://doi.org/10.14722/ndss.2022.23109.
Lailong Luo, Deke Guo, Richard T. B. Ma, Ori Rottenstreich, and Xueshan Luo. Optimizing bloom filter: Challenges, solutions, and comparisons. IEEE Communications Surveys & Tutorials, 21(2):1912–1949, 2019. https://doi.org/10.1109/COMST.2018.2889329.
Sam A. Markelon, Mia Filić, and Thomas Shrimpton. Compact frequency estimators in adversarial environments. In ACM SIGSAC CCS, 2023. https://doi.org/10.1145/3576915.3623216.
Luca Melis, George Danezis, and Emiliano De Cristofaro. Efficient private statistics with succinct sketches. In Network and Distributed Systems Security Symposium, 2016. https://doi.org/10.14722/ndss.2016.23175.
Páll Melsted and Jonathan K Pritchard. Efficient counting of k-mers in dna sequences using a bloom filter. BMC Bioinformatics, 12, 2011. https://doi.org/10.1186/1471-2105-12-333.
Moni Naor and Noa Oved. Bet-or-pass: Adversarially robust bloom filters. In TCC, 2022. https://doi.org/10.1007/978-3-031-22365-5_27.
Moni Naor and Eylon Yogev. Bloom filters in adversarial environments. In CRYPTO, 2015. https://doi.org/10.1007/978-3-662-48000-7_28.
Moni Naor and Eylon Yogev. Bloom filters in adversarial environments. ACM Transactions on Algorithms, 15(3):35:1–35:30, 2019. https://doi.org/10.1145/3306193.
Kenneth G. Paterson and Mathilde Raynal. HyperLogLog: Exponentially bad in adversarial settings. In EuroS &P, 2022. https://doi.org/10.1109/EuroSP53844.2022.00018.
Henning Perl, Yassene Mohammed, Michael Brenner, and Matthew Smith. Fast confidential search for bio-medical data using bloom filters and homomorphic cryptography. IEEE Conference on E-Science, pages 1–8, 2012. https://doi.org/10.1109/eScience.2012.6404484.
Xiaofeng Shi, Shouqian Shi, Minmei Wang, Jonne Kaunisto, and Chen Qian. On-device iot certificate revocation checking with small memory and low latency. In ACM SIGSAC CCS, 2021. https://doi.org/10.1145/3460120.3484580.
Dimitrios Sikeridis, Sean Huntley, David Ott, and Michael Devetsikiotis. Intermediate certificate suppression in post-quantum tls: An approximate membership querying approach. In CoNEXT, 2022. https://doi.org/10.1145/3555050.3569127.
Henrik Stranneheim, Max Käller, Tobias Allander, Björn Andersson, Lars Arvestad, and Joakim Lundeberg. Classification of dna sequences using bloom filters. Bioinformatics, 26(13):1595–1600, 2010. https://doi.org/10.1093/bioinformatics/btq230.
Jeff Yan and Pook Leong Cho. Enhancing collaborative spam detection with bloom filters. In Annual Computer Security Applications Conference, 2006. https://doi.org/10.1109/ACSAC.2006.26.
Kevin Yeo. Cuckoo hashing in cryptography: Optimal parameters, robustness and applications. In CRYPTO, 2023. https://doi.org/10.1007/978-3-031-38551-3_7.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Cuckoo Filters
A Cuckoo Filters
In Fig. 7, we give the AMQ-PDS syntax instantiation for PRF-wrapped Cuckoo filters. Following the reference implementation [11] by the authors of [10], after we delete an element, we try to empty the stash by re-inserting the stashed element. We write the procedure evict separately for ease of understanding.
Rights and permissions
Copyright information
© 2025 International Association for Cryptologic Research
About this paper
Cite this paper
Filić, M., Kocher, K., Kummer, E., Unnikrishnan, A. (2025). Deletions and Dishonesty: Probabilistic Data Structures in Adversarial Settings. In: Chung, KM., Sasaki, Y. (eds) Advances in Cryptology – ASIACRYPT 2024. ASIACRYPT 2024. Lecture Notes in Computer Science, vol 15487. Springer, Singapore. https://doi.org/10.1007/978-981-96-0894-2_5
Download citation
DOI: https://doi.org/10.1007/978-981-96-0894-2_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0893-5
Online ISBN: 978-981-96-0894-2
eBook Packages: Computer ScienceComputer Science (R0)