Keywords

1 Introduction

Searchable symmetric encryption (SSE) is a useful cryptographic primitive that can encrypt the data to protect its confidentiality while keeping its searchability. Dynamic SSE (DSSE) further provides data dynamics that allows the client to update data over the time without losing data confidentiality and searchability. Due to this property, DSSE is highly demanded in encrypted cloud. However, many existing DSSE schemes [8, 14] suffer from file-injection attacks [7, 22], where the adversary can compromise the privacy of a client query by injecting a small portion of new documents to the encrypted database. To resist this attack, Zhang et al. [22] highlighted the need of forward security that was informally introduced by Stefanov et al. [19]. The formal definition of forward security for DSSE was given by Bost [5] who also proposed a concrete forward-secure DSSE scheme. Furthermore, Bost et al. [6] demonstrated the damage of content leak of deleted documents and proposed the corresponding security notion–backward security. Several backward-secure DSSE schemes were also presented in [6].

Nevertheless, the existing forward/backward-secure DSSE schemes only support single keyword queries, which are not expressive enough in data search service [12, 13]. To solve this problem, in this paper, we aim to design forward/backward-secure DSSE schemes supporting range queries. Our design starts from the regular binary tree in [13] to support range queries. However, the binary tree in [13] cannot be applied directly to the dynamic setting. It is mainly because that the keywords in [13] are labelled according to the corresponding tree levels that will change significantly in the dynamic setting. A naïve solution is to replace all old keywords by the associated new keywords. This is, however, not efficient. To address this problem, we have to explore new approaches for our goal.

Our Contributions. To achieve above goal, we propose two new DSSE constructions supporting range queries in this paper. The first one is forward-secure but with a larger client overhead in contrast to [13]. The second one is a more efficient DSSE which achieves both forward and backward security at the same time. In more details, our main contributions are as follows:

  • To make the binary tree suitable for range queries in the dynamic setting, we introduce a new binary tree data structure, and then present the first forward-secure DSSE supporting range queries by applying it to Bost’s scheme [5]. However, the forward security is achieved at the expense of suffering from a large storage overhead on the client side.

  • To reduce the large storage overhead, we further propose another DSSE scheme supporting range queries by leveraging the Paillier cryptosystem [17]. With its homomorphic property, this construction can achieve not only forward security, but also backward security. Notably, due to the limitation of the Paillier cryptosystem, it cannot support large-scale database consisting a large number of documents. Nevertheless, it suits well for certain scenarios where the number of documents is moderate. The new approach may give new lights on designing more efficient and secure DSSE schemes.

  • Also, the comparison with related works in Table 1 and detailed security analyses are provided, which demonstrate that our constructions are not only forward (and backward)-secure but also with a comparable efficiency.

Table 1. Comparison with existing DSSE schemes

1.1 Related Works

Song et al. [18] were the first using symmetric encryption to facilitate keyword search over the encrypted data. Later, Curtmola et al. [11] gave a formal definition for SSE and the corresponding security model in the static setting. To make SSE more scalable and expressive, Cash et al. [9] proposed a new scalable SSE supporting Boolean queries. Following this construction, many extensions have been proposed. Faber et al. [13] extended it to process a much richer collection of queries. For instance, they used a binary tree with keywords labelled according to the tree levels to support range queries. Zuo et al. [23] made another extension to support general Boolean queries. Cash et al.’s construction has also been extended into multi-user setting [15, 20, 21]. However, the above schemes cannot support data update. To solve this problem, some DSSE schemes have been proposed [8, 14].

However, designing a secure DSSE scheme is not an easy job. Cash et al. [7] pointed out that only a small leakage leveraged by the adversary would be enough to compromise the privacy of clients’ queries. A concrete attack named file-injection attack was proposed by Zhang et al. [22]. In this attack, the adversary can infer the concept of a client queries by injecting a small portion of new documents into encrypted database. This attack also highlights the need for forward security which protects security of new added parts. Accordingly, we have backward security that protects security of new added parts and later deleted. These two security notions were first introduced by Stefanov et al. [19]. The formal definitions of forward/backward security for DSSE were given by Bost [5] and Bost et al. [6], respectively. In [5], Bost also proposed a concrete forward-secure DSSE scheme, it does not support physical deletion. Later on, Kim et al. [16] proposed a forward-secure DSSE scheme supporting physical deletion. Meanwhile, Bost et al. [6] proposed a forward/backward-secure DSSE to reduce leakage during deletion. Unfortunately, all the existing forward/backward-secure DSSE schemes only support single keyword queries. Hence, forward/backward-secure DSSE supporting more expressive queries, such as range queries, are quite desired.

Apart from the binary tree technique, order preserving encryption (OPE) can also be used to support range queries. The concept of OPE was proposed by Agrawal et al. [1], and it allows the order of the plaintexts to be preserved in the ciphertexts. It is easy to see that this kind of encryption would lead to the leakage in [2, 3]. To reduce this leakage, Boneh et al. [4] proposed another concept named order revealing encryption (ORE), where the order of the ciphertexts are revealed by using an algorithm rather than comparing the ciphertexts (in OPE) directly. More efficient ORE schemes were proposed later [10]. However, ORE-based SSE still leaks much information about the underlying plaintexts. To avoid this, in this paper, we focus on how to use the binary tree structure to achieve range queries.

1.2 Organization

The remaining sections of this paper are organized as follows. In Sect. 2, we give the background information and building blocks that are used in this paper. In Sect. 3, we give the definition of DSSE and its security definition. After that in Sect. 4, we present a new binary tree and our DSSE schemes. Their security analyses are given in Sect. 5. Finally, Sect. 6 concludes this work.

2 Preliminaries

In this section, we describe cryptographic primitives (building blocks) that are used in this work.

2.1 Trapdoor Permutations

A trapdoor permutation (TDP) \(\varPi \) is a one-way permutation over a domain D such that (1) it is “easy” to compute \(\varPi \) for any value of the domain with the public key, and (2) it is“easy” to calculate the inverse \(\varPi ^{-1}\) for any value of a co-domain \(\mathcal {M}\) only if a matching secret key is known. More formally, \(\varPi \) consists of the following algorithms:

  • \(\texttt {TKeyGen}(1^{\lambda })\rightarrow (\texttt {TPK,\texttt {TSK}})\): For a security parameter \(1^{\lambda }\), the algorithm returns a pair of cryptographic keys: a public key \(\texttt {TPK}\) and a secret key \(\texttt {TSK}\).

  • \(\varPi (\texttt {TPK},x)\rightarrow y\): For a pair: public key \(\texttt {TPK}\) and \(x\in D\), the algorithm outputs \(y\in \mathcal {M}\).

  • \(\varPi ^{-1}(\texttt {TSK},y)\rightarrow x\): For a pair: a secret key \(\texttt {TSK}\) and \(y\in \mathcal {M}\), the algorithm returns \(x\in D\).

One-wayness. We say \(\varPi \) is one-way if for any probabilistic polynomial time (PPT) adversary \(\mathcal {A}\), an advantage

$$\begin{aligned} \texttt {Adv}_{\varPi ,\mathcal {A}}^{\texttt {OW}}(1^{\lambda })=\Pr [ x\leftarrow \mathcal {A}(\texttt {TPK}, y)] \end{aligned}$$

is negligible, where \((\texttt {TSK},\texttt {TPK})\leftarrow \texttt {TKeyGen}(1^{\lambda })\), \(y\leftarrow \varPi (\texttt {TPK}, x)\), \(x\in D\).

2.2 Paillier Cryptosystem

A Paillier cryptosystem \(\varSigma =(\texttt {KeyGen}, \texttt {Enc}, \texttt {Dec})\) is defined by following three algorithms:

  • \(\texttt {KeyGen}(1^{\lambda })\rightarrow (\texttt {PK,SK})\): It chooses at random two primes p and q of similar lengths and computes \(n=pq\) and \(\phi (n)=(p-1)(q-1)\). Next it sets \(g=n+1\), \(\beta =\phi (n)\) and \(\mu =\phi (n)^{-1}\) mod n. It returns \(\texttt {PK}=(n,g)\) and \(\texttt {SK}=(\beta , \mu )\).

  • \(\texttt {Enc}(\texttt {PK},m)\rightarrow c\): Let m be the message, where \(0\le m<n\), the algorithm selects an integer r at random from \(\mathbb {Z}_n\) and computes a ciphertext \(c=g^m\cdot r^n\) mod \(n^2\).

  • \(\texttt {Dec}(\texttt {SK},c)\rightarrow m\): The algorithm calculates \(m=L(c^{\beta }\) mod \(n^2)\cdot \mu \) mod n, where \(L(x)=\frac{x-1}{n}\).

Semantically Security. We say \(\varSigma \) is semantically secure if for any probabilistic polynomial time (PPT) adversary \(\mathcal {A}\), an advantage

$$\begin{aligned} \texttt {Adv}_{\varSigma ,\mathcal {A}}^{\texttt {IND-CPA}}(1^{\lambda })=|\Pr [\mathcal {A}(\texttt {Enc}(\texttt {PK}, m_0))=1]-\Pr [\mathcal {A}(\texttt {Enc}(\texttt {PK}, m_1))=1]| \end{aligned}$$

is negligible, where \((\texttt {SK}\), \(\texttt {PK})\leftarrow \texttt {KeyGen}(1^{\lambda })\), \(\mathcal {A}\) chooses \(m_0\), \(m_1\) and \(|m_0|=|m_1|\).

Homomorphic Addition. Paillier cryptosystem is homomorphic, i.e.

$$\begin{aligned} \texttt {Dec}(\texttt {Enc}(m_1)\cdot \texttt {Enc}(m_2))\bmod {n^2}=m_1+m_2\bmod {n}. \end{aligned}$$

We need this property to achieve forward security of our DSSE.

2.3 Notations

The list of notations used is given in Table 2.

Table 2. Notations (used in our constructions)

3 Dynamic Searchable Symmetric Encryption (DSSE)

We follow the database model given in the paper [5]. A database is a collection of (index, keyword set) pairs denoted as DB \(=(ind_i,\mathbf W _i)_{i=1}^d\), where \(ind_i\in \{0,1\}^{\ell }\) and \(\mathbf W _i\subseteq \{0,1\}^*\). The set of all keywords of the database DB is \(\mathbf W =\cup _{i=1}^d\mathbf W _i\), where d is the number of documents in DB. We identify \(W=|\mathbf W |\) as the total number of keywords and \(N=\varSigma _{i=1}^d|\mathbf W _i|\) as the number of document/keyword pairs. We denote DB(w) as the set of documents that contain a keyword w. To achieve a sublinear search time, we encrypt the file indices of DB(w) corresponding to the same keyword w (a.k.a. inverted indexFootnote 1).

A DSSE scheme \(\varGamma \) consists of an algorithm Setup and two protocols Search and Update as described below.

  • (EDB, \(\sigma \)) \(\leftarrow \) Setup(DB, \(1^{\lambda }\)): For a security parameter \(1^{\lambda }\) and a database DB. The algorithm outputs an encrypted database EDB for the server and a secret state \(\sigma \) for the client.

  • (\(\mathcal {I}\), \(\perp \)) \(\leftarrow \) Search(q, \(\sigma \), EDB): The protocol is executed between a client (with her query q and state \(\sigma \)) and a server (with its EDB). At the end of the protocol, the client outputs a set of file indices \(\mathcal {I}\) and the server outputs nothing.

  • (\(\sigma '\), EDB\('\)) \(\leftarrow \) Update(\(\sigma \), op, in, EDB): The protocol runs between a client and a server. The client input is a state \(\sigma \), an operation \(op=(add,del)\) she wants to perform and a collection of \(in=(ind, \mathbf w )\) pairs that are going to be modified, where adddel mean the addition and deletion of a document/keyword pair, respectively, and ind is the file index and w is a set of keywords. The server input is EDB. Update returns an updated state \(\sigma '\) to the client and an updated encrypted database EDB\('\) to the server.

3.1 Security Definition

The security definition of DSSE is formulated using the following two games: \({\texttt {DSSEREAL}}_{\mathcal {A}}^{\varGamma }(1^{\lambda })\) and \({\texttt {DSSEIDEAL}}_{\mathcal {A},\mathcal {S}}^{\varGamma }(1^{\lambda })\). The \({\texttt {DSSEREAL}}_{\mathcal {A}}^{\varGamma }(1^{\lambda })\) is executed using DSSE. The \({\texttt {DSSEIDEAL}}_{\mathcal {A},\mathcal {S}}^{\varGamma }(1^{\lambda })\) is simulated using the leakage of DSSE. The leakage is parameterized by a function \(\mathcal {L}=(\mathcal {L}^{Stp}, \mathcal {L}^{Srch}, \mathcal {L}^{Updt})\), which describes what information is leaked to the adversary \(\mathcal {A}\). If the adversary \(\mathcal {A}\) cannot distinguish these two games, then we can say there is no other information leaked except the information that can be inferred from the leakage function \(\mathcal {L}\). More formally,

  • \({\texttt {DSSEREAL}}_{\mathcal {A}}^{\varGamma }(1^{\lambda })\): On input a database DB, which is chosen by the adversary \(\mathcal {A}\), it outputs EDB by using \(\mathbf Setup (1^{\lambda },\) DB) to the adversary \(\mathcal {A}\). \(\mathcal {A}\) can repeatedly perform a search query q (or an update query (opin)). The game outputs the results generated by running Search(q) (or Update(opin)) to the adversary \(\mathcal {A}\). Eventually, \(\mathcal {A}\) outputs a bit.

  • \({\texttt {DSSEIDEAL}}_{\mathcal {A},\mathcal {S}}^{\varGamma }(1^{\lambda })\): On input a database DB which is chosen by the adversary \(\mathcal {A}\), it outputs EDB to the adversary \(\mathcal {A}\) by using a simulator \(\mathcal {S}(\mathcal {L}^{Stp}(1^{\lambda }\), DB)). Then, it simulates the results for the search query q by using the leakage function \(\mathcal {S}(\mathcal {L}^{Srch}(q))\) and uses \(\mathcal {S}(\mathcal {L}^{Updt}(op,in))\) to simulate the results for update query (opin). Eventually, \(\mathcal {A}\) outputs a bit.

Definition 1

A DSSE scheme \(\varGamma \) is \(\mathcal {L}\)-adaptively-secure if for every PPT adversary \(\mathcal {A}\), there exists an efficient simulator \(\mathcal {S}\) such that

$$\begin{aligned} |\Pr [\texttt {DSSEREAL}_{\mathcal {A}}^{\varGamma }(1^{\lambda })=1]-\Pr [\texttt {DSSEIDEAL}_{\mathcal {A},\mathcal {S}}^{\varGamma }(1^{\lambda })=1]|\le negl(1^{\lambda }). \end{aligned}$$

4 Constructions

In this section, we give two DSSE constructions. In order to process range queries, we deploy a new binary tree which is modified from the binary tree in [13]. Now, we first give our binary tree used in our constructions.

4.1 Binary Tree for Range Queries

In a binary tree BT, every node has at most two children named left and right. If a node has a child, then there is an edge that connects these two nodes. The node is the parent parent of its child. The root root of a binary tree does not have parent and the leaf of a binary tree does not have any child. In this paper, the binary tree is stored in thew form of linked structures. The first node of BT is the root of a binary tree. For example, the root node of the binary tree BT is BT, the left child of BT is BT.left, and the parent of BT’s left child is BT.left.parent, where BT = BT.left.parent.

In a complete binary tree CBT, every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible (the leaf level may not full). A perfect binary tree PBT is a binary tree in which all internal nodes (not the leaves) have two children and all leaves have the same depth or same level. Note that, PBT is a special CBT.

4.2 Binary Database

In this paper, we use binary database BDB which is generated from DB. In DB, keywords (the first row in Fig. 1(c)) are used to retrieve the file indices (every column in Fig. 1(c)). For simplicity, we map keywords in DB to the values in the range [0, \(m-1\)] for range queriesFootnote 2, where m is the maximum number of values. If we want to search the range [0, 3], a naïve solution is to send every value in the range (0, 1, 2 and 3) to the server, which is not efficient. To reduce the number of keywords sent to the server, we use the binary tree as shown in Fig. 1(a). For the range query [0, 3], we simply send the keyword \(n_3\) (the minimum nodes to cover value 0, 1, 2 and 3) to the server. In BDB, every node in the binary tree is the keyword of the binary database, and every node has all the file indices for its decedents, as illustrated in Fig. 1(d).

As shown in Fig. 1(a), keyword in BDB corresponding to node i (the black integer) is \(n_i\) (e.g. the keyword for node 0 is \(n_0\).). The blue integers are the keywords in DB and are mapped to the values in the range [0, 3]. These values are associated with the leaves of our binary tree. The words in red are the file indices in DB. For every node (keyword), it contains all the file indices in its descendant leaves. Node \(n_1\) contains \(f_0,f_1,f_2,f_3\) and there is no file in node \(n_4\) (See Fig. 1(d)). For a range query [0, 2], we need to send the keywords \(n_1, n_4\) (\(n_1\) and \(n_4\) are the minimum number of keywords to cover the range [0, 2].) to the server, and the result file indices are \(f_0, f_1, f_2\) and \(f_3\).

Bit String Representation.

We parse the file indices for every keyword in BDB (e.g. every column in Fig. 1(d)) into a bit string, which we will use later. Suppose there are \(y-1\) documents in our BDB, then we need y bits to represent the existence of these documents. The highest bit is the sign bit (0 means positive and 1 means negative). If \(f_i\) contains keyword \(n_j\), then the i-th bit of the bit string for \(n_j\) (every keyword has a bit string) is set to 1. Otherwise, it is set to 0. For update, if we want to add a new file index \(f_i\) (which also contains keyword \(n_j\)) to keyword \(n_j\), we need a positive bit string, where the i-th bit is set to 1 and all other bits are set to 0. Next, we add this bit string to the existing bit string associated with \(n_j\)Footnote 3. Then, \(f_i\) is added to the bit string for \(n_j\). If we want to delete file index \(f_i\) from the bit string for \(n_j\), we need a negative bit string (the most significant bit is set to 1), the i-th bit is set to 1 and the remaining bits are set to 0. Then, we need to get the complement of the bit stringFootnote 4. Next, we add the complement bit string as in the add operation. Finally, the \(f_i\) is deleted from the bit string for \(n_j\).

For example, in Fig. 1(b), the bit string for \(n_0\) is 000001, and the bit string for \(n_4\) is 000000. Assume that we want to delete file index \(f_0\) from \(n_0\) and add it to \(n_4\). First we need to generate bit string 000001 and add it to the bit string (000000) for \(n_4\). Next we generate the complement bit string 111111 (the complement of 100001) and add it to 000001 for \(n_0\). Then, the result bit strings for \(n_0\) and \(n_4\) are 000000 and 000001, respectively. As a result, the file index \(f_0\) has been moved from \(n_0\) to \(n_4\).

Fig. 1.
figure 1

Architecture of our binary tree for range query (Color figure online)

Binary Tree Assignment and Update. As we use the binary tree to support data structure needed in our DSSE, we define the following operations that are necessary to manipulate the DSSE data structure.

TCon(m): For an integer m, the operation builds a complete binary tree CBT. CBT has \(\lceil log(m)\rceil + 1\) levels, where the root is on the level 0, and the leaves are on the level \(\lceil log(m)\rceil \). All leaves are associated with the m consecutive integers from left to right.

TAssign(CBT): The operation takes a CBT as an input and outputs an assigned binary tree ABT, where nodes are labelled by appropriate integers. The operation applies TAssignSub recursively. Keywords then are assigned to the node integers.

TAssignSub(c, CBT): For an input pair: a counter c and CBT, the operation outputs an assigned binary tree. It is implemented as a recursive function. It starts from 0 and assigns to nodes incrementally. See Fig. 2 for an example.

figure a

TGetNodes(n, ABT): For an input pair: a node n and a tree ABT, the operation generates a collection of nodes in a path from the node n to the root node. This operation is needed for our update algorithm if a client wants to add a file to a leaf (a value in the range). The file is added to the leaf and its parent nodes.

TUpdate(add, v, CBT): The operation takes a value v and a complete binary tree CBT and updates CBT so the tree contains the value v. For simplicity, we consider the current complete binary tree contains values in the range \([0,v-1]\)Footnote 5. Depending on the value of v, the operation is executed according to the following cases:

  • \(v=0\): It means that the current complete binary tree is null, we simply create a node and associate value \(v=0\) with the node. The operation returns the node as CBT.

  • \(v>0\): If the current complete binary tree is a perfect binary tree PBT or it consists of a single node only, we need to create a virtual binary tree VBT, which is a copy of the current binary tree. Next, we merge the virtual perfect binary tree with the original one getting a large perfect binary tree. Finally, we need to associate the value v with the least virtual leaf (the leftmost virtual leaf without a value) of the virtual binary tree and set this leaf and its parents as real. For example, in Fig. 2(a), \(v=4\), the nodes with solid line are real and the nodes with dot line are virtual which can be added later. Otherwise, we directly associate the value v to the least virtual leaf and set this leaf and its parents as realFootnote 6. In Fig. 2(b), \(v=5\).

Fig. 2.
figure 2

Example of update operation

Note that, in our range queries, we need to parse a normal database DB to its binary form BDB. First, we need to map keywords of DB to integers in the range \([0, |W|-1]\), where |W| is the total number of keywords in DB. Next, we construct a binary tree as described above. The keywords are assigned to the nodes of the binary tree and are associated with the documents of their descendants. For example, In Fig. 1a, the keywords are \(\{n_0, n_1, \cdots , n_6\}\) and \(\texttt {BDB}(n_0)=\{f_0\}\), \(\texttt {BDB}(n_1)=\{f_0,f_1,f_2,f_3\}\).

4.3 DSSE Range Queries - Construction A

In this section, we apply our new binary tree to the Bost [5] scheme to support range queries. For performing a ranger query, the client in our scheme first determine a collection of keywords to cover the requested range. Then, she generates the search token corresponding to each node (in the cover) and sends them to the sever, which can be done in a similar way as [5]. Now we are ready to present the first DSSE scheme that supports range queries and is forward-secure. The scheme is described in Algorithm 2, where F is a cryptographically strong pseudorandom function (PRF), \(H_1\) and \(H_2\) are keyed hash functions and \(\varPi \) is a trapdoor permutation.

Setup(\(1^\lambda \)): For a security parameter \(1^\lambda \), the algorithm outputs (\(\texttt {TPK}, \texttt {TSK},\) \(K, \mathbf T , \mathbf N , m\)), where \(\texttt {TPK}\) and \(\texttt {TSK}\) are the public key and secret keys of the trapdoor permutation, respectively, K is the secret key of function F, T, N are maps and m is the maximum number of the values in our range queries. The map N is used to store the pair keyword/(\(ST_c\), c) (current search token and the counter c, please see Algorithm 2 for more details.) and is kept by the client. The map T is the encrypted database EDB that used to store the encrypted indices which is kept by the server.

Search(\([a,b],\sigma ,\) m, EDB): The protocol is executed between a client and a server. The client asks for documents, whose keywords are in the range [ab], where \(0\le a \le b<m\). The current state of EDB is \(\sigma \) and the integer m describes the maximum number of values. Note that knowing m, the client can easily construct the complete binary tree. The server returns a collection of file indices of requested documents.

figure b

Update(\(add, v, ind, \sigma ,\) m, EDB): The protocol is performed jointly by a client and server. The client wishes to add an integer v together with a file index ind to EDB. The state of EDB is \(\sigma \), the number of values m. There are following three cases:

  • \(v<m\): The client simply adds ind to the leaf, which contains value v and its parents (See line 9–24 in Algorithm 2). This is a basic update, which is similar to the one from [5].

  • \(v=m\): The client first updates the complete binary tree to which she adds the value v. If a new root is added to the new complete binary tree, then the server needs to add all file indices of the old complete binary tree to the new one. Finally, the server needs to add ind to the leaf, which contains value v and its parents.

  • \(v>m\): The client uses Update as many times as needed. For simplicity, we only present the simple case \(v=m\), i.e., the newly added value v equals the maximum number of values of the current range \([0, m-1]\), in the description of Algorithm 2.

The DSSE supports range queries at the cost of large client storage, since the number of search tokens is linear in the number of all nodes of the current tree instead of only leaves. In [5], the number of entries at the client is |W|, while it would be roughly 2|W| in this construction. Moreover, communication costs is heavy since the server needs to return all file indices to the client when the binary tree is updated with a new root. To overcome the weakness, we give a new construction with lower client storage and communication costs in the following section.

4.4 DSSE Range Queries - Construction B

In this section, we give the second construction by leveraging the Paillier cryptosystem [17], which significantly reduce the client storage and communication costs compared with the first one. With the homomorphic addition property of the Paillier cryptosystem, we can add and delete the file indices by parsing them into binary strings, as illustrated in Sect. 4.2. Next we briefly describe our second DSSE, which can not only support range queries but also achieve both forward and backward security. The scheme is described in Algorithm 3.

Setup(\(1^\lambda \)): For a security parameter \(1^\lambda \), the algorithm returns (\(\texttt {PK}, \texttt {SK}, K,\) \(\mathbf T , m\)), where \(\texttt {PK}\) and \(\texttt {SK}\) are the public and secret keys of the Paillier cryptosystem, respectively, K is the secret key of a PRF F, m is the maximum number of values which can be used to reconstruct the binary tree and the encrypted database EDB is stored in a map T which is kept by the server.

Search(\([a,b],\sigma ,\) m, EDB): The protocol is executed between a client and a server. The client queries for documents, whose keywords are in the range [ab], where \(0\le a \le b<m\). \(\sigma \) is the state of EDB, and integer m specifies the maximum values for our range queries. The server returns encrypted file indices e to the client, who can decrypt e by using the secret key \(\texttt {SK}\) of Pailler Cryptosystem and obtain the file indices of requested documents.

Update(\(op, v, ind, \sigma ,\) m, EDB): The protocol runs between a client and a server. A requested update is named by the parameter op. The integer v and the file index ind specifies the tree nodes that need to be updated. The current state \(\sigma \), the integer m and the server with input EDB. If \(op=add\), the client generates a bit string as prescribed in Sect. 4.2. In case when \(op=delete\), the client creates the complement bit string as given in Sect. 4.2. The bit string bs is encrypted using the Paillier cryptosystem. The encrypted string is denoted by e. There are following three cases:

figure c
  • \(v<m\): The client sends the encrypted bit string e with the leaf \(n_v\) containing value v and its parents to server. Next the server adds e with the existing encrypted bit strings corresponding to the nodes specified by the client. See line 11–23 in Algorithm 3 which is similar to the update in Algorithm 2.

  • \(v=m\): The client first updates the complete binary tree to which she adds the value v. If a new root is added to the new complete binary tree, then the client retrieves the encrypted bit string of the root (before update). Next the client adds it to the new root by sending it with the new root to the server. Finally, the client adds e to the leaf that contains value v and its parents as in \(v<m\) case.

  • \(v>m\): The client uses Update as many times as needed. For simplicity, we only consider \(v=m\), where m is the number of values in the maximum range.

In this construction, it achieves both forward and backward security. Moreover, the communication overhead between the client and the server is significantly reduced due to the fact that for each query, the server returns a single ciphertext to the client at the cost of supporting small number of documents. Since, in Paillier cryptosystem, the length of the message is usually small and fixed (e.g. 1024 bits).

This construction can be applied to applications, where the number of documents is small and simultaneously the number of keywords can be large. The reason for this is the fact that for a given keyword, the number of documents which contain it is small. Consider a temperature forecast system that uses a database, which stores records from different sensors (IoT) located in different cities across Australia. In the application, the cities (sensors) can be considered as documents and temperature measurements can be considered as the keywords. For example, Sydney and Melbourne have the temperature of 18\(^{\circ }\)C. Adelaide and Wollongong have got 17\(^{\circ }\)C and 15\(^{\circ }\)C, respectively. If we query for cities, whose temperature measurements are in the range from 17 to 18\(^{\circ }\)C, then the outcome includes Adelaide, Sydney and Melbourne. Here, the number of cities (documents) is not large. The number of different temperature measurements (keywords) can be large depending on requested precision.

5 Security Analysis

In our constructions, we parse a range query into several keywords. Then, following [5], the leakage to the server is summarized as follows:

  • search pattern \(\texttt {sp}(w)\), the repetition of the query w.

  • history \(\texttt {Hist}(w)\), the history of keyword w. It includes all the updates made to \(\texttt {DB}(w)\).

  • contain pattern \(\texttt {cp}(w)\), the inclusion relation between the keyword w with previous queried keywords.

  • time \(\texttt {Time}(w)\), the number of updates made to \(\texttt {DB}(w)\) and when the update happened.

Note that, contain pattern \(\texttt {cp}(w)\) is an inherited leakage for range queries when the file indices are revealed to the server. If a query \(w'\) is a subrange of query w, then the file index set for \(w'\) will also be a subset of the file index set for w.

5.1 Forward Security and Backward Security

Forward security means that an update does not leak any information about keywords of updated documents matching a query we previously issued. A formal definition is given below:

Definition 2

([5]) A \(\mathcal {L}\)-adaptively-secure DSSE scheme \(\varGamma \) is forward-secure if the update leakage function \(\mathcal {L}^{Updt}\) can be written as

$$\mathcal {L}^{Updt}(op,in)=\mathcal {L}'(op,{(ind_i,\mu _i)})$$

where \({(ind_i,\mu _i)}\) is the set of modified documents paired with number \(\mu _i\) of modified keywords for the updated document \(ind_i\).

Backward security means that a search query on w does not leak the file indices that previously added and later deleted. More formally, we use the level I definition of [6] with modifications which leaks less information. It leaks the encrypted documents currently matching w, when they were updated, and the total number of updates on w.

Definition 3

A \(\mathcal {L}\)-adaptively-secure DSSE scheme \(\varGamma \) is insertion pattern revealing backward-secure if the the update leakage function \(\mathcal {L}^{Srch}\), \(\mathcal {L}^{Updt}\) can be written as \(\mathcal {L}^{Updt}(op,w,ind)=\mathcal {L}'(op)\), \(\mathcal {L}^{Srch}(w)=\mathcal {L}''(\texttt {Time}(w))\).

5.2 Construction A

Since the first DSSE construction is based on [5], it inherits security of the original design. Adaptive security of the construction A can be proven in the Random Oracle Model and is a modification of the security proof of [5]. Due to page limitation, we give a sketch proof here, and refer the reader to the full version [24] for the full proof.

Theorem 1

(Adaptive forward security of A). Let \(\mathcal {L}_{\varGamma _A}=(\mathcal {L}_{\varGamma _A}^{Srch}\), \(\mathcal {L}_{\varGamma _A}^{Updt})\), where \(\mathcal {L}_{\varGamma _A}^{Srch}(n)\) \(=(\texttt {sp}(n),\texttt {Hist}(n),\texttt {cp}(n))\), \(\mathcal {L}_{\varGamma _A}^{Updt}(add,n,ind)=\perp \). The construction A is \(\mathcal {L}_{\varGamma _A}\)-adaptively forward-secure.

Proof

(Sketch) Compared with [5], this construction additionally leaks the contain pattern \(\texttt {cp}\) as described in Sect. 3.1. Other leakages are exactly the same as [5]. Since the server executes one keyword search and update one keyword/file-index pair at a time. Note that the server does not know the secret key of the trapdoor permutation, so it cannot learn anything about the pair even if the keyword has been searched by the client previously.

5.3 Construction B

The adaptive security of second DSSE construction relies on the semantic security of Paillier cryptosystem. All file indices are encrypted using the public key of Paillier cryptosystem. Without the secret key, the server cannot learn anything from the ciphertext. Due to page limitation, we give a sketch proof here and refer the reader to the full version [24] for the full proof.

Theorem 2

(Adaptive forward security of B). Let \(\mathcal {L}_{\varGamma _B}=(\mathcal {L}_{\varGamma _B}^{Srch}\), \(\mathcal {L}_{\varGamma _B}^{Updt})\), where \(\mathcal {L}_{\varGamma _B}^{Srch}(n)\) \(=(\texttt {sp}(n))\), \(\mathcal {L}_{\varGamma _B}^{Updt}(op,n,ind)=(\texttt {Time}(n))\). Construction B is \(\mathcal {L}_{\varGamma _B}\)-adaptively forward-secure.

Proof

(Sketch) In construction B, for the update, we only leak the number of updates corresponding to the queried keywords n. Since all cryptographic operations are performed at the client side where no keys are revealed to the server, the server can learn nothing from the update, given that the Paillier cryptosystem scheme is IND-CPA secure. We can simulate the \(\texttt {DSSEREAL}\) as in Algorithm 3 and simulate the \({\texttt {DSSEIDEAL}}\) by encrypting all 0’s strings for the EDB. The adversary \(\mathcal {A}\) can not distinguish the real ciphertext from the ciphertext of 0’s. Then, \(\mathcal {A}\) cannot distinguish \({\texttt {DSSEREAL}}\) from \({\texttt {DSSEIDEAL}}\). Hence, our Construction B achieves forward security.\(\Box \)

Theorem 3

(Adaptive backward security of B). Let \(\mathcal {L}_{\varGamma _B}=(\mathcal {L}_{\varGamma _B}^{Srch}\), \(\mathcal {L}_{\varGamma _B}^{Updt})\), where \(\mathcal {L}_{\varGamma _B}^{Srch}(n)=(\texttt {sp}(n), \texttt {Hist}(n))\), \(\mathcal {L}_{\varGamma _B}^{Updt}(op,n,ind)=(\texttt {Time}(n))\). Construction B is \(\mathcal {L}_{\varGamma _B}\)-adaptively backward-secure.

Proof

(Sketch) The construction B does not leak the type of update (either add or del) on encrypted file indices since it has been encrypted. Moreover, it does not leak the file indices that previously added and later deleted. The construction B is backward-secure. Since the leakage is same as Theorem 2, then the simulation is same as Theorem 2.\(\Box \)

6 Conclusion

In this paper, we give two secure DSSE schemes that support range queries. The first DSSE construction applies our binary tree to the scheme from [5] and is forward-secure. However, it incurs a large storage overhead in the client and a large communication costs between the client and the server. To address these problems, we propose the second DSSE construction with range queries that uses Paillier cryptosystem. It achieves both the forward and backward security. Although the second DSSE construction cannot support large number of documents, it can still be very useful in certain applications. In the future, we would like to construct more scalable DSSE schemes with more expressive queries.