Generic Multi-keyword Ranked Search on Encrypted Cloud Data

Kasra Kermanshahi, Shabnam; Liu, Joseph K.; Steinfeld, Ron; Nepal, Surya

doi:10.1007/978-3-030-29962-0_16

Shabnam Kasra Kermanshahi^11,12,
Joseph K. Liu¹¹,
Ron Steinfeld¹¹ &
…
Surya Nepal¹²

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11736))

Included in the following conference series:

European Symposium on Research in Computer Security

2305 Accesses
7 Citations

Abstract

Although searchable encryption schemes allow secure search over the encrypted data, they mostly support conventional Boolean keyword search, without capturing any relevance of the search results. This leads to a large amount of post-processing overhead to find the most matching documents and causes an unnecessary communication cost between the servers and end-users. Such problems can be addressed efficiently using a ranked search system that retrieves the most relevant documents. However, existing state-of-the-art solutions in the context of Searchable Symmetric Encryption (SSE) suffer from either (a) security and privacy breaches due to the use of Order Preserving Encryption (OPE) or (b) non-practical solutions like using the two non-colluding servers. In this paper, we present a generic solution for multi-keyword ranked search over the encrypted cloud data. The proposed solution can be applied over different symmetric searchable encryption schemes. To demonstrate the practicality of our technique, in this paper we leverage the Oblivious Cross Tags (OXT) protocol of Cash et al. (2013) due to its scalability and remarkable flexibility to support different settings. Our proposed scheme supports the multi-keyword search on Boolean, ranked and limited range queries while keeping all of the OXT’s properties intact. The key contribution of this paper is that our scheme is resilience against all common attacks that take advantage of OPE leakage while only a single cloud server is used. Moreover, the results indicate that using the proposed solution the communication overhead decreases drastically when the number of matching results is large.

You have full access to this open access chapter, Download conference paper PDF

Multi-keyword Ranked Search with Privacy Protection on Encrypted Cloud Data

Ranked Searchable Symmetric Encryption Supporting Conjunctive Queries

Inverted Index Based Ranked Keyword Search in Multi-user Searchable Encryption

Keywords

1 Introduction

In contrast to Boolean queries, which rely on appearance of the queried keywords in the database and return the matching documents, the ranked search captures the most relevant documents for a query. In the potentially huge result space, ranked search systems minimize the post processing of data for end-users. Moreover, it has a great impact on the system usability and performance when dealing with the massive amounts of data stored in the cloud. Ranked search has been widely studied by Information Retrieval (IR) and database communities. Top-$\mathscr {K}$ query processing techniques such as TA [13] and FA [12] are well-known examples of such systems.

However, such techniques do not preserve the privacy of the data stored on the database. That is, they require direct access to the relevance scores as well as various modes of access to data which makes them inapplicable in the context of encrypted data search.

Related Works. The first multi-keyword ranked search scheme was proposed by Cao et al. [6], where both documents and queries are represented as vectors of dictionary size. This scheme sorts documents using the score based on Inner Product Similarity (IPS), where a document score is simply the number of matches of queried keywords in each document, which is not accurate [3]. In general, for ranked search the following techniques are proposed in the literature.

Fully Homomorphic Encryption (FHE): Although FHE [14] supports arbitrary computations over the encrypted data, due to the high performance overheads its not suitable for practical database queries [21].
ORAM: Similar to FHE, ORAM (Oblivious Random-Access Machine) [16] is computationally expensive to be used in practice [21]. Although some works tried to improve ORAM efficiency [4, 10, 17, 27], its application in symmetric encryption for execution of top-$\mathscr {K}$ queries is limited [24].
OPE: Order Preserving Encryption (OPE) [1] allows efficient range queries over the encrypted data. However, OPE reveals the relative order of elements in the database, and therefore does not meet the data owner’s data privacy.
Using two (or more) non-colluding servers: The authors of [3] justified the assumption of non-colluding servers; these parties are usually supplied by different companies hence have also commercial interests not to collude. This model would be a solution to avoid multiple rounds of user-to-server interactions. However, it requires the server-to-server interactions instead. Moreover, this assumption is less appealing in practice compared to the traditional single-server model [25].

Motivations. Among different methods to support ranked query for searchable encryption, OPE is the most popular one due to its efficiency. However, the leakage associated with OPE makes it vulnerable to several attacks. Naveed et al. [22] presented two attacks against OPE as follows:

Sorting Attack: This attack decrypts the OPE-encrypted columns. That is, adversary sorts the ciphertext and the message space, and outputs a function that maps each ciphertext to the element of message space with the same rank.

Cumulative Attack: OPE reveals the frequency of the data and its relative order at the same time which helps an adversary to find out what fraction of encrypted data is smaller than each ciphertext. This is known as cumulative attack. This attack recovers plaintext from OPE with high probability using auxiliary information^{Footnote 1}.

Table 1. Summary of comparison

Full size table

Durak et al. [11] showed that the above attacks did not take advantage of the additional leakage that is present in OPE constructions. They discussed additional two types of attacks Inter-column correlation-based attacks and Inter+Intra-column correlation-based attack. The former takes the advantage of correlation between OPE columns where the adversary knows a bounding box for the plaintext. That is, the columns of data in a table are usually correlated because a row of a table usually corresponds to an individual record. The latter attack uses both inter and intra column correlations.

Table 1 provides a summary of some related works that support ranked search over an encrypted database. They either used OPE or cryptographic primitives over two servers. The former suffers from serious leakages and the latter from usability (i.e., issues to use in practice). Shen et al. [23] built an OPE on the top of Oblivious Cross Tags (OXT) protocol of [7]. Although their scheme is efficient, it is vulnerable against aforementioned attacks due to the OPE leakage. To avoid revealing the distribution of scores in OPE, Wang et al. [26] proposed one-to-many OPE. Their construction conceals the distribution of scores using a probabilistic encryption. However, Li et al. [19] presented a differential attack over one-to-many OPE which reveals the leakage of distribution by exploiting the difference between ciphertexts. It is assumed that the attacker has some background information which helps him to infer the encrypted keywords using differential attacks. On the other hand, if two servers are located in the same place, this contradicts with the assumption that they do not collude. There would be server-to-server communication overhead if they are located in different places. This cancels out the major advantage of ranked query, minimizing the unnecessary network traffics. Therefore, an effective solution to support ranked query over symmetric searchable encryption schemes is still a challenge. Our proposed approach in this paper aims to address this challenge.

Our Contributions. The key contribution of this paper is a generic solution for supporting effective multi-keyword ranked search over an encrypted database. We demonstrate the application of this solution through the proposed Multi-keyword Ranked Searchable Symmetric Encryption scheme (MRSSE). MRSSE is secure against all of the attacks related to OPE without relying on a two server assumption, and hence overcomes the limitations of existing approaches. More precisely, MRSSE uses somewhat homomorphic encryption within our proposed filtering techniques instead of OPE to provide a ranked search. In terms of functionality, MRSSE supports the multi-keyword search over Boolean, ranked and limited range queries without adding any extra leakage. It reduces the communication overhead when the number of search results is large (we give examples in Sect. 5.2) while the security is guaranteed. The effectiveness of MRSEE is proven via security and efficiency analyses. It is important to note that though our approach is generic, in this paper we leverage OXT protocol^{Footnote 2} as an example to demonstrate the applicability of the proposed approach.

It is worth to note that when performing rank search, the server learns a set of ciphertexts which are satisfying the ranking condition, this is an inherent leakage in ranked search. Moreover, in the most of the current solutions for ranked-search the relative order of the importance of the document (ranking) is also leaked to the server. However, the proposed solution in this paper avoids this leakage as the server returns always a fixed size of unsorted results (the actual results are padded with the encryption of 0s) and the client performs sorting locally.

Our Technique. We used various homomorphic encryption tools and techniques to efficiently filter the search results. We considered using BGV-type homomorphic encryption but it resulted in high depth for equality check on integers (j and P)^{Footnote 3}. Hence, we reduced this depth by using unary encoding (we also have document scores encoded in unary which are small). However, the conditional increment of the pointer “P" involves a multiplication. Therefore, we switched to Ring-GSW homomorphic encryption which allows us to do the repeated multiplications with low noise growth (refer to Sect. 3.1 for details).

2 Preliminaries

In this section, we present notations and definitions needed in our construction.

Cryptographic Primitives. The utilized cryptographic primitives in this paper are presented in details in the appendix.

Notations. Notations frequently used in this paper are listed in Table 2.

Table 2. Notations and Terminologies

Full size table

Scoring Approach. We use a common method of evaluating a relevance score, called $TF \times IDF$ (term frequency times inverse document frequency) [28]. However, it should not be regarded as the name suggest. It is defined as $Score(w_j,id_i)=\dfrac{1}{|id_i|}.(ln(1+\dfrac{N}{f_{w_j}})).(1+ln ( f _{id_i,w_j}))$. This score consists of two main components; term weight $tw=(ln(1+\dfrac{N}{f_{w_j}}))$ and relative term frequency $r_{d,t}=(ln(1+\dfrac{N}{f_{w_j}}))$. Here, $f_{w_j}$ is the number of documents that contain the keyword/term $w_j$, N is total number of documents in the collection, $ f _{id_i,w_j}$ is the frequency of $w_j$ in the document $id_i$ and, $|id_i|$ is a normalization factor to discount the contribution of long documents and obtained by counting the number of indexed terms in a document. Note that $Score(w_j,id_i)$ is set to zero if the keyword $w_j$ does not appear in document with identifier $id_i$. Finally, the similarity measure is the sum of the products of the weights of query terms (Q) and the corresponding document terms [28]: $Score(Q,id_i)=\sum \limits _ {w_j \in Q} Score(w_j,id_i)$.

3 Our Threshold-Based Filtering Approach

We solve the problem of multi-keyword ranked search by introducing threshold-based filtering on the ciphertexts. This generic approach can be applied on symmetric searchable encryption schemes. Assume that the database DB is going to be outsourced to a honest-but-curious cloud server. The first step for the data owner is to generate the scored encrypted database. In this phase the data owner can apply any desired scoring technique (such as $TF \times IDF$). Whenever, the data owner wants to search through the scored encrypted database, he need to generate a search token and choose a threshold value. Thus, the server would be able to find the matching results, then using homomorphic operations aggregate the scores and filter them according to the threshold and return the most relevant results (with the scores higher or equivalent to the threshold). Since the size of output buffer, N, is independent of total number of search results, n, we can achieve a homomorphic computation and communication independent of n.

3.1 Homomorphic Operations

We define the following homomorphic operations which are used in the homomorphic search and homomorphic filter presented in Sects. 3.2 and 3.3, respectively.

Component-wise homomorphic Operations: We represent the encryption of a $\ell $-bit integer such as $s=s_{\ell -1}s_{\ell -2}...s_0$ using Ring-GSW (see appendix) as $\hat{s}=Enc(s_{\ell -1})$ $Enc(s_{\ell -2})...Enc(s_0)$, where the plaintext space of encryption is $(\mathbb {Z}_2, (+,.))$. We denote addition and multiplication over ciphertext with $\boxplus $ and $\boxdot $, respectively. It extends to vectors of encrypted bits by just doing $\boxplus $ and $\boxdot $ operations component-wise. Thus, the multiplication of an encryption of a bit z with a vector of the encryption of bits like $\hat{s}=Enc(s_{\ell -1})Enc(s_{\ell -2})...Enc(s_0)$, is defined to be $(Enc(z) \boxdot Enc(s_{\ell -1}))(Enc(z) \boxdot Enc(s_{\ell -2}))...(Enc(z) \boxdot Enc(s_0))$.

Remark. All of the following algorithms are applying the same operation component-wise to each bit ciphertext of the score except $Convert_{GSW,RLWE}(.)$.
Greater-Than Comparison ( ): For two encrypted unsigned $\ell $-bit integers $\hat{s_i}$ and $\hat{t}$, the operation ($\hat{s_i}$ $\hat{t}$) outputs 1 if $s_i \ge t$ and 0 otherwise. We adapt the greater-than comparison circuit of Cheon et al. [8] by defining a NOT operation. That is, NOT(b) outputs 1 if $b=1$ and 0 otherwise. This operation can be defined as $NOT(b)=1-b$. Thus, operation can be defined as $\hat{z_i}=$ $\hat{s_i}$ $\hat{t}$ for $\hat{z_i}=1\boxplus (1\boxplus \hat{s}_{i,\ell -1})\boxdot \hat{t}_{\ell -1}\boxplus \sum \limits _{j=0}^{\ell -2}(1\boxplus \hat{s}_{i,j})\boxdot \hat{t}_j\boxdot d_{j+1},...,d_{\ell -1}$. Here, $d_j=(1\boxplus \hat{s}_{i,j}\boxplus \hat{t}_j)$. The depth of this circuit is $log (\ell +1)$ and, to evaluate this circuit, $2\ell -2$ homomorphic multiplication is required.
Integer Addition ($\dotplus $): We use $\dotplus $ to denote the homomorphic integer addition operation. For addition of two $\nu $-bit integer, x and y, first we make them $\ell $-bit by padding with zeros on the left (here, $\ell > \nu $). Then, the sum $\hat{s_i}=\hat{x} \dotplus \hat{y}$ can be computed efficiently using SIMD operations as introduce in [8]. For $i \in [1,\ell -1]$ with initial values $\hat{s_0}=\hat{x_0} \boxplus \hat{y_0}$ and $\hat{Carry_0}=\hat{x_0} \boxdot \hat{y_0}$, where the $\hat{s_i}$s are written as $\hat{s_i}=\hat{x} \boxplus \hat{y}\boxplus \sum \limits _{j=0}^{i-1} t_{ij}$ where, $t_{ij}=(\hat{x_i} \boxdot \hat{y_i})\Pi _{j+1 \leqslant k \leqslant i-1}(\hat{x_k}\boxplus \hat{y_k})$ for $j<i-1$ and $t_{i,i-1}=\hat{x}_{i-1} \boxdot \hat{y}_{i-1}$. The circuit has $log(\ell -2)+1$ depth and using SIMD and parallelism, it can be evaluated just by $3\ell -5$ homomorphic multiplications.
Unary encoding: We represent the unary encoding of a number such as $p \in [0,N)$ (we assume $N<d -1$, where d is the ring dimension) as $p(x)=0x^0+0x^1+...+1x^p+...+0x^{N-1}$. The RLWE encryption of $\hat{p}(x)$ is in the form of $(c=\psi _0u+\varvec{t}g+p(x), c'=\psi _1u+\varvec{t}f)$ where $(\psi _0,\psi _1)$ is the public key, u, f, g are small random noises and $\varvec{t}$ is the plaintext space (here $\{0,1\}$). Note that, $\psi _0=(\psi _1.\varvec{s}+\varvec{t}\varvec{e})$ where $\varvec{s}$ is the secret. We may represent n-dimension encryption of $\varvec{p}$ using matrices as follows:
$$ \displaystyle \begin{aligned} {\begin{bmatrix} c_{0} \\ \\ \vdots \\ \\ c_{n-1} \end{bmatrix}} {=} {\begin{bmatrix} \psi _{0,0} &{} -\psi _{0,n-1} &{} \dots &{} -\psi _{0,1} \\ \psi _{0,1} &{} \psi _{0,0} &{} \dots &{}\vdots \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \psi _{0,n-1} &{} \psi _{0,n-2} &{} \dots &{} \psi _{0,0} \end{bmatrix}} {\begin{bmatrix} u_{0} \\ \\ \vdots \\ \\ u_{n-1} \end{bmatrix}} {+t} {\begin{bmatrix} g_{0} \\ \\ \vdots \\ \\ g_{n-1} \end{bmatrix}} {+} {\begin{bmatrix} \psi _{0} \\ \\ \vdots \\ \\ \psi _{n-1} \end{bmatrix}} \end{aligned} $$

$$ \displaystyle \begin{aligned} {\begin{bmatrix} c'_{0} \\ \\ \vdots \\ \\ c'_{n-1} \end{bmatrix}} \!{=} \!{\begin{bmatrix} \psi _{0,0} &{} -\psi _{0,n-1} &{} \dots &{} -\psi _{0,1} \\ \psi _{0,1} &{} \psi _{0,0} &{} \dots &{}\vdots \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \psi _{0,n-1} &{} \psi _{0,n-2} &{} \dots &{} \psi _{0,0} \end{bmatrix}}\! \!{\begin{bmatrix} u_{0} \\ \\ \vdots \\ \\ u_{n-1} \end{bmatrix}} \!{+t}\! {\begin{bmatrix} f_{0} \\ \\ \vdots \\ \\ f_{n-1} \end{bmatrix}} \end{aligned} $$
Increment by 1: The operation $Inc(\hat{p})$ increases the value of $p\in [0,N)$ by 1, unconditionally: $Inc(\hat{p})=x.\hat{p}(x)$ Here, $\hat{p}(x)$ is the unary encryption of p.
Conditional Increment by 1: The operation $Inc(\hat{p},\hat{z_i})$ increases the value of $\hat{p}(x)$ (which is the unary encoding of $p\in [0,N)$) by 1 if $\hat{z_i}=1$. This function can be defined as $Inc(\hat{p},\hat{z_i})=\hat{z_i}(x).(x.\hat{p}(x))+(1-\hat{z_i}(x)).\hat{p}(x)$. Here, $\hat{z_i}= {\left\{ \begin{array}{ll} 0 \\ 1 \end{array}\right. }$ is a constant polynomial. This point to the line 10 of Filter in Algorithm 1. To efficiently compute the iterated homomorphic multiplications of the increment function in the loop (line 10 of algorithm 1), we applied GSW encryption based on ring-LWE. More precisely, for $i=0,...,n-1$, we can rewrite the increment as $\hat{P}_{i+1}=\hat{z_i}.x\hat{P}_{i}+(1-\hat{z_i}).\hat{P}_{i}=(x-1)\hat{z_i}\hat{P}_{i}+\hat{P}_{i}$ which can be simplified to $\hat{P}_{i+1}=[(x-1)\hat{z_i}+1]\hat{P}_{i}$. Then, if we expand this down to $i=0$:
$$\begin{aligned} \hat{P}_{i+1}=\prod \limits _{j=1} ^i \hat{z}'_j \hat{P}_{0} \end{aligned}$$
where $\hat{z}'_j=(x-1)\hat{z_i}+1$. Note that there is no multiplicative depth in these computations due to the use of GSW encryption. One of the advantages of using GSW is the additive noise growth for iterative homomorphic multiplications [15]. Thus, the noise growth for $\hat{z_i}$ and $\hat{z}'_j$ is equivalent to i additions.
Equality ($f_j(.)$): To evaluate the equality of j and p in Filter(S, t) (line 18/19 of Algorithm 1), we define a function $f_j(p)= Enc_{LWE}(\langle \,\varvec{p} , \varvec{j}\,\rangle ) $. Here, $f_j(p)$ is equal to “$\hat{1}$” if $j=p$, and “$\hat{0}$” otherwise. Note that both p and j are unary encoded. Therefore, $f_j(p)={\varvec{j}}^T\varvec{p}$. Given the RLWE ciphertext $\hat{p}=(c,c')$ for unary encoded $\varvec{p}$ and the plaintext $\varvec{j}$, we implement $f_j(p)$ operation homomorphically as follows, which outputs LWE ciphertext of $f_j(p)$:
$$\begin{aligned} \begin{array}{c} c_{LWE}={\varvec{j}}^T{\varvec{c}}={\varvec{j}}^T rot(\psi _0)\varvec{u}+\varvec{t}.{\varvec{j}}^T\varvec{g}+{\varvec{j}}^T \varvec{p}\\ c'_{LWE}={\hat{c'}}^T={\varvec{j}}^T rot(c')={\varvec{j}}^T rot(\psi _0)rot(u)+\varvec{t}{\varvec{j}}^T rot(f)\\ \end{array} \end{aligned}$$
By substituting the $rot(\psi _0)$ for $rot(\psi _1)rot(\varvec{s})+\varvec{t} rot(\varvec{e})$, we can rewrite this operation in terms of the secret as follows:
$$\begin{aligned} \begin{array}{c} c_{LWE}=({\varvec{j}}^T rot(\psi _1)rot(u))\varvec{s}+\varvec{t}.({\varvec{j}}^T(rot(\varvec{e})\varvec{u}+{\varvec{g}}))+ {\varvec{j}}^T\varvec{p}\\ c'_{LWE}={\varvec{j}}^T (rot(\psi _1)rot(u))\varvec{s}+\varvec{t}.({\varvec{j}}^T(rot(\varvec{e})rot(u)+rot(f))) \\ \end{array} \end{aligned}$$
To do the decryption using the secret s, we proceed as follows:
$$\begin{aligned} \begin{array}{c} {\hat{c'}}^T\varvec{s}=({\varvec{j}}^T rot(\psi _1) rot(u))\varvec{s}+{\varvec{j}}^T rot(f)\varvec{s} \\ {\varvec{j}}^T \varvec{c}-{\hat{c'}}^T\varvec{s}=\varvec{t}{\varvec{j}}^T (rot(\varvec{e})\varvec{u}+\varvec{g}-rot(f)\varvec{s})+{\varvec{j}}^T \varvec{p}\\ \end{array} \end{aligned}$$
By reducing this result modulo t, the message which is ${\varvec{j}}^T \varvec{p}$ can be obtained.
Convert GSW to RLWE: We define $Convert_{GSW,RLWE}(.)$ function to convert a GSW ciphertext of kth bit $s_{i,k}$ of score into RLWE ciphertext of $s_{i,k}$ with message multiplied by $\frac{q}{2^{k+1}}$ for $k=0,...,6$. We define the convert function as $Convert_{GSW,RLWE}(C_{GSW},i)$ which picks the row $\ell -(r-i)$ of $C_{GSW}$ and outputs the RLWE ciphertext of form $c_0=a.\varvec{t}+\varvec{e}+\frac{q}{2^r}.2^i.s_i$ , $c_1=a$ which is an encryption of the plaintext $2^i.s_i$ with plaintext space modulo $2^r$. For the bits $s_0,...,s_{r-1}$ that at the end we want to pack into one ciphertext of the integer $s=\sum 2^i.s_i$, we use $Convert_{GSW,RLWE}(C_{GSW},i)$ for the bit $s_i$ to encode $s_i$ as $2^i.s_i$ and at the end we add the ciphertext i to get the ciphertext for $\sum 2^i.s_i=s$.
Convert RLWE to LWE: We define $Convert_{RLWE,LWE}(.)$ function to convert a RLWE ciphertext of a bit to the LWE ciphertext. More precisely, in Algorithm 1 line 11, the generated value $y_i$ is a GSW ciphertext, however, the homomorphic operation in the line 18 is done using LWE (similarly for $y'_i$). Therefore, we define the convert function as follows over the plaintext:
$$Convert_{RLWE,LWE}(y_i)= \displaystyle \begin{aligned} <{\begin{bmatrix} 1,0, \dots ,0 \end{bmatrix}} ,{\begin{bmatrix} y_{i} \\ 0\\ \vdots \\ 0 \end{bmatrix}}>=y_i \end{aligned} $$
Here, $y_i(x)=y_ix^0+0x^1+...+0x^{n-1}$. Similarly, for the ciphertext for $\hat{y}_i=(c,c')$, the conversion function $Convert_{RLWE,LWE}(\hat{y}_i)$ operates as follows:
$$\begin{aligned} \begin{array}{c} c_{LWE}=<{\begin{bmatrix} 1,0, \dots ,0 \end{bmatrix}} ,rot(c')>\\ c'_{LWE}=<{\begin{bmatrix} 1,0, \dots ,0 \end{bmatrix}} ,\varvec{c}> \end{array} \end{aligned}$$

3.2 Homomorphic Search Algorithm

The search algorithm only requires one homomorphic operation, an integer addition when the overall score using the aggregation function is computed. For instance, homomorphic aggregation of the scores in Server Search($Tok_\mathbf{q}, \mathsf {SEDB}$) of Algorithm2 can be written as follows:

Here, $\hat{cscore}_{s}$ and $\hat{cscore}_{x_i}$ are the ciphertexts of the scores of sterm and xterms, respectively. $\hat{S}^{(GSW)}$ denotes the GSW ciphertext of the aggregated score. The details of this homomorphic integer addition operation is given in Sect. 3.1.

3.3 Homomorphic Filter Algorithm

Algorithm 1 shows how the filtrating is performed on the ciphertexts. This algorithm reflects Filter(S, t) while the homomorphic computations are used. Figure 1 demonstrates the functionality of the Algorithm 1 (which is also used in the plaintext form in Filter of Algorithm 2). First, it compares the overall scores (output of the monotone aggregation function in the search algorithm) with the considered threshold value received from the user to set $z_i$s. Whenever, $s_i \ge t$ it sets $z_i=1$; otherwise it is set to zero. Whenever $z_i=1$, it copies the corresponding score to the buffer. Finally, it returns the scores stored in the buffer which are in fact equal or greater than the threshold value.

In Algorithm 1, we start homomorphic computations with GSW with the modulus q, where we have $\varvec{s}$ as the secret key (line 9 to 11). Then we do the conversion to LWE with the modulus q using the secret key $\varvec{s}$. The last multiplication (in line 18/19) involves three steps as defined in [5]; $\mathsf {Mult}$, $\mathsf {Scale}$, and $\mathsf {SwitchKey}$. After the first step, in the LWE with modulus q, the secret key becomes $\varvec{s}''$. Then, we do the modulus switching in the second step. We do the conversion to LWE with modulus $q'$ using the secret key $\varvec{s}''$. In the last step, we switch the key back to s while the modulus is still $q'$. Note that to support the above mentioned chain, we need to publish the public key of GSW as well as the key switching public key of LWE which is $\tau _{\varvec{s}'' \rightarrow \varvec{s}}$.

4 Our Multi-keyword Ranked Searchable Symmetric Encryption Scheme

In this section, we present our multi-keyword ranked searchable symmetric encryption scheme (MRSSE). By leveraging OXT [7] in a novel way, MRSSE can support multi-keyword ranked search as well as conjunctive and limited range queries. Let us consider the database $\mathsf {DB}$ consists of $\mathscr {D}$ documents where each keyword-document pair has a score which shows their relevance. MRSSE first encrypts the scored database using $\mathsf {SEDB}$ Setup algorithm. In order to perform a ranked search over the encrypted database, the Search protocol must be run between the client and the server. Then, the server performs Filter algorithm to narrow down the results by comparing them with the given threshold in order to return the most relevant documents to the client. Afterwards, the client can sort the received results using Sort algorithm. Finally, the client retrieves the top-$\mathscr {K}$ most relevant documents using Retrieve algorithm.

Our construction given in Algorithm 2 consists of the following algorithms.

$\mathsf {SEDB}$ Setup$(\lambda , \mathsf {DB})$: This algorithm is similar to the one in OXT [7] except that here the scores of keywords are also encrypted and inserted to XSet (the differences are highlighted in red). The score of each keyword-document pairs is computed using the scoring approach introduced in Sect. 2.
Search: This protocol consists of two algorithms:
- Client Search(K, $q(\bar{w}=(w_1,...,w_L))$): This algorithm inputs the PRF’s keys K and the search keywords $(w_1,...,w_L)$ then generates the search token and outputs it alongside of the chosen threshold.
- Server Search($Tok_\mathbf{q}, \mathsf {SEDB}$): This algorithm inputs the search token $Tok_\mathbf{q}$ and the scored database $\mathsf {SEDB}$. Next, finds the match for the least frequent keyword in $\mathsf {TSet}$ and then tests the membership of corresponding document Ids in $\mathsf {XSet}$ for the other searched keywords. Moreover, in parallel this algorithm computes the aggregation function over the collected scores homomorphically. It is worth to note that when we perform ranking, we might not want to ignore the documents that do not contain sterm with the intersection of all of xterms. That is, there might be some documents which contains a few of xterms with high scores. Thus, if we want to return only documents matching all queried keyword, we keep the $\mathsf {XSet}$ as it is in the Server Search($Tok_\mathbf{q}, \mathsf {SEDB}$) algorithm; otherwise we can simply remove it.
Filter(S, t): Given the score set S and the threshold t, this algorithm compares the overall scores (output of the monotone aggregation function in Server Search($Tok_\mathbf{q}, \mathsf {SEDB}$)) with the threshold value. It returns the scores which are equal or greater than the considered threshold and the corresponding ciphertext^{Footnote 4}.
Sort$(C_f, C'_f)$: This algorithm inputs two list with corresponding elements (score/ciphertext). First, decrypts the score list and then uses any well-known sorting algorithm to sort the received list of candidates. Finally, it outputs the ordered set of the score-ciphertext pairs.
Retrieve$(C_s,\mathscr {K})$: To retrieve the top-$\mathscr {K}$ documents, this algorithm first computes the decryption key and then decrypts the first $\mathscr {K}$ documents from the ordered set of the score-ciphertext pairs $C_s$.

For the sake of readability, in this section, the homomorphic computations are not considered in Search protocol and Filter(S, t) algorithm. That is, we show only operations performed on plaintext values. Section 3 presented the actual algorithms with the required homomorphic computations on ciphertexts.

4.1 Modes of Operation

To minimize the communication overhead, we define two modes for our scheme: trivial and filtered. The former refers to the condition where the number of results in Search is smaller than the breaking point^{Footnote 5}. In this mode, both Filter(S, t) and Sort$(C_f, C'_f)$ algorithms are not required. Otherwise, the server performs in the filtered mode by running the Filter(S, t) algorithm to narrow down the results to the most relevant ones. This leads to the lower communication overhead and network traffic. These two modes are illustrated in Fig. 2 where we discus the communication overhead of our scheme. That is, the communication overhead in trivial mode (black line) is linear to the number of matching result whereas it is constant in the filtered mode (red line). Note that the communication cost in the filtered mode is independent of number of matching results.

5 Evaluation

In this section we discuss the cost evaluation of MRSSE from computation and communication complexity viewpoints (refer to the full version for the security analysis).

5.1 Computation Complexity

The costs that our design added on the top of OXT to support ranked queries are related to the homomorphic computations in Server Search($Tok_\mathbf{q}, \mathsf {SEDB}$) and Filter(S, t) algorithm. In Server Search($Tok_\mathbf{q}, \mathsf {SEDB}$), we need to compute the score aggregation function using the integer addition operation ($\dotplus $) in a loop that costs $L\times \dotplus $. The evaluation of this circuit in the loop has multiplicative depth of $logL(log(\ell -2)+1)$. Filter(S, t) algorithm requires more homomorphic computations which are performed over $(\mathbb {Z}_N, (+,.))$ as follows:

$n \times $ for the loop in line 9 of Algorithm 1,
$2\times n \times \boxdot $ for the loop in line 11 of Algorithm 1,
$ 2\times n \times N \times (\ell \boxplus + f_j(\hat{P})+(\ell +1) \boxdot )$ for line 18 and line 19 of Algorithm 1,

where the computation cost of $f_j(\hat{P})$ is negligible. Finally, the overall multiplicative depth of Filter(S, t) would be $Depth(Filter)=2+(log \ell +1)$. Therefore, the overall multiplicative depth would be:

$Depth(Search) + Depth(Filter) = logL(log(\ell -2)+1) +(2+(log \ell +1))$

5.2 Communication Complexity

In order to determine the communication complexity, we should set the parameters in such way that the following conditions hold. We denote as $\Vert .\Vert _\infty $ standard norm for scalars and vectors.

Aggregation noise growth. Let $D_{Agg}$ be the depth of the aggregation function in $Search_{mr}$ algorithm. The noise growth of this function is at most $(\eta d+1)^{\frac{1}{2}D_{Agg}}.\sigma (CScore)$ where $\sigma (CScore)=\sqrt{m.d} \sigma (\varvec{e})$ is the noise for the fresh ciphertext from GSW encryption. By assuming $\sigma (\varvec{e})=2$, the noise growth of aggregation function is at most $(\eta d+1)^{\frac{1}{2}D_{Agg}}.2\sqrt{m.d}$.
$\mathsf {NAND}$ gate homomorphic noise growth. We use $\mathsf {NAND}$ operation to restrict the message space to $\{ 0,1 \}$ in order to avoid blowup of error. The noise of $\mathsf {NAND}$ of ciphertexts $C_1,C_2$ for the message $\mu \in \{ 0,1 \}$ is:
$$\begin{aligned} \begin{array}{c} |noise(\mathsf {NAND}(C_1,C_2))|=\mu . noise (C_1)+\Vert C_1. noise (C_2)\Vert _\infty \\ |noise(\mathsf {NAND}(C_1,C_2))|\le \Vert noise (C_1)\Vert _\infty +\Vert C_1. noise (C_2)\Vert _\infty \\ |noise(\mathsf {NAND}(C_1,C_2))|\le (\eta .d+1)\times max(\Vert noise (C_1)\Vert _\infty , \Vert noise (C_2)\Vert _\infty )\\ \end{array} \end{aligned}$$
We analyze the standard deviation growth using independence heuristic assumption (that indicates the coordinates of noise vector are independent of Gaussian samples), for each coordinates of noise (similar to the assumption in [9]):

$ Var (noise (\mathsf {NAND}(C_1,C_2))\le (\eta .d+1)\times max( Var (noise(C_1)) Var (noise(C_2))$ $\sigma =sd(noise (\mathsf {NAND}(C_1,C_2))\le \sqrt{(\eta .d+1)}\times max(sd(noise(C_1)),sd(noise(C_2))$ $\sigma _{\mathsf {NAND}}(out)\le \sqrt{(\eta .d+1)}\sigma _{\mathsf {NAND}}(in)$.

Therefore, for the depth $D-D'$, the noise growth is $\sigma _{\mathsf {NAND}}(out)\le {(\eta .d+1)}^{\frac{D-D'}{2}}\sigma _{\mathsf {NAND}}(in)$.
Homomorphic noise growth for conditional increment. The conditional increment of the encrypted pointer $\hat{P}$ can be implemented using $\mathsf {Mult}$ operation: $\hat{P}_{i+1}=\hat{P}_i\hat{z}'_i=\mathsf {Mult}(\hat{P}_i,\hat{z}'_i)=\mathsf {Flatten}(\hat{P}_i\hat{z}'_i)$. Here, $\hat{z}'_i=\mathsf {Flatten}((x-1).I.\hat{z}_i+I)$, which has the noise $\Vert noise(\hat{z}'_i)\Vert =noise(\hat{z}_i).\Vert (x-1)\Vert $

$sd(\hat{z}'_i)=\Vert (x-1)\Vert .sd(\hat{z}_i)$

Therefore, the conditional increment has the noise growth as follows:

$noise(\hat{P}_{i+1})=\hat{z}'_i.noise(\hat{P}_i)+\hat{P}_i.noise(\hat{z}'_i)$

$ Var (\hat{P}_{i+1}) \le \Vert \hat{z}'_i\Vert ^2. Var (\hat{P}_i)+\eta d Var (\hat{z}'_i)$

Here, $\Vert \hat{z}'_i\Vert =1$ because $\hat{z}'_i \in \{ x^0,x'\} $, and $ Var (\hat{z}'_i) \le 2\times Var (\hat{z}_i)$. Therefore,

$ Var (\hat{P}_{i+1}) \le Var (\hat{P}_{i})+2\eta d Var (\hat{z}_i)$

$ \forall i=0,..., n-1$

$ Var (\hat{P}_{i}) \le Var (\hat{P}_{n})\le Var (\hat{P}_{0})+(n-1)\times 2Nd(max Var (\hat{z}_i))$

$ \forall i $ $sd (\hat{P}_{i}) \le \sqrt{2 n\eta d}\times max~sd(\hat{z}_i)$
Overall noise growth condition. We denote the depth of homomorphic operation by . The overall noise noise growth condition for Ring-GSW is where $\sigma _{in}$ indicates the input noise to the Filter(S, t) algorithm. This noise is generated by the aggregation function in Server Search($Tok_\mathbf{q}, \mathsf {SEDB}$). Note that here, , $\sqrt{2n\eta d}$, and $\sqrt{\eta d+1}$ correspond to lines 9, 10, and 11 of Algorithm 1, respectively. More accurately, the noise for $\hat{P}_{i}$ is at most whereas, the noise for $\hat{y}_{i}$ is at most . When we perform $f_j(\hat{P})$, $\hat{P}$ will be converted to LWE. Hence, we switch from modulus q to $q'$. Therefore, for the last multiplication in Filter(S, t) algorithm $\sigma _{f_j(\hat{P})\boxdot \hat{y}_i} ^{(LWE),q'}$ is at most $\sigma _{f_j(\hat{P})} ^ {(LWE),q}$ if the condition: $2 \times (\sigma _{\hat{P}}^{(LWE),q})\gamma _{\mathbb {Z}} \le \varDelta $ is satisfied. Here, $\varDelta =\frac{q}{q'}$ is the ratio between q and $q'$. The noise in each $\hat{c}_j$ (with modulus $q'$) would be at most $\sqrt{n} \times \sigma _{f_j(\hat{P})} ^ {(LWE),q'}$. That is, each $\hat{c}_j$ is computed as a sum of n intermediate ciphertexts thus, the standard deviation gets multiplied by $\sqrt{n}$.

The BGV-type LWE multiplication consists of three steps; $\mathsf {Mult}$, $\mathsf {Scale}$, and $\mathsf {SwitchKey}$ [5]. After the first step, the noise has length at most $\gamma _{\mathbb {Z}} B^2$ (here B is a bound on the noise length). Then, we apply the $\mathsf {Scale}$ function (as defined in Lemma 4 in [5]), after which the noise length is at most $(q'/q)\gamma _{\mathbb {Z}} B^2+\eta _{Scale}$ where $\eta _{Scale} \le (\tau /2)\sqrt{d}\gamma _{\mathbb {Z}}h$. After the last step (the $\mathsf {SwitchKey}$ is defined in Lemma 9 in [5]), the noise become at most $(q'/q)\gamma _{\mathbb {Z}} B^2+\eta _{Scale}+\eta _{SwitchKey}$ where $\eta _{SwitchKey} \le 2\gamma _{\mathbb {Z}}{\ d+1 \atopwithdelims ()\ 2} {(log q')}^2$. At the end, we want the noise to be smaller than $\frac{1}{2}(q'/\tau )$ to decrypt correctly. Note that, in BGV-type LWE, the actual bound on the noise length is used, whereas, in our case, we use the standard deviation but the relation is similar. In fact, the last multiplication has the standard deviation of the noise of input $\sigma _{\hat{P}_{i}}$ and $\sigma _{\hat{y}_{i}}$. We are assuming these are heuristically independent of Gaussian noises. Therefore, if the ratio between the threshold $q'/\tau $ and the standard deviation is more than $\sqrt{2ln(2/\epsilon )}\le \frac{q'}{2\tau }$ , then the probability, that the noise crosses this threshold, is less than $\epsilon $.
Decryption correctness condition. For the correctness of decryption, at the end the condition: $\frac{\sigma _{\hat{P}_{i}}.\sigma _{\hat{y}_{i}}}{\varDelta }+\eta _{Scale}+\eta _{SwitchKey} \le \frac{q'}{\tau \sqrt{2ln(2/\epsilon )}}$ must hold. Here, $\epsilon =2^{-20}$. For the security, the hardness of RLWE $2^\lambda $ security for GSW keys must be hold. Thus, we need to show that the RLWE with noise $\sigma (\varvec{e})=2$, dimension d and modulus q is hard. The condition $\alpha =2/q$ ensures this assumption.

Parameter Setting: We can determine the parameters based on the above mentioned conditions and the security of RLWE. To determine d, we need to choose it based on the complexity of best lattice attack against underlying Ring-LWE problem in dimension d with modulus q and noise $\sigma $. Once we determined $\alpha $, q and d, we can find the security against lattice attack, $T_{attack}$. That is, we set the security parameter $\lambda $ so that $T_{attack} \ge 2^\lambda $. Table 4 provides a few examples of parameter setting on the security parameter $\lambda =80$. We used an attack estimator for the Learning with Errors Problem [2, 20]. The output of Filter(S, t) algorithm are LWE ciphertexts encrypted with the modulus, $q'$. For kth bit of each score $s_i$, its LWE ciphertext is in the form of $(a_{i,k}, b_{i,k}=a_{i,k}.\varvec{s}+\varvec{e}+\frac{q'}{2^i}.s_{i,k})$. Each ciphertext is $(d+1) \times logq'$ bits, for 7 bits score and 64 bits document identifier, we would have $71\times N\times (d+1) \times logq'$ bits of N filtered results to be delivered to the client. This can be reduced by packing each $\log \tau $ bits ciphertext into one ciphertext by homomorphic addition of the ciphertexts.

This results the following overall communication cost:

$$Communication ~cost= \frac{71N}{\log \tau }\times (d+1) \times (logq')$$

Although the length of ciphertext in symmetric cipher of OXT is relatively small, the overall communication overhead is directly related to the number of matching results. This is evident in the case of cloud storage where the result space is potentially huge. The following example clarifies the significance of this issue. Let us assume that the scores are 7 bits and document Ids are 64 bits, which results in about 200 bits of ciphertext in OXT assuming AES CBC mode is used. Thus, the communication cost would be $CC_{OXT}=200n$ (where n is the number of matching results for all queried keywords). Let us assume the total number of matching results $n=10^6$, then the communication cost of OXT would be 25MB. However, this can be reduced to just about 6.5MB by performing our scheme in the filtered mode. More precisely, the server can compute the breaking point $n_{b}=\frac{\frac{71N}{\log \tau }\times (d+1) \times (logq')}{200}$ which in this case is about 263055 matching results (for $N=2^6$, $d=4750$, $\log \tau =32 $), and then compares it with the total results $n=10^6$. Since the breaking point is smaller than n, the server runs filtered mode which has the communication cost of $CC_{our}=\frac{71N}{\log \tau }\times (d+1) \times (logq')=6.55$MB. Table 3 shows the communication cost improvement when the filtration approach is used versus the trivial mode that returns all of the matching documents. It is apparent from this table that the proposed approach has a significant impact on the communication cost.

Table 3. Communication cost improvement

Full size table

Table 4. Parameter settings

Full size table

Figure 2 illustrates the overall communication cost of our scheme. Note that the trivial mode introduced in Sect. 4.1 acts the same way as OXT. It is apparent from this figure that our scheme (the green line) reduces the communication cost significantly by filtering the results to the most relevant ones.

6 Conclusion

We have presented a generic solution for efficient multi-keyword ranked searchable symmetric encryption. The proposed threshold-based filtering solution enables the honest-but-curious server to refine the encrypted search results and returns only the most relevant ones to the user. The proposed scheme supports multi-keyword ranked search as well as Boolean and limited range queries. Our scheme resists all attacks associated with OPE leakage. In comparison with the conventional searchable symmetric encryption schemes, our solution decreases the communication overhead between the client and server significantly without adding any additional leakage to the server.

Notes

1.
Auxiliary information are publicly-available information such as application details, public statistics, and prior versions of the database (possibly achieved by a prior data breach).
2.
For detailed explanations of OXT refer to the appendix.
3.
($f_j(.)$) in line 18 and line 19 of Algorithm 1 where j is the counter for candidate list and P is the pointer of the output buffer.
4.
We are assuming the size of output buffer (here, N) is big enough to contain all of the results which has equal/higher score than the considered threshold. If not, our protocol returns last N results higher than the threshold since the pointer P is incremented modulo N.
5.
breaking point is the point that the number of filtered results is the same as unfiltered results- refer to Sect. 5.2.
6.
In single-writer/single-reader setting like OXT and our scheme, data owner and the client/user are the same entity.

References

Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD, pp. 563–574. ACM (2004)
Google Scholar
Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. 9(3), 169–203 (2015)
Article MathSciNet Google Scholar
Baldimtsi, F., Ohrimenko, O.: Sorting and searching behind the curtain. In: Böhme, R., Okamoto, T. (eds.) FC 2015. LNCS, vol. 8975, pp. 127–146. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47854-7_8
Chapter Google Scholar
Boneh, D., Mazieres, D., Popa, R.A.: Remote oblivious storage: making oblivious ram practical (2011)
Google Scholar
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theor. (TOCT) 6(3), 13 (2014)
MathSciNet MATH Google Scholar
Cao, N., Wang, C., Li, M., Ren, K., Lou, W.: Privacy-preserving multi-keyword ranked search over encrypted cloud data. In: INFOCOM, 2011 Proceedings IEEE, pp. 829–837. IEEE (2011)
Google Scholar
Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Roşu, M.-C., Steiner, M.: Highly-scalable searchable symmetric encryption with support for Boolean queries. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 353–373. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_20
Chapter Google Scholar
Cheon, J.H., Kim, M., Kim, M.: Optimized search-and-compute circuits and their application to query evaluation on encrypted data. IEEE Trans. Inf. Forensics Secur. 11(1), 188–199 (2016)
Article Google Scholar
Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: TFHE: fast fully homomorphic encryption over the torus. J. Cryptol., 1–58 (2018)
Google Scholar
Damgård, I., Meldgaard, S., Nielsen, J.B.: Perfectly secure oblivious RAM without random oracles. In: Ishai, Y. (ed.) TCC 2011. LNCS, vol. 6597, pp. 144–163. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19571-6_10
Chapter Google Scholar
Durak, F.B., DuBuisson, T.M., Cash, D.: What else is revealed by order-revealing encryption? In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1155–1166. ACM (2016)
Google Scholar
Fagin, R.: Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)
Article MathSciNet Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet Google Scholar
Gentry, C.: A fully homomorphic encryption scheme. Stanford University (2009)
Google Scholar
Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_5
Chapter Google Scholar
Goldreich, O., Ostrovsky, R.: Software protection and simulation on oblivious RAMs. J. ACM (JACM) 43(3), 431–473 (1996)
Article MathSciNet Google Scholar
Goodrich, M.T., Mitzenmacher, M.: Privacy-preserving access of outsourced data via oblivious RAM simulation. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011. LNCS, vol. 6756, pp. 576–587. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22012-8_46
Chapter Google Scholar
Jiang, X., Yu, J., Yan, J., Hao, R.: Enabling efficient and verifiable multi-keyword ranked search over encrypted cloud data. Inf. Sci. 403, 22–41 (2017)
Article Google Scholar
Li, K., Zhang, W., Yang, C., Yu, N.: Security analysis on one-to-many order preserving encryption-based cloud data search. IEEE Trans. Inf. Forensics Secur. 10(9), 1918–1926 (2015)
Article Google Scholar
Albrecht, M.R., Player, R., Scott, S.: Security estimates for the learning with errors problem (2015). https://bitbucket.org/malb/lwe-estimator
Meng, X., Zhu, H., Kollios, G.: Top-k query processing on encrypted databases with strong security guarantees. arXiv preprint arXiv:1510.05175 (2015)
Naveed, M., Kamara, S., Wright, C.V.: Inference attacks on property-preserving encrypted databases. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 644–655. ACM (2015)
Google Scholar
Shen, Y., Zhang, P.: Ranked searchable symmetric encryption supporting conjunctive queries. In: Liu, J.K., Samarati, P. (eds.) ISPEC 2017. LNCS, vol. 10701, pp. 350–360. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72359-4_20
Chapter Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving top-k queries. In: 21st International Conference on Data Engineering, ICDE 2005, Proceedings, pp. 545–546. IEEE (2005)
Google Scholar
Wang, B., Li, M., Wang, H.: Geometric range search on encrypted spatial data. IEEE Trans. Inf. Forensics Secur. 11(4), 704–719 (2016)
Google Scholar
Wang, C., Cao, N., Ren, K., Lou, W.: Enabling secure and efficient ranked keyword search over outsourced cloud data. IEEE Trans. Parallel Distrib. Syst. 23(8), 1467–1479 (2012)
Article Google Scholar
Williams, P., Sion, R.: Single round access privacy on outsourced storage. In: Proceedings of the 2012 ACM conference on Computer and Communications Security, pp. 293–304. ACM (2012)
Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, Burlington (1999)
MATH Google Scholar

Download references

Acknowledgement

The work of Ron Steinfeld and Joseph K. Liu were also supported in part by ARC Discovery Project grant DP180102199.

Author information

Authors and Affiliations

Monash University, Melbourne, VIC, 3800, Australia
Shabnam Kasra Kermanshahi, Joseph K. Liu & Ron Steinfeld
CSIRO Data 61, Melbourne, VIC, 3008, Australia
Shabnam Kasra Kermanshahi & Surya Nepal

Authors

Shabnam Kasra Kermanshahi
View author publications
You can also search for this author in PubMed Google Scholar
Joseph K. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ron Steinfeld
View author publications
You can also search for this author in PubMed Google Scholar
Surya Nepal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shabnam Kasra Kermanshahi , Joseph K. Liu , Ron Steinfeld or Surya Nepal .

Editor information

Editors and Affiliations

NEC Corporation, Kawasaki, Japan
Kazue Sako
University of Surrey, Guildford, UK
Steve Schneider
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Peter Y. A. Ryan

Appendices

A Review of OXT

As discussed earlier, our scheme MRSSE leverages the protocol by Cash et al. [7] called Oblivious Cross Tags (OXT) to demonstrate its applicability in Searchable Symmetric Encryption (SSE). In this section, we provide a brief review of OXT to explain how it works. OXT is the first SSE scheme that goes beyond a single-keyword search. This scalable scheme supports boolean queries over the encrypted database in sublinear time. OXT consists of an algorithm EDBSetup and a protocol Search as follows.

EDBSetup($\lambda ,\mathsf {DB}$): Given a security parameter $\lambda $ and a $\mathsf {DB}=(id_{i}, \text {W}_{i})_{i=1}^{\mathscr {D}}$, this algorithm generates the encrypted database $\mathsf {EDB}$ which is given to the server and a secret key for the user^{Footnote 6}.

Note that $\mathsf {EDB}$ consists of two data structures $\mathsf {TSet}$ and $\mathsf {XSet}$. The former allows one to associate a list of fixed-sized data tuples with each keyword in the database, and later issues the keyword-related tokens to retrieve these lists [7]. The latter contains elements computed from each keyword-document pair, called Xtag.

The protocol Search running between the user and server consists of following algorithms;

TokenGeneration($(q(\bar{w})=(w_1,\ldots ,w_n),~\mathsf {EDB})$): If a user wants to make a query $q(\bar{w})$ over $\mathsf {EDB}$, the search tokens are required. This algorithm generates the search tokens $Tok_\mathbf{q}$ based on the given query.

Search($Tok_\mathbf{q}, \mathsf {EDB}$): The algorithm gets the inputs the search token $Tok_\mathbf{q}=(\mathsf {stag}, $ $\mathsf {xtoken}[1],$ $ \mathsf {xtoken}[2], \cdots )$ and outputs the encrypted search result(s) ERes.

DecResult (ERes, K): This algorithm takes the encrypted search result ERes and the utilized secret key as inputs and outputs the corresponding document identifier(s) id(s).

B Cryptographic Primitives

Pseudorandom Function (PRF): $F:\{0,1\}^\lambda \times X \rightarrow Y$ is a pseudorandom function where X and Y are sets and for all efficient adversaries A, $Adv^{prf}_{F,A} (\lambda )$ is negligible [7];

$$\begin{aligned} Adv^{prf}_{F,A} (\lambda )=Pr[A^{F(K,.)}(1^\lambda )=1]-Pr[A^{f(.)}(1^k)=1] \end{aligned}$$

Here, the probability is over the randomness of A, $K \xleftarrow {\$}\{0,1\}^\lambda $, and $f\xleftarrow {\$}Fun(X,Y)$.

Homomorphic Encryption from Ring Learning with Errors (RLWE): Ring-LWE encryption scheme is associated with a number of parameters [8]:

$\lambda $: Security parameter
$R=\mathbb {Z}[X]/(x^d+1)$: polynomial ring of degree d
: ring mod q for an integer q (ciphertexts are pairs of $R_q$ elements)
: message space for $\tau =2$, $R_\tau $ can be represented by polynomials $p(x)=\sum \limits _{i<mu}p_ix^i$ for $p_i \in \mathbb {Z}_\tau $.
$\chi $: a distribution of polynomials over $R_q$ with ‘small’ coefficients (with standard deviation $\sigma $).

Ring-LWE encryption can be described by the following algorithms:

KeyGen(): This algorithm samples $t \leftarrow \chi $, $e \leftarrow \chi $ and defines the secret key $sk=\varvec{s} \leftarrow (1,-t)$ and computes the public key $pk=(a,b)$. Here, $a\xleftarrow {\$}R_q$ and $b=a t+ e$.

Enc():Given the public key and a message $m \in R_\tau $, the encryption algorithm chooses a small polynomial $v \leftarrow \chi $ and two polynomials $e_0$ and $e_1$ and computes the ciphertext $\varvec{c}=(c_0,c_1)$ where $(c_0,c_1)=(m.q/\tau ,0)+(bv+ e_0,av+ e_1)$ (note that $c_0=a't+(e_0+ ev) +m.q/\tau $ and $c_1=a'+e_1$, where $a'=av$).

Dec(): Given a ciphertext $\varvec{c}=(c_0,c_1)$, this algorithm outputs $m'= \varvec{c} \cdot \varvec{s}$ and round $m'$ coefficient to nearest multiple of $q / \tau $.

Eval(): Given two ciphertexts, this algorithm outputs the ciphertext obtained through the considered operation over the given ciphertexts (for detailed operations refer to Sect. 3).

Homomorphic Encryption from Learning with Errors (LWE). We also use LWE-based Regev’s encryption scheme rather than Ring-LWE, denoted by $Enc_{LWE}$. In Regev’s encryption, the public key consists of a matrix $\mathbb {A}\xleftarrow {\$}\mathbb {Z}_q ^{n \times m}$ and an LWE sample $\varvec{a}\in \mathbb {Z}_q ^{m}$ (q, m, and n are integers). Let $\varvec{a}=\mathbb {A}^T\varvec{s}+\varvec{e}$ where $\varvec{s}\xleftarrow {\$}\mathbb {Z}_q ^n$ is the secret and $\varvec{e} \in \mathbb {Z}^m$ is the error. For the encryption, one chooses uniformly random vectors $\varvec{y} \in \{0,1\}^m$ and computes $c=\mathbb {A}\varvec{y} \mod q$, $c'=<\varvec{a},\varvec{y}>+m.\lfloor q/2 \rfloor \mod q$. For the decryption, $c'-\varvec{s}^Tc \mod q$ must be computed to remove the common part and recover the message by rounding the “error” to nearest multiple of $\lfloor q/2 \rfloor $.

Homomorphic Encryption from Ring-GSW: Let $\mathbf {G}={[\mathbf {I},2\mathbf {I},4\mathbf {I}...2^{l-1}\mathbf {I}]}^t \in R_q ^{2l \times 2}$ be the gadget matrix. Homomorphic Encryption from Ring-GSW can be described by the following algorithms:

KeyGen(): This algorithm samples $\varvec{t} \leftarrow \mathbb {Z}_q$ and defines the secret key $sk=\varvec{s} \leftarrow (1,-t)$ and set $\varvec{v}=\mathbf {G} \varvec{s}$. To define public key $pk=A$, this algorithm first generates a $m\times 1$ matrix $B \xleftarrow {\$}R_q ^{m \times i}$ (where $R_q=\mathbb {Z}_q(x)/(x^d+1)$) and a vector $\varvec{e} \leftarrow \chi ^m$; it sets $\varvec{b}=B.\varvec{t}+\varvec{e}$, and A to be the 2-column matrix consisting of $\varvec{b}$ followed by the n columns of B ($A.\varvec{s}=\varvec{e}$).

Enc(): To encrypt the message $\mu \in \{0,1\}$, this algorithm computes a ciphertext in the form of $C=\mathsf {Flatten(C')}$ where $C'=\mu . I_\eta +\mathsf {BitDecomp}(RA)$. Here, $\eta =2\times l$ and $\mathsf {BitDecomp}(a)=(a_0,...,a_{l-1}) \in R_q ^l$ where each $a_i$ is an element of R that when represented as a polynomial of degree $d-1$ has coefficients that are all in $\{ 0, 1 \}$.

Let $C''=\mathsf {BitDecomp}^{-1}(C')$, thus $C = \mathsf {Flatten}(C') = \mathsf {BitDecomp}(C'')$. $\mathsf {Flatten}$ ensures that the coefficients of C are small; therefore C has the proper form of a ciphertext that permits our homomorphic operations [15].

Dec(): This algorithm computes $C.\varvec{v}=\mu . \varvec{v} + \mathsf {BitDecomp}(C'').\varvec{v}=\mu .\varvec{v}+C''.\varvec{s}=\mu .\varvec{v}+e'$ where $e'$ is a small noise of C.

The other utilised homomorphic operations are $\mathsf {NAND}$ and $\mathsf {Mult}$ as follows.

$$\begin{aligned} \mathsf {NAND}(C_1,C_2)=\mathsf {Flatten}(I_N-C_1.C_2) \end{aligned}$$

$$\begin{aligned} \mathsf {Mult}(C_1,C_2)=\mathsf {Flatten}(C_1.C_2) \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kasra Kermanshahi, S., Liu, J.K., Steinfeld, R., Nepal, S. (2019). Generic Multi-keyword Ranked Search on Encrypted Cloud Data. In: Sako, K., Schneider, S., Ryan, P. (eds) Computer Security – ESORICS 2019. ESORICS 2019. Lecture Notes in Computer Science(), vol 11736. Springer, Cham. https://doi.org/10.1007/978-3-030-29962-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-29962-0_16
Published: 15 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29961-3
Online ISBN: 978-3-030-29962-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generic Multi-keyword Ranked Search on Encrypted Cloud Data

Abstract

Similar content being viewed by others

Multi-keyword Ranked Search with Privacy Protection on Encrypted Cloud Data

Ranked Searchable Symmetric Encryption Supporting Conjunctive Queries

Inverted Index Based Ranked Keyword Search in Multi-user Searchable Encryption

Keywords

1 Introduction

2 Preliminaries