Keywords

1 Introduction

The problem of securely outsourcing computation to semi-trusted or malicious servers has been increasingly becoming significant with the fast development of wireless communication where the resource-constrained mobile devices cannot afford the computationally intensive tasks [1]. In outsourced computing, clients can take advantage of abundant computational resources in the “pay-per-use” manner instead of depending on their own infrastructure. While it has brought about considerable benefit to efficiency enhancement, the security and privacy issues have significantly impeded its wide adoption. It is well known that the cloud is generally assumed to work under the semi-honest or malicious environment: the former of which refers to the cloud would faithfully execute the protocol but try to extract the secret information from the interactions between the clients and itself; while the latter means the cloud can perform arbitrarily to destroy the protocol run. Therefore, a series of cryptographic techniques have been devised to address the issue of secure outsourced computation [2, 3].

Outsourced pattern matching, as a kind of valuable outsourced computation, generally refers to delegating the task of finding all the positions where the receiver’s pattern P of size m matches the sender’s text T of size n to the cloud and it is required to run in the encrypted domain to guarantee user privacy. Therefore, in addition to the traditional security requirements such as data confidentiality and authentication, there exist a series of unique security and privacy requirements for outsourced pattern matching: (1) pattern privacy means the pattern queried by the receiver should be well protected against the collusion between the cloud and the malicious sender; (2) text privacy denotes that the text held by the sender should be well protected against the collusion between the cloud and the malicious receiver. Consider the e-healthcare cloud computing systems, secure outsourced genome fragment matching allows to judge the probability of one person to suffer from specific diseases and take effective preventive measures to avoid its deterioration without exposing both the template (specific genome fragment inferring the disease) and the text (the person’s genome sequence). The physicians can also seek the best clinic decisions, by outsourced searching other patients with the same genome fragments associated to the specific disease and analyzing the effect of their prescriptions. In searchable encryption, secure outsourced pattern matching enables to query the encrypted files embracing substring of particular keywords without revealing both the substring (pattern privacy) and the keyword (text privacy).

Vergnaud proposed a secure generalized pattern matching protocol by exploiting ElGamal’s homomorphic encryption [15]. However, the high computational complexity cannot be tolerated by resource-constrained mobile devices and directly performing ElGamal’s public key homomorphic encryption [34] on data deviates the principle of hybrid encryption that public key encryption is adopted to encrypt short symmetric data encryption key which is further used to encrypt the data themselves. The technique of fully homomorphic encryption (FHE) can be straightforwardly applied to solve secure outsourced computation [520, 2633, 35, 36]. Unfortunately, Lauter et al. pointed that regardless of great effort on designing lightweight FHE, its high complexity can still not well adapt to computationally-weak mobile devices [4]. More seriously, the fact that public key FHE is required to be performed on each character of both text and pattern (i.e. the complexity is \(O(n+m)\)) even made the outsourced pattern matching impractical owing to the intolerably high computational overhead on resource-constrained users’ ends. Recently, Faust et al. proposed an outsourced pattern matching protocol simulation-based secure in the presence of semi-honest and malicious environment under the random oracle model [5]. It was achieved not by exploiting the traditional technique of FHE, but based on an instance of subset sum problem that is solvable in polynomial time. Unfortunately, this protocol was constructed under the assumption that private channels were well established since it cannot resist the collusion attack against pattern privacy launched between the cloud knowing the exact matching positions and the sender who holds the text and the size of the pattern.

Based on the observations presented above, we briefly sketch the key motivations and solutions of our proposed more efficient privacy-preserving outsourced pattern matching protocol PPOPM, to save the computational cost from the following three aspects: the first is to replace public key FHE by a newly-devised lightweight integration of any one-way trapdoor permutation \(f(\cdot )\) to encode a private key and a keyed symmetric fully homomorphic mapping on data \(m_i\) in \(\mathbb {Z}_{N}(N=pq)\) as \(h_{fhom,U}(m_{i})=q^{-1}qm_{i,p}^{p}+p^{-1}pm_{i,q}^{q}\), where \(m_{i}\in \mathbb {Z}_{N}, m_{i,l}\equiv m_{i}\ mod\ l\ (l\in \{p,q\})\) and pq are big primes of 512-bits. This design also satisfies the principle of hybrid encryption. Another critical intuition for efficiency enhancement is that although the complexity of public key encryption (i.e. we mean any public key encryption based on specific one-way trapdoor permutation used in our construction) can be hardly reduced, we can still reduce the computational cost of secure outsourced pattern matching in a whole, by reducing the usage times of the associated public key encryption to only once in the optimized case. Last but not least, the secret key \(sk_f\) and the randomnesses rr” privately kept by the authorized receivers permit they are the only entities who can successfully decipher and deblind the authentic matching positions, and the pattern privacy would be well protected. In this paper, we proposed a more efficient privacy-preserving outsourced pattern matching protocol PPOPM without exploiting the technique of public key (fully) homomorphic encryption. The main contributions are described as follows.

Firstly, without adopting public key (fully) homomorphic encryption, a secure outsourced discrete fourier transform OFFT is proposed by exploiting any one-way trapdoor permutation only once, where both the input privacy and coefficient privacy are well protected. It is the building block of a new secure outsourced polynomial multiplication protocol OPMUL which is further used to design our final construction PPOPM.

Then, a more efficient privacy preserving outsourced pattern matching protocol PPOPM is designed through polynomial multiplication OPMUL realized by our newly-devised OFFT. Pattern privacy is guaranteed against the collusion attack launched between the sender and the cloud; while text privacy is well protected against the collusion attack between the receiver and the cloud.

Finally, the proposed PPOPM is formally proved secure under the universal composable (UC) model in the presence of semi-honest environment in the random oracle model. The time complexity on the cloud and the resource-constrained user’s (both sender and receiver) ends are respectively O(nlogm) and O(1). Extensive performance evaluations demonstrate the efficiency advantage of our proposed PPOPM over the state-of-the-art in both computational and communication cost.

The remainder of this paper is organized as follows. The problem statement and the formal security model are presented in Sect. 2. A secure outsourced discrete fourier transform OFFT and more efficient privacy preserving outsourced pattern matching PPOPM are respectively proposed in Sects. 3 and 4. Section 5 gives the formal security proof, followed by the performance evaluation in Sect. 6. Finally, we conclude our paper in Sect. 7.

2 Problem Statement and Security Model

2.1 Problem Statement

The problem of pattern matching sets the goal to find all the positions where the pattern P of size m matches the text T of size n. Both the text T and pattern P are sequences of characters in a finite character set U. It can be further categorized into two classes, namely pattern matching with perfect precision and pattern matching with wildcards. In the first case, the problem is to find all occurrences of a pattern \(P\in U^{m}\) in a text \(T\in U^{n}\); while in the latter case, it is aimed to find all occurrences of a pattern \(P\in (U\bigcup \{*\})^{m}\) in a text \(T\in (U\bigcup \{*\})^{n}\) where a wildcard character \(*\notin U\) can match any character in U.

Fig. 1.
figure 1

Network architecture of outsourced pattern matching

Outsourced pattern matching problem is delegated to the cloud since both the sender holding text T and the receiver holding pattern P are assumed to be resource-constrained in wireless mobile communications such as E-health systems, VANETs and cannot afford this computationally-intensive task of pattern matching. To protect both the text and pattern privacy, the cloud is required to perform the pattern matching operation in the encrypted domain. Figure 1 demonstrates the network architecture of privacy preserving outsourced pattern matching. It is observed that there exist three entities, namely the sender, the cloud and the receiver. The process of outsourced pattern matching can be described as following steps: In step (1), the sender (i.e. we use sender and owner exchangeably in the following) partitions, encrypts and outsources the preprocessed text T to the cloud; in steps (2) and (3), the receiver resorts to the sender for a query token TOK permitting him to launch a pattern matching operation w.r.t. text T; in steps (4) and (5), the receiver appropriately encrypts the pattern by exploiting the token TOK issued by the sender, and outsources it to the cloud. Finally, the cloud performs the outsourced pattern matching in the encrypted domain and returns the result to the receiver. The preprocessing step is required to run only once, the complexity of which is apportioned among multiple receivers querying the text with different patterns.

2.2 Definitions and Security Model

In this section, we mainly focus on the honest-but-curious (semi-trusted) security model. Specifically, the goal of the receiver is to know the positions where pattern P matches in the text without disclosing anything about the pattern to even the collusion between the cloud and the sender. On the other hand, the text T is also required to be well protected against even the collusion between the cloud and the receiver.

We formalize the security of our proposed PPOPM through the universal composable (UC) model using the ideal/real paradigm. The functionality of \(F_{PPOPM}\) implemented via three two-party protocols \(PTL=(PTL_{Pre},PTL_{Qry},PTL_{Opm})\) is described as follows.

Functionality \((F_{PPOPM})\): Let \(n,m,\lambda \in \mathbb {Z}_{N}\). The functionality \(F_{PPOPM}\) initially sets an empty table Tab and perform the following operations. It interacts with the sender, receiver, server and adversary Sim.

  1. (i)

    Upon receiving a message (TxtTm) from the sender, it sends (Pre, |T|, m) to the cloud and Sim, and records (TxtT), where |T| means the size of text T;

  2. (ii)

    Upon receiving a message \((Qry,P_{i})\) from the receiver where \(i\in [1,poly(\lambda )]\) (i.e. the receiver is allowed to launch multiple times of query which is a polynomial of security parameter \(\lambda \)), it firstly checks whether message \((Txt,\cdot )\) has been recorded and \(|P_{i}|=m\) where \(|P_{i}|\) means the size of pattern \(P_{i}\). If passed, it further checks whether there exists an entry of \((P_{i},\cdot )\) in table Tab. If it is not the case, it picks a new identifier \(ID\in \{0,1\}^{*}\) and adds the entry \((P_{i},ID)\) into table Tab. Then, it sends (QryRec) to the sender and Sim.

  3. (iii-a)

    Upon receiving message (AprRec) from the sender, it reads the entry \((P_{i},ID)\) from Tab and sends message \((Qry,Rec,(Cov_{1,u}^{Enc},Cov_{2,u}^{Enc},Cov_{3,u}^{Enc}),ID)\) to the cloud, where \(u\in [0,n-m])\) refers to \(n-m+1\) possible positions that pattern \(P_{i}\) occurs in text T. Otherwise, it sends \(\perp \) to the receiver and abort.

  4. (iii-b)

    Upon receiving message (AprRec) from Sim, it reads the entry \((P_{i},ID)\) from Tab and sends message \((Qry,P_{i},(Cov_{1,u}^{Enc},Cov_{2,u}^{Enc},Cov_{3,u}^{Enc}),ID)\) to the receiver. Otherwise, it sends \(\perp \) to the receiver.

It is noted that distinguishing from the existing work [5], even the server is corrupted, the positions the pattern occurs in the text can still not be disclosed, since only \((Cov_{1,u}^{Enc},Cov_{2,u}^{Enc},Cov_{3,u}^{Enc})\) inferring the encrypted matching positions are obtained by the cloud which can only be successfully deciphered by the authorized receivers. Therefore, both text privacy and pattern privacy can be well protected against collusion attacks. Formally, if we define \(\mathbf {IDEAL}_{F_{PPOPM},Sim(z)}(\lambda ,p,(T,(P_{1},\cdots ,P_{poly(\lambda )})))\) (\(\mathbf {REAL}_{PTL,Adv(z)}(\lambda ,p,(T,(P_{1},\cdots ,P_{poly(\lambda )})))\)) as the output of an ideal adversary Sim (adversary Adv), the cloud, the sender and the receiver in the ideal execution of \(F_{PPOPM}\) (the real execution of protocol \(PTL=(PTL_{Pre},PTL_{Qry},PTL_{Opm})\)) upon the inputs \((\lambda ,p,(T,(P_{1},\cdots ,P_{poly(\lambda )})))\) and the auxiliary input z for Sim(Adv), we can derive the following security definition of our proposed privacy preserving outsourced pattern matching protocol PPOPM.

Definition 1

We say protocol \(PTL=(PTL_{Pre},PTL_{Qry},PTL_{Opm})\) securely implements the functionality \(F_{PPOPM}\), if and only if for any probabilistic polynomial time (PPT) real adversary Adv, there exists a probabilistic polynomial time ideal adversary (simulator) Sim such that for any inputs \((\lambda ,T,(P_{1},\cdots ,P_{poly(\lambda )}))\) and auxiliary input z, we have

where \(\approx _{c}\) denotes computationally indistinguishability. The proposed PPOPM implements ideal functionality \(F_{PPOPM}\) in the random oracle model. Functions \(H_0,H_1\) behaving as truly random functions are public and can be queried by all parts polynomial times to security parameter \(\lambda \).

3 Efficient Privacy-Preserving Outsourced Discrete Fourier Transform OFFT

In this section, an efficient privacy preserving outsourced discrete fourier transform protocol OFFT without exploiting FHE is proposed to protect both the input privacy and the coefficient privacy. Discrete fourier transform refers to the following mapping between two vectors

$$\begin{aligned} \varvec{a}=(a_0,a_1,\cdots ,a_{n-1})\rightarrow \varvec{\hat{a}}=(\hat{a}_0,\hat{a}_1,\cdots ,\hat{a}_{n-1}) \end{aligned}$$
(1)

such that \(\hat{a}_{l}=\sum _{i=0}^{n-1}a_iw_{n}^{il}(l=0,1,\cdots ,n-1)\). The proposed OFFT is mainly composed of the following four algorithms: Setup, Encrypt, Evaluation and Decrypt, which are detailed as follows.

Setup(\(1^{\lambda }\)): On input \(1^{\lambda }\) where \(\lambda \) is the security parameter, it runs a trapdoor permutation generator denoted as a probabilistic polynomial time (PPT) algorithm \(\mathcal {G}\) and outputs a tuple of permutations \((f,f^{-1}):\{0,1\}^{2\lambda }\rightarrow \{0,1\}^{2\lambda }\) with a pair of corresponding keys \((pk_{f},sk_{f})\). It also outputs two hash functions \(H_{0},H_{1}:\{0,1\}^{*}\rightarrow \{0,1\}^{2\lambda }\). The public parameters are \(PPR=(pk_{f},H_{0},H_{1})\) and the secret key is \(sk_{f}\) assigned to the receiver.

Encrypt(\(pk_f,w_n,m_{i}(i=0,1,\cdots ,n-1)\)): Let \(w_{n}\) be an n-th complex root of 1, that is \(w_{n}=e^{j\frac{2\pi }{n}}\) where \(j^{2}=-1\). \(m_{i}=m_{i,1}+jm_{i,2}\) where \(m_i=(m_{i,1},m_{i,2})\in (\{0,1\}^{2\lambda })^{2}(i=0,1,\cdots ,n-1)\) are the inputs and \(w_{n}=w_{n,1}+jw_{n,2}\) where \(w_{n}=(w_{n,1},w_{n,2})\in (\{0,1\}^{2\lambda })^{2}\) is the coefficient for the discrete fourier transform. It is noted that for cryptography usage convenience, both \(w_{n,1},w_{n,2}\) here have been enlarged by an appropriate scaling factor so that they can be located in \(\{0,1\}^{2\lambda }\) (i.e. the methods for choosing a proper scaling factor and the corresponding result recovery can refer [20]). The data owner randomly selects two big primes pq of \(|p|=|q|=\lambda \) which are kept secret by the owner, and computes \(N=pq\) which is publicized for the receiver. Then, the owner randomly selects \(r^{'}\in _{R}\{0,1\}^{\lambda }\) and computes \(C_{1,1}=f(p\parallel r^{'})\) where \(\parallel \) means the random padding operation of p to the length of \(2\lambda \). For each \(m_{i}\) and \(w_{n}\), it computes \(m_{i,p}\equiv m_{i}\ mod\ p, m_{i,q}\equiv m_{i}\ mod\ q, w_{n,p}\equiv w_{n}\ mod\ p, w_{n,q}\equiv w_{n}\ mod\ q\). It is noted that the modular operation for complex numbers is defined as the modular operations respectively on the real part and the imaginary part. Then, it randomly selects \(U_{i}^{mul},U_{k}^{mul}\in _{R}\mathbb {Z}_{N}(k=0,1,\cdots ,il;l=0,1,\cdots ,n-1)\) and computes

$$\begin{aligned}&C_{2,i}=(q^{-1}qm_{i,p}^{p}+p^{-1}pm_{i,q}^{q})U_{i}^{mul}\ mod\ N,\nonumber \\&C_{3,k}=(q^{-1}qw_{n,p}^{p}+p^{-1}pw_{n,q}^{q})U_{k}^{mul}\ mod\ N, \end{aligned}$$
(2)

where \(q^{-1},p^{-1}\) respectively denote the inverses of \(q\ (mod p)\) and \(p\ (mod q)\). Finally, the owner publicizes \(U_{i,T}^{mul}=U_{i}^{mul}\prod _{k=0}^{il}U_{k}^{mul}\), computes \(C_{ram}^{mul}=H_{0}(p\parallel \bigcup _{i=0}^{n-1}C_{2,i}\parallel \bigcup _{k=0}^{il}C_{3,k})\) and sends \(C_{u,i}=(C_{1,1},C_{2,i},C_{3,k},C_{ram}^{mul})\) to the cloud.

Evaluate(\(C_{2,i},C_{3,k}(i,l=0,1,\cdots ,n-1;k=0,1,\cdots ,il),C_{ram}^{mul}\)): The cloud evaluates the discrete fourier transform in the encrypted domain Firstly, the cloud performs the addition and multiplication aggregation operations

$$\begin{aligned} C_{i,T}^{mul}&=C_{2,i}\prod _{k=0}^{il}C_{3,k}(U_{i,T}^{mul})^{-1}\ mod\ N\nonumber \\&=(q^{-1}qm_{i,p}^{p}+p^{-1}pm_{i,q}^{q})(q^{-1}qw_{n,p}^{p}+p^{-1}pw_{n,q}^{q})^{il}\ mod\ N\nonumber \\&=q^{-1}q(m_{i,p}w_{n,p}^{il})^{p}+p^{-1}p(m_{i,q}w_{n,q}^{il})^{q}\ mod\ N,\nonumber \\ C_{\hat{a}_{l}}&=\sum _{i=0}^{n-1}C_{i,T}^{mul}\ mod\ N\nonumber \\&=q^{-1}q\sum _{i=0}^{n-1}(m_{i,p}w_{n,p}^{il})^{p}+p^{-1}p\sum _{i=0}^{n-1}(m_{i,q}w_{n,q}^{il})^{q}\ mod\ N\nonumber \\&=q^{-1}q(\sum _{i=0}^{n-1}m_{i,p}w_{n,p}^{il})^{p}+p^{-1}p(\sum _{i=0}^{n-1}m_{i,q}w_{n,q}^{il})^{q}\ mod\ N\nonumber \\ C_{3}&=H_{1}(C_{\hat{a}_{l}}\parallel C_{ram}^{mul}) \end{aligned}$$
(3)

and sends \(C_{A}=(C_{1,1},C_{2,i},C_{3,k},C_{\hat{a}_{l}},C_{ram}^{mul},C_{3})\) to the receiver.

Decrypt(\(sk_f,C_{1,1},C_{\hat{a}_{l}},C_{ram}^{mul},C_{3}\)): The receiver firstly computes \(p\parallel r^{'}=f^{-1}(C_{1,1})\) by using her/his secret key \(sk_{f}\), derives p by removing the last \(\lambda \) bits for random padding, and checks whether both \(C_{ram}^{mul}=H_{0}(p\parallel \bigcup _{i=0}^{n-1}C_{2,i}\parallel \bigcup _{k=0}^{il}C_{3,k})\) and \(C_{3}=H_{1}(C_{\hat{a}_{l}}\parallel C_{ram}^{mul})\) hold. If not, this algorithm outputs \(\perp \); otherwise, the receiver continues to compute \(q=Np^{-1}\) and

$$\begin{aligned} C_{\hat{a}_{l}}\ mod\ p&=q^{-1}q(\sum _{i=0}^{n-1}m_{i,p}w_{n,p}^{il})^{p}+p^{-1}p(\sum _{i=0}^{n-1}m_{i,q}w_{n,q}^{il})^{q}\ mod\ p\nonumber \\&=q^{-1}q(\sum _{i=0}^{n-1}m_{i,p}w_{n,p}^{il})^{p}\ mod\ p=M_{T,p}\ mod\ p,\nonumber \\ C_{\hat{a}_{l}}\ mod\ q&=q^{-1}q(\sum _{i=0}^{n-1}m_{i,p}w_{n,p}^{il})^{p}+p^{-1}p(\sum _{i=0}^{n-1}m_{i,q}w_{n,q}^{il})^{q}\ mod\ q\nonumber \\&=p^{-1}p(\sum _{i=0}^{n-1}m_{i,q}w_{n,q}^{il})^{q}\ mod\ q=M_{T,q}\ mod\ q. \end{aligned}$$
(4)

where

$$\begin{aligned} M_{T,p}=\sum _{i=0}^{n-1}m_{i,p}w_{n,p}^{il}\ mod\ p, M_{T,q}=\sum _{i=0}^{n-1}m_{i,q}w_{n,q}^{il}\ mod\ q. \end{aligned}$$
(5)

It is noted that the fully homomorphic property is preserved on both operations with modulus p and q as presented above. Then, the receiver can recover the fully homomorphic results \(M_{T}\), namely the result of discrete fourier transform \(\hat{a}_{l}\) by exploiting the Chinese Remainder Theorem (CRM) on Eq. (5) as follows,

$$\begin{aligned} \hat{a}_{l}=M_{T}=M_{p}^{'}qM_{T,p}+M_{q}^{'}pM_{T,q}\ mod\ N \end{aligned}$$
(6)

where \(M_{p}^{'},M_{q}^{'}\) respectively satisfies \(M_{p}^{'}q\equiv 1\ mod\ p, M_{q}^{'}p\equiv 1\ mod\ q\), that can be efficiently computed since the greatest common divisor of p and q namely \((p,q)=1\).

4 The Proposed PPOPM

4.1 Efficient Privacy Preserving Outsourced Polynomial Multiplication

An efficient privacy preserving outsourced polynomial multiplication scheme OPMUL based on our newly-devised OFFT is proposed, serving the cornerstone of our secure outsourced pattern matching. Both the inputs and the coefficients of the polynomial are well protected against the cloud, and only the authorized receiver can successfully decipher the multiplication result. Without loss of generality, it is assumed that there exist two polynomials \(P(x),Q(x)\in \mathbb {Z}_{N}[x]\) of the degree \(n-1\), where n is assumed as a power of 2,

$$\begin{aligned} P(x)=a_{n-1}x^{n-1}+\cdots +a_{1}x+a_{0}, Q(x)=b_{n-1}x^{n-1}+\cdots +b_{1}x+b_{0}. \end{aligned}$$
(7)

Note that it can be easily achieved since if not the case, the higher order terms with zero coefficients can be added as \(a_{k}=b_{k}=0(k=n,n+1,\cdots ,2^{\lceil log_{2}n\rceil }-1)\) where \(\lceil x\rceil \) denotes the minimum integer not smaller than x. The goal is to compute \(MUL(x)=P(x)Q(x)=c_{2n-2}x^{2n-2}+\cdots +c_{1}x+c_{0}\). The details are presented as follows.

Setup. The setup algorithm is the same as OFFT.Setup.

Encrypt. The Encrypt algorithm is the same as OFFT.Encrypt with the exception that \(l=0,1,\cdots ,2n-1\) and \(m_{i}\) is replaced by the coefficients of polynomials \(P(x),Q(x)\in \mathbb {Z}_{N}[x]\), namely \(\varvec{a}=(a_{0},\cdots ,a_{n-1})\) and \(\varvec{b}=(b_{0},\cdots ,b_{n-1})\). It is also noted that for polynomial evaluation and multiplication applications, only a special case of our newly-designed privacy-preserving outsourced discrete fourier transform is exploited where we only focus the real part of \(m_{i}\).

Evaluate. The data owner securely outsources the evaluation of polynomials P(x) and Q(x) at 2n points \(w_{2n}^{0},w_{2n}^{1},\cdots ,w_{2n}^{2n-1}\) by exploiting the algorithm OFFT.Evaluate, where \(w_{2n}\) is a 2n-th complex root of 1, namely \(w_{2n}=e^{j\frac{\pi }{n}}\) and \(j^{2}=-1\). We use polynomial P(x) for example to briefly describe how to efficiently evaluate polynomials in the encrypted domain and the case of Q(x) is the same. It is noted that the polynomial evaluation on the corresponding encrypted points of \(w_{2n}^{l}(l=0,1,\cdots ,2n-1)\) can be performed by using the algorithm OFFT.Evaluate in our newly-designed privacy preserving outsourced discrete fourier transformation OFFT in Sec. 3 for \(\hat{a}_{l}=\sum _{i=0}^{n-1}a_{i}w_{2n}^{il}\) with the only exception that \(l=0,1,\cdots ,2n-1\).

On the other hand, it is observed that polynomial P(x) can be divided into two polynomials \(P_{0}(x)\) and \(P_{1}(x)\) of the same degree \(\frac{n}{2}-1\),

$$\begin{aligned} P_{0}(x)=a_{n-2}x^{\frac{n}{2}-1}+\cdots +a_{2}x+a_{0}, P_{1}(x)=a_{n-1}x^{\frac{n}{2}-1}+\cdots +a_{3}x+a_{1}. \end{aligned}$$
(8)

such that \(P(x)=P_{0}(x^2)+xP_{1}(x^2)\). Therefore, the problem of P(x) evaluation can be divided into two steps: firstly evaluate \(P_{0}(x)\) and \(P_{1}(x)\) at \((w_{2n}^{0})^2,(w_{2n}^{1})^2,\cdots ,(w_{2n}^{2n-1})^2\) and then integrate the final result accordingly. Note that \((w_{2n}^{0})^2,(w_{2n}^{1})^2,\cdots ,(w_{2n}^{2n-1})^2\) only comprise n complex roots of unity, namely \(w_{2n}^{0},w_{2n}^{2},\cdots ,w_{2n}^{2n-2}\), therefore the subproblems of privacy preserving outsourced computing \(P_{0}(x)\) and \(P_{1}(x)\) possess exactly the same form of the original problem of evaluating P(x) with the half size and we can solve it by recursively exploiting our newly-designed privacy preserving outsourced discrete fourier transform OFFT in Sect. 3 described in Algorithm 1.

figure a

Then, the cloud can obtain the encrypted results of polynomial Q(x) evaluation at these 2n points \(w_{2n}^{0},w_{2n}^{1},\cdots ,w_{2n}^{2n-1}\) in the same way. Let \(C_{\hat{a}_{l}}^{s}(l\in \{0,1,\cdots ,2n-1\};s\in \{P,Q\})\) be the encrypted results of polynomials P(x) and Q(x) at points \(w_{2n}^{l}\), the cloud can also evaluate the encrypted product of \(MUL(x)=P(x)Q(x)\) at these points through pairwise multiplication

$$\begin{aligned} \varvec{C_{\hat{a}}}^{PQ}=C_{\hat{a}_{l}}^{PQ}=C_{\hat{a}_{l}}^{P}C_{\hat{a}_{l}}^{Q} (l=0,1,\cdots ,2n-1). \end{aligned}$$
(9)

Finally, the cloud interpolates polynomial MUL(x) at the encrypted product values by exploiting inverse fourier transform to obtain the encrypted coefficients \(\varvec{c}^{Enc}=(OFFT.Encrypt(c_{0}),OFFT.Encrypt(c_{1}),\cdots ,OFFT.Encrypt(c_{2n-1}))\). Specifically,

$$\begin{aligned} \varvec{c}^{Enc}=(V_{2n}^{Enc})^{-1}\varvec{C}_{\hat{a}_{l}}^{PQ}, \end{aligned}$$
(10)

where the matrix \(V_{2n}^{Enc}\) is presented in Eq. (11).

Decrypt. The Decrypt algorithm is the same as OFFT.Decrypt with the exception that

$$\begin{aligned} V_{2n}^{Enc}=\left( \begin{array}{ccccc} OFFT.Enc(1)&{} OFFT.Enc(1)&{} OFFT.Enc(1)&{} \cdots &{} OFFT.Enc(1)\\ OFFT.Enc(1)&{} OFFT.Enc(w_{2n})&{} OFFT.Enc(w_{2n}^{2})&{} \cdots &{} OFFT.Enc(w_{2n}^{2n-1})\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ OFFT.Enc(1)&{} OFFT.Enc(w_{2n}^{2n-1})&{} OFFT.Enc(w_{2n}^{2(2n-1)})&{} \cdots &{} OFFT.Enc(w_{2n}^{(2n-1)^{2}})\\ \end{array} \right) . \end{aligned}$$
(11)

\(C_{\hat{a}_{l}}\) is replaced by each element \(OFFT.Encrypt(c_{l})(l=0,1,\cdots ,2n-1)\) of \(\varvec{c}^{Enc}\).

4.2 Efficient Privacy Preserving Outsourced Pattern Matching

In this subsection, an efficient privacy-preserving outsourced pattern matching protocol PPOPM is proposed based on the proposed OPMUL. It is observed that the pattern matching problem can be reduced to computing the sum of square differences between text T of size n and the pattern P of size m for each possible alignment. Specifically, in the case of no-wildcards, there exists an exact match if and only if for each position \(u\in [0,n-m]\), the following judging equation

$$\begin{aligned} f_{nw}=\varSigma _{v=0}^{m-1}(p_{v}-t_{u+v-1})^{2} =\varSigma _{v=0}^{m-1}(p_{v}^{2}-2p_{v}t_{u+v-1}+t_{u+v-1}^{2})=0 \end{aligned}$$
(12)

holds; while in the case of wildcards, the judging equation is replaced by

$$\begin{aligned} f_{w}=\varSigma _{v=0}^{m-1}p_{v}t_{u+v-1}(p_{v}-t_{u+v-1})^{2} =\varSigma _{v=0}^{m-1}(p_{v}^3t_{u+v-1}-2p_{v}^2t_{u+v-1}^2+p_{v}t_{u+v-1}^3)=0, \end{aligned}$$
(13)

filling wildcard characters with “0”s instead. Therefore, the cloud is respectively required to calculate one convolution and two aligned multiplication for no-wildcard case, and three convolutions for wildcard case in the encrypted domain if one of the vectors is re-aligned in its reversed order, by exploiting our proposed privacy-preserving outsourced polynomial multiplication OPMUL, according to the convolution theorem. Moreover, to achieve the pattern privacy against the collusion between the sender and the cloud, it is required for the receiver to hide the authentic pattern by the randomnesses rr” in respective cases, which results in that the sender cannot correctly decipher the sum of aligned multiplications and convolutions to judge whether there exists a precise matching at position \(u\in [0,n-m]\) and derive the receiver’s queried pattern. The details of the proposed PPOPM are described as follows.

Setup. The Setup algorithm is invoked between the sender and the cloud, which is the same as OPMUL.Setup with the following exceptions. Given the text T and integer m, the sender firstly partitions text T into \(\frac{n}{m}\) overlapping substrings of size 2m. The first substring starts at the beginning of the text and each subsequent substring has an overlap of size m with the previous one. The processed text \(T^{'}\) can be represented as

$$\begin{aligned} T^{'}=(B_{1},\cdots ,B_{s}) =((t_{0},\cdots ,t_{2m-1}),(t_{m},\cdots ,t_{3m-1}),\cdots ,(t_{(s-1)m},\cdots ,t_{n-1})), \end{aligned}$$
(14)

where \(s=\lceil \frac{n}{m}\rceil -1\) and the following privacy-preserving outsourced pattern matching is performed on each block \(B_{d}(d=1,\cdots ,s)\). Then, the sender encrypts each element \(t_{u}(u\in [(d-1)m,(d+1)m-1])\) for the first \(s-1\) blocks and \(t_{u}(u\in [(s-1)m,n-1])\) for the last block to generate \(T^{',Enc}\) by exploiting the algorithm OPMUL.Encrypt, with the only exception that the ciphertext \(C_{1,1}\) is delayed to be computed and delivered as the token in the Query phase.

Query. In the query phase, the sender is required to transmit a query token \(TOK=f(p\parallel r^{'})\) for all the authorized receivers holding pattern P. Note that the one-way trapdoor permutation f can be flexibly implemented by identity-based encryption or attribute-based encryption according to different security and privacy requirements. Then, the authorized receiver performs decryption on TOK using secret key \(sk_{f}\), and obtains the private key p that is used to encrypt the queried pattern P afterwards by moving the last \(\lambda \) bits as random padding.

Outsourced Pattern Matching. Firstly, the receiver randomly selects \(r\in _{R}\mathbb {Z}_{N}\), computes \(p_{v}^{'}=rp_{v}(v\in [0,m-1])\) and encrypts \(p_{v}^{'}\) by exploiting OPMUL.Encrypt.

In the case of no-wildcards, for each position \(u\in [(d-1)m,(d+1)m-1]\) in block \(B_{d}\) (i.e. without loss of generality, we take the first \(s-1\) blocks to explain the outsourced pattern matching process, the case of the last block \(B_{s}\) is the same), the cloud computes

$$\begin{aligned} Cov_{1,u}=\sum _{v=0}^{m-1}r^{2}p_{v}^{2},Cov_{2,u}=\sum _{v=0}^{m-1}rp_{v}t_{u+v},Cov_{3,u}=\sum _{v=0}^{m-1}t_{u+v}^{2} \end{aligned}$$
(15)

in the encrypted domain by exploiting the algorithm OPMUL.Evaluate. Specifically, without loss of generality, we take the computation of \(Cov_{2}\) for example. While re-arranging vector \(p_{v}\) in its reversed order, \(Cov_{2,u}\) can be rewritten as its standard convolution representation of \(Cov_{2,u}^{std}=\sum _{v=0}^{m-1}t_{u+v}p_{m-1-v}\) deriving the same value of each other. If we let \(P(x)=t_{u+m-1}x^{u+m-1}+\cdots +t_{u+1}x^{u+1}+t_{u}x^{u}\) and \(Q(x)=p_{0}x^{m-1}+\cdots +p_{m-2}x+p_{m-1}\), the calculation of \(Cov_{2,u}^{std}\) can be considered as computing the coefficient of item \(x^{u+m-1}\) in the polynomial multiplication of P(x) and Q(x). It is noted that for brief presentation, the problem equals to compute the coefficient of item \(x^{m-1}\) in the polynomial multiplication of \(P^{'}(x)\) and Q(x) if we let \(P^{'}(x)=\frac{P(x)}{x^{u}}=t_{u+m-1}x^{m-1}+\cdots +t_{u+1}x+t_{u}\). The privacy preserving outsourced evaluation in the encrypted domain for \(Cov_{1,u}\) and \(Cov_{3,u}\) can be executed in the same way. Finally, the cloud sends the corresponding encrypted results \(Cov_{1,u}^{Enc},Cov_{2,u}^{Enc},Cov_{3,u}^{Enc}\) to the receiver. The case of wildcards is the same with the exception of outsourcing the convolution computations

(16)

in the encrypted domain instead, where is the blinding factor selected by the receiver.

After receiving the encrypted results from the cloud, the receiver performs the decryption by exploiting the algorithm OPMUL.Decrypt. In the case of no-wildcards, the receiver continues to obtain the authentic judging polynomial evaluation by de-blinding

$$\begin{aligned} f_{nw}=(r^{-1})^{2}Cov_{1,u}-2r^{-1}Cov_{2,u}+Cov_{3,u}. \end{aligned}$$
(17)

Finally, the receiver can judge whether an exact matching occurs at position \(u\in [(d-1)m,(d+1)m-1]\) in block \(B_{d}\) by checking whether \(f_{nw}=0\) (i.e. or \(f_{nw}<k\) where k is a predefined threshold for approximate pattern matching). The case of wildcards is similar with the exception of judging whether \(f_{w}=0\) to decide an exact matching.

5 Security Proof

In this section, we give the formal security proof of our proposed privacy preserving outsourced pattern matching protocol PPOPM in the universal composable (UC) model.

Intuitively, the text privacy can be well protected against the collusion between the cloud and the receiver, since in the algorithm Setup of our proposed PPOPM, the sender encrypts each character \(t_{u}\) of text T by not only the knowledge of N-factoring namely the secret keys p and q, but also an multiplicative blinding factor \(U_{u}^{mul}\). Therefore, the cloud and even the authorized receivers possessing the knowledge of N-factoring can still not obtain the original text T since the randomness \(U_{u}^{mul}\) is unavailable. Therefore, the text privacy can be well protected against the collusion between the cloud and the receiver.

On the other hand, the pattern privacy can also be well protected against the collusion between the cloud and the sender, since the sender nor the cloud can successfully obtain the exact matching positions. The reason is that even the sender possesses the knowledge of N-factoring, it is still not able to calculate the correct judging polynomial evaluations including pattern characters \(p_{v}\), since each \(p_{v}\) is not only encrypted by the secret keys p and q derived from the token TOK, but also blinded by a randomness r selected by the receiver while being outsourced to the cloud in the algorithm of Query in our proposed PPOPM. Factually, the sender is only allowed to decipher \(Cov_{1,u},Cov_{2,u},Cov_{3,u}\), but not the authentic polynomial evaluations of \(f_{nw}\) (or \(f_{w}\)) in Eq. (12) by successfully removing the randomness r (or r ”).

Table 1. A summary of complex and security comparison (nm respectively denotes the sizes of text T and pattern P)

Theorem 1

Let \(\lambda \in \mathbb {Z}_{N}\) be the security parameter. For integers nm, we set \(s=\lceil \frac{n}{m}\rceil -1\) and assume \(H_{0},H_{1}:\{0,1\}^{*}\rightarrow \{0,1\}^{2\lambda }\) are random oracle. \(N=pq\) where \(|p|=|q|=\lambda \) and \(f:\{0,1\}^{2\lambda }\rightarrow \{0,1\}^{2\lambda }\) is a one-way trapdoor permutation. Then, the proposed protocol PTL (PPOPM) securely implements the functionality \(F_{PPOPM}\) in the presence of semi-honest adversaries, namely satisfying the security Definition 1.

Proof

Due to the space limitation, we only give the proof sketch of our proposed PPOPM in the random oracle model here and please refer to the full version of our paper for the constructions secure in the standard model for the malicious setting with the full proof by exploiting the techniques of [21, 24, 25]. We firstly formally prove Theorem 1 that our proposed PPOPM achieves the security Definition 1 for each corruption setting of sender, server and receiver respectively. We take the case of corrupted server for example to explain our proof idea that Theorem 1 can be derived by the integration of Claims 1 and 2. We firstly define a hybrid distribution \(\mathbf {HYB}_{PTL,Adv(z)}(\lambda ,(T,(P_{1},\cdots ,P_{poly(\lambda )})))\) which can be defined as a real experiment \(\mathbf {REAL}_{PTL,Adv(z)}(\lambda ,(T,(P_{1},\cdots ,P_{poly(\lambda )})))\) with the exception that the encrypted text \(T^{Enc}\) is replaced by a random function \(f_{\lambda }\) instead and derive the following claim by the reduction to the security of our proposed primitive OPMUL.

Fig. 2.
figure 2

Computational cost comparison of secure outsourced discrete fourier transform

Fig. 3.
figure 3

Communication cost comparison of secure outsourced discrete fourier transform

Claim 1

Let OPMUL exploited in PPOPM be secure under the hardness of both N-factoring and inverting one-way trapdoor permutation without secret key \(sk_f\), there exists a negligible function \(negl(\cdot )\) such that for sufficiently large \(\lambda \in \mathbb {Z}_{N}\), any tuple of inputs \((T,(P_{1},\cdots ,P_{poly(\lambda )}))\) and auxiliary input z, it holds that

By computing the negligible probability of Bad event we defined that occurs when \(Sim_R\) aborts in the ideal environment, we can derive the following claim.

Claim 2

For any input text T, patterns \(P_{1},\cdots ,P_{poly(\lambda )}\), and the auxiliary input z, it holds that

where \(\equiv _{s}\) denotes statistically indistinguishability.

For the colluding case, it is required to show that the cloud and the receiver (i.e. or the sender) cannot obtain any additional information about the text (i.e. or the pattern) other than what is leaked from the queries (i.e. or the matching result response).

6 Performance Evaluation

In this section, we mainly evaluate the performance of our proposed privacy preserving outsourced discrete fourier transformation OFFT and outsourced pattern matching protocol PPOPM in the following aspects: the computational and communication cost comparison between the proposed OFFT/PPOPM and the existing work [5, 15, 20] exploiting the techniques of public key homomorphic encryptions. We also study the pattern matching probability comparison respectively in the plaintext domain and encrypted domain.

Fig. 4.
figure 4

Computational cost comparison of secure outsourced pattern matching

Fig. 5.
figure 5

Communication cost comparison of secure outsourced pattern matching

6.1 Complexity Analysis

Before delving into the experimental results, we firstly provide an overview of the complexity of our proposed PPOPM on all three participating entities. Table 1 summarizes the complexities and the security levels for Faust et al.’s protocol [5], Vergnaud’s protocol [15] and our proposed PPOPM. The time complexity of Faust et al.’s [5] and our PPOPM in the outsourced model are both O(nlogm), owing to the divide-and-conquer technique by dividing text T into \(\frac{n}{m}\) overlapping substrings (blocks) of 2m. However, for Vergnaud’s protocol [15] in two-party secure computation model, both sender and receiver are required not only for pattern matching, but also performing ElGamal’s encryption [34] on each text/pattern element, the time complexity are respectively \(O(n+nlogm)\) and \(O(m+nlogm)\).

Fig. 6.
figure 6

Matching probability comparison

Additionally, since it is required to process each possible pattern for each block in setup phase and the oblivious PRF is needed to achieve pattern privacy in query phase for Faust et al.’s [5] (i.e. it is also the reason for that the communication complexity of the sender in Faust et al.’s [5] is \(O(n+m)\)), the time complexity of the sender and receiver are respectively \(O(n+m)\) and O(m). While in our PPOPM, owing to the newly-devised primitive OPMUL based on recursive OFFT (i.e. Algorithm 1) and partition techniques, the time complexity on the cloud end is O(nlogm). The one-way trapdoor permutation is required to perform only once (obviously O(1)) at both the sender and receiver’s ends for uploading the encrypted text T, pattern P and decipher the matching result. Therefore, it is observed that the efficiency complexity of the proposed PPOPM far outperforms Faust et al.’s [5] and Vergnaud’s [15] while achieving both text privacy and pattern privacy.

6.2 Experimental Evaluation

We conduct the experiments by exploiting PBC and MIRACLE libraries [22, 23] running on Linux platform with 2.93 GHz processor to study the operation costs. We mainly focus on the most computationally-intensive component, namely the one-way trapdoor permutations containing most modular exponentiation operations and most ciphertext expansions which considerably contribute to computational and communication cost. It is respectively implemented by the Paillier’s additive homomorphic encryption [33] on \(\mathbb {Z}_{N^{2}}\) in [20] and RSA on \(\mathbb {Z}_{N}\) where \(|N|=1024\)-bit long in our proposed OFFT. Figures 2 and 3 demonstrate the computational and communication cost comparison between our proposed OFFT and the existing work [20] adopting the technique of Paillier’s cryptosystem [33]. Figure 2 shows us as the degree i in FFT increases while fixing subscript \(l=2\) (i.e. we recall the definition of discrete FFT as \(\hat{a}_{l}=\sum _{i=0}^{n-1}a_iw_{n}^{il}(l=0,1,\cdots ,n-1)\)), both the computational cost of our proposed OFFT and [20] increase accordingly since there are more \(w_{n}\)s are required to be encrypted using Paillier’s cryptosystem in [20] or blinded by the multiplicative factor \(U_{k}^{mul}(k=0,1,\cdots ,il)\). However, the computational cost of OFFT is approximately zero compared to [20] no matter at the owner, the receiver or the cloud’s ends. The reason is that to securely outsource each item computation in the FFT of \(\hat{a}_{l}\) as \(ITM_{i}=a_{i}w_{n}^{il}\), it is required in [20] to execute Paillier’s additive homomorphic encryption [33] il times; while in our proposed OFFT, the RSA is required to perform only once. Additionally, owing to the fact that Paillier’s cryptosystem is located in \(\mathbb {Z}_{N^{2}}\), both the computational cost of a single encryption and the ciphertext expansion are significantly heavier than RSA adopted in our OFFT. Therefore, the overall complexity of our OFFT is dramatically lower than [20]. Based on the same observation, Fig. 3 shows us the communication cost of our OFFT has also been significantly reduced in contrast to [20].

For comparison convenience, we firstly extend [15] to an outsourced setting and eliminate the comparison to [5] owing to the different security levels between [5] and our PPOPM. As is respectively shown in Figs. 4 and 5, as the pattern size increases while fixing the text length \(n=500\), both the computational and communication cost increase accordingly and the efficiency of our OFFT is optimized compared to [15]. The reason is that the outsourced pattern matching in OFFT and [20] are both achieved by securely computing three convolutions adopting secure discrete fourier transform as the cornerstone, and the advantages of our proposed OFFT is perfectly inherited. Figure 6 demonstrates that the matching probability of our proposed PPOPM on the ciphertext domain is comparable to the one on the plaintext domain.

7 Conclusion

In this paper, without exploiting public key (fully) homomorphic encryption, we firstly present an efficient privacy-preserving outsourced discrete fourier transform OFFT by exploiting any one-way trapdoor permutation only once, where both the input privacy and coefficient privacy are well protected. Based on OFFT, we propose a privacy preserving outsourced polynomial multiplication OPMUL as a new building block. Finally based on OPMUL, a more efficient privacy preserving outsourced pattern matching PPOPM is proposed where both text privacy and pattern privacy are well protected against the collusion between the cloud and malicious receiver or malicious senders. Finally, formal security proof and extensive evaluations demonstrate the practicability and efficiency of our proposed PPOPM.