Abstract
This paper concerns the fundamental problem of processing conjunctive keyword queries over an outsourced data table on untrusted public clouds in a privacy-preserving manner. The data table can be properly implemented with tree-based searchable symmetric encryption schemes, such as the known Keyword Red–Black tree and the Indistinguishable Bloom-filter Tree in ICDE’17. However, as for these trees, there still exist many limitations to support sub-linear time updates. One of the reasons is that their tree branches are directly exposed to the cloud. To achieve efficient conjunctive queries while supporting dynamic updates, we introduce a novel tree data structure called virtual binary tree (VBTree). Our key design is to organize indexing elements into the VBTree in a top-down fashion, without storing any tree branches and tree nodes. The tree only exists in a logical view, and all of the elements are actually stored in a hash table. To achieve forward privacy, which is discussed by Bost in CCS’16, we also propose a storage mechanism called version control repository (VCR), to record and control versions of keywords and queries. VCR has a smaller client-side storage compared to other forward-private schemes. With our proposed approach, data elements can be quickly searched while the index can be privately updated. The security of the VBTree is formally proved under the IND-CKA2 model. We test our scheme on a real e-mail dataset and a user location dataset. The testing results demonstrate its high efficiency and scalability in both searching and updating processes.
Similar content being viewed by others
References
Amazon: “Amazon Web services” (2017). http://aws.amazon.com
Microsoft: “Microsoft Azure” (2017). http://www.microsoft.com/azure
Google: “Google App Engine” (2017). http://code.google.com/appengine
Zhang, Y., Katz, J., Papamanthou, C.: All your queries are belong to us: the power of file-injection attacks on searchable encryption. In: 25th USENIX Security Symposium (USENIX), pp. 707–720. USENIX Association (2016)
Curtmola, R., Garay, J., Kamara, S., et al.: Searchable symmetric encryption: improved definitions and efficient constructions. In: Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS), vol. 95, No. 5, pp. 79–88. ACM (2006)
Kamara, S., Papamanthou, C.: Parallel and dynamic searchable symmetric encryption. In: Sadeghi, A.R. (ed.) Financial Cryptography and Data Security FC 2013. Lecture Notes in Computer Science, vol. 7859, pp. 258–274. Springer, Berlin, Heidelberg (2013)
Bost, R.: \(\Sigma o\varphi o\varsigma \): forward secure searchable encryption. In: ACM Sigsac Conference on Computer and Communications Security (CCS), pp. 1143–1154. ACM (2016)
Liu, Z., Lv, S., et al.: FFSSE: flexible forward secure searchable encryption with efficient performance. ACR Cryptology ePrint Archive (2017)
Li, R., Liu, A.X.: Adaptively secure conjunctive query processing over encrypted data for cloud computing. In: International Conference on Data Engineering (ICDE), pp. 697–708. IEEE (2017)
Goh, E.J.: Secure indexes. IACR Cryptology ePrint Archive (2003)
Li, R., Liu, A.X., Wang, A.L., et al.: Fast range query processing with strong privacy protection for cloud computing. In: International Conference on Very Large Data Bases (VLDB), pp. 1953–1964 (2014)
Naveed, M., Prabhakaran, M., Gunter, C.A.: Dynamic searchable encryption via blind storage. In: Security and Privacy (S&P), pp. 639–654 (2014)
Li, R., Liu, A.X., Wang, A.L., et al.: Fast and scalable range query processing with strong privacy protection for cloud computing. In: Transactions on Networking (TON), pp. 2305–2318 (2016)
Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. Trans. Parallel Distrib. Syst. (TPDS) 27(2), 340–352 (2016)
Song, D.X., Wagner, D., Perrig, A.: Practical techniques for searches on encrypted data. In: IEEE Symposium on Security and Privacy (S&P), pp. 44–55 (2000)
Chang, Y.C., Mitzenmacher, M.: Privacy preserving keyword searches on remote encrypted data. In: Applied Cryptography and Network Security (ACNS), pp. 442–455. Springer, Berlin (2005)
Bezawada, B., Liu, A.X., Jayaraman, B., et al.: Privacy preserving string matching for cloud computing. In: IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 609–618 (2015)
Chase, M., Kamara, S.: Structured encryption and controlled disclosure. In: Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT), pp. 577–594. Springer, New York (2010)
Kurosawa, K., Ohtaki, Y.: UC-secure searchable symmetric encryption. In: Financial Cryptography and Data Security (FC), pp. 285–298. Springer, New York (2012)
Liesdonk, P.V., Sedghi, S., Doumen, J., Hartel, P., Jonker, W.: Computationally efficient searchable symmetric encryption. In: VLDB Conference on Secure Data Management (SDM), pp. 87–100. Springer, New York (2010)
Cash, D., Jarecki, S., Jutla, C., et al.: Highly-scalable searchable symmetric encryption with support for Boolean queries. In: International Cryptology Conference (CRYPTO), pp. 353–373. Springer, New York (2013)
Pappas, V., Krell, F., Vo, B., et al.: Blind seer: a scalable private DBMS. In: Security and Privacy (S&P), pp. 359–374 (2014)
Ishai, Y., Kushilevitz, E., Lu, S., et al.: Private large-scale databases with distributed searchable symmetric encryption. In: Cryptographers ’Track at the RSA Conference, pp. 90–107. Springer, New York (2016)
Kamara, S., Moataz, T.: SQL on structurally-encrypted databases. IACR Cryptology ePrint Archive (2016)
Kamara, S., Papamanthou, C., Roeder, T.: Dynamic searchable symmetric encryption. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS), pp. 965–976. ACM (2012)
Cash, D., Jaeger, J., Jarecki, S., et al.: Dynamic searchable encryption in very-large databases: data structures and implementation. In: Network and Distributed System Security (NDSS), pp. 23–26. ISOC (2014)
Kamara, S., Moataz, T.: Boolean searchable symmetric encryption with worst-case sub-linear complexity. In: European Cryptology Conference (EUROCRYPT). Springer, New York (2017)
Wang, B., Yu, S., Lou, W., et al.: Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud. In: INFOCOM, pp. 2112–2120 (2014)
Fu, Z., Wu, X., Guan, C., et al.: Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. In: Transactions on Information Forensics and Security (TIFS), pp. 2706–2716
Stefanov, E., Papamanthou, C., Shi, E.: Practical dynamic searchable encryption with small leakage. In: Network and Distributed System Security (NDSS), pp. 23–26. ISOC (2014)
Garg, S., Mohassel, P., Papamanthou, C.: TWORAM: round-optimal oblivious RAM with applications to searchable encryption. IACR Cryptology ePrint Archive (2015)
Bost, R., Fouque, P.A., Pointcheval, D.: Verifiable dynamic symmetric searchable encryption: optimality and forward security. IACR Cryptology ePrint Archive (2016)
Chang, Z., Xie, D., Li, F.: Oblivious RAM: a dissection and experimental evaluation. In: International Conference on Very Large Data Bases (VLDB), pp. 1113–1124 (2016)
Islam, M.S., Kuzu, M., Kantarcioglu, M.: Access pattern disclosure on searchable encryption: ramification, attack and mitigation. In: Network and Distributed System Security (NDSS). ISOC (2012)
Popa, R.A., Redfield, C., Zeldovich, N., et al.: CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP), pp. 85–100. ACM (2011)
Mavroforakis, C., Chenette, N., O’Neill, A., et al.: Modular order-preserving encryption, Revisited. ACM International Conference on Management of Data (SIGMOD), pp. 763–777. ACM (2015)
Naveed, M., Kamara, S., Wright, C.V.: Inference attacks on property-preserving encrypted databases. In: ACM Sigsac Conference on Computer and Communications Security (CCS), pp. 644–655. ACM (2015)
Yao, A.C.: Protocols for secure computations. In: Foundations of Computer Science (SFCS), pp. 160–164 (1982)
Ben-David, A., Nisan, N., Pinkas, B.: FairplayMP: a system for secure multi-party computation. In: Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS), pp. 257–266. ACM (2008)
Dijk, M.V., Gentry, C., Halevi, S., et al.: Fully homomorphic encryption over the integers. In: Advances in Cryptology – EUROCRYPT, pp. 24–43. Springer, Berlin, Heidelberg (2010)
Enron email dataset (2015). http://www.cs.cmu.edu/~enron/
Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th International Conference on Knowledge Discovery and Data mining (SIGKDD), pp. 1082–1090. ACM (2011)
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Theorem 5.1
Appendix: Proof of Theorem 5.1
Proof
Let’s first study DDFPP. First, given a partition instance, the solution can be verified in polynomial time and then DDFPP is NP. Next, we reduce a known NP-complete problem, the Subset Sum Problem, to DDFPP. The Subset Sum Problem is as follows: “For a multiset of positive numbers \(A=\{a_1,a_2,\ldots ,a_{n}\}\), given a positive number t, is there a set B such that \(\sum _{a_i\in B}{a_i}=t\) and \(B\subseteq A\)?”. \(\square \)
Considering an instance of the Subset Sum Problem with a positive number multiset \(A=\{a_1,a_2,\ldots ,a_n\}\) and a positive number t, we convert it to an instance of DDFPP using following construction. If n is odd, we insert a zero number into A first. We create two variables \(a_{n+1}\) and \(a_{n+2}\), and let \(\sum _{a_i\in A}{a_i}=b\), \(a_{n+1}=2b-t\), and \(a_{n+2}=b+t\). Then, a new set C is created as follows, \(C=A\cup \{a_{n+1},a_{n+2}\}\). For each number \(a_i\) in C, we generate a file group \(G_{a_i}=\{d_1, d_2, \cdots , d_{a_i}\}\). All generated files are \(G_{a_1}\bigcup G_{a_2}\bigcup \cdots \bigcup G_{a_{n+2}}\), which have \(\sum _{a_i\in C}{a_i}=b+(2b-t)+(b+t)=4b\) files in total. For each data file group \(G_{a_i}\), we create k unique keywords first and insert these k keywords into every file in the group. That is to say, in the \(a_i\)th file group, \(a_i\) files share k same keywords. Next, we insert other randomly generated keywords into every file. Repeat this process until all of the file groups are initialized.
Suppose the file groups constructed above have a data file partition solution, and the file groups can be partitioned into \({\mathcal {F}}_1\) and \({\mathcal {F}}_2\) in polynomial time, such that \(\vert {\mathcal {F}}_1\vert =\vert {\mathcal {F}}_2\vert \) and \(\vert W({\mathcal {F}}_1)\bigcap W({\mathcal {F}}_2)\vert <k\). We now prove that A has a subset sum solution. Note that, for each of the file group, the files share k common keywords. It implies that for any file group \(G_{a_i}\) constructed from the number \(a_i\), the files of \(G_{a_i}\) are either all in \({\mathcal {F}}_1\) or all in \({\mathcal {F}}_2\). Otherwise, suppose there exist two files \(d_i\) and \(d_j\) with \(d_i\in {\mathcal {F}}_1\) and \(d_j\in {\mathcal {F}}_2\), then we have \(\vert d_i\bigcap d_j\vert \geqslant k\) and \(\vert W({\mathcal {F}}_1)\bigcap W({\mathcal {F}}_2)\vert \geqslant k\). This contradicts the fact that we assume. Since \(\vert {\mathcal {F}}_1\vert =\vert {\mathcal {F}}_2\vert \), we have \(\vert G_{x_1}\vert +\vert G_{x_2}\vert +\cdots +\vert G_{x_r}\vert =\vert G_{y_1}\vert +\vert G_{y_2}\vert +\cdots +\vert G_{y_r}\vert \), where \(G_{x_i}\subseteq {\mathcal {F}}_1\), \(G_{y_i}\subseteq {\mathcal {F}}_2\), and \(2r=n+2\). As \(\vert G_{x_i}\vert =x_i\) and \(\vert G_{y_i}\vert =y_i\), the following equation holds, \(x_1+x_2+\cdots +x_r=y_1+y_2+\cdots +y_r\). Let \(B=\{x_1,x_2,\ldots ,x_r\}\) and \(B'=\{y_1,y_2,\ldots ,y_r\}\). Note that, \(a_{n+1}\) and \(a_{n+2}\) cannot coexist in the set B or in the set \(B'\), otherwise, \(\sum {x_i}\ne \sum {y_i}\). If \(a_{n+1}\) is in B, then the subset sum \(\sum _{i\in ({B-\{a_{n+1}\}})}a_i=(2b-(2b-t))=t\). If \(a_{n+1}\) is in B,’ then \(\sum _{i\in ({B'-\{a_{n+1}\}})}a_i=(2b-(2b-t))=t\). Now, we have a subset sum solution of \(B-\{a_{n+1}\}\) or \(B'-\{a_{n+1}\}\). Finally, the Subset Sum Problem \(\le _p\) DDFPP, which means DDFPP is NP-complete. Since DDFPP is a special case of DFPP, then DFPP is NP-hard.
Rights and permissions
About this article
Cite this article
Wu, Z., Li, K. VBTree: forward secure conjunctive queries over encrypted data for cloud computing. The VLDB Journal 28, 25–46 (2019). https://doi.org/10.1007/s00778-018-0517-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-018-0517-6