Abstract
Nowadays, fast growing number of users and business are motivated to outsource their private data to public cloud servers. Taking into consideration security issues, private data should be encrypted before being outsourced to remote servers, though this makes traditional plaintext keyword search rather difficult. For this reason, there exists an urgent need of an efficient and secure searchable encryption technology. In this paper, an affinity propagation (AP) K-means clustering method (CAK-means, a combination of AP and K-means clustering) is proposed to realize fast searchable encryption in Big Data environments. CAK-means clustering utilizes affinity propagation to initialize K-means clustering, thereby making the clustering process faster, stable and effectively improving the initial clustering center quality of the K-means. As the AP algorithm identifies the clustering center with much lower errors than other methods, it significantly improves the search accuracy. Simultaneously, the related files in one cluster are stored at the contiguous locality of disks which will substantially improve the file locality and speedup the read and write disk I/O. Additionally, the coordinated matching measure is utilized to support accurate ranking of search results. Experimental results show that the proposed CAK-means-based multi-keyword ranked searchable encryption scheme (MRSE-CAK) has higher search efficiency and accuracy while simultaneously ensuring equivalent security.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Asharov G, Naor M, Segev G, et al (2016) Searchable symmetric encryption: optimal locality in linear space via two-dimensional balanced allocations. In: Proceedings of the international conference on ACM symposium on theory of computing, Cambridge, MA, USA, pp 1101–1114
Cao N, Wang C, Li M et al (2014) Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Trans Parallel Distrib Syst 25(1):222–233
Cash D, Tessaro S (2014) The locality of searchable symmetric encryption. In: Proceedings of the international conference on the theory and applications of cryptographic techniques, Copenhagen, Denmark, pp 351–368
Chen C, Zhu X, Shen P et al (2016) An efficient privacy-preserving ranked keyword search method. IEEE Trans Parallel Distrib Syst 27(4):951–963
Chen L, Qiu L, Li KC et al (2017) DMRS: an efficient dynamic multi-keyword ranked search over encrypted cloud data. Soft Comput 21(16):4829–4841
Chen L, Qiu L, Li K-C, Zhou S (2018) A secure multi-keyword ranked search over encrypted cloud data against memory leakage attack. J Internet Technol 19(1):179–188
Curtmola R, Garay J, Kamara S, et al (2006) Searchable symmetric encryption: improved definitions and efficient constructions. In: Proceedings of the international conference on ACM conference on computer and communications security, Alexandria, VA, USA, pp 79–88
Demertzis I, Papamanthou C (2017) Fast searchable encryption with tunable locality. In: Proceedings of the international conference ACM international conference on management of data, Chicago, Illinois, USA, pp 1053–1067
Feingold DG, Varga RS (1962) Block diagonally dominant matrices and generalizations of the Gerschgorin circle theorem. Pac J Math 12(4):1241–1250
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Fu Z, Sun X, Liu Q, Zhou L, Shu J (2015) Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans Commun 98(1):190–200
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304
Ishai Y, Kushilevitz E, Ostrovsky R (2006) Cryptography from anonymity. In: Proceedings of the international conference on foundations of computer science, Washington, DC, USA, pp 239–248
Kamara S, Moataz T (2017) Boolean searchable symmetric encryption with worst-case sub-linear complexity. In: Proceedings of the international conference on the theory and applications of cryptographic techniques, Paris, France, pp 94–124
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the international conference on Berkeley symposium on mathematical statistics and probability, California, USA, pp 281–297
Miers I, Mohassel P (2017) IO-DSSE: scaling dynamic searchable encryption to millions of indexes by improving locality. In: Proceedings of the international conference on network and distributed system security symposium, San Diego, California, pp 1–13
Poh GS, Chin JJ, Yau WC et al (2017) Searchable symmetric encryption: designs and challenges. ACM Comput Surv 50(3):40
Wang J, Chen X, Li J et al (2017) Towards achieving flexible and verifiable search for outsourced database in cloud computing. Future Gener Comput Syst 67:266–275
Wang B, Yu S, Lou W, et al (2014) Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud. In: Proceedings of the international conference on computer communications, Toronto, Canada, pp 2112–2120
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishing, San Francisco
Xia Z, Wang X, Sun X et al (2016) A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst 27(2):340–352
Zhu Y, Yu J, Jia C (2009) Initializing K-means clustering using affinity propagation. In: Proceedings of the international conference on hybrid intelligent systems, Shenyang, China, pp 338–343
Acknowledgements
This work was supported by the Natural Science Foundation of China (Nos. 61602118, 61572010 and 61472074), Fujian Normal University Innovative Research Team (No. IRTL1207), Natural Science Foundation of Fujian Province (Nos. 2015J01240, 2017J01738), Science and Technology Projects of Educational Office of Fujian Province (No. JK2014009), and Fuzhou Science and Technology Plan Project (No. 2014-G-80).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Chen, L., Zhang, N., Li, KC. et al. Improving file locality in multi-keyword top-k search based on clustering. Soft Comput 22, 3111–3121 (2018). https://doi.org/10.1007/s00500-018-3145-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3145-6