Enabling Cooperative Privacy-preserving Personalized search in cloud environments
Introduction
The rapid increase in the amount of data, has brought society’s increasing concern to the value of data, followed by the increasingly prominent issues such as the privacy and security of our data [11], [28], [31]. As a recent example shows, Hillary’s support rate dropped dramatically since her privacy and private mailboxes had been excavated in receiving and sending public e-mails. Apart from the security issue, how to make good use of data is also a thorny issue. The cloud computing has superior storage and computing capabilities, so as to provide a good platform for big data storage and application. Currently, encryption could be the best choice for data security and users’ privacy [30], yet, the availability of encrypted data is deteriorating. Therefore, how to guarantee data availability while ensuring data security and users’ privacy will be a major challenge.
To address this problem, searchable encryption came into being. Researchers [4], [29], [36] have put forward many constructive schemes in different applications. However, most of the present schemes are keyword-based searchable encryption schemes which can no longer fully meet new challenges and the growing needs of users. It is mainly reflected in the following two aspects.
When the same keywords are used for searching, the required information varies from person to person due to different personal experiences, interests, and the like. For example, a user who is interested in electronic products may type in “Apple” for a search when he wants search results related to the Apple company, while for a user who is in favor of eating fruit, when he searches for “Apple”, he probably wants to obtain the relevant information about a certain kind of fruit. However, most of the current schemes failed to notice this. In these schemes, the cloud server will feed back all search results related to user search keywords. This will not only increase the bandwidth consumption of the network, but also confuse users when facing sea of data. As a result, the user has to spend a lot of time before obtaining the required information, which will seriously affect user’s search experience. Therefore, it is necessary to design a search scheme that can understand the user’s search intentions [18].
The other one is that most of the existing schemes failed to take into account the computational overhead and communication overhead in the cloud service process, and it is meaningful only when the data stored in the cloud can be easily searched. Liu [21] proposed a cooperative private searching protocol, which is compared with Ostrovsky protocol [27], discovering that its computational overhead and communication overhead are far lower than those of the Ostrovsky protocol. However, its computational overhead is still too high, and the long waiting time will seriously reduce the user’s search experience. In addition, the protocol only supports keyword search, with the search results unable to be customized for each person, which will reduce the search accuracy to a certain extent. Therefore, it is necessary to develop an effective search scheme characterized as privacy-preserving with low bandwidth and computing consumption.
Under this condition, on the premise of guaranteeing users’ privacy, this paper aims to design a personalized search scheme suitable for ciphertext environment with low computational overhead and low communication overhead. In CPPS, an aggregation and distribution layer (ADL) is introduced, and thus a larger number of user’s personalized queries can be aggregated and uploaded to the cloud server, so that each user can get search results related to the query words and user’s interesting model.
The main contributions of this paper compared with [5], [14], [20], [21] are listed below:
- •
1) The CPPS scheme is the first cooperative personalized search in cloud environments, which merges user’s queries and returns personalized search results for each user by introducing an ADL server, thus improving the efficiency of our scheme.
- •
2) With the assistance of secure kNN [33], the search of the merged query is implemented on the cloud server while the distribution of the results is displayed on the ADL server, so as to ensure the privacy of users’ information.
- •
3) The scheme was tested that it can well protect user’s privacy, with its efficiency and feasibility also being proven through experiments. When the number of merged users is 3, the communication overhead of the CPPS scheme is 66.7% compared to the circumstance with no ADL scheme, and the computing overhead of the cloud server is 33.3% compared to the circumstance with no ADL scheme, so the CPPS scheme can effectively alleviate the performance bottleneck of the cloud server.
The remainder of this paper is organized as follows. In Section 2, we discuss related work for searchable encryption and personalized search. In Section 3, we give the relevant definitions in the paper. In Section 4, we propose the system model, the threat model, the design goals and summarize the notations of the paper. In Section 5, we propose CPPS scheme. Its security and privacy analysis is presented in Section 6. In Section 7, we present simulation experiment. Finally, Section 8 with concluding remarks is the end of this paper.
Section snippets
Searchable encryption
To ensure data security and user privacy, data encryption is a usual practice. However, the availability of encrypted data is degraded. How to find the files you need in a large amount of ciphertext is a question worth considering. In order to solve the search problem over encrypted data, Song et al. [29] proposed the symmetric searchable encryption scheme in the first, which used stream ciphers to encrypt keywords. In the process of retrieval, through keywords and one-to-one matches between
Preliminaries
In this section, relevant definitions will be introduced as follows:
“TF-IDF”. “TF-IDF” is a frequently used weighting technique for information retrieval. “TF” is the abbreviation of term frequency, and “IDF” is the abbreviation of inverse document frequency, which has been applied in many literatures [1], [10], [15]. For this reason, “TF-IDF” is used to obtain the weight of keywords and keywords of files in CPPS scheme.
User interest model. User-specific search results are inseparable from user
System model and notations
CPPS is involved with four different entities, namely, the data owner, the data user, the ADL and the cloud server. The overview of CPPS scheme is shown in Fig. 1 and summary of notations is shown in Table 1.
For the sake of understanding, in this paper, we just use an ADL server. However, if necessary, CPPS scheme is also applicable to multiple ADL scenarios. On the user side, a user interest model is stored to protect the user’s private information. At the same time, to better reflect the
CPPS scheme
The CPPS scheme merges user queries by introducing an ADL server and feeds top-ki sorted results back to each user. The scheme can effectively reduce the computational overhead of the cloud server and the communication overhead between the cloud server and the ADL server, thus making the scheme very suitable for resource-saving cloud environments. The privacy of user and index is well protected by introducing a secure inner product [33]. At the same time, the user can quickly obtain
Security and privacy analysis
In this section, the CPPS scheme against the cloud server or the ADL server is analyzed, which is subject to the HBC model. We also simply analyzed the CPPS scheme against malicious attackers. Specific analysis is as follows:
Simulation experiment
The “business” and “review” data in the Yelp dataset are used to analyze the CPPS scheme. Some “business” data and all “review” data related to these “buiness” data are selected randomly as the experiment dataset.
The entire experiment is implemented with a 2.6GHz Intel (R) Core (TM) i7-6700HQ CPU, Windows 10 operating system with RAM of 16GB. The Matlab R2016b is used to implement the simulation code, with OriginPro 2017 being used to simulate the experimental data.
Conclusion
With the advent of the era of big data, information overload and privacy protection issues have received more and more attention. Through introducing the ADL server, this paper proposes the CPPS scheme that allows users to have cooperative personalized search. Merging the user’s query and distributing the search results returned by the cloud server can effectively alleviate the performance bottleneck of the cloud server, so as to better fit a resource-saving cloud environment. Meanwhile, the
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China under Grants 61632009 & 61472451, in part by the Guangdong Provincial Natural Science Foundation under Grant 2017A030308006, and High-Level Talents Program of Higher Education in Guangdong Province under Grant 2016ZJ01, the Fundamental Research Funds for the Central Universities of Central South University under Grant Numbers 2017zzts141.
References (42)
- et al.
Fuzzy rule based profiling approach for enterprise information seeking and retrieval
Inf. Sci.
(2017) - et al.
Akser: attribute-based keyword search with efficient revocation in cloud computing
Inf. Sci.
(2018) - et al.
Multimodal retrieval using mutual information based textual query reformulation
Expert Syst. Appl.
(2017) - et al.
A technique to circumvent ssl/tls validations on ios devices
Future Gener. Comput. Syst.
(2017) - et al.
Enabling personalized search over encrypted outsourced data with efficiency improvement
IEEE Trans. Parallel Distrib. Syst.
(2016) - et al.
Privacy-preserving multi-hop profile-matching protocol for proximity mobile social networks
Future Gener. Comput. Syst.
(2017) - et al.
Hierarchical multi-authority and attribute-based encryption friend discovery scheme in mobile social networks
IEEE Commun. Lett.
(2016) - et al.
Enabling verifiable multiple keywords search over encrypted cloud data
Inf. Sci.
(2018) - et al.
Private searching on streaming data
Annual International Cryptology Conference
(2005) - et al.
Collaborative trajectory privacy preserving scheme in location-based services
Inf. Sci.
(2017)
An overview of fog computing and its security issues
Concurrency Comput.
Expressive query over outsourced encrypted data
Inf. Sci.
Prms: a personalized mobile search over encrypted outsourced data
IEEE Access
Attribute-based encryption with personalized search
2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)
Search engine optimization: what drives organic traffic to retail sites?
J. Econ. Manage. Strategy
Public key encryption that allows pir queries
Annual International Cryptology Conference
Privacy-preserving multi-keyword ranked search over encrypted cloud data
IEEE Trans. Parallel Distrib. Syst.
Privacy preserving keyword searches on remote encrypted data
International Conference on Applied Cryptography and Network Security
Private information retrieval
Proceedings 36th Annual Symposium on Foundations of Computer Science
Searchable symmetric encryption: improved definitions and efficient constructions
J. Comput. Secur.
Cited by (18)
Preserving privacy while revealing thumbnail for content-based encrypted image retrieval in the cloud
2022, Information SciencesDMSE: Dynamic Multi-keyword Search Encryption based on inverted index
2021, Journal of Systems ArchitectureCitation Excerpt :Next, we find the OABKS [18] and DMSE schemes can be applied in a multi-DO/multi-DU scenario, i.e., a more practical application scenario. However, the DOAS [7], ECPPS [10], MRSMS [11] and IIMP [13] only consider single DO or single DU application scenario, which will greatly limit the scalability of a KSE scheme. Besides, only our scheme takes into account keyword updating among above six schemes.
Efficient personalized search over encrypted data for mobile edge-assisted cloud storage
2021, Computer CommunicationsPrivacy preserving and data transpiration in multiple cloud using secure and robust data access management algorithm
2021, Microprocessors and MicrosystemsCitation Excerpt :It gains by Fully Homomorphic Encryption (FHE), which empowers calculations on private health information without actually observing the basic information by Kocabas et al. [1]. Zhang et al. [2] expressed public auditing scheme with identity security for cloud storage. Here, single client, including a group manager, can't know the signer's identity.
An Edge Computing-enhanced Internet of Things Framework for Privacy-preserving in Smart City
2020, Computers and Electrical Engineering