Secure and efficient outsourcing differential privacy data release scheme in Cyber–physical system
Introduction
While approaching the digital age, increasingly more data are being produced by physical device. A cyber–physical system (CPS) is designed to improve the linkage between physical device and computing network. CPS is a mechanism controlled or monitored by computer-based algorithms, tightly integrated with the internet and its users. In cyber–physical systems, physical and software components are deeply intertwined. Some physical devices may wish to share their sensitive data with data evaluators in CPS to increase the value of their data. However, the physical devices do not want their property to be directly exposed to data evaluators. Therefore, physical devices desire a technique that ensures the privacy of their sensitive data while retaining the ability to perform different types of tasks, such as statistical analyses, classifications and predictions.
Encryption is a well-established technology for protecting sensitive data. However, once encrypted, data can no longer be easily queried, except from exact matches. Although we can perform queries over data encrypted using homomorphic encryption, the enormous ciphertext makes applying this theory difficult. To make encryption more practical, different types of encryption schemes have been proposed, such as additive homomorphic encryption, function encryption and order-preserving encryption (OPE). OPE allows any comparison operation to be directly applied on encrypted data to obtain the result, which can be used for sorting, range queries, ranking, and so forth. However, OPE has its shortcomings in that when an adversary has sufficient background knowledge and the ability to ask any range query, personal privacy cannot be guaranteed.
Differential privacy (DP) is a promising scheme known for unconditional privacy guarantees with the advantage that it makes no assumption regarding the attacker’s background knowledge. When releasing the results of statistical queries on sensitive data, including range queries, DP ensures that with or without any particular record in the dataset, the outcomes of computations are formally indistinguishable. In general, DP provides an interactive interface to data evaluators (for the non-interactive data release model, finding efficient algorithms for many domains remains an open problem, which we will not discuss in this paper). However, the interactive model requires a server that can always answer data evaluators’ queries in time. Data providers cannot be offline if they hold their sensitive data tightly. Taking into account the risk of data providers staying online for a long time and the simplicity of cloud computing, data providers consider shifting their sensitive data to a cloud service provider (CSP), which makes the CSP the host for answering each query from data evaluators. However, the CSP may not always be reliable and trustworthy. The privacy of sensitive data becomes uncertain as soon as the CSP has been compromised.
To overcome this problem, we propose the concept of our outsourcing differentially private data release. Due to its higher storage and computing abilities, our scheme also makes a cloud server the host for answering data providers’ queries. However, we combine cloud computing and OPE rather than using only one of them to protect the privacy of data providers. To prevent the cloud server and a malicious data evaluator from colluding, we use another cloud server to transform the queries being collected from the data evaluator in our scheme. Noise should be added to each answer to satisfy DP by the cloud server, which stores the encrypted data and executes the search operation. In this way, we can retain the privacy of data providers’ data. Moreover, the data providers are not required to be online when the range query is presented, which is the most significant property for a practical system in our scheme.
Our work is based on several studies. For a better understanding, we will summarize the relevant research areas for the readers. We describe the related works in the following:
OPE was first proposed by Agrawal et al. to solve range queries over encrypted numeric data [1]. OPE is primarily used in databases for processing SQL queries over encrypted data [[2], [3], [4]]. Additionally, OPE can be used for range queries; Wang et al. recently used OPE for securely calculating K-NN [5]. The ideal security goal for an order-preserving scheme, IND-OCPA [6], is to reveal no additional information about the plaintext values other than their order (which is the minimum information needed for the order-preserving property). Popa proposed the first ideal security protocol for OPE encoding [7]. Subsequently, Kerschbaum presented the truly OPE scheme [8]. The idea of the latest work [9] that outsourcing images in mobile cloud computing environment by using OPE is very enlightening to our works.
DP has been accepted as the main privacy paradigm in recent years because it is based on purely mathematical calculations and provides a means for quantitative assessment. Ten years have passed since the DP concept was first proposed in 2006 by Dwork [10] as a new technology for privacy protection. The Laplace mechanism, which is an effective way to achieve -DP, was also proposed by Dwork in the same paper. Subsequently, many works on DP have been reported. McSherry identified the sequence/parallel properties of DP and the exponent mechanism for adding noise to achieve DP [11].
DP can be achieved by answering a data evaluator’s queries under a predetermined privacy budget, ; see Fig. 1. In this framework, different types of techniques, such as the Laplace [12], Privlet [13], linear query [14] and batch query techniques [15], can be applied to generate different responses to satisfy -DP. The main research area on this framework is designing an effective noise-adding or post-processing algorithm that not only satisfies DP but can also enhance the accuracy of noisy results under the same privacy budget. We begin our research from the problem of a differentially private histogram release [10]. A histogram is a disjoint partitioning of the database points with the number of points that fall into each partition. From a certain perspective, answering range queries can be viewed as the histogram release. When the dimension of the data is low, we can simply add noise to each histogram (add noise to each range query) to satisfy DP. However, as the dimension of the data increases, directly adding noise will lead to too much noise, thus making the result useless. To address this problem, Xiao proposed a histogram release scheme based on k–d tree to reduce the dimension of the data [16], which requires the original dataset for constructing the k–d tree. In our scheme, the cloud server only has the encrypted data. Thus, we have to consider another way to enhance accuracy because of the poor utility of encrypted data. Xu proposed a scheme to address the noisy histogram based on SSE/SAE to reduce noise [17], but post-processing the noisy result requires the original response, which should not be held by the post-processing server in our scheme. Fortunately, in 2010, Hay proposed a scheme [18] that can increase the accuracy of differentially private histograms through consistency. This method uses a tree to represent a query sequence, and each node in the tree represents a range query. The constraints between queries are encoded among the tree, which can be stored in a cloud server and checked in ciphertext form. Our scheme is primarily inspired by Hay’s work.
Outsourcing computation is a technique for securely outsourcing expensive computations to untrusted servers, which allows resource-constrained data providers to outsource their computational workloads to cloud servers that have unlimited computational resources. Chaum first proposed the concept of wallets, which are secure hardware installed on a client’s computer to perform expensive computations, in 1992 [19]. To protect data providers’ privacy, Chevallier presented the first algorithm for the secure delegation of elliptic-curve pairings [20]. In addition, solutions for performing meaningful computations over encrypted data using fully homomorphic encryption have emerged, although they are known to be of poor practicality [21]. To improve the utility of encrypted data, many different works have been proposed [[22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36]]. As our basis for performing outsourced computing, we selected OPE, which offers the ciphertext comparison function and a perfect balance of efficiency and security [[37], [38], [39]].
In this paper, we propose a new scheme for outsourcing DP in cloud computing. Our contributions can be summarized as follows.
We design a secure and efficient outsourcing DP approach with OPE rather than the general solution using homomorphic encryption. In this way, the data are outsourced to a CSP for secure storage, and the CSP has the ability to answer some questions in a proper privacy budget.
The data providers can be offline after uploading the data.
The data security can be guaranteed against the CSP.
The remainder of this paper is organized as follows. We present some preliminaries in Section 2. In Section 3, our scheme architecture and threat model are introduced. Then, we describe the problem statement and present the details of the scheme in Section 4. In Section 5, we analyze the security of the proposed scheme. Then, the implementation details and the evaluation of the experimental results are presented in Section 6. Finally, we conclude our work in Section 7.
Section snippets
Differential privacy
DP was introduced by Dwork et al. as a technique for protecting individual privacy in a privacy-preserving data release. DP provides a strong privacy guarantee, ensuring that the presence or absence of an individual will not significantly affect the final output of any function.
Definition 1 Differential Privacy A randomized function with a well-defined probability density satisfies -DP if, for any two neighboring datasets and that differ by only one record and for any ,
DP can
System
Four entities are involved in our scheme, namely, data provider (physical sensor), CSP A, CSP B and data evaluator (software in CPS). Data providers have data and would like to share and contribute the data for utilization, including classification or data analysis. Cloud A provides the cloud storage service for data providers. Cloud B provides the conversion operation (for example, transform plaintext into ciphertext) and post-processing. The data evaluator will send some range queries to
Our outsourcing differential privacy scheme
In this section, we present our outsourcing DP scheme. In general, we consider the following scenario.
A data provider has private data that they do not want to be exposed to the public. The data provider wants find some knowledge from this vast amount of data. However, due to their poor calculation ability or calculation algorithms, the data provider cannot obtain the appropriate result.
Analysis
In this section, we will analyze the security of our scheme before proving the correctness of our technique. We first introduce some theorems:
Theorem 3 Any post-processing of the answers cannot diminish this rigorous privacy guarantee.[18]
Theorem 4 Given the noisy sequence res, the unique minimum solution that also meets all constraints can be obtained using the following recurrence relation. Let t be v’s parent: [18]
With these theorems, we can
Evaluation
In this section, we evaluate the performance of our scheme. All experiments are conducted on a PC equipped with an AMD A4-3300M APU with a Radeon(tm) HD Graphics 1.90 GHz and 6 GB of RAM.
We will present an example of our scheme before we present our evaluation.
The original dataset is {10,9,5,3,2,8,5,7,4,1,0,2,12,15,7,2}, which means that we have 10 items between [0,1], have 9 items between [1,2] and so on. The queries are {q[2,5],q[2,8],q[3,5],q[3,8]}, (here, note that the queries have the
Conclusions
In this paper, we introduce the problem that data providers holding data and answering thousands of queries under DP is inconvenient and vulnerable to attack. To solve this problem, we propose our secure and efficient outsourcing DP releasing scheme, where the data providers communication costs are reduced and the data providers are not required to be online. In future work, we will consider more different types of post-processing algorithms, which can offer better accuracy under the same
Acknowledgment
The work reported in this paper was supported in part by National Natural Science Foundation of China , under Grant 61672092, 61472091 and 61722203.
Heng Ye is a Ph.D. candidate at Beijing Jiaotong University since 2016. His research interests include differential privacy , attribute based encryption and IOT.
References (41)
- et al.
Secure attribute-based data sharing for resource-limited users in cloud computing
Comput. Secur.
(2018) - et al.
Multi-key privacy-preserving deep learning in cloud computing
Future Gener. Comput. Syst.
(2017) - et al.
Insight of the protection for data security under selective opening attacks
Inform. Sci.
(2017) - et al.
Identity-based chameleon hashing and signatures without key exposure
Inform. Sci.
(2014) - R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, Order preserving encryption for numeric data (2004)...
- T. Ge, S. Zdonik, Fast, secure encryption for indexing in a column-oriented DBMS (2007)...
- et al.
Chaotic order preserving encryption for efficient and secure queries on databases
IEICE Trans. Inform. Syst.
(2009) - H.K. Toshiyuki Amagasa, Hasan Kadhem, A secure and efficient order preserving encryption scheme for relational...
- B. Wang, Y. Hou, M. Li, Practical and secure nearest neighbor search on encrypted large-scale data (2016)...
- A. Boldyreva, N. Chenette, Y. Lee, A. Oneill, Order-Preserving symmetric encryption (2009)...
Homomorphic encryption as a service for outsourced images in mobile cloud computing environment
Int. J. Cloud Appl. Comput.
Privacy integrated queries: an extensible platform for privacy-preserving data analysis
Commun. ACM
Differential privacy via wavelet transforms
IEEE Trans. Knowl. Data Eng.
Low-rank mechanism: optimizing batch queries under differential privacy
Proc. Vldb Endowment
Cited by (16)
Differentially private data fusion and deep learning Framework for Cyber–Physical–Social Systems: State-of-the-art and perspectives
2021, Information FusionCitation Excerpt :Differential privacy can easily be quantified theoretically, which offer researchers a clear indication on how much CPSS data can be released safely and with how much accuracy [11,15]. Differential privacy shuffles data through noise insertion such that CPSSs’ real-time or statistical data that has its privacy protected can still be utilized by analysts based on their requirements [56,57]. Differential privacy is capable of providing significant privacy preservation even in distributed CPSS scenario, in contrast to other preservation methods that fail to offer efficient results due to correlation issues existing among attributes [58].
Cyber-physical systems security: Limitations, issues and future trends
2020, Microprocessors and MicrosystemsCitation Excerpt :User’s privacy is achieved using an homomorphic cryptosystem, while computation overheads are offloaded using privacy-preserving tensor protocols. In [313], Ye et al. presented a secure and efficient outsourcing Differential Privacy (DP) scheme to solve data providers issues related to being vulnerable to privacy attacks. In [314], Zhang et al. presented a practical lightweight identity-based proxy-oriented outsourcing with public auditing scheme in cloud-based MCPS, by using elliptic curve cryptography to achieve storage correctness guarantee and proxy-oriented privacy-preserving property.
Securing content-centric networks with content-based encryption
2019, Journal of Network and Computer ApplicationsCitation Excerpt :However, how to distribute the symmetric key is a problem. The traditional public key infrastructure (PKI) as well as the ID-based PKI (Shamir, 1984; Boneh and Franklin, 2003; Gao et al., 2018; Ye et al., 2018; Li et al., 2018a), are not appropriate for securely distributing symmetric keys in the CCN due to the high cost of establishing a secure key channel between the source host and the destination host. In the CCN, we aim to secure the content instead of the channel between the provider and the consumer.
A content and URL analysis-based efficient approach to detect smishing SMS in intelligent systems
2022, International Journal of Intelligent Systems
Heng Ye is a Ph.D. candidate at Beijing Jiaotong University since 2016. His research interests include differential privacy , attribute based encryption and IOT.
Jiqiang Liu received the Ph.D. degree in Beijing Normal University in 1999. He now works at Beijing Jiaotong University as a professor as well as Associate Dean of Graduate School. He is a IEEE fellow and has published more than 100 research papers. His research interests includes Trusted Computing, Privacy Preserving and Security Protocol.
Wei Wang now works at Beijing Jiaotong University as a associate professor. He is a senior member of CCF. His research interests include Android platform security, Web security, cloud computing security, network traffic analysis, detection and modeling, intrusion detection, data mining.
Ping Li received the M.S. and Ph.D. degree in applied mathematics from Sun Yat-sen University in 2011 and 2016, respectively. She is currently a postdoc at the School of Computer Science and Educational Software, Guangzhou University. Her research interests include cryptography, privacy-preserving and cloud computing.
Tong Li received his B.S. and M.S. from Taiyuan University of Technology and Beijing University of Technology, in 2011 and 2014, respectively, both in Computer Science & Technology. Currently, he is a Ph.D. candidate at Nankai University. His research interests include applied cryptography and data privacy protection in cloud computing.
Jin Li received the B.S. degree in mathematics from Southwest University in 2002 and the Ph.D. degree in information security from Sun Yat-sen University in 2007. Currently, he works at Guangzhou University as a professor. He has been selected as one of science and technology new star in Guangdong province. His research interests include applied cryptography and security in cloud computing. He has published more than 70 research papers in refereed international conferences and journals and has served as the program chair or program committee member in many international.