Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges

doi:10.1016/j.comcom.2019.04.011

Computer Communications

Volumes 140–141, May 2019, Pages 38-60

https://doi.org/10.1016/j.comcom.2019.04.011 Get rights and content

Abstract

The increasing volume of personal and sensitive data being harvested by data controllers makes it increasingly necessary to use the cloud not just to store the data, but also to process them on cloud premises. However, security concerns on frequent data breaches, together with recently upgraded legal data protection requirements (like the European Union’s General Data Protection Regulation), advise against outsourcing unprotected sensitive data to public clouds. To tackle this issue, this survey covers technologies that allow privacy-aware outsourcing of storage and processing of sensitive data to public clouds. Specifically and as a novelty, we review masking methods for outsourced data based on data splitting and anonymization, in addition to cryptographic methods covered in other surveys. We then compare these methods in terms of operations supported on the masked outsourced data, overhead, accuracy preservation, and impact on data management. Furthermore, we list several research projects and available products that have materialized some of the surveyed solutions. Finally, we identify outstanding research challenges.

Introduction

Many companies are outsourcing at least some of their information technology to the cloud, from mere data storage to e-mail and other productivity applications. Reduced costs, no need for maintenance, virtually unlimited computational resources and increased availability are the main forces driving this change. Yet, security and privacy misgivings are still cardinal barriers hindering a franker migration to the cloud.

Security is defined as achieving confidentiality, integrity and availability of the data outsourced to the cloud. Users want to be assured that no intruder can hack the cloud and/or impersonate them to steal or alter their sensitive data, and that no denial of service will occur. In the E.U., 57% of large enterprises using the cloud reported the risk of a security breach as the main limiting factor in the use of cloud computing services [1]; in a survey by the cloud Security Alliance to over 165 information technology and security professionals in the U.S., most of the respondents considered cloud storage as high risk [2]; the European Network and Information Security Agency identified “loss of governance” over the data outsourced to the cloud as a critically deterring factor [3]. Security breaches are, in fact, very real threats. Some well-known examples include the Sony PlayStation Network outage¹ as a result of an external intrusion, in which personal details from approximately 77 million accounts were stolen, the multi-day outage in Dropbox² that temporarily allowed visitors to log into any of its 25 million customer accounts as a result of a misconfiguration, or the leakage of private pictures of a number of celebrities from the Apple iCloud storage service due to weakly protected login credentials.³

Regarding privacy, its most widely accepted definition in the information society is in terms of informational self-determination, that is, “the claim of individuals, groups or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” [4]. Hence, for a cloud user to store and/or process sensitive data in the cloud, she needs the guarantee that no one other than herself – not even the CSP – will be able to see or infer her data. Thus, cloud computing needs to increase the user’s control on her data, which will decrease the need for users blindly trusting the CSP. Otherwise, a user might be reluctant to outsource sensitive data to the cloud.

Furthermore, when data are personal, the individuals to whom the data refer – a.k.a. subjects – have privacy rights that have recently been enshrined in the new European Union’s General Data Protection Regulation (GDPR⁴ ). To stay GDPR-compliant, a controller – an entity that has obtained consent from subjects to collect, store and process their data – can only outsource subject data to the cloud if she can obtain full control and confidentiality for the outsourced data. The above is a relevant issue because GDPR is also becoming a de facto legal standard outside the European Union, specifically in the USA, Australia, Canada and Japan, and any company wishing to sell information technology solutions to those markets must take it into account.

Notice that privacy is even more challenging than security, because it must hold also with respect to (public and, therefore, untrusted) CSPs. In this respect, the cloud has given CSPs the opportunity to analyze and exploit large amounts of personal data. In fact, a report by the U.S. Federal Trade Commission [5] states that public CSPs regularly collect and analyze the data of their users without the latter’s knowledge, and that those analyses could yield sensitive inferences; for example, a CSP could detect individuals that suffer from diabetes because of their interest in sugar-free products and share this information with an insurance company that could use that clue to classify a person as higher-risk (and possibly higher-premium).

One might argue that sensitive data handling in the cloud would be much simpler if the CSP could be assumed to be trusted. However, there are several legal issues here. On the one hand, in many scenarios the data subjects entrust the data controller with their personal data (for example, healthcare data), but this does not mean they allow the controller to further transfer their data to whomever the controller chooses to trust. On the other hand, the CSP may be under a jurisdiction different from the controller’s. If, say, the CSP is under U.S. law whereas controller and subjects are under E.U. law, the latter law may be violated. Finally, many public CSPs offer their services free of charge in return for the possibility of monetizing users’ data. For example, a recent privacy policy in Google⁵ specifies that whatever information a user decides to outsource to any Google service can be used, reproduced, modified or distributed by Google with the aim of improving or promoting its services, but also to conduct targeted advertising (e.g., the Gmail filtering system scans the content of our emails to serve personalized ads).

To assuage the above issues and restore the user’s control and trust on the protection of the data outsourced to the cloud, several solutions have been proposed in recent years. All of them involve masking sensitive data so that only protected values are stored in the cloud and only the user/controller owning the data is able to unmask the protected values retrieved from the cloud. However, if the user wants to use not only the cloud’s storage but also the cloud’s computational power, the challenge is even harder, because data protection should be made compatible with outsourced computations on cloud premises on masked data.

In this paper we survey the state of the art on security and privacy-enabling solutions towards the cloud, with a focus on those that preserve cloud service functionalities, such as the ability to outsource queries and calculations on protected data to the cloud. In comparison with recent surveys on this area, ours offers the following contributions:

•
Most surveys focus on data security vs external attackers [6], [7], [8], [9] rather than on privacy versus the cloud. Therefore, they center their analysis on security attacks and on mechanisms to prevent, detect and mitigate them. In contrast, our survey considers mechanisms that protect outsourced data not only against third-party attackers, but also against insiders and honest-but-curious clouds ⁶ storing and managing such data.
•
Many surveys concentrate on outsourced data storage [8], [9], [10]. Our paper goes a step beyond and puts the spotlight on the preservation of cloud service functionalities (e.g., queries, calculations, etc.) on the protected data outsourced to the cloud. Taking advantage of the cloud’s computing power on protected data is significantly more challenging than merely using the cloud to store protected data.
•
All surveys covering privacy-enabling service preservation mechanisms are limited to cryptographic solutions [11], [12]. Even though these methods are powerful to secure data, they also result in significant calculation overheads (which partly neutralize the cost-saving benefits of cloud computing), they require key management, and they severely limit cloud functionalities on the outsourced data because they need tailored solutions for each type of outsourced calculation [12]. To be self-contained, we also discuss cryptographic solutions but, for the first time, we comprehensively survey non-cryptographic methods (based on data splitting and anonymization) that can be used to efficiently protect data outsourced to the cloud while preserving a variety of cloud services.
•
In addition to the security and privacy-enabling solutions proposed in the literature, we survey research projects and products that implement some of these methods in the cloud scenario and discuss the outsourced functionalities they support.

In Section 2 we characterize the scenario we consider by identifying the actors and the entities involved. A common feature of privacy-enabling technologies towards the cloud is the use of a local proxy by the controller to mask the data before outsourcing them to the cloud. In this section we also present this general architecture and the assumptions and security models on which available solutions rely. We conclude the section with the identification and the description of the requirements that privacy-enabling technologies should ideally fulfill. In Section 3 we review data protection techniques, non-cryptographic or cryptographic, that have been or can be implemented in the aforementioned proxies to enable privacy and security towards the cloud while preserving (at least some of) its functionalities. In Section 4 we present a critical comparison of the techniques covered in Section 3 with respect to the requirements identified in Section 2. In Section 5 we survey research projects and products that offer security- and privacy-enabling solutions towards the cloud, and classify them according to the type of data protection technique they employ. Finally, in Section 6 we gather the conclusions and several research challenges derived from the gaps identified in the survey.

Section snippets

Data protection towards the cloud: assumptions and requirements

Data protection towards untrusted clouds is usually accomplished by deploying a trusted local proxy. This proxy is a logical entity that can be located on the client side (for instance, the user’s personal computer or smartphone) or, at least, in a location trusted by the user (such as a server or a router within the user’s corporate intranet) [13]. Proxies can be implemented as additional modules or services for existing applications (with dedicated APIs), as browser plugins (that intercept

Data protection techniques

As discussed in the previous section, local proxies may use different techniques to protect the user’s data before outsourcing them to the cloud. Due to the variety of data protection scenarios, the research community has proposed a plethora of masking techniques. Although this diversity may seem bewildering, it also has the advantage of making it possible to cope with heterogeneous privacy and functionality needs.

This section surveys the state of the art on functionality-preserving data

Comparison of data protection techniques

In Table 4 we compare the types of protection methods surveyed above in terms of the requirements identified in Section 2: supported operations, overhead at the local proxy, preservation of the accuracy of the original data, transparency and interoperability. Operations on protected data not listed in the table may still be supported but at a prohibitive cost (e.g. to update searchable encrypted data, the whole data set should be downloaded, decrypted, updated, encrypted and re-uploaded). In

Research projects and products on privacy-aware computation outsourcing

In this section we review research projects and products that materialize some of the data protection techniques described above in the cloud scenario.

Conclusions and research challenges

In this survey, we have presented in a systematic way technologies that allow privacy-aware outsourcing of storage and processing of sensitive data to the cloud. This topic is especially hot given the coincidence of two phenomena: first, the sheer volume of personal or otherwise sensitive data being collected makes it increasingly necessary not only to store them in the cloud, but also to process them there; second, the upgrade of personal data protection that the new European General Data

Disclaimer and acknowledgments

This work was partly supported by the European Commission, Brussels (Belgium) (project H2020-700540 “CANVAS”), the Government of Catalonia, Spain (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2017 SGR 705) and the Spanish Government (projects TIN2014-57364-C2-1-R “SmartGlacis”, TIN2015-70054-REDC and TIN2016-80250-R “Sec-MCloud”). While the authors are with the UNESCO Chair in Data Privacy, the opinions expressed in this paper are the authors’ own and do not necessarily reflect the

References (183)

KhanM.A.
A survey of security issues for cloud computing
J. Netw. Comput. Appl.
(2016)
SinghS. et al.
A survey on cloud computing security: Issues, threats, and solutions
J. Netw. Comput. Appl.
(2016)
SinghA. et al.
Cloud security issues and challenges: A survey
J. Netw. Comput. Appl.
(2017)
KaanicheN. et al.
Data security and privacy preservation in cloud storage environments based on cryptographic mechanisms
Comput. Commun.
(2017)
SánchezD. et al.
Privacy-preserving data outsourcing in the cloud via semantic data splitting
Comput. Commun.
(2017)
Domingo-FerrerJ. et al.
Outsourcing scalar products and matrix products on privacy-protected unencrypted data stored in untrusted clouds
Inform. Sci.
(2018)
YangJ. et al.
A hybrid solution for privacy preserving medical data sharing in the cloud environment
Future Gener. Comput. Syst.
(2015)
MartínezS. et al.
Privacy protection of textual attributes through a semantic-based masking method
Inf. Fusion
(2012)
Rodríguez-GarciaM. et al.
A semantic framework for noise addition with nominal data
Knowl.-Based Syst.
(2017)
Rodriguez-GarciaM. et al.
Utility-preserving privacy protection of nominal data sets via semantic rank
Inf. Fusion
(2019)

MartínezS. et al.

Semantic adaptive microaggregation of categorical microdata

Comput. Secur.

(2012)

ZhangJ. et al.

PRivBayes: private data release via Bayesian networks

ACM Trans. Database Syst.

(2017)

SánchezD. et al.

Utility-preserving differentially private data releases via individual ranking microaggregation

Inf. Fusion

(2016)

RheeH.S. et al.

Trapdoor security in a searchable public-key encryption scheme with a designated tester

J. Syst. Softw.

(2010)

Eurostat, Cloud computing - statistics on the use by enterprises, (Dec. 2016 (Accessed 14.02.19)). URL...

C.S. Alliance, Cloud usage: Risks and opportunities report (Sep. 2014 (Accessed 14.02.19)). URL...

HaeberlenT. et al.

Cloud Computing. Benefits, Risks and Recommendations for Information Security (Rev. B)

(2012)

WestinA.

Privacy and Freedom

(1967)

RamirezE. et al.

Data Brokers: A Call for Transparency and Accountability

(2014)

Praveen-KumarP. et al.

Attribute based encryption in cloud computing: A survey, gap analysis, and future directions

J. Netw. Comput. Appl.

(2018)

TangJ. et al.

Ensuring security and privacy preservation for cloud data services

ACM Comput. Surv.

(2016)

ShanZ. et al.

Practical secure computation outsourcing: A survey

ACM Comput. Surv.

(2018)

CaoN. et al.

Privacy-preserving multi-keyword ranked search over encrypted cloud data

IEEE Trans. Parallel Distrib. Syst.

(2014)

M. Rouse, What is a multi-cloud strategy?, WhatIs.com (Jul. 2014). URL...

AggarwalG. et al.

Two can keep a secret: A distributed architecture for secure database services

WeiZ. et al.

Data privacy protection using multiple cloud storages

DevH. et al.

An approach to protect the privacy of cloud data from data mining based attacks

AliM. et al.

DROPS: Division and replication of data in cloud for optimal performance and security

IEEE Trans. Cloud Comput.

(2018)

GaiK. et al.

Security-aware efficient mass distributed storage approach for cloud systems in big data

AlqahtaniH. et al.

A multi-cloud approach for secure data storage on smart device

CirianiV. et al.

Fragmentation and encryption to enforce privacy in data storage

GanapathyV. et al.

Distributing data for secure database services

Trans. Data Priv.

(2012)

CirianiV. et al.

Selective data outsourcing for enforcing privacy

J. Comp. Sec.

(2011)

SánchezD. et al.

C-sanitized: A privacy model for document redaction and sanitization

J. Assoc. Inf. Sci. Technol.

(2016)

GoethalsB. et al.

On private scalar product computation for privacy-preserving data mining

KarrA.F. et al.

Privacy-preserving analysis of vertically partitioned data using secure matrix products

J. Off. Stat.

(2009)

HundepoolA. et al.

Statistical Disclosure Control

(2012)

ElliotM. et al.

The Future of Statistical Disclosure Control, Tech. Rep.

(2018)

FungB.C.M. et al.

Privacy-preserving data publishing: A survey of recent developments

ACM Comput. Surv.

(2010)

SinghN. et al.

Data privacy protection mechanisms in cloud

Data Sci. Eng.

(2018)

Domingo-FerrerJ. et al.

Ordinal, continuous and heterogeneous k-anonymity through microaggregation

Data Min. Knowl. Discov.

(2005)

R.A. Moore, Controlled data swapping techniques for masking public use microdata sets, U.S. Bureau of the Census...

Domingo-FerrerJ. et al.

SkinnerC. et al.

Disclosure avoidance for census microdata in great britain

SamaratiP. et al.

Protecting Privacy when Disclosing Information: $k$ -Anonymity and its Enforcement through Generalization and Suppression, Tech. Rep.

(1998)

BrandR.

Microdata protection through noise addition

TorraV.

Rank swapping for partial orders and continuous variables

MuralidharK. et al.

Reverse mapping to preserve the marginal distributions of attributes in masked microdata

Domingo-FerrerJ. et al.

Practical data-oriented microaggregation for statistical disclosure control

IEEE Trans. Knowl. Data Eng.

(2002)

DefaysD. et al.

Panels of enterprises and confidentiality: the small aggregates method

Cited by (122)

A comprehensive survey and taxonomy on privacy-preserving deep learning
2024, Neurocomputing
Deep learning (DL) has been shown to be very effective for many application domains of machine learning (ML), including image classification, voice recognition, natural language processing, and bioinformatics. The success of DL techniques is directly related to the availability of large amounts of training data. However, in many cases, the data are sensitive to the users and should be protected to preserve the privacy. Privacy-preserving deep learning (PPDL) has thus become a very active research field to ensure the training process and use of DL models are productive without exposing or leaking information about the data.
This paper aims to provide a comprehensive survey of PPDL. We concentrate on the risks that affect data privacy in DL and conduct a detailed investigation into the models that ensure privacy. Finally, we propose a set of evaluation criteria, detailing the advantages and disadvantages of the solutions. Based on the analyzed strengths and weaknesses, the paper has highlighted some important research problems and application cases that have not been studied and these point to certain open research directions.
Data integrity aware system for executing dynamic operations on outsourced cloud data
2024, Measurement: Sensors
The cloud computing platform of today handles a greater amount of real-time data, which requires dynamic data operations. The majority of the currently available cloud techniques, however, do not concentrate on dynamic data operations. Hence, this work provides a system that enables dynamic data operations to be carried out on a cloud computing platform, such as the insertion, deletion, and updating of data. The performance of the work is tested in terms of time consumption with respect to data storage, retrieval, data block insertion and deletion operations. The analysis is carried out by altering the data volume and the performance of the work is satisfactory with reasonable time consumption.
Efficient and privacy-preserving similar electronic medical records query for large-scale ehealthcare systems
2024, Computer Standards and Interfaces
The advancements and adoption of cloud-assisted ehealthcare systems have enabled the storage of massive electronic medical records (EMRs) in the cloud for efficient and easy access. A direct benefit of EMRs is the ability of patients to search for EMRs that are similar to their own in the cloud for use as references. These similar EMRs can help a patient find appropriate medical services quickly. However, for large-scale ehealthcare systems, challenges remain with respect to ensuring the efficiency and privacy of these queries. In this study, we construct an efficient and privacy-preserving similar EMR query scheme to help patients find similar EMRs to reference in a large-scale ehealthcare system. Specifically, we propose a coarse-grained query method based on a binary decision tree to find a set of EMRs corresponding to the patient’s set of medical-symptom keywords. We also design a fine-grained query method to find similar EMRs that meet the threshold set by the patient. A detailed security analysis shows that the proposed scheme is secure. The efficiency of the proposed method in a large-scale ehealthcare system is verified experimentally.
Modeling of blockchain with encryption based secure education record management system
2023, Egyptian Informatics Journal
Blockchain technology can be employed in the education sector by building a decentralized system to store and share student records. The records can be encrypted to guarantee their confidentiality and security. With a blockchain-based system, student records can be saved in blocks that are linked and secured via cryptography. The records are decentralized and not controlled by any single entity, making them less susceptible to hacking or tampering. By using blockchain technology, educational institutions can create a more secure and efficient system for storing and sharing student records. This can streamline the process of transferring records between schools, and provide a secure and transparent way for students to access their own records. In this study, we provide a novel Merkle tree-based strategy for preserving the accuracy of student records and outline how to put it into practice. The software architecture resembled blockchain technology and was developed for private network deployment. The key components of our strategy are replacing conventional audit trails with their cryptographically secure equivalent and simplifying the Blockchain framework by avoiding mining. The cryptography system's framework is presented, and the new five dimensions of chaotic map academic records are proposed. Our study utilizes deoxyribonucleic acid (DNA) sequences and operations and the chaotic system to strengthen the cryptosystem in the blockchain authentication and authorization process. The significant advantage of this method is enhancing the generation of the hash function, which is the most critical challenge in the blockchain concept. The experimental outcomes and security analysis demonstrated that the proposed method works well in terms of different aspects. The suggested hash function's hash value distribution, sensitivity to tiny message modifications, confusion and diffusion qualities, resilience against birthday attacks, keyspace analysis, collision resistance, efficiency, and flexibility were all considered throughout the study.
OWL: A data sharing scheme with controllable anonymity and integrity for group users
2023, Computer Communications
As our society becomes digital and communication technology develops, global collaboration is an inevitable trend, where data-sharing is a critical component of cooperation across organizations. Identity privacy and data integrity are vital issues in data-sharing. Existing works struggle to address these problems simultaneously, either privacy leaking or privacy abuse. In this work, we proposed OWL, a data-sharing scheme that (1) provides users on-demand anonymity and (2) allows users to verify data integrity while preserving anonymity. To achieve (1) OWL enables controllable anonymity that allows de-anonymity for the malicious while keeping anonymity to the honest providers based on traceable ring signature technology. To achieve (2), OWL designs a data integrity auditing scheme that uses vector commitment to verify data integrity without privacy leakage. Furthermore, OWL employs the blockchain to store immutable auxiliary information for the integrity and controllable anonymity. We also employ the state channel to resolve the performance bottleneck of blockchain and design methods to improve the usage of the state channel for group users. We prove that OWL achieves controllable anonymity and integrity. Finally, we implement the experiment to evaluate the efficiency of OWL.
Optimizing Security and Quality of Service for End-to-End Cryptographic Access Control in IoT Applications
2024, SSRN

View all citing articles on Scopus

View full text

Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges

Abstract

Introduction

Section snippets

Data protection towards the cloud: assumptions and requirements

Data protection techniques

Comparison of data protection techniques

Research projects and products on privacy-aware computation outsourcing

Conclusions and research challenges

Disclaimer and acknowledgments

J. Netw. Comput. Appl.

J. Netw. Comput. Appl.

J. Netw. Comput. Appl.

Comput. Commun.

Comput. Commun.

Inform. Sci.

Future Gener. Comput. Syst.

Inf. Fusion

Knowl.-Based Syst.

Inf. Fusion

Comput. Secur.

ACM Trans. Database Syst.

Inf. Fusion

J. Syst. Softw.

Cloud Computing. Benefits, Risks and Recommendations for Information Security (Rev. B)

Privacy and Freedom

Data Brokers: A Call for Transparency and Accountability

Attribute based encryption in cloud computing: A survey, gap analysis, and future directions

J. Netw. Comput. Appl.

Ensuring security and privacy preservation for cloud data services

ACM Comput. Surv.

Practical secure computation outsourcing: A survey

ACM Comput. Surv.

Privacy-preserving multi-keyword ranked search over encrypted cloud data

IEEE Trans. Parallel Distrib. Syst.

Two can keep a secret: A distributed architecture for secure database services

Data privacy protection using multiple cloud storages

An approach to protect the privacy of cloud data from data mining based attacks

DROPS: Division and replication of data in cloud for optimal performance and security

IEEE Trans. Cloud Comput.

Security-aware efficient mass distributed storage approach for cloud systems in big data

A multi-cloud approach for secure data storage on smart device

Fragmentation and encryption to enforce privacy in data storage

Distributing data for secure database services

Trans. Data Priv.

Selective data outsourcing for enforcing privacy

J. Comp. Sec.

C-sanitized: A privacy model for document redaction and sanitization

J. Assoc. Inf. Sci. Technol.

On private scalar product computation for privacy-preserving data mining

Privacy-preserving analysis of vertically partitioned data using secure matrix products

J. Off. Stat.

Statistical Disclosure Control

The Future of Statistical Disclosure Control, Tech. Rep.

Privacy-preserving data publishing: A survey of recent developments

ACM Comput. Surv.

Data privacy protection mechanisms in cloud

Data Sci. Eng.

Ordinal, continuous and heterogeneous k-anonymity through microaggregation

Data Min. Knowl. Discov.

Disclosure avoidance for census microdata in great britain

Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement through Generalization and Suppression, Tech. Rep.

Microdata protection through noise addition

Rank swapping for partial orders and continuous variables

Reverse mapping to preserve the marginal distributions of attributes in masked microdata

Practical data-oriented microaggregation for statistical disclosure control

IEEE Trans. Knowl. Data Eng.

Panels of enterprises and confidentiality: the small aggregates method

Protecting Privacy when Disclosing Information: $k$ -Anonymity and its Enforcement through Generalization and Suppression, Tech. Rep.