Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges

https://doi.org/10.1016/j.comcom.2019.04.011Get rights and content

Abstract

The increasing volume of personal and sensitive data being harvested by data controllers makes it increasingly necessary to use the cloud not just to store the data, but also to process them on cloud premises. However, security concerns on frequent data breaches, together with recently upgraded legal data protection requirements (like the European Union’s General Data Protection Regulation), advise against outsourcing unprotected sensitive data to public clouds. To tackle this issue, this survey covers technologies that allow privacy-aware outsourcing of storage and processing of sensitive data to public clouds. Specifically and as a novelty, we review masking methods for outsourced data based on data splitting and anonymization, in addition to cryptographic methods covered in other surveys. We then compare these methods in terms of operations supported on the masked outsourced data, overhead, accuracy preservation, and impact on data management. Furthermore, we list several research projects and available products that have materialized some of the surveyed solutions. Finally, we identify outstanding research challenges.

Introduction

Many companies are outsourcing at least some of their information technology to the cloud, from mere data storage to e-mail and other productivity applications. Reduced costs, no need for maintenance, virtually unlimited computational resources and increased availability are the main forces driving this change. Yet, security and privacy misgivings are still cardinal barriers hindering a franker migration to the cloud.

Security is defined as achieving confidentiality, integrity and availability of the data outsourced to the cloud. Users want to be assured that no intruder can hack the cloud and/or impersonate them to steal or alter their sensitive data, and that no denial of service will occur. In the E.U., 57% of large enterprises using the cloud reported the risk of a security breach as the main limiting factor in the use of cloud computing services [1]; in a survey by the cloud Security Alliance to over 165 information technology and security professionals in the U.S., most of the respondents considered cloud storage as high risk [2]; the European Network and Information Security Agency identified “loss of governance” over the data outsourced to the cloud as a critically deterring factor [3]. Security breaches are, in fact, very real threats. Some well-known examples include the Sony PlayStation Network outage1 as a result of an external intrusion, in which personal details from approximately 77 million accounts were stolen, the multi-day outage in Dropbox2 that temporarily allowed visitors to log into any of its 25 million customer accounts as a result of a misconfiguration, or the leakage of private pictures of a number of celebrities from the Apple iCloud storage service due to weakly protected login credentials.3

Regarding privacy, its most widely accepted definition in the information society is in terms of informational self-determination, that is, “the claim of individuals, groups or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” [4]. Hence, for a cloud user to store and/or process sensitive data in the cloud, she needs the guarantee that no one other than herself – not even the CSP – will be able to see or infer her data. Thus, cloud computing needs to increase the user’s control on her data, which will decrease the need for users blindly trusting the CSP. Otherwise, a user might be reluctant to outsource sensitive data to the cloud.

Furthermore, when data are personal, the individuals to whom the data refer – a.k.a. subjects – have privacy rights that have recently been enshrined in the new European Union’s General Data Protection Regulation (GDPR4 ). To stay GDPR-compliant, a controller – an entity that has obtained consent from subjects to collect, store and process their data – can only outsource subject data to the cloud if she can obtain full control and confidentiality for the outsourced data. The above is a relevant issue because GDPR is also becoming a de facto legal standard outside the European Union, specifically in the USA, Australia, Canada and Japan, and any company wishing to sell information technology solutions to those markets must take it into account.

Notice that privacy is even more challenging than security, because it must hold also with respect to (public and, therefore, untrusted) CSPs. In this respect, the cloud has given CSPs the opportunity to analyze and exploit large amounts of personal data. In fact, a report by the U.S. Federal Trade Commission [5] states that public CSPs regularly collect and analyze the data of their users without the latter’s knowledge, and that those analyses could yield sensitive inferences; for example, a CSP could detect individuals that suffer from diabetes because of their interest in sugar-free products and share this information with an insurance company that could use that clue to classify a person as higher-risk (and possibly higher-premium).

One might argue that sensitive data handling in the cloud would be much simpler if the CSP could be assumed to be trusted. However, there are several legal issues here. On the one hand, in many scenarios the data subjects entrust the data controller with their personal data (for example, healthcare data), but this does not mean they allow the controller to further transfer their data to whomever the controller chooses to trust. On the other hand, the CSP may be under a jurisdiction different from the controller’s. If, say, the CSP is under U.S. law whereas controller and subjects are under E.U. law, the latter law may be violated. Finally, many public CSPs offer their services free of charge in return for the possibility of monetizing users’ data. For example, a recent privacy policy in Google5 specifies that whatever information a user decides to outsource to any Google service can be used, reproduced, modified or distributed by Google with the aim of improving or promoting its services, but also to conduct targeted advertising (e.g., the Gmail filtering system scans the content of our emails to serve personalized ads).

To assuage the above issues and restore the user’s control and trust on the protection of the data outsourced to the cloud, several solutions have been proposed in recent years. All of them involve masking sensitive data so that only protected values are stored in the cloud and only the user/controller owning the data is able to unmask the protected values retrieved from the cloud. However, if the user wants to use not only the cloud’s storage but also the cloud’s computational power, the challenge is even harder, because data protection should be made compatible with outsourced computations on cloud premises on masked data.

In this paper we survey the state of the art on security and privacy-enabling solutions towards the cloud, with a focus on those that preserve cloud service functionalities, such as the ability to outsource queries and calculations on protected data to the cloud. In comparison with recent surveys on this area, ours offers the following contributions:

  • Most surveys focus on data security vs external attackers [6], [7], [8], [9] rather than on privacy versus the cloud. Therefore, they center their analysis on security attacks and on mechanisms to prevent, detect and mitigate them. In contrast, our survey considers mechanisms that protect outsourced data not only against third-party attackers, but also against insiders and honest-but-curious clouds 6 storing and managing such data.

  • Many surveys concentrate on outsourced data storage [8], [9], [10]. Our paper goes a step beyond and puts the spotlight on the preservation of cloud service functionalities (e.g., queries, calculations, etc.) on the protected data outsourced to the cloud. Taking advantage of the cloud’s computing power on protected data is significantly more challenging than merely using the cloud to store protected data.

  • All surveys covering privacy-enabling service preservation mechanisms are limited to cryptographic solutions [11], [12]. Even though these methods are powerful to secure data, they also result in significant calculation overheads (which partly neutralize the cost-saving benefits of cloud computing), they require key management, and they severely limit cloud functionalities on the outsourced data because they need tailored solutions for each type of outsourced calculation [12]. To be self-contained, we also discuss cryptographic solutions but, for the first time, we comprehensively survey non-cryptographic methods (based on data splitting and anonymization) that can be used to efficiently protect data outsourced to the cloud while preserving a variety of cloud services.

  • In addition to the security and privacy-enabling solutions proposed in the literature, we survey research projects and products that implement some of these methods in the cloud scenario and discuss the outsourced functionalities they support.

In Section 2 we characterize the scenario we consider by identifying the actors and the entities involved. A common feature of privacy-enabling technologies towards the cloud is the use of a local proxy by the controller to mask the data before outsourcing them to the cloud. In this section we also present this general architecture and the assumptions and security models on which available solutions rely. We conclude the section with the identification and the description of the requirements that privacy-enabling technologies should ideally fulfill. In Section 3 we review data protection techniques, non-cryptographic or cryptographic, that have been or can be implemented in the aforementioned proxies to enable privacy and security towards the cloud while preserving (at least some of) its functionalities. In Section 4 we present a critical comparison of the techniques covered in Section 3 with respect to the requirements identified in Section 2. In Section 5 we survey research projects and products that offer security- and privacy-enabling solutions towards the cloud, and classify them according to the type of data protection technique they employ. Finally, in Section 6 we gather the conclusions and several research challenges derived from the gaps identified in the survey.

Section snippets

Data protection towards the cloud: assumptions and requirements

Data protection towards untrusted clouds is usually accomplished by deploying a trusted local proxy. This proxy is a logical entity that can be located on the client side (for instance, the user’s personal computer or smartphone) or, at least, in a location trusted by the user (such as a server or a router within the user’s corporate intranet) [13]. Proxies can be implemented as additional modules or services for existing applications (with dedicated APIs), as browser plugins (that intercept

Data protection techniques

As discussed in the previous section, local proxies may use different techniques to protect the user’s data before outsourcing them to the cloud. Due to the variety of data protection scenarios, the research community has proposed a plethora of masking techniques. Although this diversity may seem bewildering, it also has the advantage of making it possible to cope with heterogeneous privacy and functionality needs.

This section surveys the state of the art on functionality-preserving data

Comparison of data protection techniques

In Table 4 we compare the types of protection methods surveyed above in terms of the requirements identified in Section 2: supported operations, overhead at the local proxy, preservation of the accuracy of the original data, transparency and interoperability. Operations on protected data not listed in the table may still be supported but at a prohibitive cost (e.g. to update searchable encrypted data, the whole data set should be downloaded, decrypted, updated, encrypted and re-uploaded). In

Research projects and products on privacy-aware computation outsourcing

In this section we review research projects and products that materialize some of the data protection techniques described above in the cloud scenario.

Conclusions and research challenges

In this survey, we have presented in a systematic way technologies that allow privacy-aware outsourcing of storage and processing of sensitive data to the cloud. This topic is especially hot given the coincidence of two phenomena: first, the sheer volume of personal or otherwise sensitive data being collected makes it increasingly necessary not only to store them in the cloud, but also to process them there; second, the upgrade of personal data protection that the new European General Data

Disclaimer and acknowledgments

This work was partly supported by the European Commission, Brussels (Belgium) (project H2020-700540 “CANVAS”), the Government of Catalonia, Spain (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2017 SGR 705) and the Spanish Government (projects TIN2014-57364-C2-1-R “SmartGlacis”, TIN2015-70054-REDC and TIN2016-80250-R “Sec-MCloud”). While the authors are with the UNESCO Chair in Data Privacy, the opinions expressed in this paper are the authors’ own and do not necessarily reflect the

References (183)

  • MartínezS. et al.

    Semantic adaptive microaggregation of categorical microdata

    Comput. Secur.

    (2012)
  • ZhangJ. et al.

    PRivBayes: private data release via Bayesian networks

    ACM Trans. Database Syst.

    (2017)
  • SánchezD. et al.

    Utility-preserving differentially private data releases via individual ranking microaggregation

    Inf. Fusion

    (2016)
  • RheeH.S. et al.

    Trapdoor security in a searchable public-key encryption scheme with a designated tester

    J. Syst. Softw.

    (2010)
  • Eurostat, Cloud computing - statistics on the use by enterprises, (Dec. 2016 (Accessed 14.02.19)). URL...
  • C.S. Alliance, Cloud usage: Risks and opportunities report (Sep. 2014 (Accessed 14.02.19)). URL...
  • HaeberlenT. et al.

    Cloud Computing. Benefits, Risks and Recommendations for Information Security (Rev. B)

    (2012)
  • WestinA.

    Privacy and Freedom

    (1967)
  • RamirezE. et al.

    Data Brokers: A Call for Transparency and Accountability

    (2014)
  • Praveen-KumarP. et al.

    Attribute based encryption in cloud computing: A survey, gap analysis, and future directions

    J. Netw. Comput. Appl.

    (2018)
  • TangJ. et al.

    Ensuring security and privacy preservation for cloud data services

    ACM Comput. Surv.

    (2016)
  • ShanZ. et al.

    Practical secure computation outsourcing: A survey

    ACM Comput. Surv.

    (2018)
  • CaoN. et al.

    Privacy-preserving multi-keyword ranked search over encrypted cloud data

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • M. Rouse, What is a multi-cloud strategy?, WhatIs.com (Jul. 2014). URL...
  • AggarwalG. et al.

    Two can keep a secret: A distributed architecture for secure database services

  • WeiZ. et al.

    Data privacy protection using multiple cloud storages

  • DevH. et al.

    An approach to protect the privacy of cloud data from data mining based attacks

  • AliM. et al.

    DROPS: Division and replication of data in cloud for optimal performance and security

    IEEE Trans. Cloud Comput.

    (2018)
  • GaiK. et al.

    Security-aware efficient mass distributed storage approach for cloud systems in big data

  • AlqahtaniH. et al.

    A multi-cloud approach for secure data storage on smart device

  • CirianiV. et al.

    Fragmentation and encryption to enforce privacy in data storage

  • GanapathyV. et al.

    Distributing data for secure database services

    Trans. Data Priv.

    (2012)
  • CirianiV. et al.

    Selective data outsourcing for enforcing privacy

    J. Comp. Sec.

    (2011)
  • SánchezD. et al.

    C-sanitized: A privacy model for document redaction and sanitization

    J. Assoc. Inf. Sci. Technol.

    (2016)
  • GoethalsB. et al.

    On private scalar product computation for privacy-preserving data mining

  • KarrA.F. et al.

    Privacy-preserving analysis of vertically partitioned data using secure matrix products

    J. Off. Stat.

    (2009)
  • HundepoolA. et al.

    Statistical Disclosure Control

    (2012)
  • ElliotM. et al.

    The Future of Statistical Disclosure Control, Tech. Rep.

    (2018)
  • FungB.C.M. et al.

    Privacy-preserving data publishing: A survey of recent developments

    ACM Comput. Surv.

    (2010)
  • SinghN. et al.

    Data privacy protection mechanisms in cloud

    Data Sci. Eng.

    (2018)
  • Domingo-FerrerJ. et al.

    Ordinal, continuous and heterogeneous k-anonymity through microaggregation

    Data Min. Knowl. Discov.

    (2005)
  • R.A. Moore, Controlled data swapping techniques for masking public use microdata sets, U.S. Bureau of the Census...
  • Domingo-FerrerJ. et al.
  • SkinnerC. et al.

    Disclosure avoidance for census microdata in great britain

  • SamaratiP. et al.

    Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement through Generalization and Suppression, Tech. Rep.

    (1998)
  • BrandR.

    Microdata protection through noise addition

  • TorraV.

    Rank swapping for partial orders and continuous variables

  • MuralidharK. et al.

    Reverse mapping to preserve the marginal distributions of attributes in masked microdata

  • Domingo-FerrerJ. et al.

    Practical data-oriented microaggregation for statistical disclosure control

    IEEE Trans. Knowl. Data Eng.

    (2002)
  • DefaysD. et al.

    Panels of enterprises and confidentiality: the small aggregates method

  • Cited by (122)

    View all citing articles on Scopus
    View full text