1 Introduction

Data outsourcing is one of the most conspicuous characteristics of the ongoing paradigm shift in the current technological context. Certainly, we are witnessing a transition from configurations where the data owners determined and deployed the means to store their information assets, to scenarios where those owners trust in third parties to handle such tasks. This being said, from a general point of view it encompasses a series of considerations regarding the manner trust is articulated.

This chapter is focused on pinpointing the underlying security assumptions in the adoption of free-access public cloud storage, but also on highlighting the means to enhance the protection of outsourced data. To achieve such goals, in Sect. 15.2 we discuss ten security challenges in cloud storage. The main concern of each of these challenges is summarised, and we provide a set of recommendations to tackle them. These recommendations are intended to identify cryptographic procedures and software solutions that can help both SMEs and end users to implement usable and low-cost security solutions upon free-access cloud storage. Certainly, the proper combination of standard cryptographic measures and the functionality provided by free cloud storages can lead to appealing serverless solutions. This being the case, our effort is on configuring a guideline to develop client-side software for the secure and privacy-respectful adoption of cloud storage services. This study is very helpful to discuss and define the main requirements of client-side software products to access cloud services. Furthermore, it configures a checklist to take into account in the risk analysis of the related cloud services, which would also be useful for Service Providers (SPs) in the design of their platforms. Finally, we conclude this chapter in Sect. 15.3.

2 Main Threats and Challenges in Cloud Storage

Next, we describe the main challenges in cloud storage derived from the paradigm shift outlined in the Introduction, also sketching possible solutions.

2.1 Challenge 1: Authentication

The protection of information assets demands the convenient concretion of access control methodologies, i.e. of authentication and authorisation procedures. In short, the vulnerabilities in the binomial authentication-authorisation can be a consequence of inner attacks, bad access control policies or use of software with some vulnerability.

In the case of cloud services, authentication is usually performed by checking users’ credentials in a safe way. However, Cloud Providers (CPs) often use basic authentication mechanisms, in which the user has to provide her credentials to her provider. Therefore, this user has to trust the fact that providers do not have access to her credentials and they also manage these credentials in a secure way. Nevertheless, there are some relevant examples of a neglecting management of user’s credentials by CPs.Footnote 1

In addition, it is not difficult to find cases where authentication or authorisation can be circumvented due to the failures in the implementation of some security protocol. On this point, it is relevant to consider the problems regarding the implementation of the OAuth 2.0 protocol [36], which has been exploited to mount a Man-In-The-Cloud (MITC) attack [30].

2.1.1 Solutions

In order to face the problem of credentials leakage, we should use CPs that employ advanced password-based authentication techniques, like the Password-based Authenticated Key Exchange (PAKE) [1] or Secure Remote Password (SRP) [62] protocols, which effectively conveys a zero-knowledge password proof from the user to the server and thus hampers eavesdropper or man in the middle attacks.

Furthermore, this verification process of user’s identity could be better by adding Two-Step Verification (2SV) mechanisms, which is present in several free cloud storage services.Footnote 2 In Google Drive, the user could use Google Authenticator as a two-factor verification mechanism.Footnote 3 Dropbox has also an optional 2SV (via text message or Time-Based One-Time Password apps). Box also provides the use of 2SV. In this case, the process requires a user to present two authenticating pieces of evidence when they log in: something they know (their Box password) and something they have (a code that is sent to their mobile device).Footnote 4 Moreover, all OneDrive users can protect the login via One-Time Code app or a text message.Footnote 5

On the other hand, in some cloud storage services, sign-up is based on users’ Personally Identifiable Information (PII). Consequently, the provider can monitor the storage record of its users having access to a complete behavioural profile of its users. Therefore, anonymous authentication mechanisms are mandatory if the user really needs to get an adequate privacy in the cloud [40]. A way to achieve this type of authentication is through non-conventional digital signatures, like group signatures, which allow members of a group of signers to issue signatures on behalf of the group that can be verified without telling which specific member issued it [6].

Finally, we would want to point out some possible solutions to the MITC attack. One possible strategy could be, firstly, to identify the compromise of a file synchronisation account, using a Cloud Access Security Broker solution that monitors access and usage of cloud services by the users, and secondly, to identify the abuse of the internal data resource, deploying controls such as Dynamic Authorisation ManagementFootnote 6 around the data resources identifying abnormal and abusive access to the data [30]. However, we consider that the best way to solve this security issue is being very carefully in the OAuth implementation, using along with it a second authentication factor in order to have a high security level. To achieve this goal, CPs should be aware of the current potential weaknesses of this protocol and follow the well-known recommendations made by the cryptographic community about how to implement OAuth 2.0 in a secure way [23].

2.1.2 Limitations

In this and other scenarios, the deployment of security protocols implies a risk that has to be taken into account. As much effort is used in the authentication process, we can always find a poor implementation of the cryptographic functionality used (e.g. the Imperva attack on OAuth 2.0 [30], Heartbleed in the case of TLS,Footnote 7 Ghost for Linux servers,Footnote 8 etc.).

Regarding 2SV mechanisms, it has been shown they are not completely secure [18]. Besides, if we are using a service from our mobile phone which includes 2SV sending a text message to this phone, the authentication process would be almost the same than if we only have to provide our username and password straightaway to this service [18]. Furthermore, it is necessary to consider the recent security concerns about the Signalling System No 7 (SS7) and the NIST advice against adopting Short Message Service (SMS) as out-of-band verifier [27].

Another key element in authentication is usability. The adequate combination of usability goals and security concerns is not easy to achieve, and the design of any procedure to improve usability should be thoroughly analysed before deploying the corresponding authentication process. This study, for example, should have been applied to confirm that security is not degraded by the usability goals of the so-called Single Sign-On (SSO) mechanism. SSO is a process that allows a user to enter a username and password to access multiple applications. It enables authentication through a token, and thus, it increases the usability of the whole system. Nevertheless, if these tokens are not adequately managed, SSO could be vulnerable against replay attacks [37].

2.2 Challenge 2: Information Encryption

The trust model and the risk model are very different when the CP is in charge of data encryption. If so, users are implicitly trusting CP, and this assumption can entail a threat to security and/or privacy [57]. With the current free cloud storage solutions, users have to completely trust their CPs [53]. Indeed, an important question concerns who controls the keys used for encryption of data at rest. If they are under the control of the CPs, the user is implicitly trusting that CPs manage them securely and use it only for legitimate purposes. Furthermore, providers could be also compromised because of hardware failures [48], software flaws [39], or misbehaviour of system administrators [33]. Once one of these happens, a tremendous amount of data might get corrupted or lost [64].

2.2.1 Solutions

Solutions for protecting privacy require encrypting data before releasing them to the CPs, and this measure has gained a great relevance since Snowden’s leaks [51]. Client-side encryption could be properly articulated to obtain privacy and data integrity protection.

Nonetheless, encryption can bring several difficulties in scenarios where querying data is necessary. Then, we could use fragmentation instead of encryption and maintain confidential the associations among data values. This technique protects sensitive associations by splitting the concerned pieces of information and storing them in separate un-linkable fragments [53]; also, in this set-up CPs could detect redundant file blocks and perform deduplication.

2.2.2 Limitations

Client-side encryption implies that the user is in charge of keys generation and management,Footnote 9 which could suppose a high workload for her, along with a security risk (e.g. the user could loss a cryptographic key and thus she could not access to her data). Although it is possible to reduce such a burden by adopting password-based tools for the automatic generation and management of cryptographic keys, these solutions represent a security risk as they place all the protection around a single component. This Single Point of Failure (SPOF) can be avoided through distributed authentication via two or more servers. Hence, the user only has to know her password, and the compromise of one server exposes neither any private data nor any password [24].

Apart from the key management problem, data encryption thwarts traditional deduplication mechanisms and querying by keywords in the cloud [53]. However, the new paradigm called Secure Computation (SC) solves this problem, since it is essentially based on processing encrypted data [3]. Indeed, Multi-Party Computation and homomorphic encryption have been discarded in the past, since they are computationally expensive. Nonetheless, currently there exist efficient implementations (e.g. DyadicFootnote 10 and SharemindFootnote 11).

Finally, sharing encrypted documents over the cloud requires a mechanism to share the encryption key of each encrypted file among the members of the group. A first approach to solve this problem could be to make up a data packet based on the concept of the digital envelope container.Footnote 12 However, this incurs in high management costs for the owner of the data and the group, requiring granting/revoking access as users join/leave the group, as well as delays for acquiring the keys. Here, it is possible to generate and store the secret keys in a centralised cloud service. This implies that the user trusts completely the CP, which becomes a SPOF [59].

A third approach includes key management through a trusted client-side authority (the manager) different from the CP [59]: users are segmented into populations called groups, each of which has read and write access to a different data partition. Access to each data partition is managed (through keys generation and users’ authentication) by the manager.

A fourth approach is based on the combination of proxy signatures, a Tree-Based Group Diffie–Hellman (TGDH) key exchange protocol and proxy re-encryption [63]. This solution is privacy-respectful and supports better the updating of encryption keys, because it transfers most of the computational complexity and communication overhead to cloud servers.

2.3 Challenge 3: Inappropriate Modifications of Assets

Both location and data’s owner are blurry and not clearly identified, which makes the old perimeters obsolete as a means to secure IT assets. In fact, when the user uploads her assets to the cloud, she cannot assure that they are not going to be modified by unauthorised third parties (not even when client-side encryption is carried out).

Additionally, for security considerations, previous public auditing schemes for shared cloud data hid the identities of the group members. However, the unconstrained identity anonymity enables a group member to maliciously modify shared data without being identified. This is a threat for coherent data availability, and, consequently, identity traceability should also be retained in data sharing [64].

2.3.1 Solutions

The first solution is to perform a hash function in each file to upload to her CP. These hashes can be verified after files are downloaded, but the problem is that the user needs to store all of them. Alternatively, we can consider retrievability proofs by encrypting the information and inserting data blocks (the sentinels) in random points [32]. The provider only sees a file with random bits, and she is not able to distinguish between data and sentinels. Integrity verification is performed by asking for a random selection of the sentinels; as the provider does not know where the sentinels are, any change in data can be detected. However, this solution supposes an overload for the user.

At this point, we should distinguish between content integrity (achieved with hashes, CRC, sentinels, etc.) and both content and source integrity, which can be carried out with digital signatures. However, in modern IT networks conventional digital signatures do not offer the whole of set of features that may be required. Depending on the specific needs, different schemes may be required, such as group signatures, multi- and aggregate signatures or blind signatures [6]. This being the case, group signatures could be used as anonymous authentication methods in the cloud. On the other hand, multi- and aggregate signatures could be used when a file is digitally signed and shared between several users in a cloud storage service. Finally, identity-based signatures could eliminate the need of distributing public keys in the cloud, allowing the verification of digital signatures just from the identity that the signer claims to own.

Moreover, the audit of shared cloud data demands the traceability of users’ operations without eroding privacy. Therefore, it is required an efficient privacy-respectful public auditing solution that guarantees the identity traceability for group members. In [64], identity traceability is achieved by two lists that record members’ activity on each data block. Besides, the scheme also provides data privacy during the authenticator generation by utilising a blind signature technique.

2.3.2 Limitations

Digital signatures are often performed using the asymmetric cryptographic algorithm RSA. The main disadvantage of this algorithm is the time required for its key generation, although it usually has to be done just once. Nevertheless, we could improve this time using elliptic curves [28], which provide the same security as RSA but with smaller key size. In fact, RSA key generation is significantly slower than ECC key generation for RSA key of sizes 1024 bits and greater [50]. In addition, the cost of key generation can be considered as a crucial factor in the choice of public key systems to use digital signatures, especially for smaller devices with less computational resources [31]. Nonetheless, in some cases it is not necessary to generate RSA keys for each use. For this situation, we would remark that the problem mentioned before is not so dramatic, since RSA is comparable to ECC for digital signature creation in terms of time, and it is faster than ECC for digital signature verification.Footnote 13

Furthermore, we could incorporate Merkle trees [8], such that we will reduce the number of digital signatures to handle. Regarding non-conventional digital signatures, the problem comes for the scarcity of standard and thoroughly verified software libraries [6].

2.4 Challenge 4: Availability

Users typically place large amounts of data in cloud servers, and CPs need to ensure availability even when data are not accessed during long periods of time. Despite redundancy techniques (software and hardware) or error correction codes, from a general point of view, users cannot be sure whether their files are vulnerable or not against hard drive crashes [12].

2.4.1 Solutions

Recent studies propose the utilisation of RAID-like techniques over multiple cloud storage services [65]. Some cloud storage managers have been proposed to handle cloud storage in multiple CPs.Footnote 14 Moreover, in [65] it is proposed a solution for mobile devices that unifies storage from multiple CPs into a centralised storage pool.

2.4.2 Limitations

The above solutions are private, so the user would have to pay fees to use this services and to trust third-party companies to manage the data securely and privately.

2.5 Challenge 5: Data Location

When the CP guarantees that the data are stored within specific geographic area, users might not have the assurance about this fact. On this point, it had a great relevance to the decision of the European Court of Justice made on October 2015: the annulment of the EU-US data sharing agreement named Safe Harbour.Footnote 15 This revocation prevented the automatic transfer of data of European citizens to the United States of America (USA). However, since February 2016 there is a new framework that protects the fundamental rights of anyone in the EU whose personal data is transferred to the United States [21]. These agreements prove the huge legal complexity of the ubiquity in cloud storage systems.

2.5.1 Solutions

Cloud service latency is a hint to infer the geographic location of data as stored by CP. Nevertheless, these measures have to be carried out with high degree of precision since the information moves really quickly in electronic communications. An example of location proof is the use of distance bounding protocols [13]. These protocols always imply a timing phase in which the verificator sends a “challenge” and the provider responds. The provider is first authenticated and then required to respond within a time limit which depends on the distance between provider and user.

2.5.2 Limitations

Network, processing and disk access delays undermine the precision of the above methods. In fact, those methods are really dependent on the CP. For instance, Dropbox has to decrypt the information before sending it (as the encryption/decryption is made in the server), whereas this is not necessary in Mega (which performs this operation in the client side).

2.6 Challenge 6: Data Deduplication

The simple idea behind deduplication is to store each piece of data only once. Therefore, if a user wants to upload already existing data, the CP will add the user in the list of owners of that data. Deduplication is able to save space and costs, so that many CPs are adopting it. However, the adoption of deduplication is not a straightforward decision for a CP. The CP must decide between file or block level deduplication, server-side vs. client side deduplication, and single user vs. cross-user deduplication [56]. Some of these possibilities determine side channels that pose privacy matters [43]. Again we can opt for protecting privacy through encryption, but data encryption prevents straightforward deduplication techniques. Moreover, privacy against curious CPs cannot be ensured for unencrypted data [41].

2.6.1 Solutions

Several solutions have been proposed to mitigate privacy concerns with deduplication [43]. One of these solutions is Convergent Encryption (CE), which consists of using the hash of the data as encryption key. Consequently, two equal files will generate the same encrypted file, which allows both deduplication and encryption [35].

An alternative solution is ClouDedup [41], a tool that proposes the inclusion of an intermediate server between users and CPs. Users send to the intermediate server their blocks encrypted with CE, along with their keys. Afterwards, the intermediate server deduplicates blocks from all users and executes a second non-convergent encryption. As a result, the intermediate server sends to the CP only blocks that are not duplicated.

Other privacy-respectful deduplication techniques are more focused on providing benefits to the clients. This is the case of ClearBox [4], which endows cloud users with a means to verify the real storage space that their (encrypted) data is occupying in the cloud; this allows them to check whether they qualify for benefits such as price reductions.

2.6.2 Limitations

CE allows dictionary attacks. For instance, client-side CE allows malicious users to know if some particular information already exists in the cloud. As a possible solution, one can classify data segments into popular and unpopular chunks of data [42]. Popular pieces of information (those which demand deduplication) would be encrypted using CE; unpopular ones are encrypted using symmetric encryption. This achieves a high level of confidentiality and privacy protection, since the unpopular data (more likely associated to PII) is protected with semantically secure encryption. Nevertheless, the user has to decide on the popularity of data fragments. The usability of the previous scheme can be further improved by handling blocks popularity according to a previously established set of metadata and the use of an additional server [41].

Another option is proposed in [25], where the authors claim to impede content guessing attacks by implementing CE at block level under a security parameter.

2.7 Challenge 7: Version Control of Encrypted Data

Version Control Systems (VCSs) are a must as part of the recuperative controls of any architecture for securing information assets. But its deployment is not straightforward for encrypted data. Indeed, the recommended operating modes for symmetric block ciphers prevent VCSs to know which data segment was modified, as modifications in earlier blocks affect the subsequent ones. Consequently, one copy is needed per each version of a file, introducing too much overhead.

2.7.1 Solutions

VCSs can be built by splitting the files in data objects [46]. If some modifications are produced in a big file, the user would only upload the objects that have been modified. This enables a more efficient backup mechanism, as an object appearing in multiple backups is stored in the cloud only once and it is indexed by a proper hashing-based pointer.

Two tools for version control with encrypted data are SparkleShareFootnote 16 and Git-crypt.Footnote 17 Both solutions enable using a non-fully trusted Git server [54].

2.7.2 Limitations

The main shortcomings of the previous version control systems are given by usability concerns. In the first scheme, the user must decide how to split files and classify the resulting pieces. In addition, the cryptographic keys of each data object must be properly stored and managed [46]. This burden is also undermining the usability of SparkleShare and Git-crypt [29]. Furthermore, SparkleShare does not provide any key change mechanism; the encryption password is saved on each SparkleShare client as plain text and can be read by everyone; and the filename is not encrypted at all, so an attacker can monitor which files are stored. On the other hand, in Git-crypt the filename is not encrypted, and the password can be found in plain text on the client’s computer. As a matter of fact, the management of metadata in this VCS can imply some security risks [58]. In addition, key exchange is not an easy process, and it is still an issue to be solved in coherence with usability criteria [54].

2.8 Challenge 8: Assured Deletion of Data

It is not desirable to keep data backups permanently, since sensitive information can be exposed in the future due to a security breach or a poor management made by CPs [46]. Moreover, CPs can use multiple copies of data over cloud infrastructure to have a great tolerance against failures. As providers do not publish their replication policies, users do not know how many copies of their data are in the cloud, or where those copies are stored. Hence, it is not clear how providers can delete all the copies when users ask for removing their data.

2.8.1 Solutions

Client-side cryptographic protection is the easiest procedure to get assured deletion of data: if a user destroy the secret keys needed to decrypt data, then these data is not accessible anymore [46].

2.8.2 Limitations

As stated above, client-side encryption resorts to a key management system that is totally independent of the cloud system. Therefore, the same limitations explained in Sect. 15.2.2.2 can also apply for this section. On the other hand, note that client-side encryption only protects users’ data against CPs, but not against changes in the users groups in which is performed a key distribution protocol. In this specific case, if a user is revoked from a group, she could still access all previous versions of shared files that belongs to this group if she has previously stored them locally.

2.9 Challenge 9: API’s Validation

The current technological scenario is highly determined by the adoption of third-party software products, and it has been coined as the era of containers (container age). Most of these third-party solutions are given by APIs that have not always been properly validated in terms of securityFootnote 18 , Footnote 19, enclosing additional security risks.

2.9.1 Solutions

The new trust model should be handled by means of a Secure Development Lifecycle (SDL) [47]. SDLs are the key step in the evolution of software security, and they should be guided by automatic tools to analyse security properties through formal methods [7]. These tools are intended to help not only the design stage but also to face security and privacy threats arising in the production phase. The dynamic and adaptive nature of the SDL has been underlined in different methodologies to deploy systems according to the security-by-design principle [17]. In order to sustain such a methodology, there exist tools as Maude-NPAFootnote 20 and the framework STRIDE [55], and companies as Cryptosense,Footnote 21 which is a recent start-up which is looking to commercialise techniques for analysing the security of APIs [38].

2.9.2 Limitations

The human factor also takes part in the analysis of protocols procedure, so a bad formalisation can make that some errors are not detected. Then, as in any engineering process, the solution presented does not provide the 100% of success, but it helps avoid a great quantity of failures.

Finally, assuming that the CPs are validated by the cryptographic community, the solutions proposed for this challenge will be used to validate the new developments made by cloud developers. Nevertheless, the cloud users should verify that their cloud providers follow these good practices.

2.10 Challenge 10: Usable Security Solutions

Any change required to improve security should not erode users’ acceptance of the so-modified cloud services [49, 60]. In other words, secure cloud services solutions should be also easy to use. Otherwise, these services will not be adopted.

2.10.1 Solutions

The so-called security-by-design and privacy-by-design paradigms [15] are hot topics in cryptographic engineering, and their fulfilment calls for software developing methodologies that integrate standards,Footnote 22 security reference architectures [22], and well-known technologies.

Furthermore, technologies usability has to be tested. In [34], Human–Computer Interaction (HCI) methods and practices are provided for each phase of the Software Development Life Cycle (SDLC), in order to create usable secure and privacy-respectful systems. Moreover, there are recent efforts on formalising and automatically assessing the role of human factors in security protocols [44, 45].

2.10.2 Limitations

Firstly, a key limitation to consider is that security and privacy are rarely users’ main goal, and they would like privacy and security systems and controls to be as transparent as possible [34]. On the other hand, users want to be in control of the situation and understand what is happening. As a consequence, these two factors make harder to develop usable security applications.

Finally, in order to be successful in the development of usable security solutions, it is needed to have a multidisciplinary team including experts from the fields of IT, information science, usability engineering, cognitive sciences and human factors. This could suppose a main limitation, since no all companies have enough resources in order to form this kind of heterogeneous work group.

3 Conclusions

Nowadays, cloud storage is a relevant topic due to the increase in the number of users who place their assets onto the cloud. However, these users often do not trust about where their data are going to be stored and who is going to have access to these data. It is for this reason that many users feel the obligation of applying security measures in order to have a total control over their data. More concretely, the user could find authentication, integrity, availability, confidentiality and privacy problems. In the specific case of enterprises, these imply key considerations that should be included in any cloud service agreement.

To address these concerns, this chapter comprises the most relevant security problems that users of free cloud storage services can find. For each identified security challenge, we have outlined some solutions and limitations (see Table 15.1). In addition, we have discussed standards of information security that can be combined with the functionality of free cloud storage services according to a serverless architecture.

Table 15.1 Summary of security challenges and solutions in public cloud storage

Finally, we have to take into account that the world of security information evolves really quickly every day. Therefore, we have to be aware with the new potential security problems which could affect the security and privacy of the user in cloud storage environments. In this evolving scenario, our work is intended to help cloud users to evaluate the cloud service agreements according to recommendations as the new ISO/IEC 19086-1 standard.