Elsevier

Pervasive and Mobile Computing

Volume 24, December 2015, Pages 129-137
Pervasive and Mobile Computing

Secure multi-server-aided data deduplication in cloud computing

https://doi.org/10.1016/j.pmcj.2015.03.002Get rights and content

Abstract

Cloud computing enables on-demand and ubiquitous access to a centralized pool of configurable resources such as networks, applications, and services. This makes that huge of enterprises and individual users outsource their data into the cloud server. As a result, the data volume in the cloud server is growing extremely fast. How to efficiently manage the ever-increasing datum is a new security challenge in cloud computing. Recently, secure deduplication techniques have attracted considerable interests in the both academic and industrial communities. It can not only provide the optimal usage of the storage and network bandwidth resources of cloud storage providers, but also reduce the storage cost of users. Although convergent encryption has been extensively adopted for secure deduplication, it inevitably suffers from the off-line brute-force dictionary attacks since the message usually can be predictable in practice. In order to address the above weakness, the notion of DupLESS was proposed in which the user can generate the convergent key with the help of a key server. We argue that the DupLESS does not work when the key server is corrupted by the cloud server. In this paper, we propose a new multi-server-aided deduplication scheme based on the threshold blind signature, which can effectively resist the collusion attack between the cloud server and multiple key servers. Furthermore, we prove that our construction can achieve the desired security properties.

Introduction

Cloud computing, the long dreamed vision of computing as a utility, has plenty of benefits for real-world applications such as on-demand self-service, ubiquitous network access, location independent resource pooling, rapid resource elasticity, usage-based pricing, and outsourcing computation. With the rapid advances of cloud computing, plenty of enterprises and individual users outsource their sensitive data into the cloud storage provider (e.g. Dropbox  [1], Google Drive  [2]), in which they can enjoy high quality data storage and computing services in a ubiquitous manner, while reducing the burden of data storage and maintenance. As a result, the data volume of storage provider (CSP) is growing much rapidly in recent years, especially when we have entered into the era of big data. According to the analysis of IDC  [3], the volume of data in the world is expected to reach 40 trillion gigabytes in 2020. Therefore, one critical challenge of cloud storage provider (CSP) is how to efficiently manage the ever-increasing datum.

As a promising primitive, deduplication  [4] has attracted more and more attentions from both academic and industrial community. Data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. This technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent  [5]. In the process of deduplication, the CSP keeps only one copy for the identical data file. More specifically, the CSP performs the store operation for a data file upon received the first upload request. For the subsequent upload requests, a link from the uploading user to the original data copy will be assigned to him. This ensures that each data file can be stored only as one copy in the server. In the scenario of high data redundancy, deduplication can be used to effectively reduce data storage space and communication overhead.

Despite the tremendous benefits, deduplication with sensitive data also suffers from some new security challenges. Specifically, to protect the confidentiality of outsourced data, it will be performed encryption operation before outsourcing. Nevertheless, conventional encryption techniques require that the identical data is encrypted by different users with their own keys, which cause different ciphertexts corresponding to the same plaintext. Thus, it makes cross-user deduplication impossible. To tackle this incompatibility issue, Douceur et al.  [6] firstly introduced the notion of convergent encryption. It uses a convergent key deriving from the cryptographic hash value of data content to perform encryption/decryption operations on data copy. That is, given a data file F, the user first computes the convergent key K=H(F), where H() is a one-way collision-resistant hash function. Then, he encrypts F to obtain the ciphertext C=E(K,F). Since E is a deterministic symmetric encryption scheme, all the users with the identical data can generate the identical convergent key and corresponding ciphertext. This allows the CSP to perform deduplication on the ciphertexts in the cross-user setting.

However, convergent encryption is vulnerable to the off-line brute-force attack. That is because that the plaintext space of the given ciphertext C is not big enough (The message is often predictable  [7].). Hence, any attacker can learn the corresponding plaintext information by performing the encryption with all possible plaintexts in the off-line phase (note that the encryption scheme E is deterministic and the convergent key K is only depended on the data file F). To address this issue, Bellare et al.  [7] designed a more secure deduplication system called DupLESS. In DupLESS, the user generates his own convergent key with the aid of the key server (KS), where the convergent key involves into the private key of the KS by performing blind signature between the user and KS. We argue that the DupLESS system does not work when the CSP colludes with the KS. The reason is that the attacker can obtain ciphertext and the convergent key simultaneously. Recently, Duan  [8] presented a novel distributed encrypted deduplication scheme. In their construction, before uploading a file, user generates the convergent key based on threshold signature technique with the assist of other users. Moreover, a trusted dealer must be included in their scheme to distribute key shares for users. We argue that the trusted dealer works as the similar role of the key server in DupLESS, which is easy vulnerable to the single points of failure. To the best of our knowledge, it seems that there is no effective deduplication scheme that can fully resist the brute-force attack.

In this paper, we present a solution on how to design a deduplication scheme can resist the brute force attack. Motivated by reducing the trust assumption on the KS, a multi-key servers deduplication scheme is proposed based on the basic idea of threshold blind signature. Our contributions are two folds as follows:

  • We propose a new multi-server-aided deduplication scheme based on the decentralized threshold blind signature, in which each user generates convergent key by interaction with multiple key servers. Furthermore, any partial key servers cannot acquire knowledge of the distributed secret key among all key servers.

  • Security analysis shows that the proposed scheme is secure in terms of the proposed security model, while can resist brute force attack even if a limited number of key servers are corrupted.

Secure data deduplication. Data deduplication is an active research area in data storage for several years, which can be used to save network bandwidth and storage space by eliminating duplicate copies of data. However, traditional deduplication techniques  [9], [10], [11] focused on basic methods and compression ratios, they have not considered the issue of data privacy. To this end, Douceur et al.  [6] first introduced the notion of convergent encryption, which can ensure data confidentiality while performing deduplication operations.

Driven by ever-increasing datum in cloud computing, plenty of research work on deduplication over encrypted data has been done recently  [12], [4], [7], [6], [13], [14], [15], [16], [17]. Bellare et al.  [4] formalized this primitive as message-locked encryption, and explored its application in the space-efficient secure outsourced storage. Stanek et al.  [16] proposed a novel deduplication encryption scheme that can provide different security for data files according to the property of popularity (trivially, it means that more users shared the file if a file is more popular). In this way, they can achieve a more fine-grained trade-off between the storage efficiency and data security for the outsourced data. In order to better protect the confidentiality of outsourced data, Li et al.  [14] proposed a new authorized data deduplication scheme in a hybrid cloud architecture. In their construction, each user can only perform deduplication on the files satisfying his privilege. Yuan et al.  [18] presented a new deduplication scheme which can support efficient and secure data integrity auditing with data deduplication simultaneously. We argue that all the above solutions cannot resist the brute-force attack. As the first attempt, Bellare et al.  [7] introduced a DupLESS system that can partially address the problem by adding a key server (KS). That is, the user generated the convergent key by performing a blind signature protocol with the secret key of the KS. That implies DupLESS can achieve the same security of MLE  [4] at worst. Nevertheless, this solution still does not work when the cloud server colludes with the key server.

Proof of ownership. Halevi et al.  [19] firstly introduced the notion of proofs of ownership (PoW) to ensure data privacy and confidentiality in client-side deduplication. In this way, the user can efficiently prove to the cloud storage server that he indeed owns a file without uploading the file itself. Three concrete PoW constructions are presented based on Merkle hash tree(MHT) built on the content of data file. Specifically, a challenge/response protocol is performed between server and client. Each time the server requires the client to a valid verification object for the requested subset of MHT leaves node.1 Using the PoW, the cheat attack of malicious user can be prevented in the client-side deduplication. Pietro et al.  [20] proposed an efficient PoW scheme by choosing the projection of a file onto some randomly selected bit-positions as the file proof, which can only need a constant computational cost. Recently, Blasco et al.  [21] presented a novel PoW scheme based on Bloom filter, which is superior in efficiency both server and client side.

The rest of the paper is organized as follows. We present some preliminaries in Section  2. In Section  3, we present the system and threat model of the proposed deduplication scheme. The detailed constructions and its security analysis are presented in Section  4. The performance evaluation of the construction is given in Section  5. Finally, the conclusion is given in Section  6.

Section snippets

Bilinear pairings

Let G1,G2 be the cyclic groups of prime order p and g be a generator of G1, and e:G1×G1G2 be a map with the following properties.

  • 1.

    Bilinearity: e(ga,gb)=e(g,g)ab,a,bZp.

  • 2.

    Non-degeneracy: There exist x,yG1 such that e(x,y)1.

  • 3.

    Computable: For all x,yG1,e(x,y) has to be computable in an efficient manner.

Gap Diffie–Hellman (GDH) groups

Let G be a cyclic multiplicative group generated by g with the prime order q. We focus on the following mathematical problems in G:

  • 1.

    Discrete Logarithm Problem (DLP): Given h, gG, to find an integer

System model

In this work, we consider a cloud data-outsourcing system supporting deduplication, which consists of three different entities: user, storage cloud service provider (S-CSP) and key cloud service providers (K-CSPs) as elaborated below.

  • User. The user firstly checks whether the uploaded data is a duplicate one, before storing it. If found, data uploading should be canceled and a link of the data is assigned to the user.

  • S-CSP. The S-CSP is responsible for storing users’ outsourcing data. To reduce

Multi-server-aided deduplication scheme

In this section, we firstly present a basic solution that can resist brute-force attacks in deduplication, and point out the limitation of the solution. Then, an improved approach is given, which can preserve the secrecy of user’s data while preventing brute-force attacks.

Performance evaluation

In this section, we provide an experimental evaluation of the proposed data deduplication scheme. For the convenience of discussion, some notations are introduced. We denote by P the paring operation, by Exp the modular exponentiation, by n the total number of K-CSPs, and by t the number of K-CSPs participated in convergent key creation. Note that we omit the ordinary file transfer and file encryption/decryption modules for simplification. In addition, consider the process of data download is

Conclusion

In this paper, we make a further study on the problem of resisting off-line brute force attack in deduplication. In our construction, the convergent key is generated with additional secret key of K-CSPs. That is, the user interacted with any t of K-CSPs to perform the threshold blind signature. Note that the secret key can be leaked if and only if all K-CSPs are corrupted. Thus, our scheme can provide a more stronger security and effectively resist the off-line brute force attack. We also prove

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61272455), China 111 Project (No. B08038), Doctoral Fund of Ministry of Education of China (No. 20130203110004), Program for New Century Excellent Talents in University (No. NCET-13-0946), and the Fundamental Research Funds for the Central Universities (Nos. BDY151402 and JB142001-14).

References (29)

  • Dropbox, a file-storage and sharing service....
  • Google Drive....
  • J. Gantz, D. Reinsel, The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far...
  • M. Bellare, S. Keelveedhi, T. Ristenpart, Message-locked encryption and secure deduplication, in: Proceedings of 32nd...
  • Data Deduplication....
  • J.R. Douceur, A. Adya, W.J. Bolosky, D. Simon, M. Theimer, Reclaiming space from duplicate files in a serverless...
  • M. Bellare, S. Keelveedhi, T. Ristenpart, DupLESS: server-aided encryption for deduplicated storage, in: Proceedings of...
  • Y. Duan, Distributed key generation for encrypted deduplication: achieving the strongest privacy, in: Proceedings of...
  • A. Adya, W.J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J.R. Douceur, J. Howell, J.R. Lorch, M. Theimer, R.P....
  • N. Jain, M. Dahlin, R. Tewari, TAPER: tiered approach for eliminating redundancy in replica synchronization, in:...
  • S. Quinlan, S. Dorward, Venti: a new approach to archival storage, in: Proceedings of the 1st USENIX Conference on File...
  • P. Anderson, L. Zhang, Fast and secure laptop backups with encrypted de-duplication, in: Proceedings of USENIX LISA,...
  • J. Li et al.

    Secure deduplication with efficient and reliable convergent key management

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • J. Li, Y. Li, X. Chen, P. Lee, W. Lou, A hybrid cloud approach for secure authorized deduplication, IEEE Trans....
  • Cited by (63)

    • Secure deduplication for big data with efficient dynamic ownership updates

      2021, Computers and Electrical Engineering
      Citation Excerpt :

      Due to the limited key space, MLE-based deduplication schemes are inherently vulnerable to brute-force attack [11]. To solve this problem, several secure deduplication schemes have been proposed [12,13]. In cloud computing, data ownership often changes because of user enrollment and revocation.

    • Secure deduplication with efficient user revocation in cloud storage

      2021, Computer Standards and Interfaces
      Citation Excerpt :

      The main idea is that the message encryption key is generated by a key-server via an oblivious PRF protocol. Other researchers have also made efforts to improve the security of deduplication [21–24]. However, these schemes cannot support data access control since the encryption key is fixed in the data upload stage.

    View all citing articles on Scopus
    View full text