skip to main content
10.1145/2810103.2813630acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Transparent Data Deduplication in the Cloud

Published: 12 October 2015 Publication History

Abstract

Cloud storage providers such as Dropbox and Google drive heavily rely on data deduplication to save storage costs by only storing one copy of each uploaded file. Although recent studies report that whole file deduplication can achieve up to 50% storage reduction, users do not directly benefit from these savings-as there is no transparent relation between effective storage costs and the prices offered to the users. In this paper, we propose a novel storage solution, ClearBox, which allows a storage service provider to transparently attest to its customers the deduplication patterns of the (encrypted) data that it is storing. By doing so, ClearBox enables cloud users to verify the effective storage space that their data is occupying in the cloud, and consequently to check whether they qualify for benefits such as price reductions, etc. ClearBox is secure against malicious users and a rational storage provider, and ensures that files can only be accessed by their legitimate owners. We evaluate a prototype implementation of ClearBox using both Amazon S3 and Dropbox as back-end cloud storage. Our findings show that our solution works with the APIs provided by existing service providers without any modifications and achieves comparable performance to existing solutions.

References

[1]
Amazon S3 Pricing. http://aws.amazon.com/s3/pricing/.
[2]
Bitcoin real-time stats and tools. http://blockexplorer.com/q.
[3]
Google Cloud Storage. https://cloud.google.com/storage/.
[4]
The MySQL Query Cache. http://dev.mysql.com/doc/refman/5.1/en/query-cache.html.
[5]
PBC Library. http://crypto.stanford.edu/pbc/, 2007.
[6]
Cloud Market Will More Than Triple by 2014, Reaching $150 Billion. http://www.msptoday.com/topics/msp-today/articles/364312-cloud-market-will-more-than-triple-2014-reaching.htm, 2013.
[7]
JPBC:Java Pairing-Based Cryptography Library. http://gas.dia.unisa.it/projects/jpbc/#.U3HBFfna5cY, 2013.
[8]
Bitcoin as a public source of randomness. https://docs.google.com/presentation/d/1VWHm4Moza2znhXSOJ8FacfNK2B_vxnfbdZgC5EpeXFE/view?pli=1#slide=id.g3934beb89_034, 2014.
[9]
These are the cheapest cloud storage providers right now. http://qz.com/256824/these-are-the-cheapest-cloud-storage-providers-right-now/, 2014.
[10]
Armknecht, F., Bohli, J., Karame, G. O., Liu, Z., and Reuter, C. A. Outsourced proofs of retrievability. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, November 3--7, 2014 (2014), pp. 831--843.
[11]
Ateniese, G., Burns, R. C., Curtmola, R., Herring, J., Kissner, L., Peterson, Z. N. J., and Song, D. X. Provable data possession at untrusted stores. In ACM Conference on Computer and Communications Security (2007), pp. 598--609.
[12]
Baric, N., and Pfitzmann, B. Collision-free accumulators and fail-stop signature schemes without trees. In EUROCRYPT (1997), W. Fumy, Ed., vol. 1233 of Lecture Notes in Computer Science, Springer, pp. 480--494.
[13]
Bellare, M., and Keelveedhi, S. Interactive message-locked encryption and secure deduplication. In Public-Key Cryptography - PKC 2015 - 18th IACR International Conference on Practice and Theory in Public-Key Cryptography, Gaithersburg, MD, USA, March 30 - April 1, 2015, Proceedings (2015), J. Katz, Ed., vol. 9020 of Lecture Notes in Computer Science, Springer, pp. 516--538.
[14]
Bellare, M., Keelveedhi, S., and Ristenpart, T. DupLESS: Server-aided encryption for deduplicated storage. In Proceedings of the 22Nd USENIX Conference on Security (Berkeley, CA, USA, 2013), SEC'13, USENIX Association, pp. 179--194.
[15]
Bellare, M., Keelveedhi, S., and Ristenpart, T. Message-locked encryption and secure deduplication. In Advances in Cryptology - EUROCRYPT 2013, 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26--30, 2013. Proceedings (2013), T. Johansson and P. Q. Nguyen, Eds., vol. 7881 of Lecture Notes in Computer Science, Springer, pp. 296--312.
[16]
Blasco, J., Di Pietro, R., Orfila, A., and Sorniotti, A. A tunable proof of ownership scheme for deduplication using bloom filters. In Communications and Network Security (CNS), 2014 IEEE Conference on (Oct 2014), pp. 481--489.
[17]
Boldyreva, A. Efficient threshold signature, multisignature and blind signature schemes based on the gap-diffie-hellman-group signature scheme.
[18]
Boneh, D., Lynn, B., and Shacham, H. Short signatures from the weil pairing. J. Cryptology 17, 4 (2004), 297--319.
[19]
Brent Boyer. Robust Java benchmarking. http://www.ibm.com/developerworks/library/j-benchmark2/j-benchmark2-pdf.pdf.
[20]
Buldas, A., Laud, P., and Lipmaa, H. Eliminating counterevidence with applications to accountable certificate management. Journal of Computer Security 10, 3 (2002), 273--296.
[21]
Camenisch, J., and Lysyanskaya, A. Dynamic accumulators and application to efficient revocation of anonymous credentials. In Advances in Cryptology - CRYPTO 2002 (2002), Springer, pp. 61--76.
[22]
Damgård, I., and Triandopoulos, N. Supporting non-membership proofs with bilinear-map accumulators. IACR Cryptology ePrint Archive 2008 (2008), 538.
[23]
Di Pietro, R., and Sorniotti, A. Boosting efficiency and security in proof of ownership for deduplication. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (New York, NY, USA, 2012), ASIACCS '12, ACM, pp. 81--82.
[24]
Dobre, D., Karame, G., Li, W., Majuntke, M., Suri, N., and Vukolić, M. Powerstore: Proofs of writing for efficient and robust storage. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (New York, NY, USA, 2013), CCS '13, ACM, pp. 285--298.
[25]
Douceur, J. R., Adya, A., Bolosky, W. J., Simon, D., and Theimer, M. Reclaiming space from duplicate files in a serverless distributed file system. In ICDCS (2002), pp. 617--624.
[26]
Fiat, A., and Shamir, A. How to prove yourself: Practical solutions to identification and signature problems. In Proceedings on Advances in cryptology--CRYPTO '86 (London, UK, UK, 1987), Springer-Verlag, pp. 186--194.
[27]
Halevi, S., Harnik, D., Pinkas, B., and Shulman-Peleg, A. Proofs of ownership in remote storage systems. In Proceedings of the 18th ACM Conference on Computer and Communications Security (New York, NY, USA, 2011), CCS '11, ACM, pp. 491--500.
[28]
Harnik, D., Pinkas, B., and Shulman-Peleg, A. Side channels in cloud services: Deduplication in cloud storage. IEEE Security & Privacy 8, 6 (2010), 40--47.
[29]
Karame, G. O., Androulaki, E., and Capkun, S. Double-spending fast payments in bitcoin. In Proceedings of the 2012 ACM conference on Computer and communications security (New York, NY, USA, 2012), CCS '12, ACM, pp. 906--917.
[30]
Kate, A., Zaverucha, G. M., and Goldberg, I. Constant-size commitments to polynomials and their applications. In Advances in Cryptology-ASIACRYPT 2010. Springer, 2010, pp. 177--194.
[31]
Keelveedhi, S., Bellare, M., and Ristenpart, T. DupLESS: Server-aided encryption for deduplicated storage. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security 13) (Washington, D.C., 2013), USENIX, pp. 179--194.
[32]
Li, J., Li, N., and Xue, R. Universal accumulators with efficient nonmembership proofs. In Applied Cryptography and Network Security, 5th International Conference, ACNS 2007, Zhuhai, China, June 5--8, 2007, Proceedings (2007), pp. 253--269.
[33]
Lipmaa, H. Secure accumulators from euclidean rings without trusted setup. In Applied Cryptography and Network Security - 10th International Conference, ACNS 2012, Singapore, June 26--29, 2012. Proceedings (2012), pp. 224--240.
[34]
Liu, S., Huang, X., Fu, H., and Yang, G. Understanding data characteristics and access patterns in a cloud storage system. In 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands, May 13--16, 2013 (2013), pp. 327--334.
[35]
Meyer, D. T., and Bolosky, W. J. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (Berkeley, CA, USA, 2011), FAST'11, USENIX Association, pp. 1--1.
[36]
Meyer, D. T., and Bolosky, W. J. A study of practical deduplication. Trans. Storage 7, 4 (Feb. 2012), 14:1--14:20.
[37]
Micali, S., Rabin, M., and Kilian, J. Zero-knowledge sets. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on (2003), IEEE, pp. 80--91.
[38]
NetEm. NetEm, the Linux Foundation. Website, 2009. Available online at http://www.linuxfoundation.org/collaborate/workgroups/networking/netem.
[39]
Nguyen, L. Accumulators from bilinear pairings and applications. In Topics in Cryptology - CT-RSA 2005, The Cryptographers' Track at the RSA Conference 2005, San Francisco, CA, USA, February 14--18, 2005, Proceedings (2005), pp. 275--292.
[40]
Shacham, H., and Waters, B. Compact Proofs of Retrievability. In ASIACRYPT (2008), pp. 90--107.
[41]
Soriente, C., Karame, G. O., Ritzdorf, H., Marinovic, S., and Capkun, S. Commune: Shared ownership in an agnostic cloud. In Proceedings of the 20th ACM Symposium on Access Control Models and Technologies, Vienna, Austria, June 1--3, 2015 (2015), pp. 39--50.
[42]
Stanek, J., Sorniotti, A., Androulaki, E., and Kencl, L. A secure data deduplication scheme for cloud storage. In Financial Cryptography and Data Security - 18th International Conference, FC 2014, Christ Church, Barbados, March 3--7, 2014, Revised Selected Papers (2014), pp. 99--118.
[43]
van Dijk, M., Juels, A., Oprea, A., Rivest, R. L., Stefanov, E., and Triandopoulos, N. Hourglass schemes: How to prove that cloud files are encrypted. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (New York, NY, USA, 2012), CCS '12, ACM, pp. 265--280.
[44]
Xu, J., Chang, E.-C., and Zhou, J. Weak leakage-resilient client-side deduplication of encrypted data in cloud storage. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (New York, NY, USA, 2013), ASIA CCS '13, ACM, pp. 195--206.

Cited By

View all
  • (2024)Cloud Security Using Fine-Grained Efficient Information Flow TrackingFuture Internet10.3390/fi1604011016:4(110)Online publication date: 25-Mar-2024
  • (2024)Privacy-Preserving Popularity-Based Deduplication against Malicious Behaviors of the CloudProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3656286(245-256)Online publication date: 1-Jul-2024
  • (2024)Enabling Transparent Deduplication and Auditing for Encrypted Data in CloudIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333447521:4(3545-3561)Online publication date: Jul-2024
  • Show More Cited By

Index Terms

  1. Transparent Data Deduplication in the Cloud

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
      October 2015
      1750 pages
      ISBN:9781450338325
      DOI:10.1145/2810103
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 October 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cloud security
      2. secure data deduplication
      3. transparent attestation of deduplication

      Qualifiers

      • Research-article

      Funding Sources

      • TREDISEC project funded by the European Union

      Conference

      CCS'15
      Sponsor:

      Acceptance Rates

      CCS '15 Paper Acceptance Rate 128 of 660 submissions, 19%;
      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Cloud Security Using Fine-Grained Efficient Information Flow TrackingFuture Internet10.3390/fi1604011016:4(110)Online publication date: 25-Mar-2024
      • (2024)Privacy-Preserving Popularity-Based Deduplication against Malicious Behaviors of the CloudProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3656286(245-256)Online publication date: 1-Jul-2024
      • (2024)Enabling Transparent Deduplication and Auditing for Encrypted Data in CloudIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333447521:4(3545-3561)Online publication date: Jul-2024
      • (2024)A randomized encryption deduplication method against frequency attackJournal of Information Security and Applications10.1016/j.jisa.2024.10377483(103774)Online publication date: Jun-2024
      • (2024)Finite State Automata Based Cryptosystem for Secure Data Sharing and De-duplication in Cloud ComputingSN Computer Science10.1007/s42979-024-03101-y5:6Online publication date: 8-Aug-2024
      • (2024)Blockchain-based immunization against kleptographic attacksScience China Information Sciences10.1007/s11432-023-3883-467:7Online publication date: 6-Jun-2024
      • (2023)Threat Model and Defense Scheme for Side-Channel Attacks in Client-Side DeduplicationTsinghua Science and Technology10.26599/TST.2021.901007128:1(1-12)Online publication date: Feb-2023
      • (2023)FeatureSpy: Detecting Learning-Content Attacks via Feature Inspection in Secure Deduplicated StorageIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228971(1-10)Online publication date: 17-May-2023
      • (2022)Secure and Efficient Hybrid Data Deduplication in Edge ComputingACM Transactions on Internet Technology10.1145/353767522:3(1-25)Online publication date: 25-Jul-2022
      • (2022)Tunable Encrypted Deduplication with Attack-resilient Key ManagementACM Transactions on Storage10.1145/351061418:4(1-38)Online publication date: 27-Sep-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media