skip to main content
10.1145/2996429.2996432acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

On Information Leakage in Deduplicated Storage Systems

Published: 28 October 2016 Publication History

Abstract

Most existing cloud storage providers rely on data deduplication in order to significantly save storage costs by storing duplicate data only once. While the literature has thoroughly analyzed client-side information leakage associated with the use of data deduplication techniques in the cloud, no previous work has analyzed the information leakage associated with access trace information information (e.g., object size and timing) that are available whenever a client uploads a file to a curious cloud provider.
In this paper, we address this problem and analyze information leakage associated with data deduplication on a curious storage server. We show that even if the data is encrypted using a key not known by the storage server, the latter can still acquire considerable information about the stored files and even determine which files are stored. We validate our results both analytically and experimentally using a number of real storage datasets.

References

[1]
Frederik Armknecht, Jens-Matthias Bohli, Ghassan O. Karame, and Franck Youssef. Transparent data deduplication in the cloud. CCS '15, pages 886--900, New York, NY, USA, 2015. ACM.
[2]
Storage Networking Industry Association. SNIA IOTTA Repository. http://iotta.snia.org/traces/3382.
[3]
Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart. Dupless: Server-aided encryption for deduplicated storage. In Proceedings of the 22Nd USENIX Conference on Security, SEC'13, pages 179--194. USENIX Association, 2013.
[4]
Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart. Message-locked encryption and secure deduplication. EUROCRYPT '13. Springer, 2013.
[5]
Roberto Di Pietro and Alessandro Sorniotti. Boosting efficiency and security in proof of ownership for deduplication. ASIACCS '12, pages 81--82, New York, NY, USA, 2012. ACM.
[6]
John R. Douceur, Atul Adya, William J. Bolosky, Dan Simon, and Marvin Theimer. Reclaiming space from duplicate files in a serverless distributed file system. In ICDCS, pages 617--624, 2002.
[7]
Kave Eshghi and Hsiu K. Tang. A framework for analyzing and improving content-based chunking algorithms. Technical report, 2005. http://www.hpl.hp.com/techreports/2005/HPL-2005--30R1.html.
[8]
William Feller. An Introduction to Probability Theory and Its Applications. Wiley, 2nd edition, 1971.
[9]
Shai Halevi, Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Proofs of ownership in remote storage systems. CCS '11, pages 491--500, New York, NY, USA, 2011. ACM.
[10]
Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. IEEE Security & Privacy, 8(6):40--47, 2010.
[11]
Adobe Systems Incorporated. Document management - Portable document format - Part 1: PDF 1.7, 2008. https://wwwimages2.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf.
[12]
Kevin B. Korb and Ann E. Nicholson. Bayesian Artificial Intelligence. CRC Press, second edition, 2010.
[13]
Sven Laur, Riivo Talviste, and Jan Willemson. From oblivious aes to efficient and secure database join in the multiparty setting. ACNS'13. Springer-Verlag, 2013.
[14]
Dirk Meister and André Brinkmann. Multi-level comparison of data deduplication in a backup scenario. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference. ACM, 2009.
[15]
Dutch T. Meyer and William J. Bolosky. A study of practical deduplication. Trans. Storage, 7(4):14:1--14:20, February 2012.
[16]
João Paulo and José Pereira. A survey and classification of storage deduplication systems. ACM Comput. Surv., 47(1):11:1--11:30, June 2014.
[17]
Pasquale Puzio, Refik Molva, Melek Önen, and Sergio Loureiro. Block-level de-duplication with encrypted data. Open Journal of Cloud Computing (OJCC), 1(1):10--18, 2014.
[18]
Pasquale Puzio, Refik Molva, Melek Önen, and Sergio Loureiro. PerfectDedup: Secure Data Deduplication, pages 150--166. Springer International Publishing, Cham, 2016.
[19]
Michael O. Rabin. Fingerprinting by random polynomials. Harvard Aiken Computation Laboratory, pages 1--12, 1981.
[20]
Jan Stanek, Alessandro Sorniotti, Elli Androulaki, and Lukas Kencl. A Secure Data Deduplication Scheme for Cloud Storage. In 18th International Conference on Financial Cryptography and Data Security (FC), 2014.
[21]
Mark W. Storer, Kevin Greenan, Darrell D.E. Long, and Ethan L. Miller. Secure data deduplication. StorageSS '08, 2008.
[22]
Marc van Leeuwen. Extended stars-and-bars problem, 2013. http://math.stackexchange.com/a/554237.

Cited By

View all
  • (2022)Tunable Encrypted Deduplication with Attack-resilient Key ManagementACM Transactions on Storage10.1145/351061418:4(1-38)Online publication date: 27-Sep-2022
  • (2022)Revisiting Frequency Analysis against Encrypted Deduplication via Statistical DistributionIEEE INFOCOM 2022 - IEEE Conference on Computer Communications10.1109/INFOCOM48880.2022.9796897(290-299)Online publication date: 2-May-2022
  • (2020)Information Leakage in Encrypted Deduplication via Frequency AnalysisACM Transactions on Storage10.1145/336584016:1(1-30)Online publication date: 29-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCSW '16: Proceedings of the 2016 ACM on Cloud Computing Security Workshop
October 2016
116 pages
ISBN:9781450345729
DOI:10.1145/2996429
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud storage
  2. deduplication
  3. information leakage
  4. privacy
  5. storage inference

Qualifiers

  • Research-article

Funding Sources

  • Horizon 2020
  • Zurich Information Security & Privacy Center (ZISC)

Conference

CCS'16
Sponsor:

Acceptance Rates

CCSW '16 Paper Acceptance Rate 8 of 23 submissions, 35%;
Overall Acceptance Rate 37 of 108 submissions, 34%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Tunable Encrypted Deduplication with Attack-resilient Key ManagementACM Transactions on Storage10.1145/351061418:4(1-38)Online publication date: 27-Sep-2022
  • (2022)Revisiting Frequency Analysis against Encrypted Deduplication via Statistical DistributionIEEE INFOCOM 2022 - IEEE Conference on Computer Communications10.1109/INFOCOM48880.2022.9796897(290-299)Online publication date: 2-May-2022
  • (2020)Information Leakage in Encrypted Deduplication via Frequency AnalysisACM Transactions on Storage10.1145/336584016:1(1-30)Online publication date: 29-Mar-2020
  • (2020)Balancing storage efficiency and data confidentiality with tunable encrypted deduplicationProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387531(1-15)Online publication date: 15-Apr-2020
  • (2020)Privacy Aware Data Deduplication for Side Channel in Cloud StorageIEEE Transactions on Cloud Computing10.1109/TCC.2018.27945428:2(597-609)Online publication date: 1-Apr-2020
  • (2020)SeGShare: Secure Group File Sharing in the Cloud using Enclaves2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48063.2020.00061(476-488)Online publication date: Jun-2020
  • (2018)Differentially Private Access Patterns for Searchable Symmetric EncryptionIEEE INFOCOM 2018 - IEEE Conference on Computer Communications10.1109/INFOCOM.2018.8486381(810-818)Online publication date: Apr-2018
  • (2018)Security Notions for Cloud Storage and DeduplicationProvable Security10.1007/978-3-030-01446-9_20(347-365)Online publication date: 7-Oct-2018
  • (2017)Side Channels in DeduplicationProceedings of the 2017 ACM on Asia Conference on Computer and Communications Security10.1145/3052973.3053019(266-274)Online publication date: 2-Apr-2017
  • (2017)Information Leakage in Encrypted Deduplication via Frequency Analysis2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2017.28(1-12)Online publication date: Jun-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media