skip to main content
10.1145/2500727.2500731acmotherconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

RevDedup: a reverse deduplication storage system optimized for reads to latest backups

Published: 29 July 2013 Publication History

Abstract

Deduplication is known to effectively eliminate duplicates, yet it introduces fragmentation that degrades read performance. We propose RevDedup, a deduplication system that optimizes reads to the latest backups of virtual machine (VM) images using reverse deduplication. In contrast with conventional deduplication that removes duplicates from new data, RevDedup removes duplicates from old data, thereby shifting fragmentation to old data while keeping the layout of new data as sequential as possible. We evaluate our RevDedup prototype using a 12-week span of real-world VM image snapshots of 160 users. We show that RevDedup achieves high deduplication efficiency, high backup throughput, and high read throughput.

References

[1]
D. Bhagwat, K. Eshghi, D. D. E. Long, and M. Lillibridge. Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In Proc. IEEE MASCOTS, Sep 2009.
[2]
A. T. Clements, I. Ahmad, M. Vilayannur, and J. Li. Decentralized deduplication in SAN cluster file systems. In Proc. USENIX ATC, Jun 2009.
[3]
Scott Dickson. Sillyt ZFS dedup experiment. https://blogs.oracle.com/scottdickson/entry/sillyt_zfs_dedup_experiment, Dec 2009.
[4]
F. Guo and P. Efstathopoulos. Building a high performance deduplication system. In Proc. USENIX ATC, Jun 2011.
[5]
K. Jin and E. L. Miller. The effectiveness of deduplication on virtual machine disk images. In Proc. SYSTOR, May 2009.
[6]
M. Kaczmarczyk, M. Barczynski, W. Kilian, and C. Dubnicki. Reducing impact of data fragmentation caused by in-line deduplication. In Proc. SYSTOR, Jun 2012.
[7]
E. Kruus, C. Ungureanu, and C. Dubnicki. Bimodal content defined chunking for backup streams. In Proc. USENIX FAST, Feb 2010.
[8]
Mark Lillibridge, Kave Eshghi, and Deepavali Bhagwat. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proc. of USENIX FAST, Feb 2013.
[9]
LWN.net. Punching holes in files. http://lwn.net/Articles/415889/.
[10]
Y. Nam, D. Park, and D. Du. Assuring demanded read performance of data deduplication storage with backup datasets. In Proc. IEEE MASCOTS, 2012.
[11]
C. Ng and P. Lee. RevDedup: A reverse deduplication storage system optimized for reads to latest backups. Technical report, CUHK, Jun 2013. http://arxiv.org/abs/1302.0621v3.
[12]
Opendedup. http://www.opendedup.org/.
[13]
S. Quinlan and S. Dorward. Venti: a new approach to archival storage. In Proc. USENIX FAST, Jan 2002.
[14]
S. Rhea, R. Cox, and A. Pesterev. Fast, inexpensive content-addressed storage in foundation. In Proc. USENIX ATC, Jun 2008.
[15]
K. Srinivasan, T. Bisson, G. Goodson, and K. Voruganti. iDedup: Latency-aware, inline data deduplication for primary storage. In Proc. USENIX FAST, Feb 2012.
[16]
B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. USENIX FAST, Feb 2008.

Cited By

View all
  • (2023)Efficient Integrity Auditing Mechanism With Secure Deduplication for Blockchain StorageIEEE Transactions on Computers10.1109/TC.2023.324827872:8(2365-2376)Online publication date: 1-Aug-2023
  • (2023)A Secure Electronic Medical Record Authorization System in Clouds2023 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)10.1109/SCEECS57921.2023.10061820(1-4)Online publication date: 18-Feb-2023
  • (2023)A Cloud based Improved File Handling and Duplicate Removal using MD52023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)10.1109/ICAIS56108.2023.10073786(1532-1536)Online publication date: 2-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
APSys '13: Proceedings of the 4th Asia-Pacific Workshop on Systems
July 2013
131 pages
ISBN:9781450323161
DOI:10.1145/2500727
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Nanyang Technological University
  • SUTD: Singapore University of Technology and Design
  • NUS: NUS

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

APSys '13
Sponsor:
  • SUTD
  • NUS
APSys '13: Asia-Pacific Workshop on Systems
July 29 - 30, 2013
Singapore, Singapore

Acceptance Rates

APSys '13 Paper Acceptance Rate 23 of 73 submissions, 32%;
Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient Integrity Auditing Mechanism With Secure Deduplication for Blockchain StorageIEEE Transactions on Computers10.1109/TC.2023.324827872:8(2365-2376)Online publication date: 1-Aug-2023
  • (2023)A Secure Electronic Medical Record Authorization System in Clouds2023 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)10.1109/SCEECS57921.2023.10061820(1-4)Online publication date: 18-Feb-2023
  • (2023)A Cloud based Improved File Handling and Duplicate Removal using MD52023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)10.1109/ICAIS56108.2023.10073786(1532-1536)Online publication date: 2-Feb-2023
  • (2023)Public Auditing and Secure Deduplication of Dynamic Data Based on Blockchain2023 International Conference on Data Security and Privacy Protection (DSPP)10.1109/DSPP58763.2023.10404663(12-21)Online publication date: 16-Oct-2023
  • (2023)A blockchain-based compact audit-enabled deduplication in decentralized storageComputer Standards & Interfaces10.1016/j.csi.2022.10371885(103718)Online publication date: Apr-2023
  • (2022)Dedup-for-speedProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534937(128-139)Online publication date: 6-Jun-2022
  • (2022)Blockchain-Based Secure Deduplication and Shared Auditing in Decentralized StorageIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.311416019:6(3941-3954)Online publication date: 1-Nov-2022
  • (2021)Ensuring high reliability and performance with low space overhead for deduplicated and delta‐compressed storage systemsConcurrency and Computation: Practice and Experience10.1002/cpe.670634:5Online publication date: 10-Nov-2021
  • (2020)Improving the Restore Performance via Physical-Locality Middleware for Backup SystemsProceedings of the 21st International Middleware Conference10.1145/3423211.3425691(341-355)Online publication date: 7-Dec-2020
  • (2019)Optimizing the restoration performance of deduplication systems through an energy-saving data layoutAnnals of Telecommunications10.1007/s12243-019-00711-zOnline publication date: 9-Mar-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media