DEDISbench: A Benchmark for Deduplicated Storage Systems

Paulo, J.; Reis, P.; Pereira, J.; Sousa, A.

doi:10.1007/978-3-642-33615-7_9

J. Paulo²⁶,
P. Reis²⁶,
J. Pereira²⁶ &
…
A. Sousa²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7566))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1026 Accesses
6 Citations

Abstract

Deduplication is widely accepted as an effective technique for eliminating duplicated data in backup and archival systems. Nowadays, deduplication is also becoming appealing in cloud computing, where large-scale virtualized storage infrastructures hold huge data volumes with a significant share of duplicated content. There have thus been several proposals for embedding deduplication in storage appliances and file systems, providing different performance trade-offs while targeting both user and application data, as well as virtual machine images.

It is however hard to determine to what extent is deduplication useful in a particular setting and what technique will provide the best results. In fact, existing disk I/O micro-benchmarks are not designed for evaluating deduplication systems, following simplistic approaches for generating data written that lead to unrealistic amounts of duplicates.

We address this with DEDISbench, a novel micro-benchmark for evaluating disk I/O performance of block based deduplication systems. As the main contribution, we introduce the generation of a realistic duplicate distribution based on real datasets. Moreover, DEDISbench also allows simulating access hotspots and different load intensities for I/O operations. The usefulness of DEDISbench is shown by comparing it with Bonnie++ and IOzone open-source disk I/O micro-benchmarks on assessing two open-source deduplication systems, Opendedup and Lessfs, using Ext4 as a baseline. As a secondary contribution, our results lead to novel insight on the performance of these file systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

EAD: elasticity aware deduplication manager for datacenters with multi-tier storage systems

Article 07 March 2018

DZIP: A Data Deduplication-Compatible Enhanced Version of Gzip

The HDFS Replica Placement Policies: A Comparative Experimental Investigation

References

Agrawal, N., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Generating realistic impressions for file-system benchmarking. In: Conference on File and Storage Technologies (2009)
Google Scholar
Al-Rfou, R., Patwardhan, N., Bhagavatula, P.: Deduplication and compression benchmarking in filebench. Tech. rep. (2010)
Google Scholar
Anderson, D.: Fstress: A flexible network file service benchmark. Tech. rep. (2002)
Google Scholar
Clements, A.T., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in san cluster file systems. In: USENIX Annual Technical Conference (2009)
Google Scholar
Coker, R.: Bonnie++ web page (May 2012), http://www.coker.com.au/bonnie++/
Filebench: Filebench web page (May 2012), http://filebench.sourceforge.net
Ganger, G.R., Wilkes, J.: A study of practical deduplication. In: Conference on File and Storage Technologies (2011)
Google Scholar
White paper - complete storage and data protection architecture for vmware vsphere. Tech. rep. (2011), http://www.ea-data.com/HP_StoreOnce.pdf
Katcher, J.: Postmark: a new file system benchmark. Tech. rep. (1997)
Google Scholar
Koller, R., Rangaswami, R.: I/o deduplication: utilizing content similarity to improve i/o performance. In: Conference on File and Storage Technologies (2010)
Google Scholar
Lessfs: Lessfs web page (May 2012), http://www.lessfs.com/wordpress/
Muthitacharoen, A., Chen, B., Mazieres, D., Eres, D.M.: A low-bandwidth network file system. In: Symposium on Operating Systems Principles (2001)
Google Scholar
Nath, P., Kozuch, M.A., Ohallaron, D.R., Harkes, J., Satyanarayanan, M., Tolia, N., Toups, M.: Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines. In: USENIX Annual Technical Conference (2006)
Google Scholar
Norcott, W.D.: Iozone web page (May 2012), http://www.iozone.org/
Opendedup: Opendedup web page (May 2012), http://opendedup.org
Paulo, J.: Efficient storage of data in cloud computing. Master’s thesis (2009), http://gsd.di.uminho.pt/members/jtpaulo/pg10903-tese.pdf
Quinlan, S., Dorward, S.: Venti: A new approach to archival storage. In: Conference on File and Storage Technologies (2002)
Google Scholar
Tarasov, V., Mudrankit, A., Buik, W., Shilane, P., Kuenning, G., Zadok, E.: Generating realistic datasets for deduplication analysis. In: USENIX Annual Technical Conference. Poster Session (2012)
Google Scholar
Transaction processing performance council: TPC-C standard specification, revision 5.5, http://www.tpc.org/tpcc/spec/tpcc_current.pdf
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Conference on File and Storage Technologies (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

High-Assurance Software Lab (HASLab), INESC TEC & University of Minho, Portugal
J. Paulo, P. Reis, J. Pereira & A. Sousa

Authors

J. Paulo
View author publications
You can also search for this author in PubMed Google Scholar
P. Reis
View author publications
You can also search for this author in PubMed Google Scholar
J. Pereira
View author publications
You can also search for this author in PubMed Google Scholar
A. Sousa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Semantic Technology and Application Research Laboratory (STARLab), Vrije Universiteit Brussel, Building G-10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
Research Centre for Automatic Control, School of Engineering in Information Technology, Campus scientifique, University of Lorraine, CNRS, BP 70239, 54506, Vandoeuvre-les-Nancy, France
Hervé Panetto
La Trobe University, Melbourne, VIC, Australia
Tharam Dillon
Faculty of Computer Science, University of Vienna, 1010, Vienna, Austria
Stefanie Rinderle-Ma
Institute of Databases and Information Systems, Ulm University, Germany
Peter Dadam
School of Information Technology and Electrical Engineering, University of Queensland, QLD 4072, Brisbane, Australia
Xiaofang Zhou
HP Labs, Bristol, UK
Siani Pearson
Johannes Kepler University, Linz, Austria
Alois Ferscha
Università di Modena e Reggio Emilia, Modena, Italy
Sonia Bergamaschi
ADVIS Lab, Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Isabel F. Cruz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paulo, J., Reis, P., Pereira, J., Sousa, A. (2012). DEDISbench: A Benchmark for Deduplicated Storage Systems. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2012. OTM 2012. Lecture Notes in Computer Science, vol 7566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33615-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-33615-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33614-0
Online ISBN: 978-3-642-33615-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DEDISbench: A Benchmark for Deduplicated Storage Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

EAD: elasticity aware deduplication manager for datacenters with multi-tier storage systems

DZIP: A Data Deduplication-Compatible Enhanced Version of Gzip

The HDFS Replica Placement Policies: A Comparative Experimental Investigation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

DEDISbench: A Benchmark for Deduplicated Storage Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

EAD: elasticity aware deduplication manager for datacenters with multi-tier storage systems

DZIP: A Data Deduplication-Compatible Enhanced Version of Gzip

The HDFS Replica Placement Policies: A Comparative Experimental Investigation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation