Abstract
Tar archive format does not support file deduplication scheme, therefore it has drawbacks for utilizing disk storage system efficiently. In this paper, we propose an extended TAR file format called DTAR (Deduplication Tape Archive) which provides file-level deduplication. Key idea of our work is to provide block aligned compressed file that can speed up file insertion and deletion for archiving. Furthermore, DTAR shows performance enhancement using file similarity scheme for distributing files into several storage nodes. Experiment results show that the proposed system can reduce data storage space efficiently and diminish network data traffic compared to general file transfer system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cox, L.P., Murray, C.D., Noble, B.D.: Pastiche: Making backup cheap and easy. ACM SIGOPS Operating Systems Review 36, 285–298 (2002)
Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, p. 7. USENIX Association, Monterey (2002)
Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. ACM SIGOPS Operating Systems Review 35, 174–187 (2001)
Rabin, M.O.: Fingerprinting by random polynomials. Center for Research in Computing Techn. Aiken Computation Laboratory, Univ (1981)
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp. 1–14. USENIX Association, California (2008)
Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. In: Internet Mathematics, pp. 1–14. AK Peters (2004)
Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., Camble, P.: Sparse indexing: large scale, inline deduplication using sampling and locality. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 111–123. USENIX Association, California (2009)
Clements, A.T., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in SAN cluster file systems. In: Proceedings of the 2009 Conference on USENIX Annual Technical Conference, p. 8. USENIX Association, California (2009)
Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: HYDRAstor: a Scalable Secondary Storage. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 197–210. USENIX Association, California (2009)
Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., Bohra, A.: HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies, p. 17. USENIX Association, California (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, S.W., Jung, H.M., Lee, J.G., Cho, J.H., Ko, Y.W. (2011). DTAR: Deduplication TAR Scheme for Data Backup System. In: Kim, Th., et al. Communication and Networking. FGCN 2011. Communications in Computer and Information Science, vol 266. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27201-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-27201-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27200-4
Online ISBN: 978-3-642-27201-1
eBook Packages: Computer ScienceComputer Science (R0)