Skip to main content

DTAR: Deduplication TAR Scheme for Data Backup System

  • Conference paper
  • 877 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 266))

Abstract

Tar archive format does not support file deduplication scheme, therefore it has drawbacks for utilizing disk storage system efficiently. In this paper, we propose an extended TAR file format called DTAR (Deduplication Tape Archive) which provides file-level deduplication. Key idea of our work is to provide block aligned compressed file that can speed up file insertion and deletion for archiving. Furthermore, DTAR shows performance enhancement using file similarity scheme for distributing files into several storage nodes. Experiment results show that the proposed system can reduce data storage space efficiently and diminish network data traffic compared to general file transfer system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cox, L.P., Murray, C.D., Noble, B.D.: Pastiche: Making backup cheap and easy. ACM SIGOPS Operating Systems Review 36, 285–298 (2002)

    Article  Google Scholar 

  2. Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, p. 7. USENIX Association, Monterey (2002)

    Google Scholar 

  3. Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. ACM SIGOPS Operating Systems Review 35, 174–187 (2001)

    Article  Google Scholar 

  4. Rabin, M.O.: Fingerprinting by random polynomials. Center for Research in Computing Techn. Aiken Computation Laboratory, Univ (1981)

    Google Scholar 

  5. Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp. 1–14. USENIX Association, California (2008)

    Google Scholar 

  6. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. In: Internet Mathematics, pp. 1–14. AK Peters (2004)

    Google Scholar 

  7. Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., Camble, P.: Sparse indexing: large scale, inline deduplication using sampling and locality. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 111–123. USENIX Association, California (2009)

    Google Scholar 

  8. Clements, A.T., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in SAN cluster file systems. In: Proceedings of the 2009 Conference on USENIX Annual Technical Conference, p. 8. USENIX Association, California (2009)

    Google Scholar 

  9. Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: HYDRAstor: a Scalable Secondary Storage. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 197–210. USENIX Association, California (2009)

    Google Scholar 

  10. Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., Bohra, A.: HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies, p. 17. USENIX Association, California (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kang, S.W., Jung, H.M., Lee, J.G., Cho, J.H., Ko, Y.W. (2011). DTAR: Deduplication TAR Scheme for Data Backup System. In: Kim, Th., et al. Communication and Networking. FGCN 2011. Communications in Computer and Information Science, vol 266. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27201-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27201-1_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27200-4

  • Online ISBN: 978-3-642-27201-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics