Abstract
The massive data puts forward higher requirements on the capacity of storage devices, but from a practical point of view, the increasement of capacity is far more behind the growth of data. Deduplication technique, for its high efficiency, few resource consumption and extensive application scope, comes to the fore among various data reduction techniques. The so-called data deduplication refers to find and eliminate redundant data among the storage system. For local storage system, the only one data object is needed to store to save limited storage space; for network system, not only storage space can be saved, but also transmission bandwidth can be reduced to increase the transmission rate. It is a compromise to achieve the purpose of efficient storage at cost of computational overhead. This article will introduce data deduplication techniques, describe basic principles and processes, summarize the main technique of the current study and provide recommendations for future development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhagwat D, Pollack K, Long DD, Schwarz T, Miller EL, Paris JF (2006) Providing high reliability in a minimum redundancy archival storage system. In: 14th IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication systems, MASCOTS. IEEE, pp 413–421
Bhagwat D, Eshghi K, Long DDE, Lillibridge M (2009) Extreme binning: scalable, parallel deduplication for chunk-based file backup. Modeling analysis and simulation of computer and telecommunication systems MASCOTS, pp 1–9
Bolosky WJ, Corbin S, Goebel D, Douceur JR (2000) Single instance storage in windows. In: Proceedings of the 4th USENIX windows systems symposium, pp 13–24. Seattle, WA
Centera E (2004) Content addressed storage system
Cox LP, Murray CD, Noble BD (2002) Pastiche: making backup cheap and easy. ACM SIGOPS Oper Syst Rev 36(SI):285–298
Debnath BK, Sengupta S, Li J (2010) Chunkstash: speeding up inline storage deduplication using flash memory. In: USENIX annual technical conference
Denehy TE, Hsu WW (2003) Duplicate management for reference data. Technical report, Research Report RJ10305, IBM
Douglis F, Iyengar A (2003) Application-specific delta-encoding via resemblance detection. In: USENIX annual technical conference, general track, pp 113–126
Dubnicki C, Gryz L, Heldt L, Kaczmarczyk M, Kilian W, Strzelczak P, Szczepkowski J, Ungureanu C, Welnicki M (2009) Hydrastor: a scalable secondary storage. In: FAST, vol 9, pp 197–210
Guo F, Efstathopoulos P (2011) Building a high-performance deduplication system. In: USENIX annual technical conference
Henson V (2003) An analysis of compare-by-hash. In: HotOS, pp 13–18
Jain N, Dahlin M, Tewari R (2005) Taper: tiered approach for eliminating redundancy in replica synchronization. In: Proceedings of the 4th conference on USENIX conference on file and storage technologies, vol 4, pp 21–21. USENIX Association
Kruus E, Ungureanu C, Dubnicki C (2010) Bimodal content defined chunking for backup streams. In: FAST, pp 239–252
Kubiatowicz J, Bindel D, Chen Y, Czerwinski S, Eaton P, Geels D, Gummadi R, Rhea S, Weatherspoon H, Weimer W et al (2000) Oceanstore: an architecture for global-scale persistent storage. ACM SIGPLAN Not 35(11):190–201
Li AO, Shu JW, Ming-Qiang LI (2010) Data deduplication techniques. J Softw 1(21):430–433
Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezis G, Camble P (2009) Sparse indexing: large scale, inline deduplication using sampling and locality. In: Fast, vol 9, pp 111–123
Lin X, Lu G, Douglis F, Shilane P, Wallace G (2014) Migratory compression: coarse-grained data reordering to improve compressibility. In: FAST, pp 257–271
Liu C, Lu Y, Shi C, Lu G, Du DH, Wang DS (2008) Admad: application-driven metadata aware de-duplication archival storage system. In: Fifth IEEE international workshop on storage network architecture and parallel I/Os, SNAPI’08. IEEE, pp 29–35
Liu C, Gu Y, Sun L, Yan B, Wang D (2009) R-admad: high reliability provision for large-scale de-duplication archival storage systems. In: Proceedings of the 23rd international conference on supercomputing. ACM, pp 370–379
Meister D, Brinkmann A (2009) Multi-level comparison of data deduplication in a backup scenario. In: Proceedings of SYSTOR 2009: the Israeli experimental systems conference. ACM, p 8
Meister D, Brinkmann A (2010) dedupv1: improving deduplication throughput using solid state drives (SSD). In: IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, pp 1–6
Min J, Yoon D, Won Y (2011) Efficient deduplication techniques for modern backup operation. IEEE Trans Comput 60(6):824–840
Muthitacharoen A, Chen B, Mazieres D (2001) A low-bandwidth network file system. In: ACM SIGOPS operating systems review, vol 35. ACM, pp 174–187
Quinlan S, Dorward S (2002) Venti: a new approach to archival storage. In: FAST, vol 2, pp 89–101
Tan Y, Yan Z, Feng D, He X, Zou Q, Yang L (2015) De-frag: an efficient scheme to improve deduplication performance via reducing data placement de-linearization. Clust Comput 18(1):79–92
Won Y, Kim R, Ban J, Hur J, Oh S, Lee J (2008) Prun: eliminating information redundancy for large scale data backup system. In: International conference on computational sciences and its applications, ICCSA’08. IEEE, pp 139–144
Xia W, Jiang H, Feng D, Tian L, Fu M, Wang Z (2012) P-dedupe: exploiting parallelism in data deduplication system. In: IEEE 7th international conference on networking, architecture and storage (NAS). IEEE, pp 338–347
Xu M, Zhu Y, Lee PP, Xu Y, Even data placement for load balance in reliable distributed deduplication storage systems
Yinjin F, Nong X, Fang L (2012) Research and development on key techniques of data deduplication [j]. J Comput Res Dev 1:002
You L, Karamanolis CT (2004) Evaluation of efficient archival storage techniques. In: MSST, pp 227–232. Citeseer
You LL, Pollack KT, Long DD (2005) Deep store: an archival storage system architecture. In: Proceedings of the 21st international conference on data engineering, ICDE. IEEE, pp 804–815
Zhengda Z, Jingli Z (2010) A novel data redundancy scheme for de-duplication storage system. In: 3rd international symposium on knowledge acquisition and modeling (KAM). IEEE, pp 293–296
Zhou Z, Zhou J (2012) High availability replication strategy for deduplication storage system. Adv Inf Sci Serv Sci 4(8):115
Zhu B, Li K, Patterson RH (2008) Avoiding the disk bottleneck in the data domain deduplication file system. In: Fast, vol 8, pp 1–14
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, X., Deng, M. (2017). An Overview on Data Deduplication Techniques. In: Balas, V., Jain, L., Zhao, X. (eds) Information Technology and Intelligent Transportation Systems. Advances in Intelligent Systems and Computing, vol 455. Springer, Cham. https://doi.org/10.1007/978-3-319-38771-0_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-38771-0_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38769-7
Online ISBN: 978-3-319-38771-0
eBook Packages: EngineeringEngineering (R0)