Distributed deduplication with fingerprint index management model for big data storage in the cloud

Saraswathi, S. Sabeetha; Malarvizhi, N.

doi:10.1007/s12065-020-00395-8

Distributed deduplication with fingerprint index management model for big data storage in the cloud

Special Issue
Published: 02 April 2020

Volume 14, pages 683–690, (2021)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

S. Sabeetha Saraswathi¹ &
N. Malarvizhi¹

367 Accesses
5 Citations
Explore all metrics

Abstract

As data progressively grows within data centers, the cloud storage models face several issues while storing data and offers abilities needed to shift data in an adequate time frame. This study aims to develop a distributed deduplication model to achieve scalable throughput and capacity utilizing many data servers for duplicating data in parallel with minimal loss. This paper proposes a new cloud storage model based on a distributed deduplication with the fingerprint index management (DDFI) model. The DDFI model operates on three main stages. At the initial stage, the DDFI model makes use of an effective routing technique depending upon the similarity level of the data, which leads to low network overhead by rapid identification of storage locations. In the second stage, the duplicate data identification procedure is carried out by the use of the MD5 algorithm. At the final stage, a fingerprint index management process is executed where a fingerprint index comprises fingerprints and its corresponding position details of every written chunk. For optimizing the results of the deduplication performance, the DDFI model manages the fingerprint index in storage space and only sometimes writes to disk at the same time as the cloud database scheme is idle. The simulation outcome exhibited that the presented DDFI model offered maximum results with a higher deduplication ratio (DR) with a minimum overhead of network bandwidth. From the detailed comparative analysis, it is inferred that the presented DFFI model offered maximum relative DR, maximum duplication performance, minimum read bandwidth, and write bandwidth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prefetch-aware fingerprint cache management for data deduplication systems

Article 09 June 2018

A secure framework for managing data in cloud storage using rapid asymmetric maximum based dynamic size chunking and fuzzy logic for deduplication

Article 28 August 2023

File block multi-replica management technology in cloud storage

Article 10 January 2023

References

Biggar H (2012) Experiencing data de-duplication: improving efficiency and reducing capacity requirements. White paper, Feb. 2007. The Enterprise Strategy Group, Dublin
Google Scholar
Kubiatowicz J, Bindel D, Chen Y et al (2000) Oceanstore: an architecture for global-scale persistent storage. ACM Sigplan Not 35(11):190–201
Article Google Scholar
Quinlan S, Dorward S (2002) Venti: a new approach to archival storage. In: Proceedings of the conference on file and storage technologies, vol 2, pp 89–101
Lillibridge M, Eshghi K, Bhagwat D et al (2009) Sparse indexing: large scale, inline deduplication using sampling and locality In: Proceedings of the conference on file and storage technologies, vol 9, pp 111–123
Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression complexity sequences, pp 21–29
Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceedings of conference on USENIX annual technical conference, pp 16–16
EMC Data Domain Global Deduplication Array. https://www.datadomain.com/products/global-deduplication-array.html. Visited in 2015
Dubnicki C, Gryz L, Heldt L et al (2009) HYDRAstor: a scalable secondary storage. In: FAST, vol 9, pp 197–210
Dong W, Douglis F, Li K et al (2011) Tradeoffs in scalable data routing for deduplication clusters. In: Proceedings of the conference on file and storage technologies, pp 15–29
Wang L, Zhu Z, Zhang X, Dong X, Wang Y (2017) DOMe: a deduplication optimization method for the NewSQL database backups. PLoS ONE 12(10):e0185189
Article Google Scholar
Luo S, Zhang G, Wu C, Khan S, Li K (2015) Boafft: distributed deduplication for big data storage in the cloud. IEEE Trans Cloud Comput 61:1–13
Google Scholar
Li M, Zhang H, Wu Y, Zhao C (2019) Prefetch-aware fingerprint cache management for data deduplication systems. Front Comput Sci 13(3):500–515
Article Google Scholar
Muthitacharoen A, Chen B, Mazieres D (2001) A low-bandwidth network file system. ACM SIGOPS Oper Syst Rev 35(5):174–187
Article Google Scholar
Vijayan MK, Kochunni JO, Attarde DR, Ankireddypalle RR, CommVault Systems Inc (2019) Deduplication replication in a distributed deduplication data storage system. U.S. patent application 16/232,950
Thakur MA, Bari S, Deshmukh R, Auty S (2020) Secure key agreement model for group data sharing and achieving data deduplication in cloud computing. In Information and communication technology for sustainable development. Springer, Singapore, pp 121–127
Hema S, Kangaiammal A (2019) Distributed storage hash algorithm (DSHA) for file-based deduplication in cloud computing. In: International conference on computer networks and inventive communication technologies. Springer, Cham, pp 572–581
An B, Li Y, Ma J, Huang G, Chen X, Cao D (2019) DCStore: a deduplication-based cloud-of-clouds storage service. In: 2019 IEEE international conference on web services (ICWS). IEEE, pp 291–295
Yuan H, Chen X, Li J, Jiang T, Wang J, Deng R (2019) Secure cloud data deduplication with efficient re-encryption. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2019.2948007
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, School of Computing, Vel Tech Rangarajan Dr.Sagunthala R&D Institute of Science and Technology, Chennai, India
S. Sabeetha Saraswathi & N. Malarvizhi

Authors

S. Sabeetha Saraswathi
View author publications
You can also search for this author in PubMed Google Scholar
N. Malarvizhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Sabeetha Saraswathi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saraswathi, S.S., Malarvizhi, N. Distributed deduplication with fingerprint index management model for big data storage in the cloud. Evol. Intel. 14, 683–690 (2021). https://doi.org/10.1007/s12065-020-00395-8

Download citation

Received: 29 October 2019
Revised: 16 February 2020
Accepted: 17 March 2020
Published: 02 April 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s12065-020-00395-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed deduplication with fingerprint index management model for big data storage in the cloud

Abstract

Access this article

Similar content being viewed by others

Prefetch-aware fingerprint cache management for data deduplication systems

A secure framework for managing data in cloud storage using rapid asymmetric maximum based dynamic size chunking and fuzzy logic for deduplication

File block multi-replica management technology in cloud storage

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed deduplication with fingerprint index management model for big data storage in the cloud

Abstract

Access this article

Similar content being viewed by others

Prefetch-aware fingerprint cache management for data deduplication systems

A secure framework for managing data in cloud storage using rapid asymmetric maximum based dynamic size chunking and fuzzy logic for deduplication

File block multi-replica management technology in cloud storage

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation