FC-LID: File Classifier Based Linear Indexing for Deduplication in Cloud Backup Services

Neelaveni, P.; Vijayalakshmi, M.

doi:10.1007/978-3-319-28034-9_28

FC-LID: File Classifier Based Linear Indexing for Deduplication in Cloud Backup Services

P. Neelaveni¹⁶ &
M. Vijayalakshmi¹⁶

Conference paper
First Online: 25 December 2015

799 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9581))

Abstract

Data deduplication techniques are optimal solutions for reducing both bandwidth and storage space requirements for cloud backup services in data centers. During deduplication process, maintaining an index in RAM is a fundamental operation. Very large index needs more storage space. It is hard to put such a large index totally in RAM and accessing large disk also decreases throughput. To overcome this problem, index system is developed based on File classifier based Linear Indexing Deduplication called FC-LID which utilizes Linear Hashing with Representative Group (LHRG). The proposed Linear Index structure reduces deduplication computational overhead and increases deduplication efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Sun, Z., Shen, J., Yong, J.: DeDu: building a deduplication storage system over cloud computing. In: 15th IEEE International Conference on Computer Supported Cooperative Work in Design (2011)
Google Scholar
Yinjin, F., et al.: AA-Dedupe: an application-aware source deduplication approach for cloud backup services in the personal computing environment. In: IEEE International Conference on Cluster Computing, pp. 112–120 (2011)
Google Scholar
Zhonglin, H., Yuhua, H.: A study on cloud backup technology and its development. In: International Conference, ICCIC 2011, pp 1–7. Wuhan, China, 17–18 September 2011
Google Scholar
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th Conference on USENIX Conference on File and Storage Technologies, San Jose, CA, USA, pp. 269–282. USENIX Association, Berkeley, CA, USA, 26–29, 2008
Google Scholar
Neelaveni, P., Vijayalakshmi, M.: A survey on deduplication in cloud storage. Asian J. Inf. Technol. 13, 320–330 (2014)
Google Scholar
Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. In: FAST 2011: Proceedings of the 9th Conference on File and Storage Technologies (2011)
Google Scholar
Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: deduplication in cloud storage. IEEE Secur. Priv. 8(6), 40–47 (2010)
Article Google Scholar
Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., Camble, P.: Sparse indexing: large scale, inline deduplication using sampling and locality. In: Proceedings of the 7th Conference on USENIX Conference on File and Storage Technologies, San Francisco, CA, USA, pp. 111–123. USENIX Association, Berkeley, CA, USA, 24–27, 2009
Google Scholar
Bhagwat, D., Eshghi, K., Long, D., Lillibridge, M.: Extreme binning: scalable, parallel deduplication for chunk-based file backup. In: Proceedings of the 17th Annual Meeting of the IEEEIACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, London, UK, pp. 1–9. IEEE Computer Society, Washington, DC, USA, 21–23, 2014
Google Scholar
Eshghi, K., Lillibridge, M., Wilcock, L., Belrose, G., Hawkes, R.: Jumbo store: providing efficient incremental upload and versioning for a utility rendering service. In: Proceedings of the 5th Conference on USENIX Conference on File and Storage Technologies, San Jose, CA, USA, pp. 123–138. USENIX Association, Berkeley, CA, USA, 13–16, 2007
Google Scholar
Dong, W., Douglis, F., Li, K., Patterson, H., Reddy, S., Shilane, P.: Tradeoffs in scalable data routing for deduplication clusters. In: Proceedings of the 9th Conference on USENIX Conference on File and Storage Technologies, San Jose, CA, USA, pp. 15–29. USENIX Association, Berkeley, CA USA, 15–17, 2011
Google Scholar
Mell, P., Grance, T.: The NIST Definition of Cloud Computing, Draft by The National Institute of Standards and Technology (NIST). United States Department of Commerce Version 15 (2009)
Google Scholar
Tan, Y., Jiang, H., Sha, E.H.-M., Yan, Z., Feng, D.: SAFE: a source deduplication framework for efficient cloud backup services. J. Sign Process Syst. 72, 209–228 (2013). Springer Science, Business Media, New York
Article Google Scholar
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST 2008, pp. 18:1–18:14. USENIX Association, Berkeley, CA, USA
Google Scholar
Wei, J., Jiang, H., Zhou, K., Feng, D.: Mad2: a scalable high-throughput exact deduplication approach for network backup services. In: IEEE NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 1–14 (2010)
Google Scholar
http://open.eucalyptus.com/wiki/EucalyptusInstall_v2.0
Amazon’s Elastic Block Storage. Elastic Block Storage. http://aws.amazon.com/ebs/
Amazon’s Simple Storage Service. Simple Storage Service. http://aws.amazon.com/s3/
Gluster file system. http://www.gluster.org
http://gluster.com/community/documentation/index.php/MainPag
http://open.eucalyptus.com/wiki/EucalyptusWalrusInteracting_v.0

Download references

Author information

Authors and Affiliations

Department of Information Science and Technology, Anna University, Chennai, Tamilnadu, India
P. Neelaveni & M. Vijayalakshmi

Authors

P. Neelaveni
View author publications
You can also search for this author in PubMed Google Scholar
M. Vijayalakshmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Neelaveni .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, Washington, USA
Nikolaj Bjørner
Indian Institute of Technology Delhi, New Delhi, India
Sanjiva Prasad
IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA
Laxmi Parida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neelaveni, P., Vijayalakshmi, M. (2016). FC-LID: File Classifier Based Linear Indexing for Deduplication in Cloud Backup Services. In: Bjørner, N., Prasad, S., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2016. Lecture Notes in Computer Science(), vol 9581. Springer, Cham. https://doi.org/10.1007/978-3-319-28034-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-28034-9_28
Published: 25 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28033-2
Online ISBN: 978-3-319-28034-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics