Abstract
Data deduplication techniques are optimal solutions for reducing both bandwidth and storage space requirements for cloud backup services in data centers. During deduplication process, maintaining an index in RAM is a fundamental operation. Very large index needs more storage space. It is hard to put such a large index totally in RAM and accessing large disk also decreases throughput. To overcome this problem, index system is developed based on File classifier based Linear Indexing Deduplication called FC-LID which utilizes Linear Hashing with Representative Group (LHRG). The proposed Linear Index structure reduces deduplication computational overhead and increases deduplication efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sun, Z., Shen, J., Yong, J.: DeDu: building a deduplication storage system over cloud computing. In: 15th IEEE International Conference on Computer Supported Cooperative Work in Design (2011)
Yinjin, F., et al.: AA-Dedupe: an application-aware source deduplication approach for cloud backup services in the personal computing environment. In: IEEE International Conference on Cluster Computing, pp. 112–120 (2011)
Zhonglin, H., Yuhua, H.: A study on cloud backup technology and its development. In: International Conference, ICCIC 2011, pp 1–7. Wuhan, China, 17–18 September 2011
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th Conference on USENIX Conference on File and Storage Technologies, San Jose, CA, USA, pp. 269–282. USENIX Association, Berkeley, CA, USA, 26–29, 2008
Neelaveni, P., Vijayalakshmi, M.: A survey on deduplication in cloud storage. Asian J. Inf. Technol. 13, 320–330 (2014)
Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. In: FAST 2011: Proceedings of the 9th Conference on File and Storage Technologies (2011)
Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: deduplication in cloud storage. IEEE Secur. Priv. 8(6), 40–47 (2010)
Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., Camble, P.: Sparse indexing: large scale, inline deduplication using sampling and locality. In: Proceedings of the 7th Conference on USENIX Conference on File and Storage Technologies, San Francisco, CA, USA, pp. 111–123. USENIX Association, Berkeley, CA, USA, 24–27, 2009
Bhagwat, D., Eshghi, K., Long, D., Lillibridge, M.: Extreme binning: scalable, parallel deduplication for chunk-based file backup. In: Proceedings of the 17th Annual Meeting of the IEEEIACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, London, UK, pp. 1–9. IEEE Computer Society, Washington, DC, USA, 21–23, 2014
Eshghi, K., Lillibridge, M., Wilcock, L., Belrose, G., Hawkes, R.: Jumbo store: providing efficient incremental upload and versioning for a utility rendering service. In: Proceedings of the 5th Conference on USENIX Conference on File and Storage Technologies, San Jose, CA, USA, pp. 123–138. USENIX Association, Berkeley, CA, USA, 13–16, 2007
Dong, W., Douglis, F., Li, K., Patterson, H., Reddy, S., Shilane, P.: Tradeoffs in scalable data routing for deduplication clusters. In: Proceedings of the 9th Conference on USENIX Conference on File and Storage Technologies, San Jose, CA, USA, pp. 15–29. USENIX Association, Berkeley, CA USA, 15–17, 2011
Mell, P., Grance, T.: The NIST Definition of Cloud Computing, Draft by The National Institute of Standards and Technology (NIST). United States Department of Commerce Version 15 (2009)
Tan, Y., Jiang, H., Sha, E.H.-M., Yan, Z., Feng, D.: SAFE: a source deduplication framework for efficient cloud backup services. J. Sign Process Syst. 72, 209–228 (2013). Springer Science, Business Media, New York
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST 2008, pp. 18:1–18:14. USENIX Association, Berkeley, CA, USA
Wei, J., Jiang, H., Zhou, K., Feng, D.: Mad2: a scalable high-throughput exact deduplication approach for network backup services. In: IEEE NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 1–14 (2010)
Amazon’s Elastic Block Storage. Elastic Block Storage. http://aws.amazon.com/ebs/
Amazon’s Simple Storage Service. Simple Storage Service. http://aws.amazon.com/s3/
Gluster file system. http://www.gluster.org
http://gluster.com/community/documentation/index.php/MainPag
http://open.eucalyptus.com/wiki/EucalyptusWalrusInteracting_v.0
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Neelaveni, P., Vijayalakshmi, M. (2016). FC-LID: File Classifier Based Linear Indexing for Deduplication in Cloud Backup Services. In: Bjørner, N., Prasad, S., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2016. Lecture Notes in Computer Science(), vol 9581. Springer, Cham. https://doi.org/10.1007/978-3-319-28034-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-28034-9_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28033-2
Online ISBN: 978-3-319-28034-9
eBook Packages: Computer ScienceComputer Science (R0)