Filtering and Matching of Data Blocks to Avoid Disk Bottleneck in De-duplication File System

Zhang, Jiajia; Zhang, Xingjun; Zhao, Runting; Dong, Xiaoshe

doi:10.1007/978-3-662-44491-7_6

Jiajia Zhang¹⁵,
Xingjun Zhang¹⁵,
Runting Zhao¹⁵ &
…
Xiaoshe Dong¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 451))

978 Accesses

Abstract

Since the growing scale of data has generated huge redundancy, de-duplication which can eliminate redundancy and improve space utilization of storage device has been widely adopted. De-duplication filesystem can provide a unified interface to the upper application and implement inline de-duplication. In this paper, we design and implement FmdFS, a kernel-space de-duplication filesystem. Due to memory limitation, metadata of FmdFS is stored on disk group. Meanwhile a scale-adaptive binary tree filter is constructed in memory, which not only avoids access to the metadata on disk for searching fingerprints of most new data, but also records the groups where duplicate data is stored. In addition, FmdFS uses LRU hash cache, which holds the metadata group that has been recently accessed, to exploit locality to match the duplicate data to avoid access to the metadata on disk. In comparison with traditional de-duplication filesystems, FmdFS has the higher write performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gantz, J., Reinsel, D.: The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze the Future (2012)
Google Scholar
Opendedup (2013), http://www.opendedup.org
LessFS (2013), http://www.lessfs.com
Rodeh, O., Teperman, A.: zFS-a scalable distributed file system using object disks. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST 2003), pp. 207–218. IEEE (2003)
Google Scholar
Quinlan, S., Dorward, S.: Venti: A New Approach to Archival Storage. In: FAST, vol. 2, pp. 89–101 (2002)
Google Scholar
Zhu, B., Li, K., Patterson, R.H.: Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In: Fast, vol. 8, pp. 1–14 (2008)
Google Scholar
Lu, G., Nam, Y.J., Du, D.H.: BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. In: 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11. IEEE (2012)
Google Scholar
Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezis, G., Camble, P.: Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. In: Fast, vol. 9, pp. 111–123 (2009)
Google Scholar
Bhagwat, D., Eshghi, K., Long, D.D., Lillibridge, M.: Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS 2009, pp. 1–9. IEEE (2009)
Google Scholar
Mao, B., Jiang, H., Wu, S., Fu, Y., Tian, L.: Read-performance optimization for deduplication-based storage systems in the cloud. ACM Transactions on Storage (TOS) 10(2), 6 (2014)
Google Scholar
Mao, B., Jiang, H., Wu, S., Fu, Y., Tian, L.: SAR: SSD Assisted Restore Optimization for Deduplication-Based Storage Systems in the Cloud. In: 2012 IEEE 7th International Conference on Networking, Architecture and Storage (NAS), pp. 328–337. IEEE (2012)
Google Scholar
Debnath, B., Sengupta, S., Li, J.: ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, p. 16. USENIX Association (2010)
Google Scholar
Rivest, R.: The MD5 message-digest algorithm (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, 710049, China
Jiajia Zhang, Xingjun Zhang, Runting Zhao & Xiaoshe Dong

Authors

Jiajia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xingjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Runting Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoshe Dong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, 410073, Changsha, China
Junjie Wu
Shanghai Jiao Tong University, 200240, Shanghai, China
Haibo Chen
College of Information Science and Engineering, Northeastern University Shenyang, China
Xingwei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Zhang, X., Zhao, R., Dong, X. (2014). Filtering and Matching of Data Blocks to Avoid Disk Bottleneck in De-duplication File System. In: Wu, J., Chen, H., Wang, X. (eds) Advanced Computer Architecture. Communications in Computer and Information Science, vol 451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44491-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-44491-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44490-0
Online ISBN: 978-3-662-44491-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics