Skip to main content

Filtering and Matching of Data Blocks to Avoid Disk Bottleneck in De-duplication File System

  • Conference paper
Advanced Computer Architecture

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 451))

  • 978 Accesses

Abstract

Since the growing scale of data has generated huge redundancy, de-duplication which can eliminate redundancy and improve space utilization of storage device has been widely adopted. De-duplication filesystem can provide a unified interface to the upper application and implement inline de-duplication. In this paper, we design and implement FmdFS, a kernel-space de-duplication filesystem. Due to memory limitation, metadata of FmdFS is stored on disk group. Meanwhile a scale-adaptive binary tree filter is constructed in memory, which not only avoids access to the metadata on disk for searching fingerprints of most new data, but also records the groups where duplicate data is stored. In addition, FmdFS uses LRU hash cache, which holds the metadata group that has been recently accessed, to exploit locality to match the duplicate data to avoid access to the metadata on disk. In comparison with traditional de-duplication filesystems, FmdFS has the higher write performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gantz, J., Reinsel, D.: The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze the Future (2012)

    Google Scholar 

  2. Opendedup (2013), http://www.opendedup.org

  3. LessFS (2013), http://www.lessfs.com

  4. Rodeh, O., Teperman, A.: zFS-a scalable distributed file system using object disks. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST 2003), pp. 207–218. IEEE (2003)

    Google Scholar 

  5. Quinlan, S., Dorward, S.: Venti: A New Approach to Archival Storage. In: FAST, vol. 2, pp. 89–101 (2002)

    Google Scholar 

  6. Zhu, B., Li, K., Patterson, R.H.: Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In: Fast, vol. 8, pp. 1–14 (2008)

    Google Scholar 

  7. Lu, G., Nam, Y.J., Du, D.H.: BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. In: 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11. IEEE (2012)

    Google Scholar 

  8. Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezis, G., Camble, P.: Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. In: Fast, vol. 9, pp. 111–123 (2009)

    Google Scholar 

  9. Bhagwat, D., Eshghi, K., Long, D.D., Lillibridge, M.: Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS 2009, pp. 1–9. IEEE (2009)

    Google Scholar 

  10. Mao, B., Jiang, H., Wu, S., Fu, Y., Tian, L.: Read-performance optimization for deduplication-based storage systems in the cloud. ACM Transactions on Storage (TOS) 10(2), 6 (2014)

    Google Scholar 

  11. Mao, B., Jiang, H., Wu, S., Fu, Y., Tian, L.: SAR: SSD Assisted Restore Optimization for Deduplication-Based Storage Systems in the Cloud. In: 2012 IEEE 7th International Conference on Networking, Architecture and Storage (NAS), pp. 328–337. IEEE (2012)

    Google Scholar 

  12. Debnath, B., Sengupta, S., Li, J.: ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, p. 16. USENIX Association (2010)

    Google Scholar 

  13. Rivest, R.: The MD5 message-digest algorithm (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, J., Zhang, X., Zhao, R., Dong, X. (2014). Filtering and Matching of Data Blocks to Avoid Disk Bottleneck in De-duplication File System. In: Wu, J., Chen, H., Wang, X. (eds) Advanced Computer Architecture. Communications in Computer and Information Science, vol 451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44491-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44491-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44490-0

  • Online ISBN: 978-3-662-44491-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics