Skip to main content
Log in

An SSD-based accelerator for directory parsing in storage systems containing massive files

  • Published:
Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Abstract

Data explosion introduces new challenges to storage systems. In a file system for big data, a large number of directories and files exist, which are usually organized in a large tree. Parsing directories in a large tree is difficult. In this paper, we propose an accelerator, which helps file systems to fetch the metadata of files rapidly. Contributions of this work include two aspects. First, we propose an accelerator for directory parsing. The accelerator is actually an SSD-based (Solid State Drive-based) cache, which keeps the metadata of frequently or recently accessed files and directories. When a file is demanded, the accelerator attempts to obtain its metadata directly from SSD. If the metadata is kept in SSD, the file system can rapidly obtain the metadata. However, if the metadata is not in SSD, the accelerator consumes a long time to access SSD, but to no avail. In order to avoid non-beneficial SSD accesses, the accelerator predicts whether the metadata is kept by SSD before issuing a read request. Only if the metadata has a high probability of being kept in SSD, the accelerator issues a request to the SSD. The second contribution of this paper is a new bloom filter used to predict whether a piece of data is kept in SSD. Bloom filter is a space-efficient data structure supporting membership query. But, the standard bloom filter cannot support element deletion. Whereas, our accelerator is a cache, which evicts items periodically. The standard bloom filter is not suitable for our accelerator. In this work, we designed a new bloom filter with low overhead, which supports element deletion. The new bloom filter perfectly suits the proposed accelerator. With the prediction of our bloom filter, the accelerator can accelerate the process of directory parsing with nearly no negative impact. We evaluated the accelerator by using a prototype. Experimental results demonstrate that, the accelerator can speed up the directory parsing process by nearly four times compared with a file system without an accelerator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bryant RE, Katz RH, Lazowska ED (2008) Big-data computing: Creating revolutionary break throughs in commerce, science, and society. In Computing Research Initiatives for the 21st Century. Computing Research Association

  2. Oracle Information Architecture: An Architect’s Guide to Big Data (2012) An oracle white paper in enterprise architecture

  3. Villars RL, Olofson CW, Eastwood M (2011) Big data: What it is and why you should care. White Paper, IDC

  4. IDC: Digital Data to Double Every 18 Months (2009) Information management journal, September. 2009, vol. 43/5 Docstoc page 20

  5. Big Data: Beyond the hype, why big data matters to you. White Paper, DataStax Corporation, March 2012

  6. Russom P (2011) Big data analytics. TDWI best practices report. The fourth quarter 2011

  7. Zikopoulos P, Eaton C, Zikopoulos P (2011) Understanding big data: Analytics for enterprise class hadoop and streaming data. Published by Paul Zikopoulos, October 2011

  8. Burton HB (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426

    Article  MATH  Google Scholar 

  9. Ext2fs Home Page. http://e2fsprogs.sourceforge.net/ext2.html. Accessed 25 October 2012

  10. Roselli D, Lorch J, Anderson T (2000) A compareison of file system workloads. Proceedings of the 2000 USENIX Annual Technical Conference, pp. 41–54

  11. The Directory Cache and Inode Cache. http://www.science.unitn.it/~fiorella/guidelinux/tlk/. Accessed 20 October 2012

  12. Borthakur D (2012) The hadoop distributed file system: architecture and design. http://hadoop.apache.org/core/docs/current/hdfs_design.pdf. Accessed 20 October

  13. Patil S, Gibson G (2011) Scale and concurrency of GIGA+: file system directories with millions of files, Proceedings of the 9th USENIX conference on File and stroage technologies, p.13-13, February 15–17, 2011, San Jose, California

  14. Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. Proceedings of the nineteenth ACM symposium on operating systems principles, October 19–22, 2003. Bolton Landing, New York

    Google Scholar 

  15. Shi W (2010) Foundations of computer systems research. Higher Education, Beijing

    Google Scholar 

  16. Bonomi F, Mitzenmacher M, Panigrahy R, Singh S, Varghese G (2006) An improved construction for counting bloom filters, Proceedings of the 14th conference on Annual European Symposium, pp. 684–695, September 11–13, 2006, Zurich, Switzerland

  17. FIPS 180-1, Secure Hash Standard, April 1995

  18. FUSE: Filesystem in Userspace, http://www.fuse.sourceforge.net. Accessed 19 September 2012

  19. SNIA IOTTA Repository: MSR Cambridge Block I/O Traces, http://iotta.snia.org/traces/list/BlockIO. Accessed 19 September 2012

  20. Narayanan D, Donnelly A, Rowstron A (2008) Write off-loading: Practical power management for enterprise storage. Proceedings of the 6th USENIX conference on file and storage technologies, pp. 253–267. San Jose, CA, USA, February 26–29

  21. Hua Y, Zhu Y, Jiang H, Feng D, Tian L (2008) Scalable and adaptive metadata management in ultra large-scale file systems. Proceedings of the ICDCS pp. 403–410

  22. Linchen Y, Liao X, Jin H, Jiang W (2011) Integrated buffering schemes for P2P VoD services. Peer-to-Peer Networking and Applications 4(1):63–74

    Article  Google Scholar 

  23. Liao X, Jin H, Linchen Y (2012) A novel data replication mechanism in P2P VoD system. Future Generation Computing System 28(6):930–939

    Article  Google Scholar 

  24. Sirui Y, Hai J, Bo L, Xiaofei L, Hong Y, Qi H, Xuping T (2009) Measuring web feature impacts in Peer-to-Peer file sharing systems. Comput Commun 32(12):1418–1425

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to our anonymous reviewers for their suggestions. This work is supported by the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013201, the National Natural Science Foundation of China under Grant Nos. 61025009, 61232003, 61120106005, 61170288, 61070198.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguang Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Xiao, N. & Liu, F. An SSD-based accelerator for directory parsing in storage systems containing massive files. Peer-to-Peer Netw. Appl. 6, 397–408 (2013). https://doi.org/10.1007/s12083-013-0209-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12083-013-0209-3

Keywords

Navigation