Skip to main content
Log in

Two-level Hash/Table approach for metadata management in distributed file systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

AbFS is a distributed file system that makes it possible to efficiently share the inexpensive devices attached to the commodity computers of a cluster. The implementation of AbFS offers high-performance metadata management by combining hashing and tables in several levels, hierarchical structures and caches, and by combining the attributes and the namespace in the same structure. No additional layers are needed to implement caches because AbFS uses the Linux metadata caches, inode and dentry, to implement them. Along with the description of the proposed implementation for metadata management and the comparison with other implementations, this work provides experimental results to evaluate its performance obtained with a prototype made from scratch at kernel level. AbFS experimental results show that the implementation proposed is capable to manage files and directories with high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Carns PH, Ligon WB, Ross RB, Thakur R (2000) PVFS: a parallel file system for Linux clusters. In: Proc 4th annual Linux showc and conf, pp 317–327

    Google Scholar 

  2. Braam PJ (2002) The Lustre storage architecture

  3. Weil SA et al (2006) Ceph: a scalable, high-performance distributed file system. In: Proc 7th symp on oper syst des and implement (OSDI’06), pp 307–320

    Google Scholar 

  4. Schmuck F, Haskin R (2002) GPFS: a shared-disk file system for large computing clusters. In: Proc 1st USENIX conf on file and storage technol, Berkeley, pp 19–23

    Google Scholar 

  5. Soltis SR, Ruwart TM, O’Keefe MT (1996) The global file system. In: Proc 5th NASA Goddard conf on mass storage syst and technol. IEEE Comput. Soc., Los Alamitos, pp 319–342

    Google Scholar 

  6. Ousterhout JK et al (1985) A trace-driven analysis of the UNIX 4.2 BSD file system. In: Proc 10th ACM symp on oper syst princ, pp 15–24

    Google Scholar 

  7. Mummert L, Satyanarayanan M, (1996) Long term distributed file reference tracing: implementation and experience. Softw Pract Exp 26(6):705–736

    Article  Google Scholar 

  8. Roselli D, Lorch JR, Anderson TE (2000) A comparison of file system workloads. In: Proc annual conf on USENIX annual tech conf, p 4

    Google Scholar 

  9. SPECsfs2008 User’s guide v. 1.0. Standard performance evaluation corporation (SPEC)

  10. Zhu Y, Jiang H, Wang J, Xian F (2008) HBA: distributed metadata management for large cluster-based storage systems. IEEE Trans Parallel Distrib Syst 19(6):750–763

    Article  Google Scholar 

  11. Xing J, Xiong J, Sun N, Ma J (2009) Adaptive and scalable metadata management to support a trillion files. In: Proc conf. on high perform comput netw, storage and anal. ACM, New York, pp 1–11

    Google Scholar 

  12. Floyd RA, Ellis CS (1989) Directory reference patterns in hierarchical file systems. IEEE Trans Knowl Data Eng 1(2):238

    Article  Google Scholar 

  13. Wang F et al (2004) File system workload analysis for large scale scientific computing applications. In: Proc 21st IEEE/12th NASA Goddard conf on mass storage syst and technol

    Google Scholar 

  14. Hua Y et al (2011) Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Trans Parallel Distrib Syst 22(4):580–593

    Article  Google Scholar 

  15. Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. In: Proc 9th ACM symp on oper syst princ, pp 96–108

    Google Scholar 

  16. Sun M (2009) Clustered metadata design. Sun and Cray Confidential

  17. Anderson TE et al (2001) Serverless network file systems. In: Jin H, Cortes T, Buyya R (eds) High perform mass storage and parallel {I/O}: technol and appl. IEEE Comput Soc/Wiley, New York, pp 364–385

    Google Scholar 

  18. Corbett PF, Feitelson DG (2001) The Vesta parallel file system. In: Jin H, Cortes T, Buyya R (eds) High perform mass storage and parallel {I/O}: technol and appl. IEEE Computer Society/Wiley, New York, pp 285–308

    Google Scholar 

  19. Weil SA, Pollack KT, Brandt SA, Miller EL (2004) Dynamic metadata management for petabyte-scale file systems. In: Proc ACM/IEEE conf supercomput, p 4

    Chapter  Google Scholar 

  20. Brandt SA et al (2003) Efficient metadata management in large distributed storage systems. In: Proc 20th IEEE/11th NASA Goddard conf on mass storage syst and technol, pp 290–298

    Chapter  Google Scholar 

  21. Xiong J, Hu Y, Li G, Tang R, Fan Z (2011) Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Trans Parallel Distrib Syst 22(5):803–816

    Article  Google Scholar 

  22. Fagin R, Nievergelt J, Pippenger N, Strong HR (1979) Extendible hashing: a fast access method for dynamic files. ACM Trans Database Syst 4(3):315–344

    Article  Google Scholar 

  23. Hwang I, Maeng S, Cho J (2006) Home-based cooperative cache for parallel I/O applications. Future Gener Comput Syst 22(5):633–642

    Article  Google Scholar 

  24. Turek W, Calleja P (2010) High performance, open source, Dell Lustre storage system. White paper, University of Cambridge, Dell

  25. Kondekar P (2009) MDS performance analysis. Sun microsystems

  26. Kunkel JM, Ludwig T (2007) Performance evaluation of the PVFS2 architecture. In: Proc of the 15th EUROMICRO int conf on parallel, distrib and netw-based process, pp 509–516

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank FCSCL (Fundación Centro de Supercomputación de Castilla y León, Spain) for giving access to a cluster of its supercomputer Calendula. This work was partially funded by project IPT-2011-1728-430000.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mancia Anguita.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Díaz, A.F., Anguita, M., Camacho, H.E. et al. Two-level Hash/Table approach for metadata management in distributed file systems. J Supercomput 64, 144–155 (2013). https://doi.org/10.1007/s11227-012-0801-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0801-y

Keywords

Navigation