Skip to main content
Log in

Effective metadata management in exascale file system

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper presents an effective method of managing metadata in exascale file systems. In order to store exponentially growing numbers of files, numerous methods for distributing and managing metadata have been suggested and developed. However, these methods have not provided an appropriate solution for managing a very large amount of metadata because they do not overcome two significant challenges in exascale file systems: (1) nonlinear performance scalability and (2) performance degradation over time. We propose an effective metadata management model and high-performance metadata management system that not only overcome these limitations but also provide a foundation for managing exascale metadata in a distributed file system. The resulting implementation of our metadata management system is the core of EEFS, an exascale distributed file system by the Electronics and Telecommunications Research Institute. The evaluation results show that the critical challenges of existing metadata management technologies are overcome and particularly that the performance is not degraded even when the amount of accumulated metadata increases with time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Konstantin S, Hairong K, Sanjay R, Robert C (2010) The hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST’10), pp 1–10

  2. Oracle (2010) Lustre 2.0 operations manual. Oracle corporation. https://docs.oracle.com/cd/E19527-01/821-2076-10/821-2076-10.pdf. Accessed June 2017

  3. Konstantin S (2010) HDFS scalability: the limits to growth. USENIX; login 35(2):6–16

    Google Scholar 

  4. Sadaf RA, Hussein NEH, Kristopher H, Neil S, Fabio V (2011) Parallel I/O and the metadata wall. In: Proceedings of the 6th Workshop on Parallel Data Storage (PDSW’11), pp 13–18

  5. Sage AW (2007) Ceph: reliable, scalable, and high-performance distributed storage. Doctoral dissertation, University of California

  6. Sage AW, Scott AB, Ethan LM, Darrell DEL, Carlos M (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI’06), pp 307–320

  7. Redhat (2018) Architecture. Redhat, Inc. http://gluster.readthedocs.io/en/latest/Quick-Start-Guide/Architecture. Accessed October 2018

  8. Beaver D, Kumar S, Li H, Sobel J, Vajgel P (2010) Finding a needle in Haystack: Facebook’s photo storage. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10), pp 47–60

  9. Muralidhar S, Llyod W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L, Kumar S (2014) f4: Facebook’s warm BLOB storage system. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), pp 383–398

  10. Bronson N, Amsden Z, Cabrera G, Chakka P, Dimov P, Ding H, Ferris J, Giardullo A, Kulkarni S, Li H, Marchukov M, Petrov D, Puzar L, Song Y, Venkataramani V (2013) TAO: Facebook’s distributed data store for the social graph. In: Proceedings of USENIX Annual Technical Conference (USENIX ATC’13), pp 49–60

  11. Alexander T, Daniel JA (2015) CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15), pp 1–14

  12. Johnson C, Keeton K, Morrey III C, Soules C, Veitch A, Bacon S, Batuner O, Condotta M, Coutinho H, Doyle P, Eichelberger R, Kiehl H, Magalhaes G, McEvoy J, Nagarajan P, Osborne P, Souza J, Sparkes A, Spitzer M, Tandel S, Thomas L, Zangaro S (2014) From research to practice: experiences engineering a production metadata database for a scale out file system. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14), pp 191–198

  13. Xiao L, Ren K, Zheng Q, Gibson G (2015) ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems. In: Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC’15), pp 236–249

  14. Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: Proceedings of ACM Symposium on Operating Systems Principles (SOSP’03), pp 29–43

  15. Brandt S, Miller E, Long D, Xue L (2003) Efficient metadata management in large distributed storage systems. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST’03), pp 290–298

  16. Zhang S, Catanese H, Wang A (2016) The composite-file file system: decoupling the one-to-one mapping of files and metadata for better performance. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16), pp 15–22

  17. Sinnamohideen S, Sambasivan R, Hendricks J, Liu L, Ganger G (2010) A transparently-scalable metadata service for the ursa minor storage system. In: Proceedings of USENIX Annual Technical Conference (USENIX ATC’10)

  18. Weil S, Pollack K, Brandt S, Miller E (2004) Dynamic metadata management for petabyte-scale file systems. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing (SC’04)

  19. Xiong J, Hu Y, Li G, Tang R, Fan Z (2011) Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Trans Parallel Distrib Syst 22(5):803–816

    Article  Google Scholar 

  20. Cha M, Kim D, Kim H, Kim Y (2017) Adaptive metadata rebalance in exascale file system. J Supercomput 73:1337–1359

    Article  Google Scholar 

  21. Noghabi S, Subramanian S, Narayanan P, Narayanan S, Holla G, Zadeh M, Li T, Gupta I, Campbell R (2016) Ambry: LinkedIn’s scalable geo-distributed object store. In: Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16), pp 253–265

  22. Memcachee (2018) https://memcached.org. Accessed July 2018

  23. Thomson A, Diamond T, Weng S, Ren K, Shao P, Abadi D (2014) Fast distributed transactions and strongly consistent replication for OLTP database systems. ACM T Database Syst 39(2):11–49

    MathSciNet  Google Scholar 

  24. Ren K, Thomson A, Abadi D (2014) An evaluation of the advantages and disadvantages of deterministic database systems. Proc VLDB Endow 7(10):821–832

    Article  Google Scholar 

  25. Cipar J, Ganger G, Keeton K, Morrey III C, Soules C, Veitch A (2012) LazyBase: trading freshness for performance in a scalable database. In: Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12), pp 169–182

Download references

Acknowledgements

This work was supported by Institute for Information and communications Technology Promotion (IITP) Grant funded by the Korea government (MSIP) (No. 2015-0-00262, Management of Developing ICBMS (IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myung-Hoon Cha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cha, MH., Lee, SM., Kim, HY. et al. Effective metadata management in exascale file system. J Supercomput 75, 7665–7689 (2019). https://doi.org/10.1007/s11227-019-02974-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02974-8

Keywords

Navigation