Definition
Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.
Introduction
For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, previous attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. However, recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to reinvestigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop file system (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of HDFS...
References
Abad CL (2014) Big data storage workload characterization, modeling and synthetic generation. PhD thesis, University of Illinois at Urbana-Champaign
Guerraoui R, Raynal M (2006) A leader election protocol for eventually synchronous shared memory systems. In: The fourth IEEE workshop on software technologies for future embedded and ubiquitous systems, 2006 and the 2006 second international workshop on collaborative computing, integration, and assurance, SEUS 2006/WCCIA, pp 6–
Hammer-Bench (2016) Distributed metadata benchmark to HDFS. https://github.com/smkniazi/hammer-bench. [Online; Accessed 1 Jan 2016]
Ismail M, Gebremeskel E, Kakantousis T, Berthou G, Dowling J (2017) Hopsworks: improving user experience and development on hadoop with scalable, strongly consistent metadata. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2525–2528
Ismail M, Niazi S, Ronström M, Haridi S, Dowling J (2017) Scaling HDFS to more than 1 million operations per second with HopsFS. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGrid ’17. IEEE Press, Piscataway, pp 683–688
Niazi S, Haridi S, Dowling J (2017) Size matters: improving the performance of small files in HDF. https://eurosys2017.github.io/assets/data/posters/poster09-Niazi.pdfl. [Online; Accessed 30 June 2017]
Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX conference on file and storage technologies (FAST’17). USENIX Association, Santa Clara, pp 89–104
Noll MG (2015) Benchmarking and stress testing an hadoop cluster with TeraSort. TestDFSIO & Co. http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/. [Online; Accessed 3 Sept 2015]
Ovsiannikov M, Rus S, Reeves D, Sutter P, Rao S, Kelly J (2013) The quantcast file system. Proc VLDB Endow 6(11):1092–1101
Patil SV Gibson GA Lang S, Polte M (2007) GIGA+: scalable directories for shared file systems. In: Proceedings of the 2nd international workshop on petascale data storage: held in conjunction with supercomputing ’07, PDSW ’07. ACM, New York, pp 26–29
Ren K, Kwon Y, Balazinska M, Howe B (2013) Hadoop’s adolescence: an analysis of hadoop usage in scientific workloads. Proc VLDB Endow 6(10):853–864
Salman Niazi GB, Ismail M, Dowling J (2015) Leader election using NewSQL systems. In: Proceeding of DAIS 2015. Springer, pp 158–172
Shvachko KV (2010) HDFS scalability: the limits to growth. Login Mag USENIX 35(2):6–16
Thomson A, Abadi DJ (2015) CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: 13th USENIX conference on file and storage technologies (FAST 15). USENIX Association, Santa Clara, pp 1–14
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this entry
Cite this entry
Niazi, S., Ismail, M., Haridi, S., Dowling, J. (2018). HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_146-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_146-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Living Reference MathematicsReference Module Computer Science and Engineering