Abstract
In the last years not only a growth of data-intensive storage has been observed, but also compute-intensive workloads need a high computing power and high parallelism with good performance and great scalability. Many distributed filesystem have focused in how to distribute data across multiple processing nodes, but one of the main problem to solve is the management of the ever-greater number of metadata requests. In fact, some studies have identified that an optimized metadata management is a key factor to achieve good performance. Applications in high performance computing usually require filesystems able to provide a huge amount of operations per second to achieve the required level of performance. Although the metadata storage is smaller than data storage, metadata operations consume large CPU cycles, so a single metadata server cannot be longer sufficient. In this paper we define a completely distributed method that provides efficient metadata management and seamlessly adapts to general purpose and scientific computing filesystem workloads. The throughput performance is measured by a metadata benchmark and compared with several distributed filesystems. The results show great scalability in creating operations on a single directory accessed by multiple clients.
A.F. Díaz—This work has been partially supported by European Union FEDER and the Spanish Ministry of Economy and Competitiveness TIN2015-67020-P, FPA2015-65150-C3-3-P, and PROMEP/103.5/13/6475 UAEH-146. The authors would like to thank FCSCL (Fundación Centro de Supercomputación de Castilla y León) for providing access to a cluster of its supercomputer Calendula.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fundación centro de supercomputación de castilla y león. http://www.fcsc.es
Leveldb. http://www.leveldb.org
Official web page of lustre filesystem. http://www.lustre.org
Mdtest benchmark. http://www.nersc.gov. Accessed 29 Mar 2013
Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, pp. 398–407. ACM (2007)
Díaz, A.F., Anguita, M., Camacho, H.E., Nieto, E., Ortega, J.: Two-level hash/table approach for metadata management in distributed file systems. J. Supercomput. 64(1), 144–155 (2013)
Lorch, J.R., Anderson, T.E.: A comparison of file system workloads (2000)
Fagin, R., Nievergelt, J., Pippenger, N., Strong, H.R.: Extendible hashing-a fast access method for dynamic files. ACM Trans. Database Syst. 4(3), 315–344 (1979)
Hua, Y., Zhu, Y., Jiang, H., Feng, D., Tian, L.: Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Trans. Parallel Distrib. Syst. 22(4), 580–593 (2011). ID: 1
Weil, S.A., Pollack, K.T., Brandt, S.A., Miller, E.L.: Dynamic metadata management for petabyte-scale file systems
Patil, S.V., Gibson, G.A., Lang, S., Polte, M.: Giga+: scalable directories for shared file systems. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing 2007, PDSW 2007, pp. 26–29. ACM, New York (2007)
Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002, pp. 19–23. USENIX Association, Berkeley (2002)
Shvachko, K.V.: HDFS scalability: the limits to growth. 35(2), 6–16 (2010)
Studham, R.S., Subramaniyan, R.: Lustre: a future standard for parallel file systems. In: Invited Presentation at International Supercomputer Conference, Heidelberg, Germany (2005)
Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: A self-organizing storage cluster for parallel data-intensive applications. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC 2004, p. 52. IEEE Computer Society, Washington (2004)
Wang, F., Xin, Q., Hong, B., Brandt, S.A., Miller, E.L., Long, D.D., McLarty, T.T.: File system workload analysis for large scale scientific computing applications. In: Proceedings of the Twentieth IEEE/Eleventh NASA Goddard Conference on Mass Storage Systems and Technologies, College Park, MD. IEEE Computer Society Press, April 2004
Weil, S., Leung, A., Brandt, S., Maltzahn, C.: Rados. In: Proceedings of the 2nd International Workshop on Petascale Data Storage, pp. 35–44, 11 November 2007
Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI 2006, pp. 307–320. USENIX Association, Berkeley (2006)
Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: Crush: controlled, scalable, decentralized placement of replicated data. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006, p. 122. ACM (2006)
Welch, B., Unangst, M., Abbasi, Z., Gibson, G.A., Mueller, B., Small, J., Zhou, B.: Scalable performance of the panasas parallel file system. In: FAST, vol. 8, pp. 1–17 (2008)
Yang, S., Walter, B.: Ligon III Parallel Architecture Research Laboratory Clemson University, Clemson, SC 29634, USA f, and g. Scalable distributed directory implementation on orange file system
Zhu, Y., Jiang, H., Wang, J., Xian, F.: HBA: distributed metadata management for large cluster-based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(6), 750–763 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Rodríguez-Quintana, C., Díaz, A.F., Ortega, J., Palacios, R.H., Ortiz, A. (2016). A New Scalable Approach for Distributed Metadata in HPC. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-49583-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49582-8
Online ISBN: 978-3-319-49583-5
eBook Packages: Computer ScienceComputer Science (R0)