Abstract
With the constantly increasing number of cores in high performance computing (HPC) systems, applications produce even more data that will eventually have to be stored and accessed in parallel. Applications’ I/O in HPC is performed in a layered manner; scientific applications use standardized high-level libraries and data formats like HDF\(5\) and NetCDF-\(4\) to store and manipulate data that is located inside a parallel file system. In this paper, we present a performance analysis of the parallel interfaces of HDF\(5\) and NetCDF-\(4\) using different test configurations in order to provide best practices for choosing the right I/O configuration. Our evaluation follows a breakdown approach where we examine the performance penalties of each layer. The tested configurations include: (i) different access patterns, disjoint and interleaved (ii) aligned and unaligned accesses (iii) collective and independent I/O (iv) contiguous and chunked data layout. The main observation is that using interleaved data access in a certain configuration achieves near the maximum performance. Also, we see that NetCDF-\(4\) does not provide the ability to align the access to the Lustre object boundaries. To overcome this we have developed a patch that resolves this issue and improves the performance dramatically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bayer, R., McCreight, E.: Organization and Maintenance of Large Ordered Indexes. Springer, New York (2002)
Braam, P.J., Zahir, R.: Lustre: a scalable, high performance file system. Cluster File Systems, Inc. (2002)
Dickens, P., Logan, J.: Towards a high performance implementation of MPI-IO on the lustre file system. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 870–885. Springer, Heidelberg (2008)
Group, H., et al.: Hierarchical data format version 5 (2000). Software package, http://www.hdfgroup.org/HDF5
Howison, M.: Tuning HDF5 for lustre file systems. In: Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS 2010), Heraklion, Crete, Greece, 24 September 2010 (2012)
Liao, W.K., Choudhary, A.: Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, pp. 1–12. IEEE (2008)
Nisar, A., Liao, W.K., Choudhary, A.: Scaling parallel I/O performance through I/O delegate and caching system. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, pp. 1–12. IEEE (2008)
OpenSFS (2014). http://www.opensfs.org/press-releases/lustre-file-system-version-2-5-released/. Accessed December 2014
Rew, R., Davis, G., Emmerson, S., Davies, H., Hartnett, E.: The NetCDF users guide-data model, programming interfaces, and format for self-describing, portable data-NetCDF version 4.1. Unidata Program Center (2010)
Yu, W., Vetter, J., Canon, R.S., Jiang, S.: Exploiting lustre file joining for effective collective IO. In: Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2007, pp. 267–274. IEEE (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bartz, C., Chasapis, K., Kuhn, M., Nerge, P., Ludwig, T. (2015). A Best Practice Analysis of HDF\(5\) and NetCDF-\(4\) Using Lustre. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-20119-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)