Abstract
Today we recognize a high demand for powerful storage. In industry this issue is tackled either with large storage area networks, or by deploying parallel file systems on top of RAID systems or on smaller storage networks. The bigger the system gets the more important is the ability to analyze the performance and to identify bottlenecks in the architecture and the applications.
We extended the performance monitor available in the parallel file system PVFS2 by including statistics of the server process and information of the system. Performance monitor data is available during runtime and the server process was modified to store this data in off-line traces suitable for post-mortem analysis. These values can be used to detect bottlenecks in the system. Some measured results demonstrate how these help to identify bottlenecks and may assists to rank the servers depending on their capabilities.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Mellor, C.: US defense department builds world’s biggest SAN (2006), http://www.techworld.com/news/index.cfm?RSS&NewsID=6846
Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System for Large Computing Clusters. In: Proc. of the First Conference on File and Storage Technologies (FAST), January 2002, pp. 231–244 (2002)
Cluster File Systems Inc: Lustre, http://www.lustre.org
IBM: General Parallel File System - Advanced Administration Guide V3.1. (2006), http://publib.boulder.ibm.com/epubs/pdf/bl1adv00.pdf
Cluster File Systems Inc: Lustre 1.6 Manual, http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTML-21-1.html
Cluster File Systems Inc: Lustre Debugging (2007), http://wiki.lustre.org/index.php?title=Lustre_Debugging
Cluster File Systems Inc: Lustre: Profiling Tools for IO (2007), http://arch.lustre.org/index.php?title=Profiling_Tools_for_IO
Ligon, W., Ross, R.: PVFS: Parallel Virtual File System. In: Sterling, T. (ed.) Beowulf Cluster Computing with Linux. Scientific and Engineering Computation, November 2001, pp. 391–430. The MIT Press, Cambridge (2001)
Seger, M.: Homepage of collectl, http://collectl.sourceforge.net/
Forster, F.: Homepage of collectd, http://collectd.org/
Ludwig, T., Krempel, S., Kunkel, J.M., Panse, F., Withanage, D.: Tracing the MPI-IO Calls’ Disk Accesses. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 322–330. Springer, Heidelberg (2006)
Ludwig, T., Krempel, S., Kuhn, M., Kunkel, J.M., Lohse, C.: Analysis of the MPI-IO Optimization Levels with the PIOViz Jumpshot Enhancement. In: Cappello, F., Herault, T., Dongarra, J. (eds.) PVM/MPI 2007. LNCS, vol. 4757, pp. 213–222. Springer, Heidelberg (2007)
Juckeland, G.: Vampir and Lustre (2007), http://clusterfs-intra.com/cfscom/images/lustre/LUG2007/lug07-dresden.pdf
AT Consultancy bv: Atop, http://www.atcomputing.nl/Tools/atop
Kunkel, J.M.: Towards Automatic Load Balancing of a Parallel File System with Subfile Based Migration. Master’s thesis, Ruprecht-Karls-Universität Heidelberg, Institute of Computer Science (July 2007)
Gropp, W., Thakur, R., Lusk, E.: 3.10.1. In: Using MPI-2: Advanced Features of the Message Passing Interface, pp. 101–105. MIT Press, Cambridge (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kunkel, J.M., Ludwig, T. (2008). Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-85451-7_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)