skip to main content
research-article

Using file system counters in modelling parallel I/O architectures

Published: 30 January 2017 Publication History

Abstract

Keeping compute and I/O performance balanced is a major challenge for future cost-efficient HPC systems. Several architectural concepts and new technologies allow to address this challenge, however at the price of higher complexity. As a result, the need emerges to simulate these architectural concepts and new technologies to predict their impact on the overall performance. In this paper we propose a particular approach to explore the design space using event simulation models that take I/O server-side performance counters as input. In this way large quantities of real-life data measured over a large number of applications can be used to explore architectural modifications. We apply our approach using data collected by a GPFS file system serving a petascale Blue Gene/P installation.

References

[1]
GPFS version 3.5 (2013) advanced administration guide. IBM publication, SC23-5182-08 (June 2013).
[2]
Behzad, B., Luu, H. V. T., Huchette, J., Byna, S., Prabhat, Aydt, R., Koziol, Q., and Snir, M. Taming parallel I/O complexity with auto-tuning. In High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for (Nov 2013), pp. 1--12.
[3]
Carns, P., Latham, R., Ross, R., Iskra, K., Lang, S., and Riley, K. 24/7 characterization of petascale i/o workloads. In Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on (Aug 2009), pp. 1--10.
[4]
Deslippe, J., Essiari, A., Patton, S. J., Samak, T., Tull, C. E., Hexemer, A., Kumar, D., Parkinson, D., and Stewart, P. Workow management for real-time analysis of lightsource experiments. In Proceedings of the 9th Workshop on Workows in Support of Large-Scale Science (Piscataway, NJ, USA, 2014), WORKS '14, IEEE Press, pp. 31--40.
[5]
El Sayed, S. Analysis Of I/O Requirements Of Scientific Applications. PhD thesis, Bergische Universität Wuppertal, 2015. to be published.
[6]
El Sayed, S., Bolten, M., and Pleiter, D. Parallel i/o architecture modelling based on _le system counters. In High Performance Computing, LNCS (2016), vol. 9945, Springer. ISC High Performance 2016 International Workshops ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P3MA, VHPC, WOPSSS Frankfurt, Germany.
[7]
Frings, W., and Hennecke, M. A system level view of petascale I/O on IBM Blue Gene/P. Computer Science - Research and Development 26, 3-4 (2011), 275--283.
[8]
Gray, J., and Shenoy, P. Rules of Thumb in Data Engineering. pp. 3--10.
[9]
Kunkel, J. M., Zimmer, M., Hübbe, N., Aguilera, A., Mickler, H., Wang, X., Chut, A., Bonisch, T., Lüttgau, J., Michel, R., and Weging, J. The SIOX architecture -- coupling automatic monitoring and optimization of parallel I/O. In Supercomputing, J. Kunkel, T. Ludwig, and H. Meuer, Eds., vol. 8488 of Lecture Notes in Computer Science. Springer International Publishing, 2014, pp. 245--260.
[10]
Liu, N., Carothers, C., Cope, J., Carns, P., Ross, R., Crume, A., and Maltzahn, C. Modeling a leadership-scale storage system. In Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wa_sniewski, Eds., vol. 7203 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2012, pp. 10--19.
[11]
Liu, N., Cope, J., Carns, P., Carothers, C., Ross, R., Grider, G., Crume, A., and Maltzahn, C. On the role of burst bu_ers in leadership-class storage systems. In Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on (April 2012), pp. 1--11.
[12]
Miller, E. L., and Katz, R. H. Input/output behavior of supercomputing applications. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (New York, NY, USA, 1991), Supercomputing '91, ACM, pp. 567--576.
[13]
Quinn, P. J., Axelrod, T., Bird, I., Dodson, R., Szalay, A., and Wicenec, A. Delivering SKA Science. PoS AASKA14 (2015), 147.
[14]
Sandeep, S. R., Swapna, M., Niranjan, T., Susarla, S., and Nandi, S. Cluebox: A performance log analyzer for automated troubleshooting. In Proceedings of the First USENIX Conference on Analysis of System Logs (Berkeley, CA, USA, 2008), WASL'08, USENIX Association, pp. 1--1.
[15]
Thakur, R., Gropp, W., and Lusk, E. Data sieving and collective I/O in ROMIO. In Frontiers of Massively Parallel Computation, 1999. Frontiers '99. The Seventh Symposium on the (Feb 1999), pp. 182--189.
[16]
Uselton, A., and Ushizima, D. Poster: I/O workload analysis with server-side data collection. In Proceedings of the 2011 Companion on High Performance Computing Networking, Storage and Analysis Companion (New York, NY, USA, 2011), SC '11 Companion, ACM, pp. 33--34.
[17]
Varga, A., et al. The OMNeT++ discrete event simulation system. In Proceedings of the European simulation multiconference (ESM'2001) (2001), vol. 9, sn, p. 65.

Cited By

View all
  • (2017)Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration)2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW)10.1109/ICDCSW.2017.74(343-347)Online publication date: Jun-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 50, Issue 2
Special Topics
December 2016
45 pages
ISSN:0163-5980
DOI:10.1145/3041710
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2017
Published in SIGOPS Volume 50, Issue 2

Check for updates

Author Tags

  1. I/O
  2. Supercomputer
  3. performance analysis

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration)2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW)10.1109/ICDCSW.2017.74(343-347)Online publication date: Jun-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media