Skip to main content

Towards Self-optimization in HPC I/O

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7905))

Abstract

Performance analysis and optimization of high-performance I/O systems is a daunting task. Mainly, this is due to the overwhelmingly complex interplay of internal processes while executing application programs. Unfortunately, there is a lack of monitoring tools to reduce this complexity to a bearable level. For these reasons, the project Scalable I/O for Extreme Performance (SIOX) aims to provide a versatile environment for recording system activities and learning from this information. While still under development, SIOX will ultimately assist in locating and diagnosing performance problems and automatically suggest and apply performance optimizations.

The SIOX knowledge path is concerned with the analysis and utilization of data describing the cause-and-effect chain recorded via the monitoring path. In this paper, we present our refined modular design of the knowledge path. This includes a description of logical components and their interfaces, details about extracting, storing and retrieving abstract activity patterns, a concept for tying knowledge to these patterns, and the integration of machine learning. Each of these tasks is illustrated through examples. The feasibility of our design is further demonstrated with an internal component for anomaly detection, permitting intelligent monitoring to limit the SIOX system’s impact on system resources.

We want to express our gratitude to the ,,Deutsches Zentrum für Luft- und Raumfahrt e.V.“ as responsible project agency and to the ,,Bundesministerium für Bildung und Forschung“ for the financial support under grant 01 IH 11008 A-C.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. Computer 36(1), 41–50 (2003)

    Article  MathSciNet  Google Scholar 

  2. Wiedemann, M.C., Kunkel, J.M., Zimmer, M., Ludwig, T., Resch, M., Bönisch, T., Wang, X., Chut, A., Aguilera, A., Nagel, W.E., Kluge, M., Mickler, H.: Towards I/O Analysis of HPC Systems and a Generic Architecture to Collect Access Patterns. Computer Science - Research and Development 1, 1–11 (2012)

    Google Scholar 

  3. Madhyastha, T.M., Reed, D.A.: Learning to Classify Parallel Input/Output Access Patterns. IEEE Transactions on Parallel and Distributed Systems 13(8), 802–813 (2002)

    Article  Google Scholar 

  4. Modani, N., Gupta, R., Lohman, G., Syeda-Mahmood, T., Mignet, L.: Automatically Identifying Known Software Problems. In: 2007 IEEE 23rd International Conference on Data Engineering Workshop, pp. 433–441 (April 2007)

    Google Scholar 

  5. Barham, P., Donnelly, A., Isaacs, R., Mortier, R.: Using Magpie for Request Extraction and Workload Modelling. In: Proceedings of the 6th Symposium on Opearting Systems Design and Implementation, vol. 6, pp. 259–272 (2004)

    Google Scholar 

  6. Yuan, C., Lao, N., Wen, J.-R., Li, J., Zhang, Z., Wang, Y.-M., Ma, W.-Y.: Automated Known Problem Diagnosis with Event Traces. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys 2006, pp. 375–388. ACM, New York (2006)

    Chapter  Google Scholar 

  7. Sandeep, S.R., Swapna, M., Niranjan, T., Susarla, S., Nandi, S.: CLUEBOX: a Performance Log Analyzer for Automated Troubleshooting. In: Proceedings of the First USENIX Conference on Analysis of System Logs, WASL 2008. USENIX Association, Berkeley (2008)

    Google Scholar 

  8. Cohen, I., Zhang, S., Goldszmidt, M., Symons, J., Kelly, T., Fox, A.: Capturing, Indexing, Clustering, and Retrieving System History. SIGOPS Oper. Syst. Rev. 39(5), 105–118 (2005)

    Article  Google Scholar 

  9. Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.S.: Correlating Instrumentation Data to System States: a Building Block for Automated Diagnosis and Control. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6. USENIX Association, Berkeley (2004)

    Google Scholar 

  10. Duan, S.S., Babu, Munagala, K.: Fa: A System for Automating Failure Diagnosis. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, March 29-April 2, pp. 1012–1023 (2009)

    Google Scholar 

  11. Bader, M., Bungartz, H.J., Gerndt, M., Hollmann, A., Weidendorfer, J.: Invasive programming as a concept for HPC. In: Proc. of the 10th IASTED Int. Conf. on Parallel and Distr. Comp. and Netw., PDCN (2011)

    Google Scholar 

  12. Kunkel, J., Ludwig, T.: IOPm – Modeling the I/O Path with a Functional Representation of Parallel File System and Hardware Architecture. In: PDP 2012, Munich Network Management Team. IEEE (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zimmer, M., Kunkel, J.M., Ludwig, T. (2013). Towards Self-optimization in HPC I/O. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38750-0_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38749-4

  • Online ISBN: 978-3-642-38750-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics