skip to main content
10.1145/3624062.3624209acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

FROOM: A Framework of Operators for OTF2 Modification

Published: 12 November 2023 Publication History

Abstract

In recent years, High Performance Computing (HPC) has become increasingly important for many industries and research areas besides ‘classic’ applications. As new domains emerge, applications, implementations and frameworks become more diverse. Generic performance analysis tools often cannot keep up with the development speed of new approaches for workload distribution, offloading, and communication. Some of the new approaches employ their own performance monitoring, which is difficult to integrate into generic tools designed for traditional HPC. Performance measurements often result in a collection of separate performance logs that logically form a unit but cannot intuitively be investigated together with established performance tools. In this paper, we present a tool library that can be used to combine separate performance logs and separately recorded metrics into one single performance log, enabling investigation of such performance data as a unit. Use cases from Big Data processing and AI show the broad applicability of our approach.

Supplemental Material

MP4 File
Recording of "FROOM: A Framework of Operators for OTF2 Modification" presentation at ProTools 2023.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
[2]
Maggi Bansal, Inderveer Chana, and Siobhán Clarke. 2020. A Survey on IoT Big Data: Current Status, 13 V’s Challenges, and Future Directions. ACM Computing Surveys (CSUR) 53, 6 (2020), 1–59.
[3]
Abhinav Bhatele, Stephanie Brink, and Todd Gamblin. 2019. Hatchet: Pruning the Overgrowth in Parallel Profiles. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–21.
[4]
Irina Botan, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Renée J. Miller, and Nesime Tatbul. 2010. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. Proc. VLDB Endow. 3, 1–2 (9 2010), 232–243. https://doi.org/10.14778/1920841.1920874
[5]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and Batch Processing in a Single Engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015), 28–38.
[6]
Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Jerry Peng, and Paul Poulosky. 2016. Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1789–1792. https://doi.org/10.1109/IPDPSW.2016.138
[7]
Robert Dietrich, Frank Winkler, Andreas Knüpfer, and Wolfgang Nagel. 2020. PIKA: Center-Wide and Job-Aware Cluster Monitoring. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). 424–432. https://doi.org/10.1109/CLUSTER49012.2020.00061
[8]
Dominic Eschweiler, Michael Wagner, Markus Geimer, Andreas Knüpfer, Wolfgang E Nagel, and Felix Wolf. 2012. Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries. In Applications, Tools and Techniques on the Road to Exascale Computing. IOS Press, 481–490.
[9]
Jan Frenzel and René Jäkel. 2019. Job Performance Overview of Apache Flink and Apache Spark Applications.
[10]
David Gale and Lloyd S Shapley. 1962. College admissions and the stability of marriage. The American Mathematical Monthly 69, 1 (1962), 9–15.
[11]
Liang Hong, Mengqi Luo, Ruixue Wang, Peixin Lu, Wei Lu, and Long Lu. 2018. Big Data in Health Care: Applications and Challenges. Data and Information Management 2, 3 (2018), 175–197. https://doi.org/10.2478/dim-2018-0014
[12]
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010). 41–51. https://doi.org/10.1109/ICDEW.2010.5452747
[13]
Thomas Ilsche, Daniel Hackenberg, Robert Schöne, Mario Bielert, Franz Höpfner, and Wolfgang E. Nagel. 2019. MetricQ: A Scalable Infrastructure for Processing High-Resolution Time Series Data. In 2019 IEEE/ACM Industry/University Joint International Workshop on Data-center Automation, Analytics, and Control (DAAC). 7–12. https://doi.org/10.1109/DAAC49578.2019.00007
[14]
Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S. Müller, and Wolfgang E. Nagel. 2008. The Vampir Performance Analysis Tool-Set. In Tools for High Performance Computing, Michael Resch, Rainer Keller, Valentin Himmler, Bettina Krammer, and Alexander Schulz (Eds.). Springer Berlin Heidelberg, 139–155. https://doi.org/10.1007/978-3-540-68564-7_9
[15]
Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen Malony, Wolfgang E. Nagel, Yury Oleynik, Peter Philippen, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Ronny Tschüter, Michael Wagner, Bert Wesarg, and Felix Wolf. 2012. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope,Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011, Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 79–91.
[16]
Mohammed Islam Naas, François Trahay, Alexis Colin, Pierre Olivier, Stéphane Rubini, Frank Singhoff, and Jalil Boukhobza. 2021. EZIOTracer: unifying kernel and user space I/O tracing for data-intensive applications. In Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems. 1–11.
[17]
Robert Schöne, Ronny Tschüter, Thomas Ilsche, Joseph Schuchart, Daniel Hackenberg, and Wolfgang E Nagel. 2017. Extending the Functionality of Score-P Through Plugins: Interfaces and Use Cases. In Tools for High Performance Computing 2016: Proceedings of the 10th International Workshop on Parallel Tools for High Performance Computing, October 2016, Stuttgart, Germany. Springer, 59–82.
[18]
Ronny Tschueter, Christian Herold, William Williams, Maximilian Knespel, and Matthias Weber. 2019. A Top-Down Performance Analysis Methodology for Workflows: Tracking Performance Issues from Overview to Individual Operations. In 2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS). 21–30. https://doi.org/10.1109/WORKS49585.2019.00008
[19]
Felix Wolf and Bernd Mohr. 1999. EARL—A Programmable and Extensible Toolkit for Analyzing Event Traces of Message Passing Programs. In High-Performance Computing and Networking: 7th International Conference, HPCN Europe 1999 Amsterdam, The Netherlands, April 12–14, 1999 Proceedings 7. Springer, 503–512.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 36
    Total Downloads
  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media