skip to main content
10.1145/3588993.3597262acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

HPC Application Performance Prediction with Machine Learning on New Architectures

Published: 31 July 2023 Publication History

Abstract

We explore a modeling approach for scientific application performance on high-performance computer architectures using machine learning techniques. Multiple linear regression models and neural networks were evaluated for effectiveness in constructing performance models to predict the execution time of an application. Performance metrics collected during run time, together with hardware specifications, were used as input features for the performance models. Our two-step machine learning approach improved the R^2 score for performance prediction: we first performed feature selection to select a subset of metrics that are the most relevant for execution time prediction; machine learning models were then trained to predict this subset of performance metrics, which then served as the inputs for the final performance model construction in the second step. This two-step approach resulted in promising results during our case study. Regression models achieved an R^2 score up to 93% and a neural network model achieved an R^2 score of over 94% when applied to predict the execution time on an unseen computer architecture. These results are comparable to existing methods that require more upfront hardware and systems knowledge, implying that our method is more approachable for application developers without extensive performance knowledge.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
[2]
Alexandru Calotoiu, Marcin Copik, Torsten Hoefler, Marcus Ritter, Sergei Shudler, and Felix A. Wolf. 2020. ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications. In Software for Exascale Computing.
[3]
François Chollet et al. 2015. Keras. https://keras.io.
[4]
Jerome H. Friedman. 2002. Stochastic Gradient Boosting. Computational Statistics & Data Analysis 38, 4 (February 2002), 367--378. https://doi.org/10.1016/S0167- 9473(01)00065--2
[5]
Joseph L. Greathouse and Gabriel H. Loh. 2018. Machine Learning for Performance and Power Modeling of Heterogeneous Systems. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (San Diego, CA, USA). IEEE Press, 1--6. https://doi.org/10.1145/3240765.3243484
[6]
Muhammad Hafizhuddin Hilman, Maria Alejandra Rodriguez, and Rajkumar Buyya. 2018. Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach. 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC) (2018), 93--102.
[7]
Arthur E. Hoerl and Robert W. Kennard. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 12, 1 (February 1970), 55--67. https://doi.org/10.2307/1267351
[8]
Thomas D. Kühne, Marcella Iannuzzi, Mauro Del Ben, Vladimir V. Rybkin, Patrick Seewald, Frederick Stein, Teodoro Laino, Rustam Z. Khaliullin, Ole Schütt, Florian Schiffmann, Dorothea Golze, Jan Wilhelm, Sergey Chulkov, Mohammad Hossein Bani-Hashemian, Valéry Weber, Urban Bortnik, Mathieu Taillefumier, Alice Shoshana Jakobovits, Alfio Lazzaro, Hans Pabst, Tiziano Müller, Robert Schade, Manuel Guidon, Samuel Andermatt, Nico Holmberg, Gregory K. Schenter, Anna Hehn, Augustin Bussy, Fabian Belleflamme, Gloria Tabacchi, Andreas Glöß, Michael Lass, Iain Bethune, Christopher J. Mundy, Christian Plessl, Matt Watkins, Joost VandeVondele, Matthias Krack, and Jürg Hutter. 2020. CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations. The Journal of Chemical Physics 152, 19 (2020), 194103. https://doi.org/10.1063/5.0007045 arXiv:https://doi.org/10.1063/5.0007045
[9]
Yann LeCun, Y. Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521 (May 2015), 436--44. https://doi.org/10.1038/nature14539
[10]
Preeti Malakar, Prasanna Balaprakash, Venkatram Vishwanath, Vitali A. Morozov, and Kalyan Kumaran. 2018. Benchmarking Machine Learning Methods for Performance Modeling of Scientific Applications. 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) (2018), 33--44.
[11]
Alexandre Maros, Fabricio Murai, Ana Paula Silva, Marco Lattuada, Eugenio Gianniti, and Marjan Hosseini. 2019. Machine Learning for Performance Prediction of Spark Cloud Applications. 99--106. https://doi.org/10.1109/CLOUD.2019.00028
[12]
Harshitha Menon, Abhinav Bhatele, and Todd Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 831--840. https://doi.org/10.1109/IPDPS47924.2020.00090
[13]
Dave Montoya et al. 2022. Overview - Survey Framework. https://trenza.gitlab. io/survey.io/docs/introduction.html#
[14]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[15]
Adrian Daniel Popescu, Andrey Balmin, Vuk Ercegovac, and Anastasia Ailamaki. 2013. PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics. Proc. VLDB Endow. 6 (2013), 1678--1689.
[16]
George A. F. Seber. 1977. Linear regression analysis.
[17]
Jingwei Sun, Guang zhong Sun, Shiyan Zhan, Jiepeng Zhang, and Yong Chen. 2020. Automated Performance Modeling of HPC Applications Using Machine Learning. IEEE Trans. Comput. 69 (2020), 749--763.
[18]
Ryutaro Susukita, Hisashige Ando, Mutsumi Aoyagi, Hiroaki Honda, Yuichi Inadomi, Koji Inoue, Shigeru Ishizuki, Yasunori Kimura, Hidemi Komatsu, Motoyoshi Kurokawa, Kazuaki Murakami, Hidetomo Shibamura, Shuji Yamamura, and Yunqing Yu. 2008. Performance prediction of large-scale parallell system and application using macro-level simulation. 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (2008), 1--9.
[19]
Shivaram Venkataraman, Zongheng Yang, Michael J. Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In Symposium on Networked Systems Design and Implementation.
[20]
Carl Witt, Marc Bux, Wladislaw Gusew, and Ulf Leser. 2019. Predictive performance modeling for distributed batch processing using black box monitoring and machine learning. Inf. Syst. 82 (2019), 33--52.
[21]
Chad Wood, Giorgis Georgakoudis, D. A. Beckingsale, David Poliakoff, Alfredo Giménez, Kevin A. Huck, Allen D. Malony, and Todd Gamblin. 2021. Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning. In ISC.
[22]
Dewi Yokelson, Nikolay V. Tkachenko, Robert Robey, Ying Wai Li, and Pavel A. Dub. 2022. Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics on CPUs and GPUs. Journal of Chemical Information and Modeling 62, 10 (2022), 2378--2386. https://doi.org/10.1021/acs.jcim.1c01538 arXiv:https://doi.org/10.1021/acs.jcim.1c01538 35451847.
[23]
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing

Cited By

View all
  • (2025)Machine Learning Regression-Based Prediction for Improving Performance and Energy Consumption in HPC PlatformsHigh Performance Computing10.1007/978-3-031-80084-9_13(186-200)Online publication date: 14-Feb-2025
  • (2024)Deep Configuration Performance Learning: A Systematic Survey and TaxonomyACM Transactions on Software Engineering and Methodology10.1145/370298634:1(1-62)Online publication date: 5-Nov-2024
  • (2024)A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673059(669-678)Online publication date: 12-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PERMAVOST '23: Proceedings of the 2023 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy
July 2023
13 pages
ISBN:9798400701634
DOI:10.1145/3588993
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. high performance computing
  2. machine learning
  3. performance modeling
  4. statistical modeling

Qualifiers

  • Research-article

Conference

HPDC '23

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)13
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Machine Learning Regression-Based Prediction for Improving Performance and Energy Consumption in HPC PlatformsHigh Performance Computing10.1007/978-3-031-80084-9_13(186-200)Online publication date: 14-Feb-2025
  • (2024)Deep Configuration Performance Learning: A Systematic Survey and TaxonomyACM Transactions on Software Engineering and Methodology10.1145/370298634:1(1-62)Online publication date: 5-Nov-2024
  • (2024)A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673059(669-678)Online publication date: 12-Aug-2024
  • (2024)Evaluating Active-learning Based Performance Prediction of Parallel Applications2024 IEEE 20th International Conference on e-Science (e-Science)10.1109/e-Science62913.2024.10678665(1-10)Online publication date: 16-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media