research-article

Constructing Skeleton for Parallel Applications with Machine Learning Methods

Authors:

Guangzhong SunAuthors Info & Claims

ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing

Article No.: 16, Pages 1 - 8

https://doi.org/10.1145/3339186.3339197

Published: 05 August 2019 Publication History

Abstract

Performance prediction has always been important in the domain of parallel computing. For programs which are executed on workstation clusters and super computing systems, precise prediction of execution time can help task scheduling and resource management. A practical and effective type of prediction method is the skeleton-based method. It extracts an executable code snippet, called skeleton, from the traces of program executions, and uses the skeleton to replay the behaviors and predict the performance of the original program. However, traditional skeleton-based methods require fixed inputs to construct reliable skeletons. This requirement limits the application scope of skeleton-based methods. In this paper, we present a novel method to construct skeleton for parallel programs. Our method combines code instrument and machine learning techniques, which enable skeletons to dynamically respond varying inputs and make corresponding performance prediction. In our evaluations on three benchmarks, MCB, LULESH and STREAM, the proposed method can achieve 27%, 7% and 9% average prediction error rate, respectively.

References

[1]

Vikram Adve and Rizos Sakellariou. 2000. Application representations for multiparadigm performance modeling of large-scale parallel scientific codes. The International Journal of High Performance Computing Applications 14, 4 (2000), 304--316.

Digital Library

[2]

Xavier Aguilar, Karl Fürlinger, and Erwin Laure. 2014. MPI trace compression using event flow graphs. In European Conference on Parallel Processing. Springer, 1--12.

[3]

David Bailey, Tim Harris, William Saphir, Rob Van Der Wijngaart, Alex Woo, and Maurice Yarrow. 1995. The NAS parallel benchmarks 2.0. Technical Report. Technical Report NAS-95-020, NASA Ames Research Center.

[4]

Bradley J Barnes, Barry Rountree, David K Lowenthal, Jaxk Reeves, Bronis De Supinski, and Martin Schulz. 2008. A regression-based approach to scalability prediction. In Proceedings of the 22nd annual international conference on Supercomputing. ACM, 368--377.

Digital Library

[5]

Arnamoy Bhattacharyya and Torsten Hoefler. 2014. PEMOGEN: Automatic Adaptive Performance Modeling During Program Runtime. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). ACM, New York, NY, USA, 393--404.

Digital Library

[6]

Janki Bhimani, Ningfang Mi, Miriam Leeser, and Zhengyu Yang. 2017. Fim: performance prediction for parallel computation in iterative data processing applications. In 2017 IEEE 10th International conference on cloud computing (CLOUD). IEEE, 359--366.

[7]

Diego Didona, Francesco Quaglia, Paolo Romano, and Ennio Torre. 2015. Enhancing performance prediction robustness by combining analytical modeling and machine learning. In Proceedings of the 6th ACM/SPEC international conference on performance engineering. ACM, 145--156.

Digital Library

[8]

Nick Gentile and Brian Miller. 2011. Monte Carlo Benchmark. https://computation.llnl.gov/projects/co-design/mcb.

[9]

David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.

[10]

Frank Hutter, Lin Xu, Holger H. Hoos, and Kevin Leyton-Brown. 2014. Algorithm runtime prediction: Methods & evaluation. Artificial Intelligence 206 (2014), 79--111.

Digital Library

[11]

Ian Karlin, Jeff Keasler, and JR Neely. 2013. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).

[12]

Benjamin C Lee, David M Brooks, Bronis R de Supinski, Martin Schulz, Karan Singh, and Sally A McKee. 2007. Methods of inference and learning for performance modeling of parallel applications. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, 249--258.

Digital Library

[13]

Andréa Matsunaga and José AB Fortes. 2010. On the use of machine learning to predict the time and resources consumed by applications. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE Computer Society, 495--504.

Digital Library

[14]

John D McCalpin et al. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE computer society technical committee on computer architecture (TCCA) newsletter 2, 19--25 (1995).

[15]

Karan Singh, Engin İpek, Sally A McKee, Bronis R de Supinski, Martin Schulz, and Rich Caruana. 2007. Predicting parallel application performance via machine learning approaches. Concurrency and Computation: Practice and Experience 19, 17 (2007), 2219--2235.

Digital Library

[16]

Sukhdeep Sodhi and Jaspal Subhlok. 2004. Skeleton based performance prediction on shared networks. In IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004. IEEE, 723--730.

Digital Library

[17]

Sukhdeep Sodhi, Jaspal Subhlok, and Qiang Xu. 2008. Performance prediction with skeletons. Cluster Computing 11, 2 (June 2008), 151--165.

Digital Library

[18]

Sukhdeep Sodhi, Jaspal Subhlok, and Qiang Xu. 2008. Performance prediction with skeletons. Cluster Computing 11, 2 (2008), 151--165.

Digital Library

[19]

Jaspal Subhlok and Shreenivasa Venkataramaiah. 2003. Performance estimation for scheduling on shared networks. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer, 148--165.

[20]

Jaspal Subhlok and Qiang Xu. 2008. Automatic construction of coordinated performance skeletons. In 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1--5.

[21]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.

[22]

Jeffrey Vetter and Chris Chambreau. 2005. mpip: Lightweight, scalable mpi profiling. (2005).

[23]

Alvaro Wong, Dolores Rexachs, and Emilio Luque. 2010. Extraction of parallel application signatures for performance prediction. In 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC). IEEE, 223--230.

Digital Library

[24]

Jidong Zhai, Wenguang Chen, and Weimin Zheng. 2010. Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. In ACM Sigplan Notices, Vol. 45. ACM, 305--314.

Digital Library

[25]

J. Zhai, W. Chen, W. Zheng, and K. Li. 2016. Performance Prediction for Large-Scale Parallel Applications Using Representative Replay. IEEE Trans. Comput. 65, 7 (July 2016), 2184--2198.

[26]

Weizhe Zhang, Albert MK Cheng, and Jaspal Subhlok. 2016. DwarfCode: a performance prediction tool for parallel applications. IEEE Trans. Comput. 65, 2 (2016), 495--507.

Digital Library

[27]

Weizhe Zhang, Meng Hao, and Marc Snir. 2017. Predicting HPC parallel program performance based on LLVM compiler. Cluster Computing 20, 2 (2017), 1179--1192.

Digital Library

Cited By

Mahdavi K(2024)A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673059(669-678)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673059
Yetis HKarakose M(2020)Quantum Circuits for Binary Convolution2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI)10.1109/ICDABI51230.2020.9325659(1-5)Online publication date: 26-Oct-2020
https://doi.org/10.1109/ICDABI51230.2020.9325659

Index Terms

Constructing Skeleton for Parallel Applications with Machine Learning Methods
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Semi-automatic extraction of software skeletons for benchmarking large-scale parallel applications
SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

The design of high-performance computing architectures requires performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and ...
Severity Classification of Code Smells Using Machine-Learning Methods
Abstract
Code smell detection can be very useful for minimizing maintenance costs and improving software quality. Code smells help developers/programmers, researchers to subjectively interpret design defects in different ways. Code smells instances can ...
Machine Learning: The State of the Art

The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing

August 2019

241 pages

ISBN:9781450371964

DOI:10.1145/3339186

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
Youth Innovation Promotion Association of Chinese Academy of Sciences
Open Research Fund of State Key Laboratory of Computer Architecture, Institute of Computing Technology, CAS

Conference

ICPP 2019

ICPP 2019: Workshops

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
136
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mahdavi K(2024)A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673059(669-678)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673059
Yetis HKarakose M(2020)Quantum Circuits for Binary Convolution2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI)10.1109/ICDABI51230.2020.9325659(1-5)Online publication date: 26-Oct-2020
https://doi.org/10.1109/ICDABI51230.2020.9325659

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten