Runtime prediction of parallel applications with workload-aware clustering

Park, Ju-Won; Kim, Eunhye

doi:10.1007/s11227-017-2038-2

Runtime prediction of parallel applications with workload-aware clustering

Published: 06 April 2017

Volume 73, pages 4635–4651, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

544 Accesses
Explore all metrics

Abstract

Traditionally, many science fields require great support for a massive workflow, which utilizes multiple cores simultaneously. In order to support such large-scale scientific workflows, high-capacity parallel systems such as supercomputers are widely used. To increase the utilization of these systems, most schedulers use backfilling policy based on user’s estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, an efficient machine learning approach is present to predict the runtime of parallel application. The proposed method is divided into three phases. First is to analyze important feature of the history log data by factor analysis. Second is to carry out clustering for the parallel program based on the important features. Third is to build a prediction models by pattern similarity of parallel program log data and estimate runtime. In the experiments, we use workload logs on parallel systems (i.e., NASA-iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing root-mean-square error with other techniques, experimental results show that the proposed method improves the accuracy up to 69.56%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

In [12], the authors present that if a factor has four or more loadings ${>}0.6$, then it is reliable regardless of sample size. Therefore, in this paper, the threshold of factor analysis is determined by 0.6.

References

Agarwal B, Mittal N (2014) Text classification using machine learning methods—a survey. In: Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 28–30 Dec 2012. Springer, India, pp 701–709
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15
Article Google Scholar
Chang F, Guo CY, Lin XR, Lu CJ (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972
MATH MathSciNet Google Scholar
Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540
Article Google Scholar
Downey AB (1997) Using queue time predictions for processor allocation. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 35–57
Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inf Process Syst 9:155–161
Google Scholar
Feitelson DG, Tsafrir D, Krakov D (2012) Experience with the parallel workloads archive. Technical report
Gaussier E, Glesser D, Reis V, Trystram D (2015) Improving backfilling by using machine learning to predict running times. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, p 64
Gibbons R (1997) A historical application profiler for use by parallel schedulers. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 58–77
Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. IEEE Comput 40(12):24–32. doi:10.1109/MC.2007.421
Article Google Scholar
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 22–30
Guadagnoli E, Velicer WF (1988) Relation to sample size to the stability of component patterns. Psycholl Bull 103(2):265
Article Google Scholar
Härdle WK, Simar L (2012) Applied multivariate statistical analysis. Springer, Berlin
Book MATH Google Scholar
Jones JP, Nitzberg B (1999) Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 1–16
Kohonen T (1998) The self-organizing map. Neurocomputing 21(1):1–6
Article MATH Google Scholar
Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65
Article Google Scholar
Liang F, Liu Y, Liu H, Ma S, Schnor B (2015) A parallel job execution time estimation approach based on user submission patterns within computational grids. Int J Parallel Program 43(3):440–454
Article Google Scholar
Lifka DA (1995) The anl/ibm sp scheduling system. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 295–303
Minh TN, Wolters L (2010) Using historical data to predict application runtimes on backfilling parallel systems. In: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 246–252
Mu’alem A, Feitelson D (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543. doi:10.1109/71.932708
Article Google Scholar
R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Senger LJ, Santana MJ, Santana RC (2004) An instance-based learning approach for predicting execution times of parallel applications. In: Proceedings of international information and telecommunication technologies symposium, pp 9–15
Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016
Article MATH Google Scholar
Song H, Brandt-Pearce M (2013) Range of influence and impact of physical impairments in long-haul DWDM systems. J Lightwave Technol 31(6):846–854
Article Google Scholar
Tsafrir D, Etsion Y, Feitelson D (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789–803. doi:10.1109/TPDS.2007.70606
Article Google Scholar
Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin
MATH Google Scholar
Wei W, Yang XL, Shen PY, Zhou B (2012) Holes detection in anisotropic sensornets: topological methods. Int J Distrib Sens Netw 8(10):135054. doi:10.1155/2012/135054
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244
MATH Google Scholar
Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241
Article Google Scholar
Zhang Y, Franke H, Moreira J, Sivasubramaniam A (2003) An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans Parallel Distrib Syst 14(3):236–247
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong, Daejeon, 305-806, South Korea
Ju-Won Park
IT Convergence Technology Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon, 305-700, South Korea
Eunhye Kim

Authors

Ju-Won Park
View author publications
You can also search for this author inPubMed Google Scholar
Eunhye Kim
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ju-Won Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, JW., Kim, E. Runtime prediction of parallel applications with workload-aware clustering. J Supercomput 73, 4635–4651 (2017). https://doi.org/10.1007/s11227-017-2038-2

Download citation

Published: 06 April 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11227-017-2038-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Runtime prediction of parallel applications with workload-aware clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters

Footprinting Parallel I/O – Machine Learning to Classify Application’s I/O Behavior

Performance prediction of parallel applications: a systematic literature review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Runtime prediction of parallel applications with workload-aware clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters

Footprinting Parallel I/O – Machine Learning to Classify Application’s I/O Behavior

Performance prediction of parallel applications: a systematic literature review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now