Skip to main content
Log in

Runtime prediction of parallel applications with workload-aware clustering

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Traditionally, many science fields require great support for a massive workflow, which utilizes multiple cores simultaneously. In order to support such large-scale scientific workflows, high-capacity parallel systems such as supercomputers are widely used. To increase the utilization of these systems, most schedulers use backfilling policy based on user’s estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, an efficient machine learning approach is present to predict the runtime of parallel application. The proposed method is divided into three phases. First is to analyze important feature of the history log data by factor analysis. Second is to carry out clustering for the parallel program based on the important features. Third is to build a prediction models by pattern similarity of parallel program log data and estimate runtime. In the experiments, we use workload logs on parallel systems (i.e., NASA-iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing root-mean-square error with other techniques, experimental results show that the proposed method improves the accuracy up to 69.56%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. In [12], the authors present that if a factor has four or more loadings \({>}0.6\), then it is reliable regardless of sample size. Therefore, in this paper, the threshold of factor analysis is determined by 0.6.

References

  1. Agarwal B, Mittal N (2014) Text classification using machine learning methods—a survey. In: Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 28–30 Dec 2012. Springer, India, pp 701–709

  2. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15

    Article  Google Scholar 

  3. Chang F, Guo CY, Lin XR, Lu CJ (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972

    MATH  MathSciNet  Google Scholar 

  4. Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540

    Article  Google Scholar 

  5. Downey AB (1997) Using queue time predictions for processor allocation. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 35–57

  6. Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inf Process Syst 9:155–161

    Google Scholar 

  7. Feitelson DG, Tsafrir D, Krakov D (2012) Experience with the parallel workloads archive. Technical report

  8. Gaussier E, Glesser D, Reis V, Trystram D (2015) Improving backfilling by using machine learning to predict running times. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, p 64

  9. Gibbons R (1997) A historical application profiler for use by parallel schedulers. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 58–77

  10. Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. IEEE Comput 40(12):24–32. doi:10.1109/MC.2007.421

    Article  Google Scholar 

  11. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 22–30

  12. Guadagnoli E, Velicer WF (1988) Relation to sample size to the stability of component patterns. Psycholl Bull 103(2):265

    Article  Google Scholar 

  13. Härdle WK, Simar L (2012) Applied multivariate statistical analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  14. Jones JP, Nitzberg B (1999) Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 1–16

  15. Kohonen T (1998) The self-organizing map. Neurocomputing 21(1):1–6

    Article  MATH  Google Scholar 

  16. Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65

    Article  Google Scholar 

  17. Liang F, Liu Y, Liu H, Ma S, Schnor B (2015) A parallel job execution time estimation approach based on user submission patterns within computational grids. Int J Parallel Program 43(3):440–454

    Article  Google Scholar 

  18. Lifka DA (1995) The anl/ibm sp scheduling system. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 295–303

  19. Minh TN, Wolters L (2010) Using historical data to predict application runtimes on backfilling parallel systems. In: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 246–252

  20. Mu’alem A, Feitelson D (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543. doi:10.1109/71.932708

    Article  Google Scholar 

  21. R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

  22. Senger LJ, Santana MJ, Santana RC (2004) An instance-based learning approach for predicting execution times of parallel applications. In: Proceedings of international information and telecommunication technologies symposium, pp 9–15

  23. Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016

    Article  MATH  Google Scholar 

  24. Song H, Brandt-Pearce M (2013) Range of influence and impact of physical impairments in long-haul DWDM systems. J Lightwave Technol 31(6):846–854

    Article  Google Scholar 

  25. Tsafrir D, Etsion Y, Feitelson D (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789–803. doi:10.1109/TPDS.2007.70606

    Article  Google Scholar 

  26. Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  27. Wei W, Yang XL, Shen PY, Zhou B (2012) Holes detection in anisotropic sensornets: topological methods. Int J Distrib Sens Netw 8(10):135054. doi:10.1155/2012/135054

  28. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244

    MATH  Google Scholar 

  29. Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241

    Article  Google Scholar 

  30. Zhang Y, Franke H, Moreira J, Sivasubramaniam A (2003) An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans Parallel Distrib Syst 14(3):236–247

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ju-Won Park.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, JW., Kim, E. Runtime prediction of parallel applications with workload-aware clustering. J Supercomput 73, 4635–4651 (2017). https://doi.org/10.1007/s11227-017-2038-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2038-2

Keywords