Skip to main content
Log in

A multi-strategy collaborative prediction model for the runtime of online tasks in computing cluster/grid

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

An efficient function of a complicated or dynamic high performance computing environment requires the scheduler to dispatch the submitted tasks according to the identification of the idling resources. A derivative problem is to provide accurate forecasts of the tasks runtimes. This is usually needed to assist scheduling policies and fine tune scheduling decisions, and also used for future planning of resource allocation when conducting advance reservation. However, the characteristics of the existing prediction strategies determine that the sole strategy is not appropriate for all kinds of heterogeneous tasks. Aiming at this problem, a multi-strategy collaborative prediction model (MSCPM) for the runtime of online tasks is proposed, and a novel concept named Prediction Accuracy Assurance (PAA) as a criterion is introduced to quantitatively evaluate the precision of the prediction runtime provided by a specific prediction strategy.

MSCPM uses the existing strategies of prediction runtime to generate multiple collaborative prediction schemes and takes the prediction result of the scheme which provides the optimal PAA. We evaluate the performance of the proposed model which recently integrates four simple yet widely used time series prediction strategies based on the gathered traces of three different tasks. The analysis results show that MSCPM can aggregate the superiority of the various existing prediction strategies and the evaluation criterion can pick out the near-optimal one within the prediction results provided by the integrated strategies. MSCPM provides an enhanced accuracy assurance for the prediction runtime of the online tasks in the computing environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Leff, A., Rayfield, J.T., Dias, D.M.: Service-level agreements and commercial grids. IEEE Internet Comput. 7(4), 44–50 (2003)

    Article  Google Scholar 

  2. Ellert, M., et al.: The NorduGrid project: using globus toolkit for building grid infrastructure. Nucl. Instrum. Methods A 502, 407–410 (2003)

    Article  Google Scholar 

  3. Gong, L., Sun, X.H., Waston, E.: Performance modeling and prediction of non-dedicated network computing. IEEE Trans. Comput. 51(9), 1041–1055 (2002)

    Article  MathSciNet  Google Scholar 

  4. Berten, V., Goossens, J., Jeannot, E.: On the distribution of sequential jobs random brokering for heterogeneous computational grids. IEEE Trans. Parallel Distrib. Syst. 17(2), 113–124 (2006)

    Article  Google Scholar 

  5. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE TPDS 18(6), 789–803 (2007)

    Google Scholar 

  6. El-Ghazawi, T., Gaj, K., Alexandridis, N., et al.: A performance study of task management systems. Concurr. Comput. Pract. Exp. 16(13), 1229–1246 (2004)

    Article  Google Scholar 

  7. Kiran, M., Hashim, A.H.A., Kuan, L.M., et al.: Execution time prediction of imperative paradigm tasks for grid scheduling optimization. Int. J. Comput. Sci. Netw. Secur. 9(2), 155–163 (2009)

    Google Scholar 

  8. Phinjaroenphan, P., Bevinakoppa, S., Zeephongsekul, P.: A method for estimating the runtimeof a parallel task on a grid node. In: Eur. Grid Conf. 2005. LNCS, vol. 3470, pp. 226–236. Springer, Berlin (2005)

    Chapter  Google Scholar 

  9. Sadjadi, S.M., Shimizu, S., Figueroa, J., et al.: A modeling approach for estimating execution time of long-running scientific applications. In: Proc. 22nd IEEE International Parallel & Distributed Processing Symposium, the Fifth High-Performance Grid Computing Workshop, pp. 1–8, Miami, Florida USA (2008)

    Google Scholar 

  10. Duan, R., Nadeem, F., Wang, J., et al.: A hybrid intelligent method for performance modeling and prediction of workflow activities in grids. In: Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 339–347 (2009)

    Chapter  Google Scholar 

  11. Nadeem, F., Fahringer, T.: Using templates to predict execution time of scientific workflow applications in the grid. In: Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 316–323 (2009)

    Chapter  Google Scholar 

  12. Glasner, C., Volkert, J.: Adaptive run-time prediction in heterogeneous environments. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp. 61–62, Munich, Germany (2009)

    Chapter  Google Scholar 

  13. Glasner, C., Volkert, J.: An architecture for an adaptive run-time prediction system. In: Proceedings of the 7th International Symposium on Parallel and Distributed Computing (ISPDC’08), pp. 1–8, Krakow, Poland (2008)

    Google Scholar 

  14. Wolski, R.: Dynamically forecasting network performance using the network weather service. J. Cluster Comput. 1(1), 119–132 (1998)

    Article  Google Scholar 

  15. Wolski, R., Spring, N., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. J. Future Gener. Comput. Syst. 15(5–6), 757–768 (1999)

    Article  Google Scholar 

  16. Berten, V., Goossens, J., Jeannot, E.: On the distribution of sequential tasks in random brokering for heterogeneous computational grids. IEEE Trans. Parallel Distrib. Syst. 17(2), 113–124 (2006)

    Article  Google Scholar 

  17. Peng, X., Gang, H.Z.: Multi-strategy resource co-allocation model for real-time task in computing grid. J. Jilin Univ. 40(1), 218–223 (2010). Engineering and Technology Edition

    Google Scholar 

  18. Gross, D., Harris, C.M.: Fundamentals of Queuing Theory. Wiley, New York (1998)

    Google Scholar 

  19. Kim, J.-K., Hensgen, D.A., Kidd, T., Siegel, H.J., John, D.St., Irvine, C., Levin, T., Porter, N.W., Prasanna, V.K., Freund, R.F.: A flexible multi-dimensional QoS performance measure framework for distributed heterogeneous systems. Cluster Comput. 9(3), 281–296 (2006)

    Article  Google Scholar 

  20. Mutka, M.W.: Sharing in a privately owned workstation environment. Ph.D. dissertation, University of Wisconsin-Madison (1988)

  21. Cirne, W., Paranhos, D., Costa, L., et al.: Running bag-of-tasks applications on computational grids: the mygrid approach. In: Proceedings of ICCP’2003—International Conference on Parallel Processing, pp. 407–416 (2003)

    Chapter  Google Scholar 

  22. Brasileiro, F., Araujo, E., Voorsluys, W., et al.: Bridging the high performance computing gap: the our grid experience. In: CCGRID’07: Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid, pp. 817–822. IEEE Computer Society, Washington (2007)

    Chapter  Google Scholar 

  23. Wolski, R.: Experiences with predicting resource performance on-line in computational grid settings. Special section on grid computing. In: ACM SIGMETRICS Performance Evaluation Review, pp. 41–49, USA (2003)

    Google Scholar 

  24. Sonmez, O., Yigitbasi, N., Iosup, A., et al.: Trace-based evaluation of job runtime and queue wait time predictions in grids. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp. 111–120, Munich, Germany (2009)

    Chapter  Google Scholar 

  25. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, Berlin (2002)

    Book  MATH  Google Scholar 

  26. Buyya, R., Murshed, M.: Gridsim: A toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing. In: Concurrency and Computation: Practice and Experience, pp. 1175–1220 (2002)

    Google Scholar 

  27. Dekking, F.M., Kraaikamp, C., Lopuhaa, H.P., et al.: A Modern Introduction to Probability and Statistics. Springer, Berlin (2005)

    MATH  Google Scholar 

  28. Cirne, W., Brasileiro, F., et al.: On the efficacy, efficiency and emergent behavior of task replication in large distributed systems. Parallel Comput. 33(3), 213–234 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Tao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tao, M., Dong, S. & Zhang, L. A multi-strategy collaborative prediction model for the runtime of online tasks in computing cluster/grid. Cluster Comput 14, 199–210 (2011). https://doi.org/10.1007/s10586-010-0145-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-010-0145-4

Keywords

Navigation