Skip to main content
Log in

Adaptive Resource Allocation with Job Runtime Uncertainty

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

In this paper, we address the problem of dynamic resource allocation in presence of job runtime uncertainty. We develop an execution delay model for runtime prediction, and design an adaptive stochastic allocation strategy, named Pareto Fractal Flow Predictor (PFFP). We conduct a comprehensive performance evaluation study of the PFFP strategy on real production traces, and compare it with other well-known non-clairvoyant strategies over two metrics. In order to choose the best strategy, we perform bi-objective analysis according to a degradation methodology. To analyze possible biasing results and negative effects of allowing a small portion of the problem instances with large deviation to dominate the conclusions, we present performance profiles of the strategies. We show that PFFP performs well in different scenarios with a variety of workloads and distributed resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ramirez-Alcaraz, J.M., Tchernykh, A., Yahyapour, R., Schwiegelshohn, U., Quezada-Pina, A., Gonzalez-Garcia, J.L., Hirales-Carbajal, A.: Job allocation strategies with user run time estimates for online scheduling in hierarchical grids. J. Grid Comput. 9, 95–116 (2011)

    Article  Google Scholar 

  2. Hirales-Carbajal, A., Tchernykh, A., Yahyapour, R., Gonzalez-Garcia, J.L., Roblitz, T., Ramirez-Alcaraz, J.M.: Multiple workflow scheduling strategies with user run time estimates on a grid. J. Grid Comput. 10(2), 325–346 (2012)

  3. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18, 789–803 (2007)

    Article  Google Scholar 

  4. Oprescu, A.-M., Kielmann, T., Leahu, H.: Stochastic tail-phase optimization for bag-of-tasks execution in clouds. In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, pp. 204–208 (2012)

  5. Sotskov, Y.N., Werner, F.: Sequencing and Scheduling with Inaccurate Data. Nova, Commack (2014)

    Google Scholar 

  6. Bacso, G., Visegradi, A., Kertesz, A., Némethet, Z.: On efficiency of multi-job grid allocation based on statistical trace data. J. Grid Comput. 12, 169 (2014). https://doi.org/10.1007/s10723-013-9274-3

    Article  Google Scholar 

  7. Leland, W.E., Taqqu, M.S., Willinger, W., Wilson, D.V.: On the self-similar nature of ethernet traffic (Extended Version). IEEE/ACM Trans. Netw. 2, 1–15 (1994)

    Article  Google Scholar 

  8. Parulekar, M., Makowski, A.M.: Tail probabilities for a multiplexer with self-similar traffic. In: Proceedings of the Fifteenth Annual Joint Conference of the IEEE Computer and Communications Societies Conference on The Conference on Computer Communications - Volume 3, pp. 1452–1459. IEEE Computer Society, San Francisco (1996)

  9. Beran, J.: Statistics for Long-Memory Processes. Taylor & Francis, New York (1994)

    MATH  Google Scholar 

  10. Crovella, M.E., Taqqu, M.S., Bestavros, A., Adler, R.J., Feldman, R.E. (eds.): A Practical Guide to Heavy Tails. Heavy-tailed Probability Distributions in the World Wide Web. Birkhauser Boston Inc, Cambridge (1998)

  11. Beran, J., Sherman, R., Taqqu, M.S., Willinger, W.: Long-range dependence in variable-bit-rate video traffic. IEEE Trans. Commun. 43, 1566–1579 (1995)

    Article  Google Scholar 

  12. Schwiegelshohn, U., Tchernykh, A., Yahyapour, R.: Online scheduling in grids. In: International Symposium on Parallel and Distributed Processing, 2008, pp. 1–10. IEEE (2008)

  13. Gehring, J., Streit, A.: Robust resource management for metacomputers. In: Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing. p. 105. IEEE Computer Society, Washington, DC (2000)

  14. James, H.A., Hawick, K.A.: Scheduling independent tasks on metacomputing systems. In: Proceedings of Parallel and Distributed Computing Systems (1999)

  15. Vadhiyar, S.S., Dongarra, J.J.: A metascheduler for the grid. In: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing, p. 343. IEEE Computer Society, Washington, DC (2002)

  16. Diaza, A.R., Tchernykh, A., Eckerc, K.H.: Algorithms for dynamic scheduling of unit execution time tasks. Eur. J. Oper. Res. 146, 403–416 (2003)

    Article  MathSciNet  Google Scholar 

  17. Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Evaluation of job-scheduling strategies for grid computing. In: Proceedings of the First IEEE/ACM International Workshop on Grid Computing. pp. 191–202. Springer, London (2000)

  18. Sabin, G., Kettimuthu, R., Rajan, A., Sadayappan, P.: Scheduling of Parallel Jobs in a Heterogeneous Multi-site Environment. In: Feitelson, D., Rudolph, L., and Schwiegelshohn, U. (eds.) Job Scheduling Strategies for Parallel Processing. pp. 87–104. Springer Berlin Heidelberg (2003).

  19. Tchernykh, A., Ramirez, J.M., Avetisyan, A., Kuzjurin, N., Grushin, D., Zhuk, S.: Two level job-scheduling strategies for a computational grid. In: Proceedings of the 6th International Conference on Parallel Processing and Applied Mathematics, pp. 774–781. Springer, Poznan (2006)

  20. Zhuk, S., Chernykh, A., Avetisyan, A., Gaissaryan, S., Grushin, D., Kuzjurin, N., Pospelov, A., Shokurov, A.: Comparison of scheduling heuristics for grid resource broker. In: Proceedings of the Fifth Mexican International Conference in Computer Science, pp. 388–392. IEEE Computer Society, Washington, DC (2004)

  21. Kianpisheh, S., Jalili, S., Charkari, M.: Predicting job wait time in grid environment by applying machine learning methods on historical information. Int. J. Grid Distrib. Comput. 5, 11–22 (2012)

    Google Scholar 

  22. Kumar, R., Vadhiyar, S.: Prediction of queue waiting times for metascheduling on parallel batch systems. In: Cirne, W., Desai, N. (Eds.) Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, vol. 8828, pp. 108–128 (2015)

  23. Megow, N., Uetz, M., Vredeveld, T.: Models and algorithms for stochastic online scheduling. Math. Oper. Res. 31(3), 513–525 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  24. Megow, N., Vredeveld, T.: Approximation in preemptive stochastic online scheduling. LNCS 4168, 516–527 (2006)

    MathSciNet  MATH  Google Scholar 

  25. Vredeveld, T.: Stochastic online scheduling. Comput. Sci. Res. Dev. 27(3), 181–187 (2012)

    Article  MathSciNet  Google Scholar 

  26. Albers, S.: Better bounds for online scheduling. SIAM J. Comput. 29, 459–473 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  27. Grosu, D., Chronopoulos, A.T.: Algorithmic mechanism design for load balancing in distributed systems. In: Proceedings of the IEEE International Conference on Cluster Computing, p. 445. IEEE Computer Society, Washington, DC (2002)

  28. Addie, R.G., Zukerman, M., Neame, T.D.: Broadband traffic modeling: simple solutions to hard problems. Commun. Mag. 36, 88–95 (1998)

    Article  Google Scholar 

  29. Norros, I.: A storage model with self-similar input. Queueing Syst. 16, 387–396 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ramirez-Velarde, R.V., Rodriguez-Dagnino, R.M.: A gamma fractal noise source model for variable bit rate video servers. Comput. Commun. 27, 1786–1798 (2004)

    Article  Google Scholar 

  31. Bashforth, B., Williamson, C.L.: Statistical Multiplexing of Self-Similar Video Streams: Simulation Study and Performance Results. MASCOTS, pp. 119–126. IEEE Computer Society (2002)

  32. Bodamer, S., Charzinski, J.: Evaluation of effective bandwidth schemes for self-similar traffic. In: ITC Specialist Seminar on IP Traffic Measurement, Modeling, and Management, Monterrey (2000)

  33. Patel, A.A., Williamson, C.L.: Effective bandwidth of self-similar traffic sources: theoretical and simulation results. In: Proceedings of the IASTED Conference on Applied Modeling and Simulation, pp. 298–302. Banff (1997)

  34. Loboz, C.: Cloud resource usage—heavy tailed distributions invalidating traditional capacity planning models. J. Grid Comput. 10(1), 85–108 (2012)

    Article  Google Scholar 

  35. Christodoulopoulos, K., Gkamas, V., Varvarigos, E.A.: Statistical analysis and modeling of jobs in a grid environment. J. Grid Comput. 6(1), 77–101 (2008)

    Article  Google Scholar 

  36. Bazinet, A.L., Cummings, M.P.: Subdividing long-running, variable-length analyses into short, fixed-length BOINC work units. J. Grid Comput. 14(3), 429–41 (2016)

    Article  Google Scholar 

  37. Ramirez-Velarde, R., Vargas, C., Castanon, G., Martinez-Elizalde, L.: Self-similarity and multi-dimensionality: tools for performance modelling of distributed infrastructure. In: Meersman, R., Tari, Z. (eds.) On the Move to Meaningful Internet Systems: OTM 2008, pp 812–821. Springer, Berlin (2008)

  38. Asmussen, S.: Applied Probability and Queues. Springer, Berlin (2003)

    MATH  Google Scholar 

  39. Resnick, S.I.: Heavy tail modeling and teletraffic data. Ann. Stat. 25, 1805–2272 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  40. Leon-Garcia, A.: Probability, Statistics, and Random Processes for Electrical Engineering. Pearson/Prentice Hall, Upper Saddle River (2008)

    Google Scholar 

  41. Park, K., Willinger, W.: Self-similar network traffic: an overview. In: Self-Similar Network Traffic and Performance Evaluation, pp. 1–38 (2000)

  42. Kurowski, K., Ludwiczak, B., Nabrzyski, J., Oleksiak, A., Pukacki, J.: Dynamic grid scheduling with job migration and rescheduling in the GridLab resource management system. Sci. Program 12, 263–273 (2004)

    Google Scholar 

  43. Ramirez-Velarde, R.V., Rodriguez-Dagnino, R.M.: From commodity computers to high-performance environments: scalability analysis using self-similarity, large deviations and heavy-tails. Concurr. Comput. Pract. Exp. 22, 1494–1515 (2010)

    Google Scholar 

  44. Grimme, C., Lepping, J., Papaspyrou, A., Fölling, A.: Teikoku Grid scheduling Framework (2009)

  45. Hirales-Carbajal, A., Tchernykh, A., Roblitz, T., Yahyapour, R.: A Grid simulation framework to study advance scheduling strategies for complex workflow applications. In: 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8 (2010)

  46. Di, S., Kondo, D., Cirne, W.: In: 2012 IEEE International Conference on Characterization and Comparison of Cloud versus Grid Workloads Cluster Computing (CLUSTER), pp. 230–238 (2012)

  47. PWA: Parallel Workloads Archive (2014)

  48. Feitelson, D.G., Tsafrir, D., Krakov, D.: Experience with the parallel workloads archive. The Hebrew University and the Israel Institute of Technology (2012)

  49. Quezada-Pina, A., Tchernykh, A., Gonzalez-Garcia, J.L., Hirales-Carbajal, A., Ramirez-Alcaraz, J.M., Schwiegelshohn, U., Yahyapour, R., Miranda-Lopez, V.: Adaptive parallel job scheduling with resource admissible allocation on two-level hierarchical grids. In: Future Generation Computer Systems. Elsevier Science (2012)

  50. Dolan, E.D., Moré, J.J., Munson, T.S.: Optimality measures for performance profiles. SIAM J. Optim. 16, 891–909 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  51. Orgerie, A.-C., Lefèvre, L., Gelas, J.P.: How an experimental grid is used: the grid5000 case and its impact on energy usage. In: Proceedings of 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid2008), pp. 19–22 (2008)

  52. Pawlish, M., Varde, A.S., Robila, S.A., Ranganathan, A.: A call for energy efficiency in data centers. SIGMOD Rec. 43(1), 45–51 (2014)

    Article  Google Scholar 

  53. DeCarlo, L.T.: On the meaning and use of kurtosis. Psychol. Methods 2, 292–307 (1997)

    Article  Google Scholar 

  54. Petersen, J.L.: Estimating the parameters of a Pareto distribution. University of Montana (2000)

  55. Rytgaard, M.: Estimation in the Pareto Distribution, pp. 201–216. Astin Bulletin 20.02 (1990)

  56. Luceño, A.: Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Comput. Stat. Data Anal. 51, 904–917 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  57. Weber, M.D., Leemis, L.M., Kincaid, R.K.: Minimum Kolmogorov-Smirnov test statistic parameter estimates. J. Stat. Comput. Simul. 76, 196–206 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  58. Clegg, R.G.: A practical guide to measuring the Hurst parameter. In: 21st UK Performance Engineering Workshop, School of Computing Science Technical Report Series, CSTR-916, pp. 43–55. University of Newcastle (2006)

  59. Kirichenko, L., Radivilova, T., Deineko, Z.: Comparative analysis for estimating of hurst exponent for stationary and nonstationary time series. Int. J. Inf. Technol. Knowl. 5(1), 371–388 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrei Tchernykh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramírez-Velarde, R., Tchernykh, A., Barba-Jimenez, C. et al. Adaptive Resource Allocation with Job Runtime Uncertainty. J Grid Computing 15, 415–434 (2017). https://doi.org/10.1007/s10723-017-9410-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-017-9410-6

Keywords

Navigation