Skip to main content
Log in

Dependable Grid Workflow Scheduling Based on Resource Availability

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Due to the highly dynamic feature, dependable workflow scheduling is critical in the Grid environment. Various scheduling algorithms have been proposed, but seldom consider the resource reliability. Current Grid systems mainly exploit fault tolerance mechanism to guarantee the dependable workflow execution, which, however, wastes system resources. The paper proposes a dependable Grid workflow scheduling system (called DGWS). It introduces a Markov Chain-based resource availability prediction model. Based on the model, a reliability cost driven workflow scheduling algorithm is presented. The performance evaluation results, including the simulation on both parametric randomly generated DAGs and two real scientific workflow applications, demonstrate that compared to present workflow scheduling algorithms, DGWS improves the success ratio of tasks and diminishes the makespan of workflow, so improves the dependability of workflow execution in the dynamic Grid environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Topcuoglu, H., Hariri, S., Wu, M.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  2. Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., Johnsson, L.: Scheduling strategies for mapping application workflows onto the Grid. In: Proc. of 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), pp. 125–134. IEEE Computer Society, Research Triangle Park, North Carolina, USA (2005)

    Google Scholar 

  3. Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architecture. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)

    Article  Google Scholar 

  4. Hwang, S., Kesselman, C.: Grid workflow: a flexible failure handling framework for the Grid. In: Proc. of 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12), pp. 126–137, Seattle, Washington, USA. IEEE Computer Society Press, Los Alamitos, CA, USA, (2003)

    Google Scholar 

  5. He, Y., Shao, Z., Xiao, B., Zhuge, Q., Sha, E.: Reliability driven task scheduling for heterogeneous systems. In: The 15th IASTED International Conference on Parallel and Distributed Computing and Systems 1, pp. 465–470 (2003)

  6. Qin, X., Jiang, H., Swanson, D.R.: An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In: Proc. of the 2002 International Conference on Parallel Processing, pp. 360–368 (2002)

  7. Truong, H.L., Fahringer, T., Dustdar, S.: Dynamic instrumentation, performance monitoring and analysis of Grid scientific workflows. J. Grid Computing 3(1–2), 1–18 (2005)

    Article  Google Scholar 

  8. Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)

    Article  Google Scholar 

  9. Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of Grid resource management systems for distributed computing. Softw. Pract. Exp. 32(2), 135–164 (2002)

    Article  MATH  Google Scholar 

  10. Cao, J., Jarvis, S.A., Saini, S., Nudd, G.R.: GridFlow: workflow management for Grid computing. In: 3rd International Symposium on Cluster Computing and the Grid (CCGrid). IEEE Computer Society Press, Los Alamitos, Tokyo, Japan (2003)

    Google Scholar 

  11. Buyya, R., Murshed, M., Abramson, D., Venugopal, S.: Scheduling parameter sweep applications on global Grids: a deadline and budget constrained cost-time optimization algorithm. Softw. Pract. Exp. (SPE) J. 35(5), 491–512 (2005)

    Article  Google Scholar 

  12. Vanmechelen, K., Depoorter, W., Broeckhove, J.: Combining futures and spot markets: a hybrid market approach to economic Grid resource management. J. Grid Computing 9(1), 81–94 (2011)

    Article  Google Scholar 

  13. Prodan, R., Wieczorek, M., Mohammadi Fard, H.: Double auction-based scheduling of scientific applications in distributed Grid and cloud environments. J. Grid Computing 9(4), 531–548 (2011)

    Article  Google Scholar 

  14. Song, S.S., Hwang, K., Kwok, Y.K.: Trusted Grid computing with security binding and trust integration. J. Grid Comput. 3(1–2), 53–73 (2005)

    Article  Google Scholar 

  15. Sahoo, R., Sivasubramaniam, A., Squillante, M.S., Zhang, Y.: Failure data analysis of a large-scale heterogeneous server environment. In: The International Conference on Dependable Systems and Networks (DSN), Florence, Italy (2004)

  16. Heath, T., Martin, R., Nguyen, T.D.: Improving cluster availability using workstation validation. In: The ACM SIGMETRICS 2002, pp. 217–227. Marina Del Rey, CA (2002)

    Google Scholar 

  17. Sahoo, R., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., Ma, S., Vilalta, R., Sivasubramaniam, A.: Critical event prediction for proactive management in large-scale computing clusters. In: Proc. of the ACM SIGKDD, pp. 426–435 (2003)

  18. Fu, S., Xu, C.-Z.: Quantifying temporal and spatial correlation of failure events for proactive management. In: Proc. of IEEE International Symposium on Reliable Distributed Systems (SRDS), pp. 175–184 (2007)

  19. Nurmi, D., Brevik, J., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. In: Technical Report CS2003–28, U.C. Santa Barbara Computer Science Department (2003)

  20. Brevik, J., Nurmi, D., Wolski, R.: Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems. In: Proc. of CCGrid’04, pp. 190–199 (2004)

  21. Ren, X.J., Lee, S., Eigenmann, R., Bagchi, S.: Resource failure prediction in fine-grained cycle sharing systems. In: Proc. of 15th IEEE International Symposium on High Performance Distributed Computing, pp. 93–104. IEEE Computer Society Paris, France (2006)

    Google Scholar 

  22. Ren, X.J., Lee, S., Eigenmann, R., Bagchi, S.: Prediction of resource availability in fine-grained cycle sharing systems empirical evaluation. J. Grid Computing 5(2), 173–195 (2007)

    Article  Google Scholar 

  23. Malewicz, G., Foster, I., Rosenberg, A.L., Wilde, M.: A tool for prioritizing DAGMan jobs and its evaluation. J. Grid Computing 5(2), 197–212 (2007)

    Article  Google Scholar 

  24. Wu, M., Sun, X.H.: Grid harvest service: a performance system of Grid computing. J. Parallel Distrib. Comput. 66(10), 1322–1337 (2006)

    Article  MATH  Google Scholar 

  25. Sen, A., Bhattacharyya, G.K.: A piecewise exponential model for reliability growth and associated inferences. In: Basu, A.P. (ed.) Advances in Reliability, pp. 331–355. Elsevier (1993)

  26. Calabria, R., Guida, M., Pulcini, G.: A Bayes procedure for estimation of current system reliability. IEEE Trans. Reliab. 41, 616–620 (1992)

    Article  MATH  Google Scholar 

  27. Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Introducing Markov chain Monte Carlo. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 1–19. Chapman & Hall, London (1996)

  28. Sakellariou, R., Zhao, H.: A hybrid heuristic for DAG scheduling on heterogeneous systems. In: Proc. of 13th Heterogeneous Computing Workshop (HCW-2004), Santa Fe, New Mexico, USA (2004)

  29. Jin, H.: ChinaGrid: making Grid computing a reality. In: Digital Libraries: International Collaboration and Cross-Fertilization, Lecture Notes in Computer Science, vol. 3334, pp. 13–24. Springer (2004)

  30. Buyya, R., Murshed, M.: GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing. J. Concurr. Comput. Pract. Exp. 14(13–15), 1175–1220 (2002)

    Article  MATH  Google Scholar 

  31. Zhang, Y., Squillante, M.S., Sivasubramaniam, A., Sahoo, R.K.: Performance implications of failures in large-scale cluster scheduling. In: 10th Workshop on JSSPP, SIGMETRICS, pp. 233–252 (2004)

  32. Kato, S., Osogami, T.: Evaluating availability under quasi-heavy-tailed repair times. In: Proc. of Dependable Systems and Networks with FTCS and DCC, 2008, DSN 2008, pp. 442–451 (2008)

  33. Matlab by Mathworks: http://www.matlab.com. Accessed 1 Aug 2011

  34. Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distributions via the EM algorithm. Scand. J. Statist. 23, 419–441 (1996)

    MATH  Google Scholar 

  35. Cosnard, M., Marrakchi, M., Robert, Y., Trystram, D.: Parallel gaussian elimination on an MIMD computer. Parallel Comput. 6, 275–295 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  36. Sulakhe, D., Rodriguez, A., D’Souza, M., Wilde, M., Nefedova, V., Foster, I., Maltsev, N.: GNARE: an environment for Grid-based high throughput genome analysis. In: Proc. of 5th IEEE Int. Symp. Cluster Computing and Grid (CCGrid05), vol. 1, pp. 455–462. Cardiff, UK (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongcai Tao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tao, Y., Jin, H., Wu, S. et al. Dependable Grid Workflow Scheduling Based on Resource Availability. J Grid Computing 11, 47–61 (2013). https://doi.org/10.1007/s10723-012-9237-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-012-9237-0

Keywords

Navigation