Skip to main content
Log in

A New Grid Scheduler with Failure Recovery and Rescheduling Mechanisms: Discussion and Analysis

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Computational Grids (CGs) have become an appealing research area. They suggest a suitable environment for developing large scale parallel applications. CGs integrate a huge mount of distributed heterogeneous resources for constituting a powerful virtual supercomputer. Scheduling is the most important issue for enhancing the performance of CGs. Various strategies have been introduced, including static and dynamic behaviors. The former maps tasks to resources at submission time, while the latter operates at run time. While static scheduling is unsuitable for the dynamic Grid environment, scheduling in CGs is still more complex than the proposed dynamic solutions. This paper introduces a decentralized Adaptive Grid Scheduler (AGS) based on a novel rescheduling mechanism. AGS has several salient properties as it is; hybrid, adaptive, decentralized, and efficient. Also, AGS is a robust mechanism as it has the ability to; (i) detect resource failures, (ii) continue its functionality in spite of the failure existence, then (iii) recover back. Moreover, it integrates both static and dynamic scheduling behaviors. An initial static scheduling map is proposed for an input Direct Acyclic Graph (DAG). However, DAG tasks may be rescheduled if the performance of the allocated resources changes in away that may affect the tasks’ response time. AGS overcomes drawbacks of traditional schedulers by utilizing the mobile agent unique features to enhance the resource discovery and monitoring processes. Experimental results have shown that AGS outperforms traditional Grid schedulers as it introduces a better scheduling efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yu, Z., Shi, W.: An adaptive rescheduling strategy for grid workflow applications. In: Parallel and Distributed Processing Symposium. IPDPS 2007. IEEE International, pp. 1–18 (2007)

  2. Bharadwaj, V., Ghose, D., Mani, V., Robertazzi, T.: Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Comput. Soc. Press (1996)

  3. Lang, D., Oshima, M.: Seven good reasons for mobile agents. Commun. ACM 42(8), 88–89 (1999)

    Article  Google Scholar 

  4. Aversa, R., Di Martino, B., Mazzocca, N., Venticinque, S.: MAGDA: A mobile agent based grid architecture. J. Grid Computing 4(4), 395–412 (2006)

    Article  MATH  Google Scholar 

  5. Negri, A., Poggi, A., et al.: Dynamic grid tasks composition and distribution through agents. Concurr. Comput. Pract. Exp. 18(8), 875–885 (2006)

    Article  Google Scholar 

  6. Choi, S., Baik, M. et al.: Mobile agent based adaptive scheduling mechanism in peer-to-peer grid computing. In: Proc. of ICCSA 2005, pp. 936–947. Singapore (2005)

  7. Kontothanassis, L., Goddeau, D.: Profile driven scheduling for a heterogeneous server cluster. In: Proceedings of the 34th International Conference on Parallel Processing Workshops, pp. 336–345 (2005)

  8. Kwok, Y., Ahmad, I.: Benchmarking and comparison of the task graph scheduling algorithms. J. Parallel Distrib. Comput. 59(3), 381–422 (1999)

    Article  MATH  Google Scholar 

  9. Braun, T., Siegel, H., Beck, N.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61, 810–837 (2001)

    Article  Google Scholar 

  10. Zhang, Y., Koelbel, C., Kennedy, K.: Relative performance of scheduling algorithms in Grid Environments. In: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, pp. 521–528 (2007)

  11. Topcuoglu, H., Hariri, S.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 2(13), 260–274 (2002)

    Article  Google Scholar 

  12. Baskiyar, S., Dickinson, C.: Scheduling directed acyclic task graphs on a bounded set of heterogeneous processors using task duplication. J. Parallel Distrib. Comput. 65(8), 911–921 (2005)

    Article  MATH  Google Scholar 

  13. Zhao, H., Sakellariou, R.: An experimental investigation into the rank function of the heterogeneous earliest finish time scheduling algorithm. In: Proc. of Euro-Par 2003, LNCS 2790, pp. 189–194. Springer, Klagenfurt, Austria (2003)

    Chapter  Google Scholar 

  14. Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON Grid environment. ACM SIGMOD Record 34(3), 56–62 (2005)

    Article  Google Scholar 

  15. Huang, C., Zheng, Y., Chen, D.: A scheduling approach with respect to overlap of computing and data transferring in grid computing. In: Proc. GCC (2), pp. 105–112 (2003)

  16. Boutammine, S., Millot, D., Parrot, C.: An adaptive scheduling method for grid computing. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par, pp. 188–197 (2006)

  17. Silberschatz, A., Galvin, P., Gagne, G.: Process Scheduling. Operating System Concepts, 8th edn., p. 194. Wiley, Asia. ISBN 978–0–47023399–3. 5.3.4 Round Robin Scheduling (2010)

  18. Mandal, A, Kennedy, K., Koelbel, C., Marin, G., Johnsson, L.: Scheduling strategies for mapping application workflows onto the Grid. In: 14-th IEEE Symposium on High Performance Distributed Computing (HPDC14), pp. 125–134 (2005)

  19. Armstrong, R., Hensgen, D., Kidd, T.: The relative performance of various mapping algorithms is independent of sizable variances in run-time predictions. In: 7th IEEE Heterogeneous Computing Workshop, pp. 79–87 (1998)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Ibrahim Saleh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saleh, A.I., Sarhan, A.M. & Hamed, A.M. A New Grid Scheduler with Failure Recovery and Rescheduling Mechanisms: Discussion and Analysis. J Grid Computing 10, 211–235 (2012). https://doi.org/10.1007/s10723-011-9200-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-011-9200-5

Keywords

Navigation