Skip to main content

Advertisement

Log in

Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

As computational Grids are increasingly used for executing long running multi-phase parallel applications, it is important to develop efficient rescheduling frameworks that adapt application execution in response to resource and application dynamics. In this paper, three strategies or algorithms have been developed for deciding when and where to reschedule parallel applications that execute on multi-cluster Grids. The algorithms derive rescheduling plans that consist of potential points in application execution for rescheduling and schedules of resources for application execution between two consecutive rescheduling points. Using large number of simulations, it is shown that the rescheduling plans developed by the algorithms can lead to large decrease in application execution times when compared to executions without rescheduling on dynamic Grid resources. The rescheduling plans generated by the algorithms are also shown to be competitive when compared to the near-optimal plans generated by brute-force methods. Of the algorithms, genetic algorithm yielded the most efficient rescheduling plans with 9–12% smaller average execution times than the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Beaumont, O., Carter, L., Ferrante, J., Legrand, A., Marchal, L., Robert, Y.: Centralized versus distributed schedulers for multiple bag-of-task applications. In: 20th International Parallel and Distributed Processing Symposium, pp. 10– (2006)

  2. Allen, G., Dramlitsch, T., Foster, I., Karonis, N., Ripeanu, M., Seidel, E., Toonen, B. Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In: Supercomputing ’01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp. 52– (2001)

  3. Sudarsan, R., Ribbens, C.: ReSHAPE: a framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment. In: ICPP ’07: Proceedings of the 2007 International Conference on Parallel Processing, p. 44 (2007)

  4. Vadhiyar, S., Dongarra, J.: A performance oriented migration framework for the Grid. In: CCGRID ’03: Proceedings of the 3st International Symposium on Cluster Computing and the Grid, p. 130 (2003)

  5. Huang, C., Zheng, G., Kalé, L., Kumar, S.: Performance evaluation of adaptive MPI. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 12–21 (2006)

  6. Maghraoui, K., Desell, T., Szymanski, B., Varela, C.: The Internet operating system: middleware for adaptive distributed computing. Int. J. High Perform. Comput. Appl. 20(4), 467–480 (2006)

    Article  Google Scholar 

  7. Wrzesinska, G., Maassen, J., Bal, H.: Self-adaptive applications on the Grid. In: PPoPP ’07: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 121–129 (2007)

  8. Hussein, M., Mayes, K., Luján, M., Gurd, J.: Adaptive performance control for distributed scientific coupled models. In: ICS ’07: Proceedings of the 21st Annual International Conference on Supercomputing, pp. 274–283 (2007)

  9. Desell, T., Maghraoui, K., Varela, C.: Malleable applications for scalable high performance computing. Cluster Comput. 10(3), pp. 323–337 (2007)

    Article  Google Scholar 

  10. Sanjay, H.A., Vadhiyar, S.: Performance modeling of parallel applications for Grid scheduling. J. Parallel Distrib. Comput. 68(8), 1135–1145 (2008)

    Article  Google Scholar 

  11. Sanjay, H., Vadhiyar, S.: Strategies for scheduling tightly-coupled parallel applications on clusters and Grids. Concurr. Comput. 21(18), 2491–2517 (2009)

    Article  Google Scholar 

  12. Vadhiyar, S., Dongarra, J.: SRS–a framework for developing malleable and migratable parallel applications for distributed systems. Parallel Process. Lett. 13(2), 291–312 (2003)

    Article  MathSciNet  Google Scholar 

  13. Fernandes, R., Pingali, K., Stodghill, P.: Mobile MPI programs in computational Grids. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 22–31 (2006)

  14. Zhang, Y., Koelbel, C., Cooper, K.: Hybrid re-scheduling mechanisms for workflow applications on multi-cluster Grid. In: CCGRID ’09: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 116–123 (2009)

  15. Gong, Y., Pierce, M., Fox, G.: Dynamic resource-critical workflow scheduling in heterogeneous environments. In: Job Scheduling Strategies for Parallel Processing: 14th International Workshop, JSSPP 2009, Rome, Italy, 29 May 2009. Revised Papers, pp. 1–15 (2009)

  16. Huedo, E., Montero, R., Llorente, I.: A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services. Future Gener. Comput. Syst. 23(2), 252–261 (2007)

    Article  Google Scholar 

  17. Vadhiyar, S., Dongarra, J.: GrADSolve: a Grid-based RPC system for parallel computing with application-level scheduling. J. Parallel Distrib. Comput. 64(6), 774–783 (2004)

    Article  Google Scholar 

  18. Zhang, Y., Koelbel, C., Kennedy, K.: Relative performance of scheduling algorithms in Grid environments. In: CCGRID ’07: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, pp. 521–528 (2007)

  19. Zhang, Y., Koelbel, C., Cooper, K.: Cluster-based hybrid scheduling mechanisms for workflow applications on the Grid. In: IEEE Fourth International Conference on eScience, pp. 390–391 (2008)

  20. Sakellariou, R., Zhao, H.: A low-cost rescheduling policy for efficient mapping of workflows on Grid systems. Sci. Program. 12(4), 253–262 (2004)

    Google Scholar 

  21. Elmroth, E., Tordsson, J.: A standards-based Grid resource brokering service supporting advance reservations, coallocation, and cross-Grid interoperability. Concurr. Comput. 21(18), 2298–2335 (2009)

    Article  Google Scholar 

  22. Dumitrescu, C., Raicu, I., Foster, I.: The design, usage, and performance of GRUBER: a Grid usage service level agreement based BrokERing infrastructure. J. Grid Computing 5(1), 99–126 (2007)

    Article  Google Scholar 

  23. Moltó, G., Hernández, V., Alonso, J.: A service-oriented WSRF-based architecture for metascheduling on computational Grids. Future Gener. Comput. Syst. 24(4), 317–328 (2008)

    Article  Google Scholar 

  24. Adzigogov, L., Soldatos, J., Polymenakos, L.: EMPEROR: an OGSA Grid meta-scheduler based on dynamic resource predictions. J. Grid Computing 3(1–2), 19–37 (2005)

    Article  Google Scholar 

  25. Foster, I.: Globus toolkit version 4: software for service-oriented systems. In: IFIP International Conference on Network and Parallel Computing. LNCS, vol. 3779, pp. 2–13. Springer, Berlin (2006)

    Google Scholar 

  26. WS Resource Framework. http://www.globus.org/wsrf

  27. Czajkowski, K., Foster, I., Kesselman, C.: Agreement-based resource management. Proc. IEEE 93(3), 631–643 (2005)

    Article  Google Scholar 

  28. Zhang, X., Freschl, J., Schopf, J.: A performance study of monitoring and information services for distributed systems. In: HPDC ’03: Proceedings of the 12th IEEE International Symposiumon High Performance Distributed Computing, p. 270 (2003)

  29. Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I., Foster, I.: The globus striped GridFTP framework and server. In: Proceedings of Super Computing 2005 (SC05) (2005)

  30. Welch, V., Siebenlist, F., Foster, I., Bresnahan, J., Czajkowski, K., Gawor, J., Kesselman, C., Meder, S., Pearlman, L., Tuecke, S.: Security for Grid services. In: HPDC ’03: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, p. 48 (2003)

  31. Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)

    Google Scholar 

  32. Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)

  33. Wolski, R., Spring, N., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. Future Gener. Comput. Syst. 15(5–6), 757–768 (1999)

    Article  Google Scholar 

  34. Shen, X., Zhong, Y., Ding, C.: Predicting locality phases for dynamic memory optimization. J. Parallel Distrib. Comput. 67(7), 783–796 (2007)

    Article  MATH  Google Scholar 

  35. Shen, X., Scott, M., Zhang, C., Dwarkadas, S., Ding, C., Ogihara, M: Analysis of input-dependent program behavior using active profiling. In: ExpCS ’07: Proceedings of the 2007 Workshop on Experimental Computer Science, p. 5 (2007)

  36. Ding, C., Dwarkadas, S., Huang, M., Shen, K., Carter, J.: Program phase detection and exploitation. In: 20th International Parallel and Distributed Processing Symposium (2006)

  37. ChaNGa (Charm N-body GrAvity Solver). http://librarian.phys.washington.edu/astro/index.php/Research:ChaNGa

  38. Athena Code Home Page. http://www.astro.princeton.edu/ jstone/athena.html

  39. LAMMPS Molecular Dynamics Simulator. http://lammps.sandia.gov

  40. MIT Photonic-Bands (MPB). http://ab-initio.mit.edu/wiki/index.php/MIT_Photonic_Bands

  41. Dinda, P., O’Hallaron, D.: Host load prediction using linear models. Cluster Comput. 3(4), 265–280 (2000)

    Article  Google Scholar 

  42. Dinda, P.: Online prediction of the running time of tasks. Cluster Comput. 5(3), 225–236 (2002)

    Article  Google Scholar 

  43. Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., Johnsson, L.: Scheduling strategies for mapping application workflows onto the Grid. In: HPDC ’05: Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium, pp. 125–134 (2005)

  44. Fox, G., Gannon, D.: Workflow in Grid systems. Concurr. Comput. 18(10), 1009–1019 (2006)

    Article  Google Scholar 

  45. Montagnat, J., Glatard, T., Plasencia, I., Castejn, F., Pennec, X., Taffoni, G., Voznesensky, V., Vuerli, C.: Workflow-based data parallel applications on the EGEE production Grid infrastructure. J. Grid Computing 6(4), 369–383 (2008)

    Article  Google Scholar 

  46. Ramakrishnan, L., Koelbel, C., Kee, Y.-S., Wolski, R., Nurmi, D., Gannon, D., Obertelli, G., YarKhan, A., Mandal, A., Huang, T., Thyagaraja, K., Zagorodnov, D.: VGrADS: enabling e-science workflows on Grids and clouds with fault tolerance. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12 (2009)

  47. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)

    Article  Google Scholar 

  48. Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sathish S. Vadhiyar.

Additional information

This work is supported by Department of Science and Technology, India. project ref. no. SR/S3/EECE/59/2005/8.6.06.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sanjay, H.A., Vadhiyar, S.S. Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids. J Grid Computing 9, 379–403 (2011). https://doi.org/10.1007/s10723-010-9170-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-010-9170-z

Keywords

Navigation