Abstract
As computational Grids are increasingly used for executing long running multi-phase parallel applications, it is important to develop efficient rescheduling frameworks that adapt application execution in response to resource and application dynamics. In this paper, three strategies or algorithms have been developed for deciding when and where to reschedule parallel applications that execute on multi-cluster Grids. The algorithms derive rescheduling plans that consist of potential points in application execution for rescheduling and schedules of resources for application execution between two consecutive rescheduling points. Using large number of simulations, it is shown that the rescheduling plans developed by the algorithms can lead to large decrease in application execution times when compared to executions without rescheduling on dynamic Grid resources. The rescheduling plans generated by the algorithms are also shown to be competitive when compared to the near-optimal plans generated by brute-force methods. Of the algorithms, genetic algorithm yielded the most efficient rescheduling plans with 9–12% smaller average execution times than the other algorithms.
Similar content being viewed by others
References
Beaumont, O., Carter, L., Ferrante, J., Legrand, A., Marchal, L., Robert, Y.: Centralized versus distributed schedulers for multiple bag-of-task applications. In: 20th International Parallel and Distributed Processing Symposium, pp. 10– (2006)
Allen, G., Dramlitsch, T., Foster, I., Karonis, N., Ripeanu, M., Seidel, E., Toonen, B. Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In: Supercomputing ’01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp. 52– (2001)
Sudarsan, R., Ribbens, C.: ReSHAPE: a framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment. In: ICPP ’07: Proceedings of the 2007 International Conference on Parallel Processing, p. 44 (2007)
Vadhiyar, S., Dongarra, J.: A performance oriented migration framework for the Grid. In: CCGRID ’03: Proceedings of the 3st International Symposium on Cluster Computing and the Grid, p. 130 (2003)
Huang, C., Zheng, G., Kalé, L., Kumar, S.: Performance evaluation of adaptive MPI. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 12–21 (2006)
Maghraoui, K., Desell, T., Szymanski, B., Varela, C.: The Internet operating system: middleware for adaptive distributed computing. Int. J. High Perform. Comput. Appl. 20(4), 467–480 (2006)
Wrzesinska, G., Maassen, J., Bal, H.: Self-adaptive applications on the Grid. In: PPoPP ’07: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 121–129 (2007)
Hussein, M., Mayes, K., Luján, M., Gurd, J.: Adaptive performance control for distributed scientific coupled models. In: ICS ’07: Proceedings of the 21st Annual International Conference on Supercomputing, pp. 274–283 (2007)
Desell, T., Maghraoui, K., Varela, C.: Malleable applications for scalable high performance computing. Cluster Comput. 10(3), pp. 323–337 (2007)
Sanjay, H.A., Vadhiyar, S.: Performance modeling of parallel applications for Grid scheduling. J. Parallel Distrib. Comput. 68(8), 1135–1145 (2008)
Sanjay, H., Vadhiyar, S.: Strategies for scheduling tightly-coupled parallel applications on clusters and Grids. Concurr. Comput. 21(18), 2491–2517 (2009)
Vadhiyar, S., Dongarra, J.: SRS–a framework for developing malleable and migratable parallel applications for distributed systems. Parallel Process. Lett. 13(2), 291–312 (2003)
Fernandes, R., Pingali, K., Stodghill, P.: Mobile MPI programs in computational Grids. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 22–31 (2006)
Zhang, Y., Koelbel, C., Cooper, K.: Hybrid re-scheduling mechanisms for workflow applications on multi-cluster Grid. In: CCGRID ’09: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 116–123 (2009)
Gong, Y., Pierce, M., Fox, G.: Dynamic resource-critical workflow scheduling in heterogeneous environments. In: Job Scheduling Strategies for Parallel Processing: 14th International Workshop, JSSPP 2009, Rome, Italy, 29 May 2009. Revised Papers, pp. 1–15 (2009)
Huedo, E., Montero, R., Llorente, I.: A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services. Future Gener. Comput. Syst. 23(2), 252–261 (2007)
Vadhiyar, S., Dongarra, J.: GrADSolve: a Grid-based RPC system for parallel computing with application-level scheduling. J. Parallel Distrib. Comput. 64(6), 774–783 (2004)
Zhang, Y., Koelbel, C., Kennedy, K.: Relative performance of scheduling algorithms in Grid environments. In: CCGRID ’07: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, pp. 521–528 (2007)
Zhang, Y., Koelbel, C., Cooper, K.: Cluster-based hybrid scheduling mechanisms for workflow applications on the Grid. In: IEEE Fourth International Conference on eScience, pp. 390–391 (2008)
Sakellariou, R., Zhao, H.: A low-cost rescheduling policy for efficient mapping of workflows on Grid systems. Sci. Program. 12(4), 253–262 (2004)
Elmroth, E., Tordsson, J.: A standards-based Grid resource brokering service supporting advance reservations, coallocation, and cross-Grid interoperability. Concurr. Comput. 21(18), 2298–2335 (2009)
Dumitrescu, C., Raicu, I., Foster, I.: The design, usage, and performance of GRUBER: a Grid usage service level agreement based BrokERing infrastructure. J. Grid Computing 5(1), 99–126 (2007)
Moltó, G., Hernández, V., Alonso, J.: A service-oriented WSRF-based architecture for metascheduling on computational Grids. Future Gener. Comput. Syst. 24(4), 317–328 (2008)
Adzigogov, L., Soldatos, J., Polymenakos, L.: EMPEROR: an OGSA Grid meta-scheduler based on dynamic resource predictions. J. Grid Computing 3(1–2), 19–37 (2005)
Foster, I.: Globus toolkit version 4: software for service-oriented systems. In: IFIP International Conference on Network and Parallel Computing. LNCS, vol. 3779, pp. 2–13. Springer, Berlin (2006)
WS Resource Framework. http://www.globus.org/wsrf
Czajkowski, K., Foster, I., Kesselman, C.: Agreement-based resource management. Proc. IEEE 93(3), 631–643 (2005)
Zhang, X., Freschl, J., Schopf, J.: A performance study of monitoring and information services for distributed systems. In: HPDC ’03: Proceedings of the 12th IEEE International Symposiumon High Performance Distributed Computing, p. 270 (2003)
Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I., Foster, I.: The globus striped GridFTP framework and server. In: Proceedings of Super Computing 2005 (SC05) (2005)
Welch, V., Siebenlist, F., Foster, I., Bresnahan, J., Czajkowski, K., Gawor, J., Kesselman, C., Meder, S., Pearlman, L., Tuecke, S.: Security for Grid services. In: HPDC ’03: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, p. 48 (2003)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Wolski, R., Spring, N., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. Future Gener. Comput. Syst. 15(5–6), 757–768 (1999)
Shen, X., Zhong, Y., Ding, C.: Predicting locality phases for dynamic memory optimization. J. Parallel Distrib. Comput. 67(7), 783–796 (2007)
Shen, X., Scott, M., Zhang, C., Dwarkadas, S., Ding, C., Ogihara, M: Analysis of input-dependent program behavior using active profiling. In: ExpCS ’07: Proceedings of the 2007 Workshop on Experimental Computer Science, p. 5 (2007)
Ding, C., Dwarkadas, S., Huang, M., Shen, K., Carter, J.: Program phase detection and exploitation. In: 20th International Parallel and Distributed Processing Symposium (2006)
ChaNGa (Charm N-body GrAvity Solver). http://librarian.phys.washington.edu/astro/index.php/Research:ChaNGa
Athena Code Home Page. http://www.astro.princeton.edu/ jstone/athena.html
LAMMPS Molecular Dynamics Simulator. http://lammps.sandia.gov
MIT Photonic-Bands (MPB). http://ab-initio.mit.edu/wiki/index.php/MIT_Photonic_Bands
Dinda, P., O’Hallaron, D.: Host load prediction using linear models. Cluster Comput. 3(4), 265–280 (2000)
Dinda, P.: Online prediction of the running time of tasks. Cluster Comput. 5(3), 225–236 (2002)
Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., Johnsson, L.: Scheduling strategies for mapping application workflows onto the Grid. In: HPDC ’05: Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium, pp. 125–134 (2005)
Fox, G., Gannon, D.: Workflow in Grid systems. Concurr. Comput. 18(10), 1009–1019 (2006)
Montagnat, J., Glatard, T., Plasencia, I., Castejn, F., Pennec, X., Taffoni, G., Voznesensky, V., Vuerli, C.: Workflow-based data parallel applications on the EGEE production Grid infrastructure. J. Grid Computing 6(4), 369–383 (2008)
Ramakrishnan, L., Koelbel, C., Kee, Y.-S., Wolski, R., Nurmi, D., Gannon, D., Obertelli, G., YarKhan, A., Mandal, A., Huang, T., Thyagaraja, K., Zagorodnov, D.: VGrADS: enabling e-science workflows on Grids and clouds with fault tolerance. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12 (2009)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)
Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by Department of Science and Technology, India. project ref. no. SR/S3/EECE/59/2005/8.6.06.
Rights and permissions
About this article
Cite this article
Sanjay, H.A., Vadhiyar, S.S. Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids. J Grid Computing 9, 379–403 (2011). https://doi.org/10.1007/s10723-010-9170-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-010-9170-z