Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids

Sanjay, H. A.; Vadhiyar, Sathish S.

doi:10.1007/s10723-010-9170-z

Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids

Published: 09 November 2010

Volume 9, pages 379–403, (2011)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

H. A. Sanjay¹ &
Sathish S. Vadhiyar¹

142 Accesses
4 Citations
Explore all metrics

Abstract

As computational Grids are increasingly used for executing long running multi-phase parallel applications, it is important to develop efficient rescheduling frameworks that adapt application execution in response to resource and application dynamics. In this paper, three strategies or algorithms have been developed for deciding when and where to reschedule parallel applications that execute on multi-cluster Grids. The algorithms derive rescheduling plans that consist of potential points in application execution for rescheduling and schedules of resources for application execution between two consecutive rescheduling points. Using large number of simulations, it is shown that the rescheduling plans developed by the algorithms can lead to large decrease in application execution times when compared to executions without rescheduling on dynamic Grid resources. The rescheduling plans generated by the algorithms are also shown to be competitive when compared to the near-optimal plans generated by brute-force methods. Of the algorithms, genetic algorithm yielded the most efficient rescheduling plans with 9–12% smaller average execution times than the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

Peter Thoman & Philip Salzmann

LS-HTC: an HTC system for large-scale jobs

Article 11 March 2024

Juncheng Hu, Xilong Che, … Yuhan Shao

A high-performance computing applied to composition reservoir simulation using distributed memory and 3D hybrid unstructured grids

Article 06 April 2024

Ivens da Costa Menezes Lima, Anthonio Nunes Moreira-Netto, … Francisco Marcondes

References

Beaumont, O., Carter, L., Ferrante, J., Legrand, A., Marchal, L., Robert, Y.: Centralized versus distributed schedulers for multiple bag-of-task applications. In: 20th International Parallel and Distributed Processing Symposium, pp. 10– (2006)
Allen, G., Dramlitsch, T., Foster, I., Karonis, N., Ripeanu, M., Seidel, E., Toonen, B. Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In: Supercomputing ’01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp. 52– (2001)
Sudarsan, R., Ribbens, C.: ReSHAPE: a framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment. In: ICPP ’07: Proceedings of the 2007 International Conference on Parallel Processing, p. 44 (2007)
Vadhiyar, S., Dongarra, J.: A performance oriented migration framework for the Grid. In: CCGRID ’03: Proceedings of the 3st International Symposium on Cluster Computing and the Grid, p. 130 (2003)
Huang, C., Zheng, G., Kalé, L., Kumar, S.: Performance evaluation of adaptive MPI. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 12–21 (2006)
Maghraoui, K., Desell, T., Szymanski, B., Varela, C.: The Internet operating system: middleware for adaptive distributed computing. Int. J. High Perform. Comput. Appl. 20(4), 467–480 (2006)
Article Google Scholar
Wrzesinska, G., Maassen, J., Bal, H.: Self-adaptive applications on the Grid. In: PPoPP ’07: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 121–129 (2007)
Hussein, M., Mayes, K., Luján, M., Gurd, J.: Adaptive performance control for distributed scientific coupled models. In: ICS ’07: Proceedings of the 21st Annual International Conference on Supercomputing, pp. 274–283 (2007)
Desell, T., Maghraoui, K., Varela, C.: Malleable applications for scalable high performance computing. Cluster Comput. 10(3), pp. 323–337 (2007)
Article Google Scholar
Sanjay, H.A., Vadhiyar, S.: Performance modeling of parallel applications for Grid scheduling. J. Parallel Distrib. Comput. 68(8), 1135–1145 (2008)
Article Google Scholar
Sanjay, H., Vadhiyar, S.: Strategies for scheduling tightly-coupled parallel applications on clusters and Grids. Concurr. Comput. 21(18), 2491–2517 (2009)
Article Google Scholar
Vadhiyar, S., Dongarra, J.: SRS–a framework for developing malleable and migratable parallel applications for distributed systems. Parallel Process. Lett. 13(2), 291–312 (2003)
Article MathSciNet Google Scholar
Fernandes, R., Pingali, K., Stodghill, P.: Mobile MPI programs in computational Grids. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 22–31 (2006)
Zhang, Y., Koelbel, C., Cooper, K.: Hybrid re-scheduling mechanisms for workflow applications on multi-cluster Grid. In: CCGRID ’09: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 116–123 (2009)
Gong, Y., Pierce, M., Fox, G.: Dynamic resource-critical workflow scheduling in heterogeneous environments. In: Job Scheduling Strategies for Parallel Processing: 14th International Workshop, JSSPP 2009, Rome, Italy, 29 May 2009. Revised Papers, pp. 1–15 (2009)
Huedo, E., Montero, R., Llorente, I.: A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services. Future Gener. Comput. Syst. 23(2), 252–261 (2007)
Article Google Scholar
Vadhiyar, S., Dongarra, J.: GrADSolve: a Grid-based RPC system for parallel computing with application-level scheduling. J. Parallel Distrib. Comput. 64(6), 774–783 (2004)
Article Google Scholar
Zhang, Y., Koelbel, C., Kennedy, K.: Relative performance of scheduling algorithms in Grid environments. In: CCGRID ’07: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, pp. 521–528 (2007)
Zhang, Y., Koelbel, C., Cooper, K.: Cluster-based hybrid scheduling mechanisms for workflow applications on the Grid. In: IEEE Fourth International Conference on eScience, pp. 390–391 (2008)
Sakellariou, R., Zhao, H.: A low-cost rescheduling policy for efficient mapping of workflows on Grid systems. Sci. Program. 12(4), 253–262 (2004)
Google Scholar
Elmroth, E., Tordsson, J.: A standards-based Grid resource brokering service supporting advance reservations, coallocation, and cross-Grid interoperability. Concurr. Comput. 21(18), 2298–2335 (2009)
Article Google Scholar
Dumitrescu, C., Raicu, I., Foster, I.: The design, usage, and performance of GRUBER: a Grid usage service level agreement based BrokERing infrastructure. J. Grid Computing 5(1), 99–126 (2007)
Article Google Scholar
Moltó, G., Hernández, V., Alonso, J.: A service-oriented WSRF-based architecture for metascheduling on computational Grids. Future Gener. Comput. Syst. 24(4), 317–328 (2008)
Article Google Scholar
Adzigogov, L., Soldatos, J., Polymenakos, L.: EMPEROR: an OGSA Grid meta-scheduler based on dynamic resource predictions. J. Grid Computing 3(1–2), 19–37 (2005)
Article Google Scholar
Foster, I.: Globus toolkit version 4: software for service-oriented systems. In: IFIP International Conference on Network and Parallel Computing. LNCS, vol. 3779, pp. 2–13. Springer, Berlin (2006)
Google Scholar
WS Resource Framework. http://www.globus.org/wsrf
Czajkowski, K., Foster, I., Kesselman, C.: Agreement-based resource management. Proc. IEEE 93(3), 631–643 (2005)
Article Google Scholar
Zhang, X., Freschl, J., Schopf, J.: A performance study of monitoring and information services for distributed systems. In: HPDC ’03: Proceedings of the 12th IEEE International Symposiumon High Performance Distributed Computing, p. 270 (2003)
Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I., Foster, I.: The globus striped GridFTP framework and server. In: Proceedings of Super Computing 2005 (SC05) (2005)
Welch, V., Siebenlist, F., Foster, I., Bresnahan, J., Czajkowski, K., Gawor, J., Kesselman, C., Meder, S., Pearlman, L., Tuecke, S.: Security for Grid services. In: HPDC ’03: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, p. 48 (2003)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Wolski, R., Spring, N., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. Future Gener. Comput. Syst. 15(5–6), 757–768 (1999)
Article Google Scholar
Shen, X., Zhong, Y., Ding, C.: Predicting locality phases for dynamic memory optimization. J. Parallel Distrib. Comput. 67(7), 783–796 (2007)
Article MATH Google Scholar
Shen, X., Scott, M., Zhang, C., Dwarkadas, S., Ding, C., Ogihara, M: Analysis of input-dependent program behavior using active profiling. In: ExpCS ’07: Proceedings of the 2007 Workshop on Experimental Computer Science, p. 5 (2007)
Ding, C., Dwarkadas, S., Huang, M., Shen, K., Carter, J.: Program phase detection and exploitation. In: 20th International Parallel and Distributed Processing Symposium (2006)
ChaNGa (Charm N-body GrAvity Solver). http://librarian.phys.washington.edu/astro/index.php/Research:ChaNGa
Athena Code Home Page. http://www.astro.princeton.edu/ jstone/athena.html
LAMMPS Molecular Dynamics Simulator. http://lammps.sandia.gov
MIT Photonic-Bands (MPB). http://ab-initio.mit.edu/wiki/index.php/MIT_Photonic_Bands
Dinda, P., O’Hallaron, D.: Host load prediction using linear models. Cluster Comput. 3(4), 265–280 (2000)
Article Google Scholar
Dinda, P.: Online prediction of the running time of tasks. Cluster Comput. 5(3), 225–236 (2002)
Article Google Scholar
Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., Johnsson, L.: Scheduling strategies for mapping application workflows onto the Grid. In: HPDC ’05: Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium, pp. 125–134 (2005)
Fox, G., Gannon, D.: Workflow in Grid systems. Concurr. Comput. 18(10), 1009–1019 (2006)
Article Google Scholar
Montagnat, J., Glatard, T., Plasencia, I., Castejn, F., Pennec, X., Taffoni, G., Voznesensky, V., Vuerli, C.: Workflow-based data parallel applications on the EGEE production Grid infrastructure. J. Grid Computing 6(4), 369–383 (2008)
Article Google Scholar
Ramakrishnan, L., Koelbel, C., Kee, Y.-S., Wolski, R., Nurmi, D., Gannon, D., Obertelli, G., YarKhan, A., Mandal, A., Huang, T., Thyagaraja, K., Zagorodnov, D.: VGrADS: enabling e-science workflows on Grids and clouds with fault tolerance. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12 (2009)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)
Article Google Scholar
Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India
H. A. Sanjay & Sathish S. Vadhiyar

Authors

H. A. Sanjay
View author publications
You can also search for this author in PubMed Google Scholar
Sathish S. Vadhiyar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sathish S. Vadhiyar.

Additional information

This work is supported by Department of Science and Technology, India. project ref. no. SR/S3/EECE/59/2005/8.6.06.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sanjay, H.A., Vadhiyar, S.S. Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids. J Grid Computing 9, 379–403 (2011). https://doi.org/10.1007/s10723-010-9170-z

Download citation

Received: 01 February 2010
Accepted: 26 October 2010
Published: 09 November 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10723-010-9170-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

LS-HTC: an HTC system for large-scale jobs

A high-performance computing applied to composition reservoir simulation using distributed memory and 3D hybrid unstructured grids

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

LS-HTC: an HTC system for large-scale jobs

A high-performance computing applied to composition reservoir simulation using distributed memory and 3D hybrid unstructured grids

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation