Skip to main content
Log in

DAGMap: efficient and dependable scheduling of DAG workflow job in Grid

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

DAG has been extensively used in Grid workflow modeling. Since Grid resources tend to be heterogeneous and dynamic, efficient and dependable workflow job scheduling becomes essential. It poses great challenges to achieve minimum job accomplishing time and high resource utilization efficiency, while providing fault tolerance. Based on list scheduling and group scheduling, in this paper, we propose a novel scheduling heuristic called DAGMap. DAGMap consists of two phases, namely Static Mapping and Dependable Execution. Four salient features of DAGMap are: (1) Task grouping is based on dependency relationships and task upward priority; (2) Critical tasks are scheduled first; (3) Min-Min and Max-Min selective scheduling are used for independent tasks; and (4) Checkpoint server with cooperative checkpointing is designed for dependable execution. The experimental results show that DAGMap can achieve better performance than other previous algorithms in terms of speedup, efficiency, and dependability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Ramakrishnan A, Singh G, Zhao H, Deelman E, Sakellariou R, Vahi K, Blackburn K, Meyers D, Samidi M (2007) Scheduling data intensive workflows onto storage-constrained distributed resources. In: Proceedings of the 7th IEEE symposium on cluster computing and the grid (CCGrid’07), 2007

  2. Amin K, Hategan M, Laszewski GV, Zaluzec NJ, Hampton S, Rossi A (2004) GridAnt: a client-controllable grid workflow system. In: Proc 37th Hawai’i international conf on system science, 2004

  3. Malewicz G, Foster I, Rosenberg AL, Wilde M (2007) A tool for prioritizing DAGMan jobs and its evaluation. J Grid Comput 5(2):197–212

    Article  Google Scholar 

  4. Foster I (2005) Globus toolkit version 4: software for service-oriented systems. In: Lecture notes in computer science. vol 3779. Springer, Berlin

    Google Scholar 

  5. The Condor Project website (2007) Available: http://www.cs.wisc.edu/condor/

  6. You SY, Kim HY, Hwang DH, Kim SC (2004) Task scheduling algorithm in GRID considering heterogeneous environment. In: Proc of the international conference on parallel and distributed processing techniques and applications (PDPTA ’04), Nevada, USA, 2004, pp 240–245

  7. Mandal A, Kennedy K, Koelbel C, Marin G, Mellor-Crummey J, Liu B, Johnsson L (2005) Scheduling strategies for mapping application workflows onto the grid. In: IEEE international symposium on high performance distributed computing (HPDC’05), 2005

  8. Dong F, Akl SG (2006) Scheduling algorithms for grid computing: state of the art and open problems. Technical Report No. 2006-504, School of Computing, Queens University Kingston, Ontario

  9. Topcuoglu H, Hariri S, Wu M (2002) Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274

    Article  Google Scholar 

  10. Muthuvelu N, Liu J, Soe NL, Venugopal SR, Sulistio A, Buyya R (2005) A dynamic job grouping-based scheduling for deploying applications with fine-grained tasks on global grids. In: Proc 3rd Australasian workshop on grid computing and e-research, Australia, 2005

  11. Sakellariou R, Zhao H (2004) A hybrid heuristic for DAG scheduling on heterogeneous systems. In: Proc 13th heterogeneous computing workshop, USA, 2004

  12. Maheswaran M, Siegel HJ (1998) A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: Proc 7th heterogeneous computing workshop, 1998

  13. Etminani K, Naghibzadeh PM (2007) A Min-Min Max-Min selective algorithm for grid task scheduling. In: Proc 3rd IEEE/IFIP international conference in Central Asia, 2007

  14. Braun TD, Siegel HJ, Beck N, Boloni LL, Maheswaran M, Reuther AI, Robertson JP, Theys MD, Yao B (2001) A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 61(6):810–837

    Article  Google Scholar 

  15. Buyya R, Murshed M (2002) GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. J Concurr Comput Pract Exp (CCPE) 1175–1220

  16. Hall R, Rosenberg AL, Venkataramani A (2007) A comparison of DAG-scheduling strategies for internet-based computing. In: Proc 22nd international parallel and distributed processing symposium (IPDPS), 2007

  17. Sahoo RK, Oliner AJ, Rish I, Gupta M, Moreira JE, Ma S, Vilalta R, Sivasubramaniam A (2003) Critical event prediction for proactive management in large-scale computer clusters. In: Proc of the ACM SIGKDD, international conference on knowledge discovery and data mining, 2003, pp 426–435

  18. Liang Y, Zhang Y, Jette M, Sivasubramaniam A, Sahoo RK (2006) Blue gene/l failure analysis and prediction models. In: Proc of the international conference on dependable systems and networks (DSN), 2006

  19. Adam JO, Larry R, Ramendra KS (2006) Cooperative checkpointing: a robust approach to large-scale systems reliability. In: Proc of the 20th annual international conference on supercomputing, 2006

  20. Michael L, Todd T, Jim B, Miron L (1997) Checkpoint and migration of UNIX processes in the condor distributed processing system. University of Wisconsin-Madison Computer Sciences Technical Report 1346

  21. Stellner G (1996) Cocheck: checkpointing and process migration for MPI. In: Proc of the international parallel processing symposium, 1996

  22. Sudakov OO, Meshcheriakov IS, Boyko YV (2007) CHPOX: transparent checkpointing system for Linux clusters. In: Intelligent data acquisition and advanced computing systems: technology and applications (IDAACS 2007), 2007, pp 159–164

  23. Maoz T, Barak A, Amar L (2008) Combining virtual machine migration with process migration for HPC on multi-clusters and grids. In: IEEE Cluster, Tsukuba, 2008

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, H., Jin, H., Wu, X. et al. DAGMap: efficient and dependable scheduling of DAG workflow job in Grid. J Supercomput 51, 201–223 (2010). https://doi.org/10.1007/s11227-009-0284-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-009-0284-7

Keywords

Navigation