DAGMap: efficient and dependable scheduling of DAG workflow job in Grid

Cao, Haijun; Jin, Hai; Wu, Xiaoxin; Wu, Song; Shi, Xuanhua

doi:10.1007/s11227-009-0284-7

DAGMap: efficient and dependable scheduling of DAG workflow job in Grid

Published: 06 May 2009

Volume 51, pages 201–223, (2010)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Haijun Cao¹,
Hai Jin¹,
Xiaoxin Wu²,
Song Wu¹ &
…
Xuanhua Shi¹

864 Accesses
28 Citations
Explore all metrics

Abstract

DAG has been extensively used in Grid workflow modeling. Since Grid resources tend to be heterogeneous and dynamic, efficient and dependable workflow job scheduling becomes essential. It poses great challenges to achieve minimum job accomplishing time and high resource utilization efficiency, while providing fault tolerance. Based on list scheduling and group scheduling, in this paper, we propose a novel scheduling heuristic called DAGMap. DAGMap consists of two phases, namely Static Mapping and Dependable Execution. Four salient features of DAGMap are: (1) Task grouping is based on dependency relationships and task upward priority; (2) Critical tasks are scheduled first; (3) Min-Min and Max-Min selective scheduling are used for independent tasks; and (4) Checkpoint server with cooperative checkpointing is designed for dependable execution. The experimental results show that DAGMap can achieve better performance than other previous algorithms in terms of speedup, efficiency, and dependability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ramakrishnan A, Singh G, Zhao H, Deelman E, Sakellariou R, Vahi K, Blackburn K, Meyers D, Samidi M (2007) Scheduling data intensive workflows onto storage-constrained distributed resources. In: Proceedings of the 7th IEEE symposium on cluster computing and the grid (CCGrid’07), 2007
Amin K, Hategan M, Laszewski GV, Zaluzec NJ, Hampton S, Rossi A (2004) GridAnt: a client-controllable grid workflow system. In: Proc 37th Hawai’i international conf on system science, 2004
Malewicz G, Foster I, Rosenberg AL, Wilde M (2007) A tool for prioritizing DAGMan jobs and its evaluation. J Grid Comput 5(2):197–212
Article Google Scholar
Foster I (2005) Globus toolkit version 4: software for service-oriented systems. In: Lecture notes in computer science. vol 3779. Springer, Berlin
Google Scholar
The Condor Project website (2007) Available: http://www.cs.wisc.edu/condor/
You SY, Kim HY, Hwang DH, Kim SC (2004) Task scheduling algorithm in GRID considering heterogeneous environment. In: Proc of the international conference on parallel and distributed processing techniques and applications (PDPTA ’04), Nevada, USA, 2004, pp 240–245
Mandal A, Kennedy K, Koelbel C, Marin G, Mellor-Crummey J, Liu B, Johnsson L (2005) Scheduling strategies for mapping application workflows onto the grid. In: IEEE international symposium on high performance distributed computing (HPDC’05), 2005
Dong F, Akl SG (2006) Scheduling algorithms for grid computing: state of the art and open problems. Technical Report No. 2006-504, School of Computing, Queens University Kingston, Ontario
Topcuoglu H, Hariri S, Wu M (2002) Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Article Google Scholar
Muthuvelu N, Liu J, Soe NL, Venugopal SR, Sulistio A, Buyya R (2005) A dynamic job grouping-based scheduling for deploying applications with fine-grained tasks on global grids. In: Proc 3rd Australasian workshop on grid computing and e-research, Australia, 2005
Sakellariou R, Zhao H (2004) A hybrid heuristic for DAG scheduling on heterogeneous systems. In: Proc 13th heterogeneous computing workshop, USA, 2004
Maheswaran M, Siegel HJ (1998) A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: Proc 7th heterogeneous computing workshop, 1998
Etminani K, Naghibzadeh PM (2007) A Min-Min Max-Min selective algorithm for grid task scheduling. In: Proc 3rd IEEE/IFIP international conference in Central Asia, 2007
Braun TD, Siegel HJ, Beck N, Boloni LL, Maheswaran M, Reuther AI, Robertson JP, Theys MD, Yao B (2001) A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 61(6):810–837
Article Google Scholar
Buyya R, Murshed M (2002) GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. J Concurr Comput Pract Exp (CCPE) 1175–1220
Hall R, Rosenberg AL, Venkataramani A (2007) A comparison of DAG-scheduling strategies for internet-based computing. In: Proc 22nd international parallel and distributed processing symposium (IPDPS), 2007
Sahoo RK, Oliner AJ, Rish I, Gupta M, Moreira JE, Ma S, Vilalta R, Sivasubramaniam A (2003) Critical event prediction for proactive management in large-scale computer clusters. In: Proc of the ACM SIGKDD, international conference on knowledge discovery and data mining, 2003, pp 426–435
Liang Y, Zhang Y, Jette M, Sivasubramaniam A, Sahoo RK (2006) Blue gene/l failure analysis and prediction models. In: Proc of the international conference on dependable systems and networks (DSN), 2006
Adam JO, Larry R, Ramendra KS (2006) Cooperative checkpointing: a robust approach to large-scale systems reliability. In: Proc of the 20th annual international conference on supercomputing, 2006
Michael L, Todd T, Jim B, Miron L (1997) Checkpoint and migration of UNIX processes in the condor distributed processing system. University of Wisconsin-Madison Computer Sciences Technical Report 1346
Stellner G (1996) Cocheck: checkpointing and process migration for MPI. In: Proc of the international parallel processing symposium, 1996
Sudakov OO, Meshcheriakov IS, Boyko YV (2007) CHPOX: transparent checkpointing system for Linux clusters. In: Intelligent data acquisition and advanced computing systems: technology and applications (IDAACS 2007), 2007, pp 159–164
Maoz T, Barak A, Amar L (2008) Combining virtual machine migration with process migration for HPC on multi-clusters and grids. In: IEEE Cluster, Tsukuba, 2008

Download references

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Haijun Cao, Hai Jin, Song Wu & Xuanhua Shi
Communication Technology Lab, Intel China Research Center, Beijing, 100080, China
Xiaoxin Wu

Authors

Haijun Cao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Song Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanhua Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, H., Jin, H., Wu, X. et al. DAGMap: efficient and dependable scheduling of DAG workflow job in Grid. J Supercomput 51, 201–223 (2010). https://doi.org/10.1007/s11227-009-0284-7

Download citation

Received: 07 December 2008
Accepted: 04 March 2009
Published: 06 May 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s11227-009-0284-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DAGMap: efficient and dependable scheduling of DAG workflow job in Grid

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation