Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint

Cao, Fei; Zhu, Michelle M.

doi:10.1007/s11227-013-0938-3

Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint

Published: 20 April 2013

Volume 66, pages 1462–1488, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Fei Cao¹ &
Michelle M. Zhu¹

352 Accesses
13 Citations
Explore all metrics

Abstract

A distributed scientific workflow mapping algorithm for maximized reliability under certain end-to-end delay (EED) bound is proposed. It is studied in a heterogeneous distributed computing environment, where computing node and communication link failures are inevitable. The mapping decision and the stored table information is distributed among various nodes in order to achieve scalability and robustness, which are especially important for large-scale distributed systems. This Distributed Reliability Maximization workflow mapping algorithm under End-to-end Delay constraint (dis-DRMED) considers both the maximum reliability and the minimum EED objectives in a two-step procedure. In the first step, a mapping algorithm combining iterative Critical Path search and Layer-based priority assigning techniques (CPL) is adopted to minimize the EED by focusing on the optimal allocation of tasks on the critical path. In the second step, tasks on noncritical paths are remapped to improve the overall execution reliability. Simulation results under various system setups demonstrated that dis-DRMED achieved considerably higher reliability values under the same EED constraint compared with some representative workflow mapping algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reliability Based Workflow Scheduling on Cloud Computing with Deadline Constraint

Article 16 March 2023

Cost-driven workflow scheduling on the cloud with deadline and reliability constraints

Article 05 July 2019

A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

Article 25 August 2020

Notes

Partial EED of each individual task u _i is the end-to-end delay of a path from the starting task u ₁ to u _i.

References

Agarwalla B, Ahmed N, Hilley D, Ramachandran U (2007) Streamline: a scheduling heuristic for streaming application on the grid. In: The 13th multimedia computing and networking conf, pp 69–85
Google Scholar
Benoit A, Hakem M, Robert Y (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: IEEE international symposium on parallel and distributed processing, pp 1–8
Google Scholar
Buyya R, Murshed M (2002) GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurr Comput 14(13–15):1175–1220
Article MATH Google Scholar
Calheiros RN, Ranjan R, Belglazov A, De Rose CAF, Buyya R (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50
Article Google Scholar
Chen W, Zhang J (2009) An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements. IEEE Trans Syst Man Cybern, Part C, Appl Rev 39(1):29–43
Article Google Scholar
Cirou B, Jeannot E (2001) Triplet: a clustering scheduling algorithm for heterogeneous systems. In: IEEE ICPP international workshop on Metacomputing Systems and Applications (MSA ’2001), pp 231–236
Google Scholar
Condor. http://www.cs.wisc.edu/condor
Dabrowski C (2009) Reliability in grid computing systems. Concurr Comput 21(8):927–959
Article Google Scholar
DAGMan. http://www.cs.wisc.edu/condor/dagman
DOE UltraScienceNet. http://www.csm.ornl.gov/ultranet
Dogan A, Ozguner F (2000) Reliable matching and scheduling of precedence-constrained tasks in heterogeneous distributed computing. In: Proc. of the 29th international conference on parallel processing, pp 307–314
Google Scholar
Dogan A, Ozguner F (2002) Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):308–323
Article Google Scholar
Dogan A, Ozguner F (2005) Bi-objective scheduling algorithms for execution time-reliability trade-off in heterogeneous computing systems. Comput J 48(3):300–314
Article Google Scholar
Dongarra J, Jeannot E, Saule E, Shi Z (2007) Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems. In: Proc. of the nineteenth annual ACM symposium on parallel algorithms and architectures (SPAA ’07). ACM, New York, pp 280–288
Chapter Google Scholar
ESnet. http://www.es.net/
Globus. http://www.globus.org
Hakem M, Butelle F (2006) A Bi-objective algorithm for scheduling parallel applications on heterogeneous systems subject to failures. In: Renpar 17, canet en roussillon, pp 280–288
Google Scholar
Hakem M, Butelle F (2007) Reliability and scheduling on systems subject to failures. In: Proceedings of the 2007 International Conference on Parallel Processing (ICPP ’07). IEEE Comput Soc, Washington, p 38
Chapter Google Scholar
Large Hadron Collider. http://en.wikipedia.org/wiki/Large_Hadron_Collider
Lewis EE (1987) Introduction to reliability engineering. Wiley, New York
Google Scholar
Ma T, Buyya R (2005) Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids. In: Proc of the 17th int symp on computer architecture on high performance computing, pp 251–258
Google Scholar
Plank JS, Elwasif WR (1998) Experimental assessment of workstation failures and their impact on checkpointing systems. In: Intl symp fault-tolerant computing, pp 48–57
Google Scholar
Rahman M, Ranjan R, Buyya R (2009) A distributed heuristic for decentralized workflow scheduling. In: Global grids, 10th IEEE/ACM international conference on grid computing, pp 163–164
Google Scholar
Sih G, Lee E (1993) A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans Parallel Distrib Syst 4(2):175–187
Article Google Scholar
Singh G, Kesselman C, Deelman E (2006) Optimizing grid-based workflow execution. J Grid Comput 3:201–219
Article Google Scholar
Sonmez O, Yigitbasi N, Abrishami S, Iosup A, Epema D (2010) Performance analysis of dynamic workflow scheduling in multicluster grids. In: Proceedings of the 19th ACM international symposium on High Performance Distributed Computing (HPDC ’10), pp 49–60
Chapter Google Scholar
Topcuoglu S, Wu M (1999) Task scheduling algorithms for heterogeneous processors. In: 8th IEEE Heterogeneous Computing Workshop (HCW ’99), pp 3–14
Chapter Google Scholar
Wang L, Kunze M, Tao J (2008) Performance evaluation of virtual machine-based grid workflow system. Concurr Comput 20(15):1759–1771
Article Google Scholar
Wang L, Chen D, Huang F (2011) Virtual workflow system for distributed collaborative scientific applications on grid. Comput Electr Eng 37(3):300–310
Article Google Scholar
Wang X, Yeo CS, Buyya R, Sua J (2011) Optimizing makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm. Future Gener Comput Syst 27(8):1124–1134
Article Google Scholar
Wu Q, Gu Y (2008) Supporting distributed application workflows in heterogeneous computing environments. In: Proc of 14th International Conference on Parallel and Distributed Systems (ICPADS ’08), vol 47, pp 8–22
Google Scholar
Wu Q, Gu Y (2010) Distributed workflow mapping algorithm for minimum end-to-end delay under fault-tolerance constraint. In: IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), pp 508–515
Google Scholar
Wu Q, Gu Y, Zhu M (2008) Optimizing network performance of computing pipelines in distributed environments. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’2008), pp 1–8
Chapter Google Scholar
Wu Q, Zhu M, Lu X, Brown P, Lin Y, Gu Y, Cao F, Reuter M (2010) Automation and management of scientific workflows in distributed network environments. In: Proc of the 6th int workshop on sys man tech, proc, and serv, pp 1–8
Google Scholar
Wu Q, Gu Y, Lin Y, Rao NSV (2011) Latency modeling and minimization for large-scale scientific workflows in distributed network environments. In: Proc. of the 44th Annual Simulation Symposium (ANSS ’2011), pp 205–212
Google Scholar
Xing L, Shrest A (2006) Algorithms for minimal-length schedules. In: Computer and job-shop scheduling theory, vol 2, pp 473–479
Google Scholar
Yang X, Bruin RP, Dove MT (2010) Developing an end-to-end scientific workflow. Comput Sci Eng 12(3):52–61
Article Google Scholar
Yin PY, Yu SS, Wang PP, Wang YT (2007) Multi-objective task allocation in distributed computing systems by hybrid particle swarm optimization. Appl Math Comput 184:407–420
Article MathSciNet MATH Google Scholar
Zhu M, Wu Q, Rao NSV, Iyengar SS (2004) Adaptive visualization pipeline decomposition and mapping onto computer networks. In: Proc. of the IEEE internatioal conference on image and graphics, pp 402–405
Google Scholar
Zhu M, Cao F, Mi J (2011) A hybrid mapping and scheduling algorithm for distributed workflow applications. In: A heterogeneous computing environment, intelligent distributed computing V, 5th international symposium on Intelligent Distributed Computing (IDC 2011). Springer, Berlin, pp 117–127
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Southern Illinois University, Carbondale, IL, 62901, USA
Fei Cao & Michelle M. Zhu

Authors

Fei Cao
View author publications
You can also search for this author inPubMed Google Scholar
Michelle M. Zhu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Fei Cao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, F., Zhu, M.M. Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint. J Supercomput 66, 1462–1488 (2013). https://doi.org/10.1007/s11227-013-0938-3

Download citation

Published: 20 April 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s11227-013-0938-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reliability Based Workflow Scheduling on Cloud Computing with Deadline Constraint

Cost-driven workflow scheduling on the cloud with deadline and reliability constraints

A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now