Skip to main content

Advertisement

Log in

Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A distributed scientific workflow mapping algorithm for maximized reliability under certain end-to-end delay (EED) bound is proposed. It is studied in a heterogeneous distributed computing environment, where computing node and communication link failures are inevitable. The mapping decision and the stored table information is distributed among various nodes in order to achieve scalability and robustness, which are especially important for large-scale distributed systems. This Distributed Reliability Maximization workflow mapping algorithm under End-to-end Delay constraint (dis-DRMED) considers both the maximum reliability and the minimum EED objectives in a two-step procedure. In the first step, a mapping algorithm combining iterative Critical Path search and Layer-based priority assigning techniques (CPL) is adopted to minimize the EED by focusing on the optimal allocation of tasks on the critical path. In the second step, tasks on noncritical paths are remapped to improve the overall execution reliability. Simulation results under various system setups demonstrated that dis-DRMED achieved considerably higher reliability values under the same EED constraint compared with some representative workflow mapping algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 4
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Partial EED of each individual task u i is the end-to-end delay of a path from the starting task u 1 to u i .

References

  1. Agarwalla B, Ahmed N, Hilley D, Ramachandran U (2007) Streamline: a scheduling heuristic for streaming application on the grid. In: The 13th multimedia computing and networking conf, pp 69–85

    Google Scholar 

  2. Benoit A, Hakem M, Robert Y (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: IEEE international symposium on parallel and distributed processing, pp 1–8

    Google Scholar 

  3. Buyya R, Murshed M (2002) GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurr Comput 14(13–15):1175–1220

    Article  MATH  Google Scholar 

  4. Calheiros RN, Ranjan R, Belglazov A, De Rose CAF, Buyya R (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50

    Article  Google Scholar 

  5. Chen W, Zhang J (2009) An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements. IEEE Trans Syst Man Cybern, Part C, Appl Rev 39(1):29–43

    Article  Google Scholar 

  6. Cirou B, Jeannot E (2001) Triplet: a clustering scheduling algorithm for heterogeneous systems. In: IEEE ICPP international workshop on Metacomputing Systems and Applications (MSA ’2001), pp 231–236

    Google Scholar 

  7. Condor. http://www.cs.wisc.edu/condor

  8. Dabrowski C (2009) Reliability in grid computing systems. Concurr Comput 21(8):927–959

    Article  Google Scholar 

  9. DAGMan. http://www.cs.wisc.edu/condor/dagman

  10. DOE UltraScienceNet. http://www.csm.ornl.gov/ultranet

  11. Dogan A, Ozguner F (2000) Reliable matching and scheduling of precedence-constrained tasks in heterogeneous distributed computing. In: Proc. of the 29th international conference on parallel processing, pp 307–314

    Google Scholar 

  12. Dogan A, Ozguner F (2002) Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):308–323

    Article  Google Scholar 

  13. Dogan A, Ozguner F (2005) Bi-objective scheduling algorithms for execution time-reliability trade-off in heterogeneous computing systems. Comput J 48(3):300–314

    Article  Google Scholar 

  14. Dongarra J, Jeannot E, Saule E, Shi Z (2007) Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems. In: Proc. of the nineteenth annual ACM symposium on parallel algorithms and architectures (SPAA ’07). ACM, New York, pp 280–288

    Chapter  Google Scholar 

  15. ESnet. http://www.es.net/

  16. Globus. http://www.globus.org

  17. Hakem M, Butelle F (2006) A Bi-objective algorithm for scheduling parallel applications on heterogeneous systems subject to failures. In: Renpar 17, canet en roussillon, pp 280–288

    Google Scholar 

  18. Hakem M, Butelle F (2007) Reliability and scheduling on systems subject to failures. In: Proceedings of the 2007 International Conference on Parallel Processing (ICPP ’07). IEEE Comput Soc, Washington, p 38

    Chapter  Google Scholar 

  19. Large Hadron Collider. http://en.wikipedia.org/wiki/Large_Hadron_Collider

  20. Lewis EE (1987) Introduction to reliability engineering. Wiley, New York

    Google Scholar 

  21. Ma T, Buyya R (2005) Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids. In: Proc of the 17th int symp on computer architecture on high performance computing, pp 251–258

    Google Scholar 

  22. Plank JS, Elwasif WR (1998) Experimental assessment of workstation failures and their impact on checkpointing systems. In: Intl symp fault-tolerant computing, pp 48–57

    Google Scholar 

  23. Rahman M, Ranjan R, Buyya R (2009) A distributed heuristic for decentralized workflow scheduling. In: Global grids, 10th IEEE/ACM international conference on grid computing, pp 163–164

    Google Scholar 

  24. Sih G, Lee E (1993) A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans Parallel Distrib Syst 4(2):175–187

    Article  Google Scholar 

  25. Singh G, Kesselman C, Deelman E (2006) Optimizing grid-based workflow execution. J Grid Comput 3:201–219

    Article  Google Scholar 

  26. Sonmez O, Yigitbasi N, Abrishami S, Iosup A, Epema D (2010) Performance analysis of dynamic workflow scheduling in multicluster grids. In: Proceedings of the 19th ACM international symposium on High Performance Distributed Computing (HPDC ’10), pp 49–60

    Chapter  Google Scholar 

  27. Topcuoglu S, Wu M (1999) Task scheduling algorithms for heterogeneous processors. In: 8th IEEE Heterogeneous Computing Workshop (HCW ’99), pp 3–14

    Chapter  Google Scholar 

  28. Wang L, Kunze M, Tao J (2008) Performance evaluation of virtual machine-based grid workflow system. Concurr Comput 20(15):1759–1771

    Article  Google Scholar 

  29. Wang L, Chen D, Huang F (2011) Virtual workflow system for distributed collaborative scientific applications on grid. Comput Electr Eng 37(3):300–310

    Article  Google Scholar 

  30. Wang X, Yeo CS, Buyya R, Sua J (2011) Optimizing makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm. Future Gener Comput Syst 27(8):1124–1134

    Article  Google Scholar 

  31. Wu Q, Gu Y (2008) Supporting distributed application workflows in heterogeneous computing environments. In: Proc of 14th International Conference on Parallel and Distributed Systems (ICPADS ’08), vol 47, pp 8–22

    Google Scholar 

  32. Wu Q, Gu Y (2010) Distributed workflow mapping algorithm for minimum end-to-end delay under fault-tolerance constraint. In: IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), pp 508–515

    Google Scholar 

  33. Wu Q, Gu Y, Zhu M (2008) Optimizing network performance of computing pipelines in distributed environments. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’2008), pp 1–8

    Chapter  Google Scholar 

  34. Wu Q, Zhu M, Lu X, Brown P, Lin Y, Gu Y, Cao F, Reuter M (2010) Automation and management of scientific workflows in distributed network environments. In: Proc of the 6th int workshop on sys man tech, proc, and serv, pp 1–8

    Google Scholar 

  35. Wu Q, Gu Y, Lin Y, Rao NSV (2011) Latency modeling and minimization for large-scale scientific workflows in distributed network environments. In: Proc. of the 44th Annual Simulation Symposium (ANSS ’2011), pp 205–212

    Google Scholar 

  36. Xing L, Shrest A (2006) Algorithms for minimal-length schedules. In: Computer and job-shop scheduling theory, vol 2, pp 473–479

    Google Scholar 

  37. Yang X, Bruin RP, Dove MT (2010) Developing an end-to-end scientific workflow. Comput Sci Eng 12(3):52–61

    Article  Google Scholar 

  38. Yin PY, Yu SS, Wang PP, Wang YT (2007) Multi-objective task allocation in distributed computing systems by hybrid particle swarm optimization. Appl Math Comput 184:407–420

    Article  MathSciNet  MATH  Google Scholar 

  39. Zhu M, Wu Q, Rao NSV, Iyengar SS (2004) Adaptive visualization pipeline decomposition and mapping onto computer networks. In: Proc. of the IEEE internatioal conference on image and graphics, pp 402–405

    Google Scholar 

  40. Zhu M, Cao F, Mi J (2011) A hybrid mapping and scheduling algorithm for distributed workflow applications. In: A heterogeneous computing environment, intelligent distributed computing V, 5th international symposium on Intelligent Distributed Computing (IDC 2011). Springer, Berlin, pp 117–127

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Cao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, F., Zhu, M.M. Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint. J Supercomput 66, 1462–1488 (2013). https://doi.org/10.1007/s11227-013-0938-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-0938-3

Keywords