Skip to main content
Log in

A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Next-generation scientific applications feature complex workflows comprised of many computing modules with intricate inter-module dependencies. Supporting such scientific workflows in wide-area networks especially Grids and optimizing their performance are crucial to the success of collaborative scientific discovery. We develop a Scientific Workflow Automation and Management Platform (SWAMP), which enables scientists to conveniently assemble, execute, monitor, control, and steer computing workflows in distributed environments via a unified web-based user interface. The SWAMP architecture is built entirely on a seamless composition of web services: the functionalities of its own are provided and its interactions with other tools or systems are enabled through web services for easy access over standard Internet protocols while being independent of different platforms and programming languages. SWAMP also incorporates a class of efficient workflow mapping schemes to achieve optimal end-to-end performance based on rigorous performance modeling and algorithm design. The performance superiority of SWAMP over existing workflow mapping schemes is justified by extensive simulations, and the system efficacy is illustrated by large-scale experiments on real-life scientific workflows for climate modeling through effective system implementation, deployment, and testing on the Open Science Grid.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Afrati, F., Papadimitriou, C., Papageorgiou, G.: Scheduling \(\textnormal{DAG}\)s to minimize time and communication. In: Proc. of the 3rd Aegean Workshop on Computing: VLSI Algorithms and Architectures, pp. 134–138. Springer, Berlin (1988)

    Google Scholar 

  2. Agarwalla, B., Ahmed, N., Hilley, D., Ramachandran, U.: Streamline: a scheduling heuristic for streaming application on the Grid. In: Proc. of the 13th Multimedia Comp. and Net. Conf. San Jose, CA (2006)

  3. Ahmed, I., Kwok, Y.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9, 872–892 (1998)

    Article  Google Scholar 

  4. Annie, S., Yu, H., Jin, S., Lin, K.-C.: An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 15, 824–834 (2004)

    Article  Google Scholar 

  5. Bandwidth Test Controller: http://www.internet2.edu/performance/bwctl/. Accessed 1 Aug 2012

  6. Boeres, C., Filho, J., Rebello, V.: A cluster-based strategy for scheduling task on heterogeneous processors. In: Proc. of 16th Symp. on Comp. Arch. and HPC, pp. 214–221 (2004)

  7. Bozdag, D., Catalyurek, U., Ozguner, F.: A task duplication based bottom-up scheduling algorithm for heterogeneous environments. In: Proc. of the 20th IPDPS (2006)

  8. Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. JPDC 68(6), 790–808 (2008)

    MATH  Google Scholar 

  9. Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming scientific and distributed workflow with triana services. Concurrency and Computation: Practice and Experience, Special Issue: Workflow in Grid Systems 18(10), 1021–1037 (2006). http://www.trianacode.org

    Article  Google Scholar 

  10. Climate and Carbon Research Institute: http://www.ccs.ornl.gov/CCR. Accessed 1 Aug 2012

  11. Cordella, L., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proc. of the 3rd Int. Workshop on Graph-based Representations, Italy (2001)

  12. DAGMan: http://www.cs.wisc.edu/condor/dagman. Accessed 1 Aug 2012

  13. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proc. of 6th Symp. on Operating System Design and Implementation, San Francisco, CA (2004)

  14. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. J. of Future Generation Comp. Sys. 25(5), 528–540 (2009)

    Article  Google Scholar 

  15. Deelman, E., Singh, G., Su, M., Blythe, J., Gil, A., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13, 219–237 (2005)

    Google Scholar 

  16. Dhodhi, M., Ahmad, I., Yatama, A.: An integrated technique for task matching and scheduling onto distributed heterogeneous computing systems. JPDC 62, 1338–1361 (2002)

    MATH  Google Scholar 

  17. Distributed computing projects: http://en.wikipedia.org/wiki/List_of_distributed_computing_projects. Accessed 1 Aug 2012

  18. Dobber, M., van der Mei, R., Koole, G.: A prediction method for job runtimes on shared processors: survey, statistical analysis and new avenues. Perform. Eval. 64(7–8), 755–781 (2007)

    Article  Google Scholar 

  19. Earth Simulator Center: http://www.jamstec.go.jp/esc. Accessed 1 Aug 2012

  20. Earth System Grid (ESG): http://www.earthsystemgrid.org. Accessed 1 Aug 2012

  21. Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.-L., Villazon, A., Wieczorek, M.: ASKALON: a Grid application development and computing environment. In: Proc. of the 6th IEEE/ACM Int. Workshop on Grid Comp., pp. 122–131 (2005)

  22. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  23. Gates, M., Warshavsky, A.: Iperf version 2.0.3. http://iperf.sourceforge.net. Accessed 1 Aug 2012

  24. Gerasoulis, A., Yang, T.: A comparison of clustering heuristics for scheduling DAGs on multiprocessors. JPDC 16(4), 276–291 (1992)

    MathSciNet  MATH  Google Scholar 

  25. Globus Replica Location Service: http://www.globus.org/toolkit/data/rls/. Accessed 1 Aug 2012

  26. GridFTP: http://www.globus.org/grid_software/data/gridftp.php. Accessed 1 Aug 2012

  27. Gu, Y., Wu, Q.: Maximizing workflow throughput for streaming applications in distributed environments. In: Proc. of the 19th Int. Conf. on Comp. Comm. and Net., Zurich, Switzerland (2010)

  28. Gu, Y., Wu, Q.: Optimizing distributed computing workflows in heterogeneous network environments. In: Proc. of the 11th Int. Conf. on Distributed Computing and Networking, Kolkata, India, 3–6 Jan 2010

  29. Gu, Y., Wu, Q., Benoit, A., Robert, Y.: Optimizing end-to-end performance of distributed applications with linear computing pipelines. In: Proc. of the 15th Int. Conf. on Para. and Dist. Sys., Shenzhen, China, 8–11 Dec 2009

  30. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34, 729–732 (2006). http://www.taverna.org.uk

    Article  Google Scholar 

  31. Ilavarasan, E., Thambidurai, P.: Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comp. Sci. 3(2), 94–103 (2007)

    Article  Google Scholar 

  32. Johnston, W.: Computational and data Grids in large-scale science and engineering. J. of Future Generation Comp. Sys. 18(8), 1085–1100 (2002)

    Article  MATH  Google Scholar 

  33. Kacsuk, P., Farkas, Z., Sipos, G., Toth, A., Hermann, G.: Workflow-level parameter study management in multi-Grid environments by the P-GRADE Grid portal. In: Int. Workshop on Grid Computing Enviornments (2006)

  34. Kwok, Y., Ahmad, I.: Dynamic critical-path scheduling: An effective technique for allocating task graph to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)

    Article  Google Scholar 

  35. Kwok, Y., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)

    Article  Google Scholar 

  36. Large Hadron Collider (LHC): http://lhc.web.cern.ch/lhc

  37. Laszewski, G., Hategan, M.: Workflow concepts of the \(\textnormal{Java CoG Kit}\). J. Grid Computing 3(3–4), 239–258 (2005)

    Article  Google Scholar 

  38. Lewis, T., EI-Rewini, H.: Introduction to Parallel Computing. Prentice Hall, New York (1992)

    MATH  Google Scholar 

  39. Load Sharing Facility: http://www.platform.com/workload-management/high-performance-computing/lp. Accessed 1 Aug 2012

  40. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the \(\textnormal{K}\)epler system. Concurrency and Computation: Practice and Experience 18(10), 1039–1605 (2006)

    Article  Google Scholar 

  41. Ma, T., Buyya, R.: Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global Grids. In: Proc. of the 17th Int. Symp. on Computer Architecture on HPC, pp. 251–258 (2005)

  42. McCreary, C., Khan, A., Thompson, J., McArdle, M.: A comparison of heuristics for scheduling \(\textnormal{DAG}\)s on multiprocessors. In: Proc. of the 8th ISPP, pp. 446–451 (1994)

  43. McDermott, W., Maluf, D., Gawdiak, Y., Tran, P.: Airport simulations using distributed computational resources. J. Defense Soft. Eng. 16(6), 7–11 (2003)

    Google Scholar 

  44. Messmer, B.: Efficient graph matching algorithms for preprocessed model graphs. PhD thesis, Institute of Computer Science and Applied Mathematics, University of Bern (1996)

  45. Monitoring and Discovery System (MDS): http://www.globus.org/toolkit/mds/. Accessed 1 Aug 2012

  46. Network weather service: http://nws.cs.ucsb.edu. Accessed 1 Aug 2012

  47. One-Way Active Measurement Protocol: http://www.internet2.edu/performance/owamp/. Accessed 1 Aug 2012

  48. Open Science Grid: http://www.opensciencegrid.org. Accessed 1 Aug 2012

  49. OSCARS: On-demand Secure Circuits and Advance Reservation System: http://www.es.net/oscars. Accessed 1 Aug 2012

  50. OSG Resource and Site Validation: http://vdt.cs.wisc.edu/components/osg-rsv.html. Accessed 1 Aug 2012

  51. Performance Inspector: http://perfinsp.sourceforge.net. Accessed 1 Aug 2012

  52. perfSONAR: http://www.perfsonar.net/. Accessed 1 Aug 2012

  53. Portable Batch System: http://www.pbsworks.com/. Accessed 1 Aug 2012

  54. Rahman, M., Venugopal, S., Buyya, R.: A dynamic critical path algorithm for scheduling scientific workflow applications on global Grids. In: Proc. of the 3rd IEEE Int. Conf. on e-Sci. and Grid Comp., pp. 35–42 (2007)

  55. Ranaweera, A., Agrawal, D.: A task duplication based algorithm for heterogeneous systems. In: Proc. of IPDPS, pp. 445–450 (2000)

  56. Rao, N.S.V.: Vector space methods for sensor fusion problems. Opt. Eng. 37(2), 499–504 (1998)

    Article  Google Scholar 

  57. Reliable File Transfer: http://www-unix.globus.org/toolkit/docs/3.2/rft/index.html. Accessed 1 Aug 2012

  58. Sekhar, A., Manoj, B., Murthy, C.: A state-space search approach for optimizing reliability and cost of execution in distributed sensor networks. In: Proc. of Int. Workshop on Dist. Comp., pp. 63–74 (2005)

  59. Shroff, P., Watson, D., Flann, N., Freund, R.: Genetic simulated annealing for scheduling data-dependent tasks in heterogeneous environments. In: Proc. of Heter. Comp. Workshop, pp. 98–104 (1996)

  60. Singh, M., Vouk, M.: Scientific workflows: scientific computing meets transactional workflows. In: Proc. of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions, pp. 28–34. Univ. Georgia, Athens, GA (1996)

    Google Scholar 

  61. Spallation Neutron Source: http://neutrons.ornl.gov, http://www.sns.gov. Accessed 1 Aug 2012

  62. Storage Resource Broker (SRB): http://www.sdsc.edu/srb/index.php/Main_Page. Accessed 1 Aug 2012

  63. Storage Resource Management (SRM): https://sdm.lbl.gov/srm-wg/. Accessed 1 Aug 2012

  64. Stork: http://www.cct.lsu.edu/~kosar/stork/index.php. Accessed 1 Aug 2012

  65. Swift: http://www.ci.uchicago.edu/swift/main/. Accessed 1 Aug 2012

  66. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin Heidelberg New York (2007)

    Google Scholar 

  67. TeraPaths: https://www.racf.bnl.gov/terapaths. Accessed 1 Aug 2012

  68. The Whetstone Benchmark: http://www.roylongbottom.org.uk/whetstone.htm. Accessed 1 Aug 2012

  69. Topcuoglu, H., Hariri, S., Wu, M.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE TPDS 13(3), 260–274 (2002)

    Google Scholar 

  70. Wang, L., Siege, H., Roychowdhury, V., Maciejewski, A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. JPDC 47, 8–22 (1997)

    Google Scholar 

  71. Wassermann, B., Emmerich, W., Butchart, B., Cameron, N., Chen, L., Patel, J.: Workflows for e-Science: Scientific Workflows for Grids, Chapter Sedna: A BPEL-based Environment for Visual Scientific Workflow Modeling, pp. 427–448. Springer, London (2007)

    Google Scholar 

  72. Worldwide LHC Computing Grid (WLCG): http://lcg.web.cern.ch/LCG

  73. Wu, Q., Gu, Y.: Optimizing end-to-end performance of data-intensive computing pipelines in heterogeneous network environments. J. Parallel Distrib. Comput. 71(2), 254–265 (2011)

    Article  MATH  Google Scholar 

  74. Wu, Q., Gu, Y., Liao, Y., Lu, X., Lin, Y., Rao, N.: Latency modeling and minimization for large-scale scientific workflows in distributed network environments. In: The 44th Annual Simulation Symposium (ANSS11), Part of the 2011 Spring Simulation Multiconference (SpringSim11), Boston, MA, 4–7 Apr 2011

  75. Wu, Q., Rao, N.S.V.: On transport daemons for small collaborative applications over wide-area networks. In: Proc. of the 24th IEEE Int. Performance Computing and Communications Conf., pp. 159–166, Phoenix, AZ, 7–9 Apr 2005

  76. Wu, Q., Zhu, M., Lu, X., Brown, P., Lin, Y., Gu, Y., Cao, F., Reuter, M.: Automation and management of scientific workflows in distributed network environments. In: Proc. of the 6th Int. Workshop on Sys. Man. Tech., Proc., and Serv., Atlanta, GA, 19 Apr 2010

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qishi Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Q., Zhu, M., Gu, Y. et al. A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids. J Grid Computing 10, 367–393 (2012). https://doi.org/10.1007/s10723-012-9222-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-012-9222-7

Keywords

Navigation