Abstract
Next-generation scientific applications feature complex workflows comprised of many computing modules with intricate inter-module dependencies. Supporting such scientific workflows in wide-area networks especially Grids and optimizing their performance are crucial to the success of collaborative scientific discovery. We develop a Scientific Workflow Automation and Management Platform (SWAMP), which enables scientists to conveniently assemble, execute, monitor, control, and steer computing workflows in distributed environments via a unified web-based user interface. The SWAMP architecture is built entirely on a seamless composition of web services: the functionalities of its own are provided and its interactions with other tools or systems are enabled through web services for easy access over standard Internet protocols while being independent of different platforms and programming languages. SWAMP also incorporates a class of efficient workflow mapping schemes to achieve optimal end-to-end performance based on rigorous performance modeling and algorithm design. The performance superiority of SWAMP over existing workflow mapping schemes is justified by extensive simulations, and the system efficacy is illustrated by large-scale experiments on real-life scientific workflows for climate modeling through effective system implementation, deployment, and testing on the Open Science Grid.
Similar content being viewed by others
References
Afrati, F., Papadimitriou, C., Papageorgiou, G.: Scheduling \(\textnormal{DAG}\)s to minimize time and communication. In: Proc. of the 3rd Aegean Workshop on Computing: VLSI Algorithms and Architectures, pp. 134–138. Springer, Berlin (1988)
Agarwalla, B., Ahmed, N., Hilley, D., Ramachandran, U.: Streamline: a scheduling heuristic for streaming application on the Grid. In: Proc. of the 13th Multimedia Comp. and Net. Conf. San Jose, CA (2006)
Ahmed, I., Kwok, Y.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9, 872–892 (1998)
Annie, S., Yu, H., Jin, S., Lin, K.-C.: An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 15, 824–834 (2004)
Bandwidth Test Controller: http://www.internet2.edu/performance/bwctl/. Accessed 1 Aug 2012
Boeres, C., Filho, J., Rebello, V.: A cluster-based strategy for scheduling task on heterogeneous processors. In: Proc. of 16th Symp. on Comp. Arch. and HPC, pp. 214–221 (2004)
Bozdag, D., Catalyurek, U., Ozguner, F.: A task duplication based bottom-up scheduling algorithm for heterogeneous environments. In: Proc. of the 20th IPDPS (2006)
Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. JPDC 68(6), 790–808 (2008)
Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming scientific and distributed workflow with triana services. Concurrency and Computation: Practice and Experience, Special Issue: Workflow in Grid Systems 18(10), 1021–1037 (2006). http://www.trianacode.org
Climate and Carbon Research Institute: http://www.ccs.ornl.gov/CCR. Accessed 1 Aug 2012
Cordella, L., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proc. of the 3rd Int. Workshop on Graph-based Representations, Italy (2001)
DAGMan: http://www.cs.wisc.edu/condor/dagman. Accessed 1 Aug 2012
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proc. of 6th Symp. on Operating System Design and Implementation, San Francisco, CA (2004)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. J. of Future Generation Comp. Sys. 25(5), 528–540 (2009)
Deelman, E., Singh, G., Su, M., Blythe, J., Gil, A., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13, 219–237 (2005)
Dhodhi, M., Ahmad, I., Yatama, A.: An integrated technique for task matching and scheduling onto distributed heterogeneous computing systems. JPDC 62, 1338–1361 (2002)
Distributed computing projects: http://en.wikipedia.org/wiki/List_of_distributed_computing_projects. Accessed 1 Aug 2012
Dobber, M., van der Mei, R., Koole, G.: A prediction method for job runtimes on shared processors: survey, statistical analysis and new avenues. Perform. Eval. 64(7–8), 755–781 (2007)
Earth Simulator Center: http://www.jamstec.go.jp/esc. Accessed 1 Aug 2012
Earth System Grid (ESG): http://www.earthsystemgrid.org. Accessed 1 Aug 2012
Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.-L., Villazon, A., Wieczorek, M.: ASKALON: a Grid application development and computing environment. In: Proc. of the 6th IEEE/ACM Int. Workshop on Grid Comp., pp. 122–131 (2005)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, San Francisco (1979)
Gates, M., Warshavsky, A.: Iperf version 2.0.3. http://iperf.sourceforge.net. Accessed 1 Aug 2012
Gerasoulis, A., Yang, T.: A comparison of clustering heuristics for scheduling DAGs on multiprocessors. JPDC 16(4), 276–291 (1992)
Globus Replica Location Service: http://www.globus.org/toolkit/data/rls/. Accessed 1 Aug 2012
GridFTP: http://www.globus.org/grid_software/data/gridftp.php. Accessed 1 Aug 2012
Gu, Y., Wu, Q.: Maximizing workflow throughput for streaming applications in distributed environments. In: Proc. of the 19th Int. Conf. on Comp. Comm. and Net., Zurich, Switzerland (2010)
Gu, Y., Wu, Q.: Optimizing distributed computing workflows in heterogeneous network environments. In: Proc. of the 11th Int. Conf. on Distributed Computing and Networking, Kolkata, India, 3–6 Jan 2010
Gu, Y., Wu, Q., Benoit, A., Robert, Y.: Optimizing end-to-end performance of distributed applications with linear computing pipelines. In: Proc. of the 15th Int. Conf. on Para. and Dist. Sys., Shenzhen, China, 8–11 Dec 2009
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34, 729–732 (2006). http://www.taverna.org.uk
Ilavarasan, E., Thambidurai, P.: Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comp. Sci. 3(2), 94–103 (2007)
Johnston, W.: Computational and data Grids in large-scale science and engineering. J. of Future Generation Comp. Sys. 18(8), 1085–1100 (2002)
Kacsuk, P., Farkas, Z., Sipos, G., Toth, A., Hermann, G.: Workflow-level parameter study management in multi-Grid environments by the P-GRADE Grid portal. In: Int. Workshop on Grid Computing Enviornments (2006)
Kwok, Y., Ahmad, I.: Dynamic critical-path scheduling: An effective technique for allocating task graph to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)
Kwok, Y., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)
Large Hadron Collider (LHC): http://lhc.web.cern.ch/lhc
Laszewski, G., Hategan, M.: Workflow concepts of the \(\textnormal{Java CoG Kit}\). J. Grid Computing 3(3–4), 239–258 (2005)
Lewis, T., EI-Rewini, H.: Introduction to Parallel Computing. Prentice Hall, New York (1992)
Load Sharing Facility: http://www.platform.com/workload-management/high-performance-computing/lp. Accessed 1 Aug 2012
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the \(\textnormal{K}\)epler system. Concurrency and Computation: Practice and Experience 18(10), 1039–1605 (2006)
Ma, T., Buyya, R.: Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global Grids. In: Proc. of the 17th Int. Symp. on Computer Architecture on HPC, pp. 251–258 (2005)
McCreary, C., Khan, A., Thompson, J., McArdle, M.: A comparison of heuristics for scheduling \(\textnormal{DAG}\)s on multiprocessors. In: Proc. of the 8th ISPP, pp. 446–451 (1994)
McDermott, W., Maluf, D., Gawdiak, Y., Tran, P.: Airport simulations using distributed computational resources. J. Defense Soft. Eng. 16(6), 7–11 (2003)
Messmer, B.: Efficient graph matching algorithms for preprocessed model graphs. PhD thesis, Institute of Computer Science and Applied Mathematics, University of Bern (1996)
Monitoring and Discovery System (MDS): http://www.globus.org/toolkit/mds/. Accessed 1 Aug 2012
Network weather service: http://nws.cs.ucsb.edu. Accessed 1 Aug 2012
One-Way Active Measurement Protocol: http://www.internet2.edu/performance/owamp/. Accessed 1 Aug 2012
Open Science Grid: http://www.opensciencegrid.org. Accessed 1 Aug 2012
OSCARS: On-demand Secure Circuits and Advance Reservation System: http://www.es.net/oscars. Accessed 1 Aug 2012
OSG Resource and Site Validation: http://vdt.cs.wisc.edu/components/osg-rsv.html. Accessed 1 Aug 2012
Performance Inspector: http://perfinsp.sourceforge.net. Accessed 1 Aug 2012
perfSONAR: http://www.perfsonar.net/. Accessed 1 Aug 2012
Portable Batch System: http://www.pbsworks.com/. Accessed 1 Aug 2012
Rahman, M., Venugopal, S., Buyya, R.: A dynamic critical path algorithm for scheduling scientific workflow applications on global Grids. In: Proc. of the 3rd IEEE Int. Conf. on e-Sci. and Grid Comp., pp. 35–42 (2007)
Ranaweera, A., Agrawal, D.: A task duplication based algorithm for heterogeneous systems. In: Proc. of IPDPS, pp. 445–450 (2000)
Rao, N.S.V.: Vector space methods for sensor fusion problems. Opt. Eng. 37(2), 499–504 (1998)
Reliable File Transfer: http://www-unix.globus.org/toolkit/docs/3.2/rft/index.html. Accessed 1 Aug 2012
Sekhar, A., Manoj, B., Murthy, C.: A state-space search approach for optimizing reliability and cost of execution in distributed sensor networks. In: Proc. of Int. Workshop on Dist. Comp., pp. 63–74 (2005)
Shroff, P., Watson, D., Flann, N., Freund, R.: Genetic simulated annealing for scheduling data-dependent tasks in heterogeneous environments. In: Proc. of Heter. Comp. Workshop, pp. 98–104 (1996)
Singh, M., Vouk, M.: Scientific workflows: scientific computing meets transactional workflows. In: Proc. of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions, pp. 28–34. Univ. Georgia, Athens, GA (1996)
Spallation Neutron Source: http://neutrons.ornl.gov, http://www.sns.gov. Accessed 1 Aug 2012
Storage Resource Broker (SRB): http://www.sdsc.edu/srb/index.php/Main_Page. Accessed 1 Aug 2012
Storage Resource Management (SRM): https://sdm.lbl.gov/srm-wg/. Accessed 1 Aug 2012
Stork: http://www.cct.lsu.edu/~kosar/stork/index.php. Accessed 1 Aug 2012
Swift: http://www.ci.uchicago.edu/swift/main/. Accessed 1 Aug 2012
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin Heidelberg New York (2007)
TeraPaths: https://www.racf.bnl.gov/terapaths. Accessed 1 Aug 2012
The Whetstone Benchmark: http://www.roylongbottom.org.uk/whetstone.htm. Accessed 1 Aug 2012
Topcuoglu, H., Hariri, S., Wu, M.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE TPDS 13(3), 260–274 (2002)
Wang, L., Siege, H., Roychowdhury, V., Maciejewski, A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. JPDC 47, 8–22 (1997)
Wassermann, B., Emmerich, W., Butchart, B., Cameron, N., Chen, L., Patel, J.: Workflows for e-Science: Scientific Workflows for Grids, Chapter Sedna: A BPEL-based Environment for Visual Scientific Workflow Modeling, pp. 427–448. Springer, London (2007)
Worldwide LHC Computing Grid (WLCG): http://lcg.web.cern.ch/LCG
Wu, Q., Gu, Y.: Optimizing end-to-end performance of data-intensive computing pipelines in heterogeneous network environments. J. Parallel Distrib. Comput. 71(2), 254–265 (2011)
Wu, Q., Gu, Y., Liao, Y., Lu, X., Lin, Y., Rao, N.: Latency modeling and minimization for large-scale scientific workflows in distributed network environments. In: The 44th Annual Simulation Symposium (ANSS11), Part of the 2011 Spring Simulation Multiconference (SpringSim11), Boston, MA, 4–7 Apr 2011
Wu, Q., Rao, N.S.V.: On transport daemons for small collaborative applications over wide-area networks. In: Proc. of the 24th IEEE Int. Performance Computing and Communications Conf., pp. 159–166, Phoenix, AZ, 7–9 Apr 2005
Wu, Q., Zhu, M., Lu, X., Brown, P., Lin, Y., Gu, Y., Cao, F., Reuter, M.: Automation and management of scientific workflows in distributed network environments. In: Proc. of the 6th Int. Workshop on Sys. Man. Tech., Proc., and Serv., Atlanta, GA, 19 Apr 2010
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, Q., Zhu, M., Gu, Y. et al. A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids. J Grid Computing 10, 367–393 (2012). https://doi.org/10.1007/s10723-012-9222-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-012-9222-7