ABSTRACT
Scientific workflow management is heavily used in our organization. After six years, a large number of workflows are available and regularly used to run biomedical data analysis experiments on distributed infrastructures, mostly on grids. In this paper we present our first efforts to better understand and characterise these workflows. We start with a set of considerations previously proposed in the literature (workflow dimensions and motifs), and revise these to more closely describe what we observe in our workflows. We conclude that workflow characteristics can be categorized at two levels: firstly, the features characterizing the distributed application and how to implement it as a workflow, and secondly, workflow motifs that depend on the features of the selected workflow management system. These characteristics could be useful in the future to understand a larger set of workflows and to identify functional requirements for further development workflow management systems.
- crowdLabs. http://www.crowdlabs.org/.Google Scholar
- myExperiment website. http://www.myexperiment.org/.Google Scholar
- SHIWA Portal. http://ssp.shiwa-workflow.eu/.Google Scholar
- SHIWA Repository. http://repo.shiwa-workflow.eu/.Google Scholar
- Workflow Generator website. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator.Google Scholar
- Workflow INstance Generation and Specialization (WINGS). http://wings-workflows.org/.Google Scholar
- Workflow Patterns website. http://www.workflowpatterns.com/.Google Scholar
- Yet Another Workflow Language (YAWL). http://www.yawlfoundation.org/.Google Scholar
- S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M. Su, and K. Vahi. Characterization of scientific workflows. In Third Workshop on Workflows in Support of Large-Scale Science. WORKS 2008., pages 1--10. IEEE, November 2008.Google ScholarCross Ref
- V. Curcin and M. Ghanem. Scientific workflow systems - can one size fit all? In Biomedical Engineering Conference, CIBEC 2008, pages 1--9. IEEE, 2008.Google ScholarCross Ref
- E. Deelman, D. Gannon, M. S. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems, 25(5): 528--540, 2009. Google ScholarDigital Library
- D. Garijo, P. Alper, K. Belhajjame, O. Corcho, Y. Gil, and C. Goble. Common motifs in scientific workflows: An empirical analysis. In Proceedings of the 8th IEEE International Conference on E-Science (e-Science), pages 1--8. IEEE Computer Society, 2012. Google ScholarDigital Library
- M. Ghanem, V. Curcin, P. Wendel, and Y. Guo. Building and Using Analytical Workflows in Discovery Net, pages 119--139. John Wiley & Sons, Ltd, 2009.Google Scholar
- T. Glatard, J. Montagnat, D. Lingrand, and X. Pennec. Flexible and efficient workflow deployement of data-intensive applications on grids with MOTEUR. International Journal of High Performance Computing Applications, 22(3): 347--360, 2008. Google ScholarDigital Library
- D. Jordan and J. Evdemon (chairs). Web Services Business Process Execution Language version 2.0. http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf.Google Scholar
- P. Kacsuk, K. Karoczkai, G. Hermann, G. Sipos, and J. Kovács. WS-PGRADE: Supporting parameter sweep applications in workflows. In Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on, pages 1--10. IEEE, 2008.Google ScholarCross Ref
- B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10): 1039--1065, 2006. Google ScholarDigital Library
- S. Majithia, I. Taylor, M. Shields, and I. Wang. Triana: A graphical web service composition and execution toolkit. In Proceedings of the IEEE International Conference on Web Services, pages 514--524. IEEE Computer Society, July 2004. Google ScholarDigital Library
- S. Migliorini, M. Gambini, M. La Rosa, and A. H. M. ter Hofstede. Pattern-based evaluation of scientific workflow management systems. BPM Center Report BPM-11-03, 2011. http://eprints.qut.edu.au/39935/.Google Scholar
- T. Oinn, M. Addis, J. Ferris, D. Marvin, T. Carver, M. R. Pocock, and A. Wipat. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20: 2004, 2004. Google ScholarDigital Library
- C. Pautasso and G. Alonso. Parallel computing patterns for grid workflows. In Workshop on Workflows in Support of Large-Scale Science, WORKS '06., pages 1--10, 2006.Google ScholarCross Ref
- L. Ramakrishnan and B. Plale. A multi-dimensional classification model for scientific workflow characteristics. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, pages 4:1--4:12. ACM, 2010. Google ScholarDigital Library
- S. Shahand, A. Benabdelkader, J. Huguet, M. Jaghouri, M. Santcroos, M. al Mourabit, P. F. C. Groot, M. W. A. Caan, A. H. C. van Kampen, and S. D. Olabarriaga. A data-centric science gateway for computational neuroscience. In Proceedings of the 5th International Workshop on Science Gateways, June 2013.Google Scholar
- S. Shahand, M. Santcroos, A. H. C. Kampen, and S. D. Olabarriaga. A grid-enabled gateway for biomedical data analysis. Journal of Grid Computing, 10(4): 725--742, 2012. Google ScholarDigital Library
- W. M. P. Van Der Aalst, A. H. M. Ter Hofstede, B. Kiepuszewski, and A. P. Barros. Workflow patterns. Distrib. Parallel Databases, 14(1): 5--51, July 2003. Google ScholarDigital Library
- Workflow Management Coallition. Workflow Reference Model, 1995.Google Scholar
- Ustun Yildiz, Adnene Guabtni, and Anne H. H. Ngu. Towards scientific workflow patterns. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS '09, pages 13:1--13:10, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- J. Yu and R. Buyya. A taxonomy of workflow management systems for grid computing. Journal of Grid Computing, 3(3--4): 171--200, 2005.Google ScholarCross Ref
- Z. Zhang, D. S. Katz, J. M. Wozniak, A. Espinosa, and I. Foster. Design and analysis of data management in scalable parallel scripting. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pages 1--11, 2012. Google ScholarDigital Library
Index Terms
- Understanding workflows for distributed computing: nitty-gritty details
Recommendations
Workflows and e-Science: An overview of workflow system features and capabilities
Scientific workflow systems have become a necessary tool for many applications, enabling the composition and execution of complex analysis on distributed resources. Today there are many workflow systems, often with overlapping functionality. A key issue ...
Fine-Grain Interoperability of Scientific Workflows in Distributed Computing Infrastructures
Today there exist a wide variety of scientific workflow management systems, each designed to fulfill the needs of a certain scientific community. Unfortunately, once a workflow application has been designed in one particular system it becomes very hard ...
Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergence
AbstractThe evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements ...
Highlights- Analysis of the HPC, Big Data and AI convergence in complex scientific workflows.
Comments