Abstract
Business workflow management and business process modeling are mature research areas, whose roots go far back to the early days of office automation systems. Scientific workflow management, on the other hand, is a much more recent phenomenon, triggered by (i) a shift towards data-intensive and computational methods in the natural sciences, and (ii) the resulting need for tools that can simplify and automate recurring computational tasks. In this paper, we provide an introduction and overview of scientific workflows, highlighting features and important concepts commonly found in scientific workflow applications. We illustrate these using simple workflow examples from a bioinformatics domain. We then discuss similarities and, more importantly, differences between scientific workflows and business workflows. While some concepts and solutions developed in one domain may be readily applicable to the other, there remain sufficiently many differences that warrant a new research effort at the intersection of scientific and business workflows. We close by proposing a number of research opportunities for cross-fertilization between the scientific workflow and business workflow communities.
This research was conducted while the second author was on sabbatical leave at UC Davis.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Defining e-Science (2008), www.nesc.ac.uk/nesc/define.html
The Kepler Project (2008), www.kepler-project.org
The Taverna Project (2008), www.mygrid.org.uk/tools/taverna
The Triana Project (2008), www.trianacode.org
Abramson, D., Enticott, C., Altinas, I.: Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows. In: ACM/IEEE Conference on Supercomputing (SC 2008). IEEE Press, Los Alamitos (2008)
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the Kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)
Anand, M., Bowers, S., McPhillips, T., Ludäscher, B.: Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs. In: Intl. Conf. on Scientific and Statistical Database Management (SSDBM), pp. 237–254 (2009)
Anderson, C.: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. WIRED Magazine (June 2008)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)
Berkley, C., Bowers, S., Jones, M., Ludäscher, B., Schildhauer, M., Tao, J.: Incorporating Semantics in Scientific Workflow Authoring. In: 17th Intl. Conference on Scientific and Statistical Database Management (SSDBM), Santa Barbara, California (June 2005)
Birks, J.B.: Rutherford at Manchester. Heywood (1962)
Bowers, S., Ludäscher, B.: Actor-oriented design of scientific workflows. In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 369–384. Springer, Heidelberg (2005)
Bowers, S., McPhillips, T., Ludäscher, B., Cohen, S., Davidson, S.B.: A model for user-oriented data provenance in pipelined scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 133–147. Springer, Heidelberg (2006)
Bowers, S., McPhillips, T., Wu, M., Ludäscher, B.: Project histories: Managing data provenance across collection-oriented scientific workflow runs. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 122–138. Springer, Heidelberg (2007)
Bowers, S., McPhillips, T.M., Ludäscher, B.: Provenance in Collection-Oriented Scientific Workflows. In: Moreau, Ludäscher [43]
Bowers, S., McPhillips, T., Riddle, S., Anand, M.K., Ludäscher, B.: Kepler/pPOD: Scientific workflow and provenance support for assembling the tree of life. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 70–77. Springer, Heidelberg (2008)
Brooks, C., Lee, E.A., Liu, X., Neuendorffer, S., Zhao, Y., Zheng, H.: Heterogeneous Concurrent Modeling and Design in Java (Volume 3: Ptolemy II Domains). Technical Report No. UCB/EECS-2008-37 (April 2008)
Cheney, J., Buneman, P., Ludäscher, B.: Report on the Principles of Provenance Workshop. SIGMOD Record 37(1), 62–65 (2008)
Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming Scientific and Distributed Workflow with Triana Services. In: Fox, Gannon [28]
Cyberinfrastructure for Phylogenetic Research, CIPRES (2009), www.phlyo.org
Crawl, D., Altintas, I.: A provenance-based fault tolerance mechanism for scientific workflows. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 152–159. Springer, Heidelberg (2008)
Directed Acyclic Graph Manager, DAGMan (2009), www.cs.wisc.edu/condor/dagman
Davidson, S.B., Boulakia, S.C., Eyal, A., Ludäscher, B., McPhillips, T.M., Bowers, S., Anand, M.K., Freire, J.: Provenance in Scientific Workflow Systems. IEEE Data Eng. Bull. 30(4), 44–50 (2007)
Davidson, S.B., Freire, J.: Provenance and Scientific Workflows: Challenges and Opportunities (Tutorial Notes). In: SIGMOD (2008)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: An overview of workflow system features and capabilities. Future Generation Computer Systems 25(5), 528–540 (2009)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13(3), 219–237 (2005)
Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H., Villazon, A., Wieczorek, M.: ASKALON: A grid application development and computing environment. In: IEEE Grid Computing Workshop (2005)
Fox, G.C., Gannon, D. (eds.): Concurrency and Computation: Practice and Experience. Special Issue: Workflow in Grid Systems, vol. 18(10). John Wiley & Sons, Chichester (2006)
Freire, J.-L., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing rapidly-evolving scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)
Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the Challenges of Scientific Workflows. Computer 40(12), 24–32 (2007)
Goble, C., Roure, D.D.: myExperiment: Social Networking for Workflow-Using e-Scientists. In: Workshop on Workflows in Support of Large-Scale Science, WORKS (2007)
Hidders, J., Kwasnikowska, N., Sroka, J., Tyszkiewicz, J., den Bussche, J.V.: DFL: A dataflow language based on Petri nets and nested relational calculus. Information Systems 33(3), 261–284 (2008)
Kahn, G.: The Semantics of a Simple Language for Parallel Programming. In: Rosenfeld, J.L. (ed.) Proc. of the IFIP Congress 74, pp. 471–475. North-Holland, Amsterdam (1974)
Klasky, S., Barreto, R., Kahn, A., Parashar, M., Podhorszki, N., Parker, S., Silver, D., Vouk, M.: Collaborative Visualization Spaces for Petascale Simulations. In: Intl. Symposium on Collaborative Technologies and Systems (CTS), May 2008, pp. 203–211 (2008)
Lee, E.A., Matsikoudis, E.: The Semantics of Dataflow with Firing. In: Huet, G., Plotkin, G., Lévy, J.-J., Bertot, Y. (eds.) From Semantics to Computer Science: Essays in memory of Gilles Kahn. Cambridge University Press, Cambridge (2008)
Lee, E.A., Parks, T.M.: Dataflow Process Networks. Proceedings of the IEEE, 773–799 (1995)
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience 18(10), 1039–1065 (2006)
Ludäscher, B., Altintas, I., Bowers, S., Cummings, J., Critchlow, T., Deelman, E., Freire, J., Roure, D.D., Goble, C., Jones, M., Klasky, S., Podhorszki, N., Silva, C., Taylor, I., Vouk, M.: Scientific Process Automation and Workflow Management. In: Shoshani, A., Rotem, D. (eds.) Scientific Data Management: Challenges, Existing Technology, and Deployment. Chapman and Hall/CRC (to appear, 2009)
Ludäscher, B., Bowers, S., McPhillips, T.: Scientific Workflows. In: Özsu, M.T., Liu, L. (eds.) Encyclopedia of Database Systems. Springer, Heidelberg (to appear, 2009)
Ludäscher, B., Goble, C. (eds.): ACM SIGMOD Record: Special Issue on Scientific Workflows, vol. 34(3) (September 2005)
Ludäscher, B., Podhorszki, N., Altintas, I., Bowers, S., McPhillips, T.M.: From computation models to models of provence: The RWS approach, vol. 20(5), pp. 507–518
McPhillips, T., Bowers, S., Zinn, D., Ludäscher, B.: Scientific Workflow Design for Mere Mortals. Future Generation Computer Systems 25, 541–551 (2009)
Moreau, L., Ludäscher, B. (eds.): Concurrency and Computation: Practice & Experience – Special Issue on the First Provenance Challenge. Wiley, Chichester (2007)
Morrison, J.P.: Flow-Based Programming – A New Approach to Application Development. Van Nostrand Reinhold (1994), www.jpaulmorrison.com/fbp
Oinn, T., Greenwood, M., Addis, M., Alpdemir, M.N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D., Li, P., Lord, P., Pocock, M.R., Senger, M., Stevens, R., Wipat, A., Wroe, C.: Taverna: Lessons in Creating a Workflow Environment for the Life Sciences. In: Fox, Gannon [28]
Podhorszki, N., Ludäscher, B., Klasky, S.A.: Workflow automation for processing plasma fusion simulation data. In: Workshop on Workflows in Support of Large-Scale Science (WORKS), pp. 35–44. ACM Press, New York (2007)
Rice, J.R., Boisvert, R.F.: From Scientific Software Libraries to Problem-Solving Environments. IEEE Computational Science & Engineering 3(3), 44–53 (1996)
Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C.J., Osborne, B.I., Pocock, M.R., Schattner, P., Senger, M., Stein, L.D., Stupka, E., Wilkinson, M.D., Birney, E.: The BIOPERL Toolkit: Perl Modules for the Life Sciences. Genome Res. 12(10), 1611–1618 (2002)
Taylor, I., Deelman, E., Gannon, D., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Heidelberg (2007)
Wittgenstein, L.: Philosophical Investigations. Blackwell Publishing, Malden (1953)
Yu, J., Buyya, R.: A Taxonomy of Scientific Workflow Systems for Grid Computing. In: Ludäscher, Goble [40]
Zinn, D., Bowers, S., McPhillips, T., Ludäscher, B.: X-CSR: Dataflow Optimization for Distributed XML Process Pipelines. In: 25th Intl. Conf. on Data Engineering (ICDE), Shanghai, China (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ludäscher, B., Weske, M., McPhillips, T., Bowers, S. (2009). Scientific Workflows: Business as Usual? . In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds) Business Process Management. BPM 2009. Lecture Notes in Computer Science, vol 5701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03848-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-03848-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03847-1
Online ISBN: 978-3-642-03848-8
eBook Packages: Computer ScienceComputer Science (R0)