Definitions
A workflow is a well-defined, and possibly repeatable, pattern or systematic organization of activities designed to achieve a certain transformation of data (Talia et al. 2015).
A Workflow Management System (WMS) is a software environment providing tools to define, compose, map, and execute workflows.
Overview
The wide availability of high-performance computing systems has allowed scientists and engineers to implement more and more complex applications for accessing and analyzing huge amounts of data (Big Data) on distributed and high-performance computing platforms. Given the variety of Big Data applications and types of users (from end users to skilled programmers), there is a need for scalable programming models with different levels of abstractions and design formalisms. The programming models should adapt to user needs by allowing (i) ease in defining data analysis applications, (ii) effectiveness in the analysis of large datasets, (iii) and efficiency of executing...
References
Agapito G, Cannataro M, Guzzi PH, Marozzo F, Talia D, Trunfio P (2013) Cloud4snp: distributed analysis of snp microarray data on the cloud. In: Proceedings of the ACM conference on bioinformatics, computational biology and biomedical informatics 2013 (ACM BCB 2013). ACM, Washington, DC, p 468. ISBN:978-1-4503-2434-2
Altomare A, Cesario E, Comito C, Marozzo F, Talia D (2017) Trajectory pattern mining for urban computing in the cloud. Trans Parallel Distrib Syst 28(2):586–599. ISSN:1045-9219
Atay M, Chebotko A, Liu D, Lu S, Fotouhi F (2007) Efficient schema-based XML-to-relational data mapping. Inf Syst 32(3):458–476
Belcastro L, Marozzo F, Talia D, Trunfio P (2015) Programming visual and script-based big data analytics workflows on clouds. In: Grandinetti L, Joubert G, Kunze M, Pascucci V (eds) Post-proceedings of the high performance computing workshop 2014. Advances in parallel computing, vol 26. IOS Press, Cetraro, pp 18–31. ISBN:978-1-61499-582-1
Belcastro L, Marozzo F, Talia D, Trunfio P (2016) Using scalable data mining for predicting flight delays. ACM Trans Intell Syst Technol 8(1):1–20
Bowers S, Ludascher B, Ngu AHH, Critchlow T (2006) Enabling scientific workflow reuse through structured composition of dataflow and control-flow. In: 22nd international conference on data engineering workshops (ICDEW’06), pp 70–70. https://doi.org/10.1109/ICDEW.2006.55
Brown DA, Brady PR, Dietz A, Cao J, Johnson B, McNabb J (2007) A case study on the use of workflow technologies for scientific analysis: gravitational wave data analysis. Workflows for e-Science, pp 39–59
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Futur Gener Comput Syst 25(5):528–540
Deelman E, Vahi K, Juve G, Rynge M, Callaghan S, Maechling PJ, Mayani R, Chen W, da Silva RF, Livny M et al (2015) Pegasus, a workflow management system for science automation. Futur Gener Comput Syst 46:17–35
Georgakopoulos D, Hornick M, Sheth A (1995) An overview of workflow management: from process modeling to workflow automation infrastructure. Distrib Parallel Databases 3(2):119–153
Gropp W, Lusk E, Skjellum A (1999) Using MPI: portable parallel programming with the message-passing interface, vol 1. MIT press, Cambridge
Guan Z, Hernandez F, Bangalore P, Gray J, Skjellum A, Velusamy V, Liu Y (2006) Grid-flow: a grid-enabled scientific workflow system with a petri-net-based interface. Concurr Comput Pract Exp 18(10):1115–1140
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS operating systems review, vol 41. ACM, pp 59–72
Juric MB, Mathew B, Sarang PG (2006) Business process execution language for web services: an architect and developer’s guide to orchestrating web services using BPEL4WS. Packt Publishing Ltd, Birmingham
Juve G, Deelman E, Vahi K, Mehta G, Berriman B, Berman BP, Maechling P (2009) Scientific workflow applications on Amazon EC2. In: 2009 5th IEEE international conference on E-science workshops. IEEE, pp 59–66
Kranjc J, Podpečan V, Lavrač N (2012) Clowdflows: a cloud based scientific workflow platform. In: Machine learning and knowledge discovery in databases. Springer, pp 816–819
Lee S, Park H, Shin Y (2012) Cloud computing availability: multi-clouds for big data service. In: Convergence and hybrid information technology. Springer, Heidelberg, pp 799–806
Liu L, Pu C, Ruiz DD (2004) A systematic approach to flexible specification, composition, and restructuring of workflow activities. J Database Manag 15(1):1
Lordan F, Tejedor E, Ejarque J, Rafanell R, Álvarez J, Marozzo F, Lezzi D, Sirvent R, Talia D, Badia R (2014) Servicess: an interoperable programming framework for the cloud. J Grid Comput 12(1):67–91
Lu Q, Hao P, Curcin V, He W, Li YY, Luo QM, Guo YK, Li YX (2006) KDE bioscience: platform for bioinformatics analysis workflows. J Biomed Inf 39(4):440–450
Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee EA, Tao J, Zhao Y (2006) Scientific workflow management and the kepler system. Concurr Comput Pract Exp 18(10):1039–1065
Maheshwari K, Rodriguez A, Kelly D, Madduri R, Wozniak J, Wilde M, Foster I (2013) Enabling multi-task computation on galaxy-based gateways using swift. In: 2013 IEEE international conference on cluster computing (CLUSTER). IEEE, pp 1–3
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146
Marin A, Wellman B (2011) Social network analysis: an introduction. The SAGE handbook of social network analysis, p 11. Sage Publications, Thousand Oaks
Margolis B (2007). SOA for the business developer: concepts, BPEL, and SCA. Mc Press, Lewisville
Marozzo F, Talia D, Trunfio P (2011) A cloud framework for parameter sweeping data mining applications. In: Proceedings of the 3rd IEEE international conference on cloud computing technology and science (CloudCom’11). IEEE Computer Society Press, Athens, pp 367–374. ISBN:978-0-7695-4622-3
Marozzo F, Talia D, Trunfio P (2015) Js4cloud: script-based workflow programming for scalable data analysis on cloud platforms. Concurr Comput Pract Exp 27(17):5214–5237
Marozzo F, Talia D, Trunfio P (2016) A workflow management system for scalable data mining on clouds. IEEE Trans Serv Comput PP(99):1–1
Talia D, Trunfio P, Marozzo F (2015) Data analysis in the cloud. Elsevier. ISBN:978-0-12-802881-0
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
WFMC T (1999) Glossary, document number WFMC, issue 3.0. TC 1011
Wilde M, Hategan M, Wozniak JM, Clifford B, Katz DS, Foster I (2011) Swift: a language for distributed parallel scripting. Parallel Comput 37(9):633–652
Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, Soiland-Reyes S, Dunlop I, Nenadic A, Fisher P et al (2013) The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res 41(W1):W557–W561
Wozniak JM, Wilde M, Foster IT (2014) Language features for scalable distributed-memory dataflow computing. In: 2014 fourth workshop on data-flow execution models for extreme scale computing (DFM). IEEE, pp 50–53
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this entry
Cite this entry
Belcastro, L., Marozzo, F. (2018). Workflow Systems for Big Data Analysis. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_137-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_137-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering
Publish with us
Chapter history
-
Latest
Workflow Systems for Big Data Analysis- Published:
- 14 February 2018
DOI: https://doi.org/10.1007/978-3-319-63962-8_137-1
-
Original
Workflow Systems for Big Data Analysis- Published:
- 24 February 2012
DOI: https://doi.org/10.1007/978-3-319-63962-8_137-2