Abstract
Many applications in scientific fields, like physics, astronomy, biology, earth science, involve the process of transforming a set of data by applying iterative computation steps. From the computer science perspective these steps may be seen as a pool of tasks with data dependency. With the growth of the application complexity there will also be an increase in the number of workflows. Since we have a large variety of solutions for specific applications and platforms, a systematic analysis of existing solutions for scheduling models, methods, and algorithms used in workflow applications is needed. This chapter provides a global picture of the existing solutions providing support in making the optimal workflow scheduling choices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pop, F., Zhu, X., Yang, L.T.: Midhdc: Advanced topics on middleware services for heterogeneous distributed computing. part 1. Future Gener. Comput. Syst. 56, 734–735 (2016)
Pop, F., Potop-Butucaru, M.: Armco: Advanced topics in resource management for ubiquitous cloud computing: An adaptive approach. Future Gener. Comput. Syst. 54, 79–81 (2016)
Simion, B., Leordeanu, C., Pop, F., Cristea, V.: A hybrid algorithm for scheduling workflow applications in grid environments (icpdp). In: OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, pp. 1331–1348. Springer (2007)
Vasile, M.A., Pop, F., Tutueanu, R.I., Cristea, V., Kołodziej, J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener. Comput. Syst. 51, 61–71 (2015)
Lynch, C.: Big Data: How do your data grow? Nature 455(7209), 28–29 (2008)
Pop, F., Iacono, M., Gribaudo, M., Kołodziej, J.: Advances in modelling and simulation for big-data applications (amsba). Concurrency Comput. Practice Experience 28(2), 291–293 (2016)
Chen, M., Mao, S., Liu, Y.: Big Data: a survey. Mob. Networks Appl. 19(2), 171–209 (2014)
Erl, T., Khattak, W., Buhler, P.: Big Data Fundamentals: Concepts. Prentice Hall Press, Drivers & Techniques (2016)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)
Muresan, O., Pop, F., Gorgan, D., Cristea, V.: Satellite image processing applications in mediogrid. In: 2006 Fifth International Symposium on Parallel and Distributed Computing, pp. 253–262. IEEE (2006)
Gorgan, D., Bacu, V., Rodila, D., Pop, F., Petcu, D.: Experiments on esipenvironment oriented satellite data processing platform. Earth Sci. Inf. 3(4), 297–308 (2010)
Masdari, M., ValiKardan, S., Shahi, Z., Azar, S.I.: Towards workflow scheduling in cloud computing: a comprehensive analysis. J. Network Comput. Appl. 66, 64–82 (2016)
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids. Springer Publishing Company, Incorporated (2014)
Pop, F., Dobre, C., Cristea, V.: Performance analysis of grid dag scheduling algorithms using monarc simulation tool. In: 2008 International Symposium on Parallel and Distributed Computing, pp. 131–138. IEEE (2008)
Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. In: Metaheuristics for Scheduling in Distributed Computing Environments, pp. 173–214. Springer (2008)
Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the askalon grid environment. ACM SIGMOD Rec. 34(3), 56–62 (2005)
Maheswaran, M., Ali, S., Siegal, H., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Heterogeneous Computing Workshop, 1999.(HCW’99) Proceedings. Eighth, pp. 30–44. IEEE (1999)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Sakellariou, R., Zhao, H.: A hybrid heuristic for dag scheduling on heterogeneous systems. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium, 2004, p. 111. IEEE (2004)
Bajaj, R., Agrawal, D.P.: Improving scheduling of tasks in a heterogeneous environment. IEEE Trans. Parallel Distrib. Syst. 15(2), 107–118 (2004)
Golberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addion Wesley 1989, 102 (1989)
Hou, E.S., Ansari, N., Ren, H.: A genetic algorithm for multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 5(2), 113–120 (1994)
YarKhan, A., Dongarra, J.J.: Experiments with scheduling using simulated annealing in a grid environment. In: International Workshop on Grid Computing, pp. 232–242. Springer (2002)
Menasce, D.A., Casalicchio, E.: A framework for resource allocation in grid computing. In: MASCOTS, pp. 259–267. Citeseer (2004)
Yu, J., Buyya, R., Tham, C.K.: Cost-based scheduling of scientific workflow applications on utility grids. In: First International Conference on e-Science and Grid Computing (e-Science’05), pp. 8–pp. IEEE (2005)
Sakellariou, R., Zhao, H., Tsiakkouri, E., Dikaiakos, M.D.: Scheduling workflows with budget constraints. In: Integrated Research in GRID Computing, pp. 189–202. Springer (2007)
Ramakrishnan, A., Singh, G., Zhao, H., Deelman, E., Sakellariou, R., Vahi, K., Blackburn, K., Meyers, D., Samidi, M.: Scheduling data-intensiveworkflows onto storage-constrained distributed resources. In: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid’07), pp. 401–409. IEEE (2007)
Yu, Z., Shi, W.: A planner-guided scheduling strategy for multiple workflow applications. In: 2008 International Conference on Parallel Processing-Workshops, pp. 1–8. IEEE (2008)
Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., et al.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci. Prog. 13(3), 219–237 (2005)
Xu, M., Cui, L., Wang, H., Bi, Y.: A multiple qos constrained scheduling strategy of multiple workflows for cloud computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 629–634. IEEE (2009)
Durillo, J.J., Nae, V., Prodan, R.: Multi-objective energy-efficient workflow scheduling using list-based heuristics. Future Gener. Compu. Syst. 36, 221–236 (2014)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Taylor, I., Shields, M., Wang, I., Rana, O.: Triana applications within grid computing and peer to peer environments. J. Grid Comput. 1(2), 199–217 (2003)
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, 2004, pp. 423–424. IEEE (2004)
Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.L., Villazon, A., Wieczorek, M.: Askalon: A grid application development and computing environment. In: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 122–131. IEEE Computer Society (2005)
von Laszewski, G., Hategan, M.: Java Cog Kit Karajan/Gridant Workflow Guide. Tech. rep, Technical Report, Argonne National Laboratory, Argonne, IL, USA (2005)
Acknowledgments
The research presented in this paper is supported by projects: DataWay: Real-time Data Processing Platform for Smart Cities: Making sense of Big Data—PN-II-RU-TE-2014-4-2731; MobiWay: Mobility Beyond Individualism: an Integrated Platform for Intelligent Transportation Systems of Tomorrow—PN-II-PT-PCCA-2013-4-0321; CyberWater grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012; clueFarm: Information system based on cloud services accessible through mobile devices, to increase product quality and business development farms—PN-II-PT-PCCA-2013-4-0870.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Nita, MC., Vasile, M., Pop, F., Cristea, V. (2016). Workflow Scheduling Techniques for Big Data Platforms. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-44881-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)