Abstract
The paper studies the impact of data transfer strategies on the execution of scientific workflows. Five strategies are described, which define when and in what order data transfers are performed during the workflow execution. The strategies are experimentally evaluated by means of simulation using a realistic network model. It is demonstrated that the execution time of data-intensive workflows significantly depends on the used strategy. In particular, Eager and Lazy strategies, often used in theory and practice of workflow scheduling, demonstrate the poor results in most cases. The alternative strategies provide up to 36% makespan improvement by overlapping communications and computations, prioritizing data transfers and reducing network contention.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Sys. 29(1), 158–169 (2013)
Bharathi, S., Chervenak, A.: Data staging strategies and their impact on the execution of scientific workflows. In: Proceedings of the Second International Workshop on Data-Aware Distributed Computing, p. 5. ACM (2009)
Bharathi S., Chervenak A., Deelman E., Mehta G., Su M.H., Vahi K.: Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp. 1–10, November 2008
Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. J. Grid Comput. 14(2), 359–378 (2016)
Byun, E.K., Kee, Y.S., Kim, J.S., Maeng, S.: Cost optimized provisioning of elastic resources for application workflows. Future Gener. Comput. Syst. 27(8), 1011–1026 (2011)
Casanova, H., Giersch, A., Legrand, A., Quinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014)
Çatalyürek, Ü.V., Kaya, K., Uçar, B.: Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, pp. 45–54. ACM (2011)
Deelman, E., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)
Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29(3), 682–692 (2013)
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)
Liu, Z., et al.: A data placement strategy for scientific workflow in hybrid cloud. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 556–563. IEEE (2018)
Nazarenko, A., Sukhoroslov, O.: An experimental study of workflow scheduling algorithms for heterogeneous systems. In: Malyshkin, V. (ed.) PaCT 2017. LNCS, vol. 10421, pp. 327–341. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62932-2_32
Pandey, S., Wu, L., Guru, S.M., Buyya, R.: A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: 2010 24th IEEE International Conference on Advanced Information Networking and Applications, pp. 400–407. IEEE (2010)
da Silva, R.F., Filgueira, R., Deelman, E., Pairo-Castineira, E., Overton, I.M., Atkinson, M.P.: Using simple PID controllers to prevent and mitigate faults in scientific workflows. In: WORKS@ SC, pp. 15–24 (2016)
Szabo, C., Sheng, Q.Z., Kroeger, T., Zhang, Y., Yu, J.: Science in the cloud: allocation and execution of data-intensive scientific workflows. J. Grid Comput. 12(2), 245–264 (2014)
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids. Springer, London (2014). https://doi.org/10.1007/978-1-84628-757-2
Teylo, L., de Paula, U., Frota, Y., de Oliveira, D., Drummond, L.M.: A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds. Future Gener. Comput. Syst. 76, 1–17 (2017)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Velho, P., Schnorr, L.M., Casanova, H., Legrand, A.: On the validity of flow-level TCP network models for grid and cloud simulations. ACM Trans. Model. Comput. Simul. (TOMACS) 23(4), 23 (2013)
Wang, M., Zhang, J., Dong, F., Luo, J.: Data placement and task scheduling optimization for data intensive scientific workflow in multiple data centers environment. In: 2014 Second International Conference on Advanced Cloud and Big Data, pp. 77–84. IEEE (2014)
Wu, F., Wu, Q., Tan, Y.: Workflow scheduling in cloud: a survey. J. Supercomput. 71(9), 3373–3418 (2015)
Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. In: Xhafa, F., Abraham, A. (eds.) Metaheuristics for Scheduling in Distributed Computing Environments. Studies in Computational Intelligence, vol. 146, pp. 173–214. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69277-5_7
Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gener. Comput. Syst. 26(8), 1200–1214 (2010)
Acknowledgments
This work is supported by the Russian Science Foundation (project 16-11-10352).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sukhoroslov, O. (2019). An Experimental Study of Data Transfer Strategies for Execution of Scientific Workflows. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2019. Lecture Notes in Computer Science(), vol 11657. Springer, Cham. https://doi.org/10.1007/978-3-030-25636-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-25636-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25635-7
Online ISBN: 978-3-030-25636-4
eBook Packages: Computer ScienceComputer Science (R0)