Skip to main content

An Experimental Study of Data Transfer Strategies for Execution of Scientific Workflows

  • Conference paper
  • First Online:
Parallel Computing Technologies (PaCT 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11657))

Included in the following conference series:

Abstract

The paper studies the impact of data transfer strategies on the execution of scientific workflows. Five strategies are described, which define when and in what order data transfers are performed during the workflow execution. The strategies are experimentally evaluated by means of simulation using a realistic network model. It is demonstrated that the execution time of data-intensive workflows significantly depends on the used strategy. In particular, Eager and Lazy strategies, often used in theory and practice of workflow scheduling, demonstrate the poor results in most cases. The alternative strategies provide up to 36% makespan improvement by overlapping communications and computations, prioritizing data transfers and reducing network contention.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/alexmnazarenko/pysimgrid.

  2. 2.

    https://github.com/osukhoroslov/pysimgrid-experiments/tree/master/pact2019.

References

  1. Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Sys. 29(1), 158–169 (2013)

    Article  Google Scholar 

  2. Bharathi, S., Chervenak, A.: Data staging strategies and their impact on the execution of scientific workflows. In: Proceedings of the Second International Workshop on Data-Aware Distributed Computing, p. 5. ACM (2009)

    Google Scholar 

  3. Bharathi S., Chervenak A., Deelman E., Mehta G., Su M.H., Vahi K.: Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp. 1–10, November 2008

    Google Scholar 

  4. Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. J. Grid Comput. 14(2), 359–378 (2016)

    Article  Google Scholar 

  5. Byun, E.K., Kee, Y.S., Kim, J.S., Maeng, S.: Cost optimized provisioning of elastic resources for application workflows. Future Gener. Comput. Syst. 27(8), 1011–1026 (2011)

    Article  Google Scholar 

  6. Casanova, H., Giersch, A., Legrand, A., Quinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014)

    Article  Google Scholar 

  7. Çatalyürek, Ü.V., Kaya, K., Uçar, B.: Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, pp. 45–54. ACM (2011)

    Google Scholar 

  8. Deelman, E., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)

    Article  Google Scholar 

  9. Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29(3), 682–692 (2013)

    Article  Google Scholar 

  10. Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)

    Article  Google Scholar 

  11. Liu, Z., et al.: A data placement strategy for scientific workflow in hybrid cloud. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 556–563. IEEE (2018)

    Google Scholar 

  12. Nazarenko, A., Sukhoroslov, O.: An experimental study of workflow scheduling algorithms for heterogeneous systems. In: Malyshkin, V. (ed.) PaCT 2017. LNCS, vol. 10421, pp. 327–341. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62932-2_32

    Chapter  Google Scholar 

  13. Pandey, S., Wu, L., Guru, S.M., Buyya, R.: A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: 2010 24th IEEE International Conference on Advanced Information Networking and Applications, pp. 400–407. IEEE (2010)

    Google Scholar 

  14. da Silva, R.F., Filgueira, R., Deelman, E., Pairo-Castineira, E., Overton, I.M., Atkinson, M.P.: Using simple PID controllers to prevent and mitigate faults in scientific workflows. In: WORKS@ SC, pp. 15–24 (2016)

    Google Scholar 

  15. Szabo, C., Sheng, Q.Z., Kroeger, T., Zhang, Y., Yu, J.: Science in the cloud: allocation and execution of data-intensive scientific workflows. J. Grid Comput. 12(2), 245–264 (2014)

    Article  Google Scholar 

  16. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids. Springer, London (2014). https://doi.org/10.1007/978-1-84628-757-2

    Book  Google Scholar 

  17. Teylo, L., de Paula, U., Frota, Y., de Oliveira, D., Drummond, L.M.: A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds. Future Gener. Comput. Syst. 76, 1–17 (2017)

    Article  Google Scholar 

  18. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  19. Velho, P., Schnorr, L.M., Casanova, H., Legrand, A.: On the validity of flow-level TCP network models for grid and cloud simulations. ACM Trans. Model. Comput. Simul. (TOMACS) 23(4), 23 (2013)

    Article  MathSciNet  Google Scholar 

  20. Wang, M., Zhang, J., Dong, F., Luo, J.: Data placement and task scheduling optimization for data intensive scientific workflow in multiple data centers environment. In: 2014 Second International Conference on Advanced Cloud and Big Data, pp. 77–84. IEEE (2014)

    Google Scholar 

  21. Wu, F., Wu, Q., Tan, Y.: Workflow scheduling in cloud: a survey. J. Supercomput. 71(9), 3373–3418 (2015)

    Article  Google Scholar 

  22. Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. In: Xhafa, F., Abraham, A. (eds.) Metaheuristics for Scheduling in Distributed Computing Environments. Studies in Computational Intelligence, vol. 146, pp. 173–214. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69277-5_7

  23. Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gener. Comput. Syst. 26(8), 1200–1214 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the Russian Science Foundation (project 16-11-10352).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg Sukhoroslov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sukhoroslov, O. (2019). An Experimental Study of Data Transfer Strategies for Execution of Scientific Workflows. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2019. Lecture Notes in Computer Science(), vol 11657. Springer, Cham. https://doi.org/10.1007/978-3-030-25636-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25636-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25635-7

  • Online ISBN: 978-3-030-25636-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics