Abstract
Scientific workflow applications have a large amount of tasks and data sets to be processed in a systematic manner. These applications benefit from cloud computing platform that offer access to virtually limitless resources provisioned elastically and on demand. Running data-intensive scientific workflow on geographically distributed data centres faces massive amount of data transfer. That affects the whole execution time and monitory cost of scientific workflows. The existing efforts on scheduling workflow concentrate on decreasing make span and budget; little concern has been paid to contemplate tasks and data sets dependency. In this paper, we introduced workflow scheduling technique to overcome data transfer and execute workflow tasks within deadline and budget constraints. The proposed techniques consist of initial data placement stage, which clusters and distributes datasets based on their dependence and replication-based partial critical path (R-PCP) technique which schedules tasks with data locality and dynamically maintains dependency matrix for the placement of generated data sets. To reduce run time datasets movement, we use interdata centre tasks replication and data sets replication to make sure data sets availability. Simulation results with four workflow applications illustrate that our strategy efficiently reduces data movement and executes all chosen workflows within user specified budget and deadline. Results reveal that R-PCP has 44.93% and 31.37% less data movement compared to random and adaptive data-aware scheduling (ADAS) techniques, respectively. R-PCP has 26.48% less energy consumption compared with ADAS technique.
Similar content being viewed by others
References
Deelman E, Blythe J, Gil Y, Kesselman C, Mehta G, Patil S, Su M, Vahi K, Livny M (2004) Pegasus: mapping scientific workflows onto the grid. Grid Comput, pp 11–20
Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics J 20(17):3045–3054
Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp Work Grid Syst 18:1039–1065
Buyya R, Buyya R, Yeo CS, Yeo CS, Venugopal S, Venugopal S, Broberg J, Broberg J, Brandic I, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):17
Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: 2nd IEEE International Conference on Cloud Computing Technology Science, pp 159–168
Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: 2008 8th IEEE International Symposium Cluster Computing Grid, pp 687–692
Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Futur Gener Comput Syst 26(8):1200–1214
Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: ICDCS ’04 24th International Conference Distributed Computer Systems, vol 0, pp 342–349
Casas I, Taheri J, Ranjan R, Wang L, Zomaya AY (2016) A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Future Gener Comput Syst 74:168–178
Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS Operating Systems Review 37(5), p 43
Shvachko K, Hairong K, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies(MSST). In: 2010 IEEE 26th Symposium on, 2010, pp 1–10
Lee YC, Han H, Zomaya AY, Yousif M (2015) Resource-efficient workflow scheduling in clouds. Knowl Based Syst 80:153–162
Wu F, Wu Q, Tan Y, Li R, Wang W (2016) PCP-B2: partial critical path budget balanced scheduling algorithms for scientific work flow applications. Future Gener Comput Syst 60:22–34
Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. In: IEEE Transactions on Parallel and Distributed Systems, vol 25, no 7, July 2014
Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T (2012) Dynamic QoS-aware data replication in grid environments based on data ‘importance.’ Futur Gener Comput Syst 28(3):544–553
Vairavanathan E, Al-Kiswany S, Costa LB, Zhang Z, Katz DS, Wilde M, Ripeanu M (2012) A workflow-aware storage system: an opportunity study. In: Proceedings of 12th IEEE/ACM International Symposium Cluster Cloud and Grid Computing CCGrid 2012, pp 326–334
Abrishami S, Naghibzadeh M, Epema DHJ (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds. Futur Gener Comput Syst 29(1):158–169
Rezaeian A, Naghibzadeh M, Epema DHJ (2019) Fair multiple-workflow scheduling with different quality-of-service goals. J Supercomput 75(2):746–769
Chen H, Zhu J, Zhang Z et al (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Super Comput 73:4906–4921
Zeng L, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151
Pandey S, Wu L, Guru SM, Buyya R (2010) A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: Proceedings of International Conference on Advanced Information Networking and Applications AINA, pp 400–407
Lee YC, Zomaya AY (2010) Rescheduling for reliable job completion with the support of clouds. Futur Gener Comput Syst 26(8):1192–1199
Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48
Yu J, Buyya R (2004) A novel architecture for realizing grid workflow using tuple spaces. In: GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on GridComputing. Washington, DC, USA: IEEE, 2004, pp 119–128
Mufti WA (2019) ClientNet cluster an alternative of transferring big data files by use of mobile code. In: Xia Y, Zhang LJ (eds) Services–SERVICES 2019. Lecture notes in computer science, vol 11517. Springer, Cham. https://doi.org/10.1007/978-3-030-23381-5_8
Abrishami S, Naghibzadeh M, Epema DHJ (2012) “Cost-driven scheduling of Grid workflows using partial critical paths. IEEE Trans Parallel Distrib Syst 23(8):1400–1414
Chen W, Deelman E (2012) WorkflowSim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science, e-Science 2012
Palankar MR, Iamnitchi A, Ripeanu M, Garfinkel S (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC’08, ACM, New York, NY, USA, 2008, pp 55–64
Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: The 3rd Workshop on Workflows in Support of Large Scale Science, (WORKS 08)
Topcuoglu H, Hariri S, Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Chen F, Schneider J-G, Yang Y, Grundy J, He Q (2012) An energy consumption model and analysis tool for cloud computing environments. In: GREENS 2012, Zuricg, Switzerland, pp 45–50
Mustafa S, Nazir B, Hayat A, Madani SA (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Elect Eng 47:186–203
Ahmad Z, Nazir B, Umer A (2021) A fault-tolerant workflow management system with Quality of Service-aware scheduling for scientific workflows in cloud computing. Int J Commun Syst 34(1):e4649
Qureshi K, Khan FG, Manuel P, Nazir B (2011) A hybrid fault tolerance technique in grid computing system. J Supercomput 56(1):106–128
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ulabedin, Z., Nazir, B. Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform. J Supercomput 77, 10743–10772 (2021). https://doi.org/10.1007/s11227-020-03541-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03541-2