Abstract
Cloud service providers are offering computing resources at a reasonable price as a pay-per-use model. Further, cloud service providers have also introduced different pricing models like spot, blockspot and spotfleet instances that are cost effective and user’s have to go through the bidding to balance the reliability and monetary costs. Henceforth, Scientific Workflows (SWf) that are used to model applications of high throughput, computation and complex large-scale data analysis are significantly adopting these computing resources. Nevertheless, spot instances are terminated when the market spot price exceeds the users bid price. Moreover, failures are inevitable in such a large distributed systems and often pose a challenge to design a fault-tolerant scheduling algorithm for SWf. This paper presents an efficient, low-cost and fault-tolerant scheduling algorithm and a bidding strategy to minimize the volatility and cost of resource provisioning for SWf. The proposed algorithm uses spot and blockspot instances as hybrid instances in comparison with on-demand instance to reduce the execution cost and fault-tolerant while meeting the SWf deadline. The results obtained reveal the promising potential of the proposed scheduling algorithm and are demonstrated through empirical simulation study that is robust under short deadlines with minimal makespan and cost.
Similar content being viewed by others
References
Almi’Ani K, Lee YC (2016) Partitioning-based workflow scheduling in clouds. In: 2016 IEEE 30th international conference on Advanced information networking and applications (AINA). IEEE, Piscataway, pp 645–652
Bala A, Chana I (2015) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42(3):980–989
Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience 41 (1):23–50
Calheiros RN, Buyya Rajkumar (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans Parallel Distrib Syst 25(7):1787–1796
Chen J, Yang Y (2007) Adaptive selection of necessary and sufficient checkpoints for dynamic verification of temporal constraints in grid workflow systems. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 2(2):6
Chirkin AM, Kovalchuk SV (2014) Towards better workflow execution time estimation. IERI Procedia 10:216–223
Darbha S, Agrawal DP (1994) A task duplication based optimal scheduling algorithm for variable execution time tasks. In: International conference on parallel processing, 1994. ICPP 1994, vol 2. IEEE, Piscataway, pp 52–56
Dejun J, Pierre G, Chi C-h (2010) Ec2 performance analysis for resource provisioning of service-oriented applications. In: Service-Oriented computing. ICSOC/ServiceWave 2009 workshops. Springer, Berlin, pp 197–207
Díaz JL, Entrialgo J, García M, García J, García DF (2017) Optimal allocation of virtual machines in multi-cloud environments with reserved and on-demand pricing. Futur Gener Comput Syst 71:129–144
Hwang S, Kesselman C (2003) Grid workflow: A flexible failure handling framework for the grid. In: 2003. Proceedings. 12th IEEE International Symposium on High Performance Distributed Computing. IEEE, Piscataway, pp 126–137
Jangjaimon I, Tzeng N-F (2015) Effective cost reduction for elastic clouds under spot instance pricing through adaptive checkpointing. IEEE Trans Comput 64 (2):396–409
Javadi B, Abawajy J, Buyya R (2012) Failure-aware resource provisioning for hybrid cloud infrastructure. J Parallel Distrib Comput 72(10):1318–1331
Lifka D, Foster I, Mehringer S, Parashar M, Redfern P, Stewart C, Tuecke S (2013) Xsede cloud survey report. Technical report, National Science Foundation, USA, Tech. Rep.
Juve G, Chervenak A, Deelman E, Bharathi S, Mehta G, Vahi K (2013) Characterizing and profiling scientific workflows. Futur Gener Comput Syst 29 (3):682–692
Li J, Humphrey M, Cheah Y-W, Ryu Y, Agarwal D, Jackson K, van Ingen C (2010) Fault tolerance and scaling in e-science cloud applications: Observations from the continuing development of modisazure. In: 2010 IEEE Sixth International Conference on e-Science (e-Science). IEEE, Piscataway, pp 246–253
Li X, Zhang L, Wu Y, Liu X, Zhu E, Yi H, Wang F, Zhang C, Yang Y (2017) A novel workflow-level data placement strategy for data-sharing scientific cloud workflows. IEEE Trans Serv Comput
Mehmi S, Verma HK, Sangal AL (2016) Comparative analysis of cloudlet completion time in time and space shared allocation policies during attack on smart grid cloud. Procedia Computer Science 94:435–440
Plankensteiner K, Prodan R, Fahringer T, Kertész A, Kacsuk P (2009) Fault detection, prevention and recovery in current grid workflow systems. In: Grid and services evolution, pp 1–13
Qu C, Calheiros RN, Buyya R (2016) A reliable and cost-efficient auto-scaling system for web applications using heterogeneous spot instances. J Netw Comput Appl 65:167–180
Ribas M, Furtado CG, de Souza JN, Barroso GC, Moura A, Lima AS, Sousa FRC (2015) A petri net-based decision-making framework for assessing cloud services adoption The use of spot instances for cost reduction. J Netw Comput Appl 57:102–118
Rodriguez MA, Buyya R (2014) Deadline based resource provisioningand scheduling algorithm for scientific workflows on clouds. IEEE Transactions on Cloud Computing 2(2):222–235
Samak T, Gunter D, Goode M, Deelman E, Juve G, Silva F, Vahi K (2012) Failure analysis of distributed scientific workflows executing in the cloud. In: Proceedings of the 8th international conference on network and service management, pp 46–54 international federation for information processing
Tang X, Li K, Liao G (2014) An effective reliability-driven technique of allocating tasks on heterogeneous cluster systems. Clust Comput 17(4):1413–1425
Vinay K, Dilip Kumar SM (2016) Auto-scaling for deadline constrained scientific workflows in cloud environment. In: India Conference (INDICON) 2016 IEEE Annual. IEEE, Piscataway, pp 1–6
Wan J, Zhang R, Gui X, Xu B (2016) Reactive pricing: an adaptive pricing policy for cloud providers to maximize profit. IEEE Trans Netw Serv Manag 13 (4):941–953
Zhu X, Ji W, Guo H, Zhu D, Yang LT, Liu L (2016) Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans Parallel Distrib Syst 27(12):3501–3517
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
K., V., Kumar, S.M.D., S., R. et al. Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds. Multimed Tools Appl 77, 10171–10193 (2018). https://doi.org/10.1007/s11042-017-5304-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5304-7