Skip to main content
Log in

Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Cloud service providers are offering computing resources at a reasonable price as a pay-per-use model. Further, cloud service providers have also introduced different pricing models like spot, blockspot and spotfleet instances that are cost effective and user’s have to go through the bidding to balance the reliability and monetary costs. Henceforth, Scientific Workflows (SWf) that are used to model applications of high throughput, computation and complex large-scale data analysis are significantly adopting these computing resources. Nevertheless, spot instances are terminated when the market spot price exceeds the users bid price. Moreover, failures are inevitable in such a large distributed systems and often pose a challenge to design a fault-tolerant scheduling algorithm for SWf. This paper presents an efficient, low-cost and fault-tolerant scheduling algorithm and a bidding strategy to minimize the volatility and cost of resource provisioning for SWf. The proposed algorithm uses spot and blockspot instances as hybrid instances in comparison with on-demand instance to reduce the execution cost and fault-tolerant while meeting the SWf deadline. The results obtained reveal the promising potential of the proposed scheduling algorithm and are demonstrated through empirical simulation study that is robust under short deadlines with minimal makespan and cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://aws.amazon.com

  2. https://aws.amazon.com/ec2/spot/

  3. https://aws.amazon.com/ec2/spot/pricing/

  4. http://aws.amazon.com/ec2/purchasing-options/spot-instances/

References

  1. Almi’Ani K, Lee YC (2016) Partitioning-based workflow scheduling in clouds. In: 2016 IEEE 30th international conference on Advanced information networking and applications (AINA). IEEE, Piscataway, pp 645–652

  2. Bala A, Chana I (2015) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42(3):980–989

    Article  Google Scholar 

  3. Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience 41 (1):23–50

    Google Scholar 

  4. Calheiros RN, Buyya Rajkumar (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans Parallel Distrib Syst 25(7):1787–1796

    Article  Google Scholar 

  5. Chen J, Yang Y (2007) Adaptive selection of necessary and sufficient checkpoints for dynamic verification of temporal constraints in grid workflow systems. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 2(2):6

    Article  Google Scholar 

  6. Chirkin AM, Kovalchuk SV (2014) Towards better workflow execution time estimation. IERI Procedia 10:216–223

    Article  Google Scholar 

  7. Darbha S, Agrawal DP (1994) A task duplication based optimal scheduling algorithm for variable execution time tasks. In: International conference on parallel processing, 1994. ICPP 1994, vol 2. IEEE, Piscataway, pp 52–56

  8. Dejun J, Pierre G, Chi C-h (2010) Ec2 performance analysis for resource provisioning of service-oriented applications. In: Service-Oriented computing. ICSOC/ServiceWave 2009 workshops. Springer, Berlin, pp 197–207

  9. Díaz JL, Entrialgo J, García M, García J, García DF (2017) Optimal allocation of virtual machines in multi-cloud environments with reserved and on-demand pricing. Futur Gener Comput Syst 71:129–144

    Article  Google Scholar 

  10. Hwang S, Kesselman C (2003) Grid workflow: A flexible failure handling framework for the grid. In: 2003. Proceedings. 12th IEEE International Symposium on High Performance Distributed Computing. IEEE, Piscataway, pp 126–137

  11. Jangjaimon I, Tzeng N-F (2015) Effective cost reduction for elastic clouds under spot instance pricing through adaptive checkpointing. IEEE Trans Comput 64 (2):396–409

    Article  MathSciNet  MATH  Google Scholar 

  12. Javadi B, Abawajy J, Buyya R (2012) Failure-aware resource provisioning for hybrid cloud infrastructure. J Parallel Distrib Comput 72(10):1318–1331

    Article  Google Scholar 

  13. Lifka D, Foster I, Mehringer S, Parashar M, Redfern P, Stewart C, Tuecke S (2013) Xsede cloud survey report. Technical report, National Science Foundation, USA, Tech. Rep.

    Google Scholar 

  14. Juve G, Chervenak A, Deelman E, Bharathi S, Mehta G, Vahi K (2013) Characterizing and profiling scientific workflows. Futur Gener Comput Syst 29 (3):682–692

    Article  Google Scholar 

  15. Li J, Humphrey M, Cheah Y-W, Ryu Y, Agarwal D, Jackson K, van Ingen C (2010) Fault tolerance and scaling in e-science cloud applications: Observations from the continuing development of modisazure. In: 2010 IEEE Sixth International Conference on e-Science (e-Science). IEEE, Piscataway, pp 246–253

  16. Li X, Zhang L, Wu Y, Liu X, Zhu E, Yi H, Wang F, Zhang C, Yang Y (2017) A novel workflow-level data placement strategy for data-sharing scientific cloud workflows. IEEE Trans Serv Comput

  17. Mehmi S, Verma HK, Sangal AL (2016) Comparative analysis of cloudlet completion time in time and space shared allocation policies during attack on smart grid cloud. Procedia Computer Science 94:435–440

    Article  Google Scholar 

  18. Plankensteiner K, Prodan R, Fahringer T, Kertész A, Kacsuk P (2009) Fault detection, prevention and recovery in current grid workflow systems. In: Grid and services evolution, pp 1–13

  19. Qu C, Calheiros RN, Buyya R (2016) A reliable and cost-efficient auto-scaling system for web applications using heterogeneous spot instances. J Netw Comput Appl 65:167–180

    Article  Google Scholar 

  20. Ribas M, Furtado CG, de Souza JN, Barroso GC, Moura A, Lima AS, Sousa FRC (2015) A petri net-based decision-making framework for assessing cloud services adoption The use of spot instances for cost reduction. J Netw Comput Appl 57:102–118

    Article  Google Scholar 

  21. Rodriguez MA, Buyya R (2014) Deadline based resource provisioningand scheduling algorithm for scientific workflows on clouds. IEEE Transactions on Cloud Computing 2(2):222–235

    Article  Google Scholar 

  22. Samak T, Gunter D, Goode M, Deelman E, Juve G, Silva F, Vahi K (2012) Failure analysis of distributed scientific workflows executing in the cloud. In: Proceedings of the 8th international conference on network and service management, pp 46–54 international federation for information processing

  23. Tang X, Li K, Liao G (2014) An effective reliability-driven technique of allocating tasks on heterogeneous cluster systems. Clust Comput 17(4):1413–1425

    Article  Google Scholar 

  24. Vinay K, Dilip Kumar SM (2016) Auto-scaling for deadline constrained scientific workflows in cloud environment. In: India Conference (INDICON) 2016 IEEE Annual. IEEE, Piscataway, pp 1–6

  25. Wan J, Zhang R, Gui X, Xu B (2016) Reactive pricing: an adaptive pricing policy for cloud providers to maximize profit. IEEE Trans Netw Serv Manag 13 (4):941–953

    Article  Google Scholar 

  26. Zhu X, Ji W, Guo H, Zhu D, Yang LT, Liu L (2016) Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans Parallel Distrib Syst 27(12):3501–3517

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vinay K..

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

K., V., Kumar, S.M.D., S., R. et al. Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds. Multimed Tools Appl 77, 10171–10193 (2018). https://doi.org/10.1007/s11042-017-5304-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5304-7

Keywords

Navigation