Skip to main content
Log in

Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

How to improve resource utilization of cloud data centers (CDCs) and ensure users’ quality of service (QoS) through efficient virtual machine (VM) scheduling is an urgent problem. Especially when service reliability is taken into consideration, the problem becomes more challenging. However, existing related researches mostly ignore the influence of reliability factors, such as failures and recoveries of computing nodes (CNs), which cannot reflect the realistic situations of real-life CDCs. Therefore, this paper investigates the problem of fault tolerance-aware VM scheduling and formulates it as a multi-objective optimization model with multiple QoS constraints. The proposed model tries to minimize users’ total expenditure and, at the same time, maximize the successful execution rate of their businesses. To solve the proposed optimization model, a greedy-based best fit decreasing (GBFD) algorithm is then developed. The GBFD algorithm adopts a cost efficiency factor whose definition is according to the characteristics of CNs, to select a suitable CN for each VM request. Finally, extensive experiments are conducted to verify the feasibility of the proposed models and algorithm based on both the real-world CDC cluster data sets and the simulation ones. The results show that, first, as expected, fault tolerance significantly influences the performance criteria of VM scheduling and second, in most cases, the developed algorithm can decrease users’ expenditure, increase success rate for executing their business and improve their overall satisfactions. Specifically, under real-world CDC cluster scenario, GBFD algorithm can increase the overall satisfaction of all cloud users by 38.3%, 20.9% and 14.6%, respectively, compared with the other three ones. Thus, the developed algorithm can perform better under fault tolerance-aware cloud environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Armbrust AM, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58

    Article  Google Scholar 

  2. Liu L, Qiu Z (2016) A survey on virtual machine scheduling in cloud computing. In: Proceedings of the 2nd International Conference on Computer and Communications, IEEE, pp 2717–2721

  3. Mirobi GJ, Arockiam L (2021) DAVmS: Distance aware virtual machine scheduling approach for reducing the response time in cloud computing. J Supercomput 77(7):6664–6675

    Article  Google Scholar 

  4. Xu H, Liu Y, Wei W et al (2018) Incentive-aware virtual machine scheduling in cloud computing. J Supercomput 74(7):3016–3038

    Article  Google Scholar 

  5. Madni SHH, Latiff MSA, Coulibaly Y (2016) Resource scheduling for infrastructure as a service (IaaS) in cloud computing: challenges and opportunities. J Netw Comput Appl 68(1):173–200

    Article  Google Scholar 

  6. Wan B, Dang J, Li Z et al (2020) Modeling analysis and cost-performance ratio optimization of virtual machine scheduling in cloud computing. IEEE Trans Parallel Distrib Syst 31(7):1518–1532

    Article  Google Scholar 

  7. Rathinaraja J, Ananthanarayana VS, Paul A (2019) Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment. J Supercomput 75:7520–7549

    Article  Google Scholar 

  8. Liu X, Cheng B, Yue Y et al (2019) Traffic-aware and reliability-guaranteed virtual machine placement optimization in cloud datacenters. In: Proceeding of 12th International Conference on Cloud Computing (CLOUD), Springer, pp 91–98

  9. Fernando D, Terner J, Gopalan K et al (2019) Live migration ate my VM: recovering a virtual machine after failure of post-copy live migration. In: Proceedings of 38th Conference on Computer Communications (INFOCOM), IEEE, pp 343–351

  10. Xu L, Lv M, Li Z et al (2020) PDL: a data layout towards fast failure recovery for erasure-coded distributed storage systems. In: Proceedings of 39th Conference on Computer Communications (INFOCOM), IEEE, pp 736–745

  11. Luo L, Meng S, Qiu X et al (2019) Improving failure tolerance in large-scale cloud computing systems. IEEE Trans Reliab 68(2):620–632

    Article  Google Scholar 

  12. Xu H, Yang B, Qi W et al (2016) A multi-objective optimization approach to workflow scheduling in clouds considering fault recovery. KSII Trans Internet Inf 10(3):976–995

    Google Scholar 

  13. Liu X, Cheng B, Yue Y et al (2019) Enhancing availability of traffic-aware virtual cluster allocation in cloud datacenters. In: Proceedings of the International Conference on Services Computing (SCC), IEEE pp 220–227

  14. Meng L, Sun Y (2018) Context sensitive efficient automatic resource scheduling for cloud applications. In: Proceedings of the 11th International Conference on Cloud Computing, Springer, pp 391–397

  15. Alibaba Cluster Workload Traces, https://github.com/alibaba/clusterdata, 2017

  16. Wang D, Dai W, Zhang C et al (2017) TPS: an efficient VM scheduling algorithm for HPC applications in cloud. In: Proceedings of International Conference on Green, Pervasive, and Cloud Compunting, Springer, pp 152–164

  17. Xu H, Cheng P, Liu Y (2019) A fault tolerance aware virtual machine scheduling algorithm in cloud computing. Int J Perform Eng 15(11):2990–2997

    Article  Google Scholar 

  18. Wei L, Foh CH, He B et al (2018) Towards efficient resource allocation for heterogeneous workloads in IaaS clouds. IEEE Trans Cloud Comput 6(1):264–275

    Article  Google Scholar 

  19. Yu L, Chen L, Cai Z et al (2020) Stochastic load balancing for virtual resource management in datacenters. IEEE Trans Cloud Comput 8(2):459–472

    Article  Google Scholar 

  20. Belgacem A, Beghdad-Bey K, Mahmoudi S (2021) New virtual machine placement approach based on the micro genetic algorithm in cloud computing. In: Proceedings of 8th International Conference on Future Internet of Things and Cloud, IEEE, pp 66–72

  21. Xu H, Liu Y, Wei W et al (2019) Migration cost and energy-aware virtual machine consolidation under cloud environments considering remaining runtime. Int J Parallel Prog 47(3):481–501

    Article  Google Scholar 

  22. Mishra SK, Puthal D, Sahoo B et al (2018) An adaptive task allocation technique for green cloud computing. J Supercomput 74(1):370–385

    Article  Google Scholar 

  23. Liu X, Zhan Z, Deng J et al (2018) An energy efficient ant colony system for virtual machine placement in cloud computing. IEEE Trans Evol Comput 22(1):113–128

    Article  Google Scholar 

  24. Padhy S, Chou J (2021) MIRAGE: a consolidation aware migration avoidance genetic job scheduling algorithm for virtualized data centers. J Parallel Distrib Comput 154:106–118

    Article  Google Scholar 

  25. Tong Z, Deng X, Chen H et al (2021) DDMTS: a novel dynamic load balancing scheduling scheme under SLA constraints in cloud computing. J Parallel Distrib Comput 149:138–148

    Article  Google Scholar 

  26. Guo M, Guan Q, Chen W et al (2022) Delay-optimal scheduling of VMs in a queuing cloud computing system with heterogeneous workloads. IEEE Trans Serv Comput 15(1):110–123

    Article  Google Scholar 

  27. Shen D, Luo J, Dong F et al (2019) VirtCo: joint coflow scheduling and virtual machine placement in cloud data centers. Tsinghua Sci Technol 24(5):630–644

    Article  Google Scholar 

  28. Meo M, Renga D, Umar Z (2021) Advanced sleep modes to comply with delay constraints in energy efficient 5G networks. In: Proceedings of the 93rd Vehicular Technology Conference, IEEE, pp1–7

  29. Yu Q, Wan H, Zhao X et al (2020) Online scheduling for dynamic VM migration in multicast time-sensitive networks. IEEE Trans Industr Inf 16(6):3778–3788

    Article  Google Scholar 

  30. Zhang R, Wu K, Li M et al (2016) Online resource scheduling under concave pricing for cloud computing. IEEE Trans Parallel Distrib Syst 27(4):1131–1145

    Article  Google Scholar 

  31. Bugingo E, Zhang D, Zheng W (2020) Constrained energy-cost-aware workflow scheduling for cloud environment. In: Proceedings of 13th International Conference on Cloud Computing, IEEE, pp 40–42

  32. Ran Y, Yang J, Zhang S et al (2017) Dynamic IaaS computing resource provisioning strategy with QoS constraint. IEEE Trans Serv Comput 10(2):190–202

    Article  Google Scholar 

  33. Sotiriadis S, Bessis N, Buyya R (2018) Self managed virtual machine scheduling in cloud systems. Inf Sci 433:381–400

    Article  Google Scholar 

  34. Zheng B, Pan L, Liu S (2021) An online cost optimization algorithm for IaaS instance releasing in cloud environments. In: Proceedings of 11th Annual Computing and Communication Workshop and Conference, IEEE pp 463–469

  35. Sun P, Dai Y, Qiu X (2017) Optimal scheduling and management on correlating reliability, performance, and energy consumption for multi-agent cloud systems. IEEE Trans Reliab 66(2):547–558

    Article  Google Scholar 

  36. Secinti C, Ovatman T (2018) Fault tolerant VM consolidation for energy-efficient cloud environments. In: Proceedings of the 11th International Conference on Cloud Computing, Springer, pp 323–333

  37. Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: issues and challenges. J Grid Comput 14(2):217–264

    Article  Google Scholar 

  38. Kurdi H, Al-Anazi A, Campbell C et al (2015) A combinatorial optimization algorithm for multiple cloud service composition. Comput Electr Eng 42:107–113

    Article  Google Scholar 

  39. Calheiros R, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41:23–50

    Article  Google Scholar 

  40. Lu C, Ye K, Xu G et al (2017) Imbalance in the cloud: an analysis on Alibaba cluster trace. In: Proceeding of the 2017 IEEE International Conference on Big Data, IEEE, 2017, pp 2802–2810

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (No. 62076215), the Natural Science Foundation of the Jiangsu Higher Education Institutions (No. 21KJD520006), the Future Network Scientific Research Fund Project (No. FNSRFP-2021-YB-46), the Funding for School-Level Research Projects of Yancheng Institute of Technology (No. xjr2021047 and No. xjr2022028) and the Project of Natural science project of Zhengzhou Science and Technology Bureau (21ZZXTCX20).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heyang Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, H., Xu, S., Wei, W. et al. Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers. J Supercomput 79, 2603–2625 (2023). https://doi.org/10.1007/s11227-022-04760-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04760-5

Keywords

Navigation