Skip to main content
Log in

CAMIRA: a consolidation-aware migration avoidance job scheduling strategy for virtualized parallel computing clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Server virtualization and consolidation techniques have been widely adapted in the modern large-scale computing systems to reduce energy consumption and increase resource utilization. In these systems, physical servers are turned on/off dynamically according to the workload variation, and the loading from computing tasks are balanced among active servers through virtual machine (VM) migration. However, the downside of this approach is the overhead of VM migration can cause several negative impacts to the system and users, including application performance degradation, service interruption, prolonged job execution time, extra network bandwidth consumption, and risk of failure, etc. The existing works in the literature attempt to reduce VM migration cost for persistent running web servers in a reactive manner. In contrast, we tackle the problem for parallel computing jobs of batch processing systems. Our approach can proactively avoid VM migrations with the co-design of between job scheduling and VM consolidation strategies, and minimize communication overhead of jobs by considering the traffic pattern between the tasks of a job. Our evaluations have used real parallel job workload trace and a synthetically generated workload to show that our approach can notably reduce the number of VM migrations by 35%–50% and communication cost by up to 25% compared to the traditional job scheduling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Ahmad B, McClean S, Charles D, Parr G. Energy optimisation in cloud servers using a static threshold VM consolidation technique (STVMC), pp 117–128

  2. Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: Proceedings of the nineteenth ACM symposium on operating systems principles, SOSP ’03, pp 164–177. ACM, New York, NY, USA

  3. Beloglazov A, Buyya R (2010) Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In: Proceedings of the 8th international workshop on middleware for grids, clouds and e-science, MGC ’10, pp 4:1–4:6. ACM, New York, NY, USA

  4. Bezerra P, Martins G, Gomes R, Cavalcante F, Costa A (2017) Evaluating live virtual machine migration overhead on client’s application perspective. In: 2017 International Conference on Information Networking (ICOIN), pp 503–508. https://doi.org/10.1109/ICOIN.2017.7899536

  5. Birke LR, Chen ESY (2012) Data centers in the wild: A large performance study. Technical Report. Z1204-002, IBM Res., Zürich, Switzerland

  6. Chen CC, Hasio YT, Lin CY, Lu S, Lu HT, Chou J (2017) Using deep learning to predict and optimize hadoop data analytic service in a cloud platform. In: 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pp 909–916. https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.153

  7. Chen G, He W, Liu J, Nath S, Rigas L, Xiao L, Zhao F (2008) Energy-aware server provisioning and load dispatching for connection-intensive internet services. In: ACM/USENIX NSDI, pp 337–350

  8. Chen M, Zhang H, Su YY, Wang X, Jiang G, Yoshihira K (2011) Effective VM sizing in virtualized data centers. In: Proceedings of the 12th IFIP/IEEE international symposium on integrated network management, pp 594–601

  9. Choi HW, Kwak H, Sohn A, Chung K (2008) Autonomous learning for efficient resource utilization of dynamic vm migration. In: Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS ’08, pp 185–194. ACM, New York, NY, USA

  10. Choudhury S, Gaur D, Krishnamurti R (2009) An approximation algorithm for max k-uncut with capacity constraints. In: International Joint Conference on Computational Sciences and Optimization, 2009. CSO 2009, vol. 2, pp 934–938

  11. Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A (2005) Live migration of virtual machines. In: ACM/USENIX NSDI, pp 273–286

  12. Feige U, Krauthgamer R (2002) A polylogarithmic approximation of the minimum bisection. SIAM J Comput 31(4):1090–1118

    Article  MathSciNet  Google Scholar 

  13. Feitelson DG, Tsafrir D, Krakov D (2014) Experience with using the parallel workloads archive. J Parallel Distrib Comput 74(10):2967–2982. https://doi.org/10.1016/j.jpdc.2014.06.013; https://www.sciencedirect.com/science/article/pii/S0743731514001154

  14. Ferdaus MH, Murshed M, Calheiros RN, Buyya R (2014) Virtual machine consolidation in cloud data centers using ACO metaheuristic. In: 20th International Conference Euro-Par 2014 Parallel Processing, Proceedings, pp 306–317

  15. Ferreto T, De Rose CAF, Heiss HU (2011) Maximum migration time guarantees in dynamic server consolidation for virtualized data centers. In: Proceedings of the 17th International Conference on Parallel Processing—Volume Part I, Euro-Par’11. Springer-Verlag, Berlin, Heidelberg, pp 443–454. http://dl.acm.org/citation.cfm?id=2033345.2033392

  16. Guan B, Wu Y, Ding L, Wang Y (2013) Civsched: communication-aware inter-vm scheduling in virtual machine monitor based on the process. In: 2013 13th IEEE/ACM International symposium on cluster, cloud, and grid computing, pp 597–604. https://doi.org/10.1109/CCGrid.2013.105

  17. Hao J, Orlin JB (1992) A faster algorithm for finding the minimum cut in a graph. In: Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms, SODA ’92. Society for Industrial and Applied Mathematics, pp 165–174

  18. Hermenier F, Lorca X, Menaud JM, Muller G, Lawall J (2009) Entropy: a consolidation manager for clusters. In: Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’09. ACM, New York, NY, USA, pp 41–50

  19. Hines MR, Deshpande U, Gopalan K (2009) Post-copy live migration of virtual machines. SIGOPS Oper Syst Rev 43(3):14–26

    Article  Google Scholar 

  20. Hossain M, Huang JC, Lee HHS (2012) Migration energy-aware workload consolidation in enterprise clouds. In: International Conference on Cloud Computing, pp 405–410

  21. Huang Q, Gao F, Wang R, Qi Z (2011) Power consumption of virtual machine live migration in clouds. In: 2011 Third International Conference on Communications and Mobile Computing, pp 122–125

  22. Huang Q, Su S, Xu S, Li J, Xu P, Shuang K (2013) Migration-based elastic consolidation scheduling in cloud data center. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops, pp 93–97. https://doi.org/10.1109/ICDCSW.2013.27

  23. IBM (2012) The potsdam institute for climate impact research (pik) ibm dataplex cluster log. http://www.cs.huji.ac.il/labs/parallel/workload/l_pik_iplex/index.html

  24. Jin H, Deng L, Wu S, Shi X, Pan X (2009) Live virtual machine migration with adaptive, memory compression. In: CLUSTER ’09. IEEE International Conference on Cluster Computing and Workshops, 2009, pp 1–10

  25. Jung G, Joshi KR, Hiltunen MA, Schlichting RD, Pu C (2009) A cost-sensitive adaptation engine for server consolidation of multitier applications. In: Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Middleware ’09. Springer-Verlag New York, Inc., New York, NY, USA, pp 9:1–9:20

  26. Khoshkholghi MA, Derahman MN, Abdullah A, Subramaniam S, Othman M (2017) Energy-efficient algorithms for dynamic virtual machine consolidation in cloud data centers. IEEE Access 5:10709–10722. https://doi.org/10.1109/ACCESS.2017.2711043

    Article  Google Scholar 

  27. Kivity A, Kamay Y, Laor D, Lublin U, Liguori A (2007) Kvm: the linux virtual machine monitor. In: Proceedings of the 2007 Ottawa Linux Symposium (OLS’-07)

  28. Kochut A, Beaty K (2007) On strategies for dynamic resource management in virtualized server environments. In: 2007 15th International symposium on modeling, analysis, and simulation of computer and telecommunication systems, pp 193–200

  29. Labriji I, Meneghello F, Cecchinato D, Sesia S, Perraud E, Strinati EC, Rossi M (2021) Mobility aware and dynamic migration of mec services for the internet of vehicles. IEEE Trans Netw Serv Manage 18(1):570–584. https://doi.org/10.1109/TNSM.2021.3052808

    Article  Google Scholar 

  30. Lee BD (2003) Schopf: Run-time prediction of parallel applications on shared environments. In: 2003 Proceedings IEEE International Conference on Cluster Computing, pp 487–491. https://doi.org/10.1109/CLUSTR.2003.1253355

  31. Lee CH, Lee D, Kim M (1992) Optimal task assignment in linear array networks. IEEE Trans Comput 41(7):877–880

    Article  MathSciNet  Google Scholar 

  32. Lim MY, Rawson F, Bletsch T, Freeh VW (2009) PADD: power aware domain distribution. In: International Conference on Distributed Computing Systems, pp 239–247

  33. Lin M, Wierman A, Andrew LLH, Thereska E (2013) Dynamic right-sizing for power-proportional data centers. IEEE/ACM Trans Netw 21(5):1378–1391

    Article  Google Scholar 

  34. Liu H, Jin H, Liao X, Hu L, Yu C (2009) Live migration of virtual machine based on full system trace and replay. In: Proceedings of the 18th ACM international symposium on high performance distributed computing, HPDC ’09. ACM, New York, NY, USA, pp 101–110

  35. Liu H, Xu CZ, Jin H, Gong J, Liao X (2011) Performance and energy modeling for live migration of virtual machines. In: IEEE International Conference on High-Performance Parallel and Distributed Computing, pp 171–182

  36. Lublin U, Feitelson DG (2003) The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J Parallel Distrib Comput 63(11):1105–1122

    Article  Google Scholar 

  37. Lucarelli G, Mendonca F, Trystram D (2017) A new on-line method for scheduling independent tasks. In: 2017 17th IEEE/ACM International symposium on cluster, cloud and grid computing (CCGRID), pp 140–149. https://doi.org/10.1109/CCGRID.2017.82

  38. Mehrotra P, Djomehri J, Heistand S, Hood R, Jin H, Lazanoff A, Saini S, Biswas R (2012) Performance evaluation of amazon ec2 for nasa hpc applications. In: Proceedings of the 3rd workshop on scientific cloud computing date, ScienceCloud ’12. ACM, New York, NY, USA, pp 41–50

  39. Meng X, Pappas V, Zhang L (2010) Improving the scalability of data center networks with traffic-aware virtual machine placement. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10. IEEE Press, Piscataway, NJ, USA, pp 1154–1162

  40. Mustafa S, Elghandour I, Ismail MA (2018) A machine learning approach for predicting execution time of spark jobs. Alex Eng J 57(4):3767–3778. https://doi.org/10.1016/j.aej.2018.03.006. https://www.sciencedirect.com/science/article

  41. Nelson M, Lim BH, Hutchins G (2005) Fast transparent migration for virtual machines. In: USENIX Annual Technical Conference, pp 391–394

  42. Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the third ACM symposium on cloud computing, SoCC ’12

  43. Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Towards understanding heterogeneous clouds at scale: google trace analysis. Intel science and technology center for cloud computing, Carnegie Mellon University, Technical report

  44. Shim Y (2016) Performance evaluation of static vm consolidation algorithms for cloud-based data centers considering inter-vm performance interference

  45. Shimada K, Taniguchi I, Tomiyama H (2019) Communication-aware scheduling for malleable tasks. In: 2019 International Conference on Platform Technology and Service (PlatCon), pp 1–6. https://doi.org/10.1109/PlatCon.2019.8669429

  46. Singh P, Gupta P, Jyoti K (2019) Energy aware vm consolidation using dynamic threshold in cloud computing. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp 1098–1102. https://doi.org/10.1109/ICCS45141.2019.9065427

  47. Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016. https://doi.org/10.1016/j.jpdc.2004.06.008. https://www.sciencedirect.com/science/article

  48. Song G, Meng Z, Huet F, Magoules F, Yu L, Lin X (2013) A hadoop mapreduce performance prediction method. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp 820–825. https://doi.org/10.1109/HPCC.and.EUC.2013.118

  49. Strunk A, Dargie W (2013) Does live migration of virtual machines cost energy? In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp 514–521

  50. Tarighi M, Motamedi SA, Sharifian S (2010) A new model for virtual machine migration in virtualized cluster server based on fuzzy decision making. CoRR

  51. Toosi AN, Calheiros RN, Thulasiram RK, Buyya R (2011) Resource provisioning policies to increase iaas provider’s profit in a federated cloud environment. In: IEEE International Conference on High Performance Computing and Communications, pp 279–287

  52. Tran NM, Wolters L (2011) Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact. In: IEEE International Conference on High-Performance Parallel and Distributed Computing, pp 111–122

  53. Travostino F, Daspit P, Gommans L, Jog C, de Laat C, Mambretti J, Monga I, van Oudenaarde B, Raghunath S, Wang PY (2006) Seamless live migration of virtual machines over the man/wan. Future Gener Comput Syst 22(8):901–907

    Article  Google Scholar 

  54. Tsakalozos K, Kllapi H, Sitaridi E, Roussopoulos M, Paparas D, Delis A (2011) Flexible use of cloud resources through profit maximization and price discrimination. In: 2011 IEEE 27th International Conference on Data Engineering, pp 75–86. https://doi.org/10.1109/ICDE.2011.5767932

  55. Tziritas N, Xu CZ, Loukopoulos T, Khan SU, Yu Z (2013) Application-aware workload consolidation to minimize both energy consumption and network load in cloud environments. In: Proceedings of the 2013 42Nd International Conference on Parallel Processing, ICPP ’13, pp 449–457

  56. Utrera G, Tabik S, Corbalan J, Labarta J (2012) A job scheduling approach for multi-core clusters based on virtual malleability. In: Euro-Par, pp 191–203

  57. Verma A, Ahuja P, Neogi (2008) A pmapper: power and migration cost aware application placement in virtualized systems. In: Proceedings of the USENIX International Conference on Middleware, pp 243–264

  58. Verma A, Dasgupta G, Nayak TK, De P, Kothari R (2009) Server workload analysis for power minimization using consolidation. In: USENIX Annual Technical Conference, p 28

  59. Verma A, Kumar G, Koller R (2010) The cost of reconfiguration in a cloud. In: Proceedings of the 11th International Middleware Conference Industrial Track, Middleware Industrial Track ’10. ACM, New York, NY, USA, pp 11–16

  60. Vmware: vmware inc. http://www.vmware.com

  61. Voorsluys W, Broberg J, Venugopal S, Buyya R (2009) Cost of virtual machine live migration in clouds: a performance evaluation. In: International Conference on Cloud Computing, pp 254–265

  62. Wang J, Han D, Wang R (2018) A new rule-based power-aware job scheduler for supercomputers. J Supercomput 74(6):2508–2527. https://doi.org/10.1007/s11227-018-2281-1

    Article  Google Scholar 

  63. Wood T, Shenoy P, Venkataramani A, Yousif M (2009) Sandpiper: black-box and gray-box resource management for virtual machines. Comput Netw 53(17):2923–2938

    Article  Google Scholar 

  64. Xiao H, Hu Z, Li K (2019) Multi-objective vm consolidation based on thresholds and ant colony system in cloud computing. IEEE Access 7:53441–53453. https://doi.org/10.1109/ACCESS.2019.2912722

    Article  Google Scholar 

  65. Ye Y, Zhang J (2003) Approximation of dense-n/2-subgraph and the complement of min-bisection. J Global Optim 25(1):55–73

    Article  MathSciNet  Google Scholar 

  66. Yun HY, Jin SH, Kim KS (2021) Workload stability-aware virtual machine consolidation using adaptive harmony search in cloud datacenters. Appl Sci 11(2). https://www.mdpi.com/2076-3417/11/2/798

  67. Zhang Q, Zhani MF, Zhang S, Zhu Q, Boutaba R, Hellerstein JL (2012) Dynamic energy-aware capacity provisioning for cloud computing environments. In: IEEE International Conference on Autonomic Computing, pp 145–154

  68. Zhao M, Figueiredo RJ (2007) Experimental study of virtual machine migration in support of reservation of cluster resources. In: Proceedings of the 2nd international workshop on virtualization technology in distributed computing, VTDC ’07. ACM, New York, NY, USA, pp 5:1–5:8

  69. Zhu W, Chen J (2010) The complement of hypergraph capacitated min-k-cut problem. In: 2010 Third international symposium on parallel architectures, algorithms and programming (PAAP), pp 395–397

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerry Chou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Padhy, S., Tsai, MH., Sharma, S. et al. CAMIRA: a consolidation-aware migration avoidance job scheduling strategy for virtualized parallel computing clusters. J Supercomput 78, 11921–11948 (2022). https://doi.org/10.1007/s11227-022-04337-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04337-2

Keywords

Navigation