CAMIRA: a consolidation-aware migration avoidance job scheduling strategy for virtualized parallel computing clusters

Padhy, Satyajit; Tsai, Ming-Han; Sharma, Shalini; Chou, Jerry

doi:10.1007/s11227-022-04337-2

CAMIRA: a consolidation-aware migration avoidance job scheduling strategy for virtualized parallel computing clusters

Published: 21 February 2022

Volume 78, pages 11921–11948, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Satyajit Padhy¹,
Ming-Han Tsai²,
Shalini Sharma² &
…
Jerry Chou²

165 Accesses
Explore all metrics

Abstract

Server virtualization and consolidation techniques have been widely adapted in the modern large-scale computing systems to reduce energy consumption and increase resource utilization. In these systems, physical servers are turned on/off dynamically according to the workload variation, and the loading from computing tasks are balanced among active servers through virtual machine (VM) migration. However, the downside of this approach is the overhead of VM migration can cause several negative impacts to the system and users, including application performance degradation, service interruption, prolonged job execution time, extra network bandwidth consumption, and risk of failure, etc. The existing works in the literature attempt to reduce VM migration cost for persistent running web servers in a reactive manner. In contrast, we tackle the problem for parallel computing jobs of batch processing systems. Our approach can proactively avoid VM migrations with the co-design of between job scheduling and VM consolidation strategies, and minimize communication overhead of jobs by considering the traffic pattern between the tasks of a job. Our evaluations have used real parallel job workload trace and a synthetically generated workload to show that our approach can notably reduce the number of VM migrations by 35%–50% and communication cost by up to 25% compared to the traditional job scheduling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parallel migration scheme for fast virtual machine relocation on a cloud cluster

Article 11 November 2015

Improving performance by network-aware virtual machine clustering and consolidation

Article 06 July 2017

Smart elastic scheduling algorithm for virtual machine migration in cloud computing

Article 11 January 2019

References

Ahmad B, McClean S, Charles D, Parr G. Energy optimisation in cloud servers using a static threshold VM consolidation technique (STVMC), pp 117–128
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: Proceedings of the nineteenth ACM symposium on operating systems principles, SOSP ’03, pp 164–177. ACM, New York, NY, USA
Beloglazov A, Buyya R (2010) Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In: Proceedings of the 8th international workshop on middleware for grids, clouds and e-science, MGC ’10, pp 4:1–4:6. ACM, New York, NY, USA
Bezerra P, Martins G, Gomes R, Cavalcante F, Costa A (2017) Evaluating live virtual machine migration overhead on client’s application perspective. In: 2017 International Conference on Information Networking (ICOIN), pp 503–508. https://doi.org/10.1109/ICOIN.2017.7899536
Birke LR, Chen ESY (2012) Data centers in the wild: A large performance study. Technical Report. Z1204-002, IBM Res., Zürich, Switzerland
Chen CC, Hasio YT, Lin CY, Lu S, Lu HT, Chou J (2017) Using deep learning to predict and optimize hadoop data analytic service in a cloud platform. In: 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pp 909–916. https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.153
Chen G, He W, Liu J, Nath S, Rigas L, Xiao L, Zhao F (2008) Energy-aware server provisioning and load dispatching for connection-intensive internet services. In: ACM/USENIX NSDI, pp 337–350
Chen M, Zhang H, Su YY, Wang X, Jiang G, Yoshihira K (2011) Effective VM sizing in virtualized data centers. In: Proceedings of the 12th IFIP/IEEE international symposium on integrated network management, pp 594–601
Choi HW, Kwak H, Sohn A, Chung K (2008) Autonomous learning for efficient resource utilization of dynamic vm migration. In: Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS ’08, pp 185–194. ACM, New York, NY, USA
Choudhury S, Gaur D, Krishnamurti R (2009) An approximation algorithm for max k-uncut with capacity constraints. In: International Joint Conference on Computational Sciences and Optimization, 2009. CSO 2009, vol. 2, pp 934–938
Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A (2005) Live migration of virtual machines. In: ACM/USENIX NSDI, pp 273–286
Feige U, Krauthgamer R (2002) A polylogarithmic approximation of the minimum bisection. SIAM J Comput 31(4):1090–1118
Article MathSciNet Google Scholar
Feitelson DG, Tsafrir D, Krakov D (2014) Experience with using the parallel workloads archive. J Parallel Distrib Comput 74(10):2967–2982. https://doi.org/10.1016/j.jpdc.2014.06.013; https://www.sciencedirect.com/science/article/pii/S0743731514001154
Ferdaus MH, Murshed M, Calheiros RN, Buyya R (2014) Virtual machine consolidation in cloud data centers using ACO metaheuristic. In: 20th International Conference Euro-Par 2014 Parallel Processing, Proceedings, pp 306–317
Ferreto T, De Rose CAF, Heiss HU (2011) Maximum migration time guarantees in dynamic server consolidation for virtualized data centers. In: Proceedings of the 17th International Conference on Parallel Processing—Volume Part I, Euro-Par’11. Springer-Verlag, Berlin, Heidelberg, pp 443–454. http://dl.acm.org/citation.cfm?id=2033345.2033392
Guan B, Wu Y, Ding L, Wang Y (2013) Civsched: communication-aware inter-vm scheduling in virtual machine monitor based on the process. In: 2013 13th IEEE/ACM International symposium on cluster, cloud, and grid computing, pp 597–604. https://doi.org/10.1109/CCGrid.2013.105
Hao J, Orlin JB (1992) A faster algorithm for finding the minimum cut in a graph. In: Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms, SODA ’92. Society for Industrial and Applied Mathematics, pp 165–174
Hermenier F, Lorca X, Menaud JM, Muller G, Lawall J (2009) Entropy: a consolidation manager for clusters. In: Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’09. ACM, New York, NY, USA, pp 41–50
Hines MR, Deshpande U, Gopalan K (2009) Post-copy live migration of virtual machines. SIGOPS Oper Syst Rev 43(3):14–26
Article Google Scholar
Hossain M, Huang JC, Lee HHS (2012) Migration energy-aware workload consolidation in enterprise clouds. In: International Conference on Cloud Computing, pp 405–410
Huang Q, Gao F, Wang R, Qi Z (2011) Power consumption of virtual machine live migration in clouds. In: 2011 Third International Conference on Communications and Mobile Computing, pp 122–125
Huang Q, Su S, Xu S, Li J, Xu P, Shuang K (2013) Migration-based elastic consolidation scheduling in cloud data center. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops, pp 93–97. https://doi.org/10.1109/ICDCSW.2013.27
IBM (2012) The potsdam institute for climate impact research (pik) ibm dataplex cluster log. http://www.cs.huji.ac.il/labs/parallel/workload/l_pik_iplex/index.html
Jin H, Deng L, Wu S, Shi X, Pan X (2009) Live virtual machine migration with adaptive, memory compression. In: CLUSTER ’09. IEEE International Conference on Cluster Computing and Workshops, 2009, pp 1–10
Jung G, Joshi KR, Hiltunen MA, Schlichting RD, Pu C (2009) A cost-sensitive adaptation engine for server consolidation of multitier applications. In: Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Middleware ’09. Springer-Verlag New York, Inc., New York, NY, USA, pp 9:1–9:20
Khoshkholghi MA, Derahman MN, Abdullah A, Subramaniam S, Othman M (2017) Energy-efficient algorithms for dynamic virtual machine consolidation in cloud data centers. IEEE Access 5:10709–10722. https://doi.org/10.1109/ACCESS.2017.2711043
Article Google Scholar
Kivity A, Kamay Y, Laor D, Lublin U, Liguori A (2007) Kvm: the linux virtual machine monitor. In: Proceedings of the 2007 Ottawa Linux Symposium (OLS’-07)
Kochut A, Beaty K (2007) On strategies for dynamic resource management in virtualized server environments. In: 2007 15th International symposium on modeling, analysis, and simulation of computer and telecommunication systems, pp 193–200
Labriji I, Meneghello F, Cecchinato D, Sesia S, Perraud E, Strinati EC, Rossi M (2021) Mobility aware and dynamic migration of mec services for the internet of vehicles. IEEE Trans Netw Serv Manage 18(1):570–584. https://doi.org/10.1109/TNSM.2021.3052808
Article Google Scholar
Lee BD (2003) Schopf: Run-time prediction of parallel applications on shared environments. In: 2003 Proceedings IEEE International Conference on Cluster Computing, pp 487–491. https://doi.org/10.1109/CLUSTR.2003.1253355
Lee CH, Lee D, Kim M (1992) Optimal task assignment in linear array networks. IEEE Trans Comput 41(7):877–880
Article MathSciNet Google Scholar
Lim MY, Rawson F, Bletsch T, Freeh VW (2009) PADD: power aware domain distribution. In: International Conference on Distributed Computing Systems, pp 239–247
Lin M, Wierman A, Andrew LLH, Thereska E (2013) Dynamic right-sizing for power-proportional data centers. IEEE/ACM Trans Netw 21(5):1378–1391
Article Google Scholar
Liu H, Jin H, Liao X, Hu L, Yu C (2009) Live migration of virtual machine based on full system trace and replay. In: Proceedings of the 18th ACM international symposium on high performance distributed computing, HPDC ’09. ACM, New York, NY, USA, pp 101–110
Liu H, Xu CZ, Jin H, Gong J, Liao X (2011) Performance and energy modeling for live migration of virtual machines. In: IEEE International Conference on High-Performance Parallel and Distributed Computing, pp 171–182
Lublin U, Feitelson DG (2003) The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J Parallel Distrib Comput 63(11):1105–1122
Article Google Scholar
Lucarelli G, Mendonca F, Trystram D (2017) A new on-line method for scheduling independent tasks. In: 2017 17th IEEE/ACM International symposium on cluster, cloud and grid computing (CCGRID), pp 140–149. https://doi.org/10.1109/CCGRID.2017.82
Mehrotra P, Djomehri J, Heistand S, Hood R, Jin H, Lazanoff A, Saini S, Biswas R (2012) Performance evaluation of amazon ec2 for nasa hpc applications. In: Proceedings of the 3rd workshop on scientific cloud computing date, ScienceCloud ’12. ACM, New York, NY, USA, pp 41–50
Meng X, Pappas V, Zhang L (2010) Improving the scalability of data center networks with traffic-aware virtual machine placement. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10. IEEE Press, Piscataway, NJ, USA, pp 1154–1162
Mustafa S, Elghandour I, Ismail MA (2018) A machine learning approach for predicting execution time of spark jobs. Alex Eng J 57(4):3767–3778. https://doi.org/10.1016/j.aej.2018.03.006. https://www.sciencedirect.com/science/article
Nelson M, Lim BH, Hutchins G (2005) Fast transparent migration for virtual machines. In: USENIX Annual Technical Conference, pp 391–394
Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the third ACM symposium on cloud computing, SoCC ’12
Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Towards understanding heterogeneous clouds at scale: google trace analysis. Intel science and technology center for cloud computing, Carnegie Mellon University, Technical report
Shim Y (2016) Performance evaluation of static vm consolidation algorithms for cloud-based data centers considering inter-vm performance interference
Shimada K, Taniguchi I, Tomiyama H (2019) Communication-aware scheduling for malleable tasks. In: 2019 International Conference on Platform Technology and Service (PlatCon), pp 1–6. https://doi.org/10.1109/PlatCon.2019.8669429
Singh P, Gupta P, Jyoti K (2019) Energy aware vm consolidation using dynamic threshold in cloud computing. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp 1098–1102. https://doi.org/10.1109/ICCS45141.2019.9065427
Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016. https://doi.org/10.1016/j.jpdc.2004.06.008. https://www.sciencedirect.com/science/article
Song G, Meng Z, Huet F, Magoules F, Yu L, Lin X (2013) A hadoop mapreduce performance prediction method. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp 820–825. https://doi.org/10.1109/HPCC.and.EUC.2013.118
Strunk A, Dargie W (2013) Does live migration of virtual machines cost energy? In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp 514–521
Tarighi M, Motamedi SA, Sharifian S (2010) A new model for virtual machine migration in virtualized cluster server based on fuzzy decision making. CoRR
Toosi AN, Calheiros RN, Thulasiram RK, Buyya R (2011) Resource provisioning policies to increase iaas provider’s profit in a federated cloud environment. In: IEEE International Conference on High Performance Computing and Communications, pp 279–287
Tran NM, Wolters L (2011) Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact. In: IEEE International Conference on High-Performance Parallel and Distributed Computing, pp 111–122
Travostino F, Daspit P, Gommans L, Jog C, de Laat C, Mambretti J, Monga I, van Oudenaarde B, Raghunath S, Wang PY (2006) Seamless live migration of virtual machines over the man/wan. Future Gener Comput Syst 22(8):901–907
Article Google Scholar
Tsakalozos K, Kllapi H, Sitaridi E, Roussopoulos M, Paparas D, Delis A (2011) Flexible use of cloud resources through profit maximization and price discrimination. In: 2011 IEEE 27th International Conference on Data Engineering, pp 75–86. https://doi.org/10.1109/ICDE.2011.5767932
Tziritas N, Xu CZ, Loukopoulos T, Khan SU, Yu Z (2013) Application-aware workload consolidation to minimize both energy consumption and network load in cloud environments. In: Proceedings of the 2013 42Nd International Conference on Parallel Processing, ICPP ’13, pp 449–457
Utrera G, Tabik S, Corbalan J, Labarta J (2012) A job scheduling approach for multi-core clusters based on virtual malleability. In: Euro-Par, pp 191–203
Verma A, Ahuja P, Neogi (2008) A pmapper: power and migration cost aware application placement in virtualized systems. In: Proceedings of the USENIX International Conference on Middleware, pp 243–264
Verma A, Dasgupta G, Nayak TK, De P, Kothari R (2009) Server workload analysis for power minimization using consolidation. In: USENIX Annual Technical Conference, p 28
Verma A, Kumar G, Koller R (2010) The cost of reconfiguration in a cloud. In: Proceedings of the 11th International Middleware Conference Industrial Track, Middleware Industrial Track ’10. ACM, New York, NY, USA, pp 11–16
Vmware: vmware inc. http://www.vmware.com
Voorsluys W, Broberg J, Venugopal S, Buyya R (2009) Cost of virtual machine live migration in clouds: a performance evaluation. In: International Conference on Cloud Computing, pp 254–265
Wang J, Han D, Wang R (2018) A new rule-based power-aware job scheduler for supercomputers. J Supercomput 74(6):2508–2527. https://doi.org/10.1007/s11227-018-2281-1
Article Google Scholar
Wood T, Shenoy P, Venkataramani A, Yousif M (2009) Sandpiper: black-box and gray-box resource management for virtual machines. Comput Netw 53(17):2923–2938
Article Google Scholar
Xiao H, Hu Z, Li K (2019) Multi-objective vm consolidation based on thresholds and ant colony system in cloud computing. IEEE Access 7:53441–53453. https://doi.org/10.1109/ACCESS.2019.2912722
Article Google Scholar
Ye Y, Zhang J (2003) Approximation of dense-n/2-subgraph and the complement of min-bisection. J Global Optim 25(1):55–73
Article MathSciNet Google Scholar
Yun HY, Jin SH, Kim KS (2021) Workload stability-aware virtual machine consolidation using adaptive harmony search in cloud datacenters. Appl Sci 11(2). https://www.mdpi.com/2076-3417/11/2/798
Zhang Q, Zhani MF, Zhang S, Zhu Q, Boutaba R, Hellerstein JL (2012) Dynamic energy-aware capacity provisioning for cloud computing environments. In: IEEE International Conference on Autonomic Computing, pp 145–154
Zhao M, Figueiredo RJ (2007) Experimental study of virtual machine migration in support of reservation of cluster resources. In: Proceedings of the 2nd international workshop on virtualization technology in distributed computing, VTDC ’07. ACM, New York, NY, USA, pp 5:1–5:8
Zhu W, Chen J (2010) The complement of hypergraph capacitated min-k-cut problem. In: 2010 Third international symposium on parallel architectures, algorithms and programming (PAAP), pp 395–397

Download references

Author information

Authors and Affiliations

Institute of Information System and Applications, National Tsing Hua University, Hsinchu, Taiwan
Satyajit Padhy
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
Ming-Han Tsai, Shalini Sharma & Jerry Chou

Authors

Satyajit Padhy
View author publications
You can also search for this author inPubMed Google Scholar
Ming-Han Tsai
View author publications
You can also search for this author inPubMed Google Scholar
Shalini Sharma
View author publications
You can also search for this author inPubMed Google Scholar
Jerry Chou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jerry Chou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Padhy, S., Tsai, MH., Sharma, S. et al. CAMIRA: a consolidation-aware migration avoidance job scheduling strategy for virtualized parallel computing clusters. J Supercomput 78, 11921–11948 (2022). https://doi.org/10.1007/s11227-022-04337-2

Download citation

Accepted: 20 December 2021
Published: 21 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11227-022-04337-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CAMIRA: a consolidation-aware migration avoidance job scheduling strategy for virtualized parallel computing clusters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A parallel migration scheme for fast virtual machine relocation on a cloud cluster

Improving performance by network-aware virtual machine clustering and consolidation

Smart elastic scheduling algorithm for virtual machine migration in cloud computing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now