Live virtual machine migration: A survey, research challenges, and future directions,☆☆

https://doi.org/10.1016/j.compeleceng.2022.108297Get rights and content

Highlights

  • Overview of cloud computing, data centers, virtualization, and VM migration.

  • Conventional and AI-based Live VM migration schemes categorization.

  • Live VM migration survey of load, energy, SLA, and network schemes.

  • VM migration analysis based on migration cost and AI benefits and drawbacks.

  • Live VM migration research challenges and future directions.

Abstract

In recent years, cloud computing has emerged as a promising paradigm providing various resources in the Cloud Data Center (CDC), including computations, storage, and even platform as a service. The virtualization technology plays a pivotal role in CDC resources management and provides an enchanting feature of virtual machine (VM) migration which provides several benefits in terms of VM scheduling, fault tolerance, load balancing, energy efficiency, power management, and security. To achieve efficient VM migration, a plethora of VM migration schemes have been proposed in the literature aiming to serve the quality of service-driven user requirements. This paper surveys the most recent and state-of-the-art Artificial Intelligence (AI) based and conventional load balancing, energy-aware, SLA aware, and network-aware live VM migration schemes. Through an extensive literature review, this paper investigates the most critical aspects of conventional and AI-driven live VM migration schemes. Finally, a few open research challenges that require further consideration from the research community are highlighted.

Introduction

Cloud computing (CC) has revolutionized both academia and the industrial sectors due to its intrinsic characteristics such as the economical provisioning of hardware resources and software services with guaranteed reliability and availability. The CC provides a high degree of flexibility in deployment, termination, migration, and replication of applications and services. Cloud data centers (CDC) — are composed of a wide range of physical cloud servers connected through high-speed links, and offers a variety of cloud computing services in form of Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). In addition, CDC offers a variety of customizable settings and add-on features to meet the computation and storage requirements of the end-user. The efficient management and utilization of cloud resources have significant importance in seamlessly meeting the demands of the application while ensuring the Quality of Service (QoS). However, the literature shows that most of the time a major percentage of cloud servers (i.e., 30%) in CDC remain idle and employ only a small proportion (i.e., 10% to 15%) of their available resources. The huge underutilization of cloud resources magnifies the operational cost and energy consumption of the cloud [1].

A cornerstone technology named virtualization plays a significant role in effectively handling the aforementioned constraints. Virtualization is a software layer in between the operating system and hardware (also known as hypervisor or virtual machine monitor (VMM)) that hosts multiple virtual machines (VM) running on the same hardware. VMM maps the incoming operating system (OS) request to a specific hardware resource and ensures that the failure of a single VM does not affect the rest of the VMs running on a physical server. However, the execution of multiple VMs on a single server without considering the available resources may result in server overutilization and application performance degradation. Meaning that an application that requires a specific number of resources must have to wait until the resources become available. Therefore, the efficient management of VMs in CDC has much importance in order to resolve these problems — task distribution and VM protection from hardware failure [2].

The advent of VM migration technology resolves server overutilization and performance degradation problems by enabling the migration of VMs between the servers residing within or across the data centers. In VM migration, the hypervisor relieves the overutilized servers by migrating its workload to an underutilized or normal utilized server. The VM migration may require additional resources (such as energy, network bandwidth, and computational resources) and may affect the applications within the migrant VM until the migration process completes. Therefore, to maintain the application performance, it is quite important to complete the migration process within a minimal time duration while utilizing the minimum network and server resources [1].

To achieve efficient VM migration, various techniques have been proposed in the literature and can be categorized as non-live and live VM migration techniques. As the names suggest, the live VM migration continues serving the users at the time of VM migration to attain seamless connectivity and optimal resource utilization. Whereas the non-live VM migration halts VM services during migration and resumes them at the destination once the migration completes. The VM migration process (either live VM migration or non-live VM migration) is controlled by the VM migration controller. Depending on the current condition of the underlying server, the VM migration controller can migrate a single or multiple VMs across LAN or WAN to attain efficient resource management. VM migration within the LAN can be easily managed as compared with the VM migration in WAN. The rationale is that LAN does not require storage migration due to Networked Attached Storage (NAS). Whereas VM migration across the WAN faces several issues such as storage migration, network congestion, limited bandwidth problem, and erroneous WAN links. All these factors prolong the entire VM migration process.

To resolve the aforementioned migration issues in CDC, various research efforts have been devoted to the context of both non-live and live VM migration. The non-live VM migration scheme suspends the operation(s) of the currently executing VM before migrating it to the destination. It is to be noted that the non-live VM migration stops serving the users during the migration process which increases the overall response/execution time and degrades the QoS. Whereas, the live VM migration continues serving the users without halting the VM services during the migration process which in turn significantly lowers the overall execution time and effectively maintains the QoS. In this paper, we shed light on live VM migration and categorize them into (1) Conventional VM migration schemes and (2) AI-based migration schemes, in the context of load balancing, energy efficiency, SLA violation mitigation, and network and bandwidth utilization. A comprehensive description of the underlying working principles of these categories accompanied by a detailed survey of existing schemes in each category is presented. We critically analyze these schemes and based on our analysis we have highlighted both the contribution and the limitations. Moreover, we have provided a dedicated section composed of various sub-sections highlighting the unresolved research challenges and future directions, considering the importance of VM migration schemes.

In summary, the main contributions of our survey paper are as follows:

  • First, we provide a brief overview of cloud computing, data centers, virtualization, and VM migration.

  • Second, this paper categorizes the VM migration schemes (live and non-live VM migration schemes) and highlights the benefits based on certain performance evaluation metrics.

  • Third, this paper provides a detailed survey and analysis of conventional and AI-based live VM migration schemes in the context of load balancing, energy efficiency, SLA violations, and network and bandwidth utilization.

  • Fourth, we compare the conventional and AI-based live VM migration based on several performance metrics such as downtime, total migration time, network traffic, and QoS.

  • Finally, several research challenges and future directions in the domain of live VM migration are presented.

A plethora of research efforts aiming to achieve efficient intra and inter CDC VM migration have been proposed in the literature. In the majority of the proposed schemes, the authors mainly focused on (1) load-balancing, (2) server consolidation, (3) energy efficiency, (4) efficient resource utilization, (5) maximum bandwidth utilization, (6) fault tolerance, (7) reducing SLA violation and, (8) maximizing cloud service providers’ revenue. The main theme of these survey articles is briefly described as follows.

The authors of [2] reviewed the state-of-the-art live and non-live VM migration schemes, provided a detailed analysis of migration in terms of (e.g., network performance optimization, running and co-hosted applications’ performance of VMs), and categorized the migration schemes based on pre-copy, post copy, and hybrid methods. In [3], the authors provided a detailed survey mainly focused on bandwidth optimization, Dynamic Voltage and Frequency Scaling (DVFS) enabled optimization, server consolidation framework, and storage migration in Wide Area Network (WAN). The survey in [4] comprehensively reviewed the live VM migration techniques. In this survey, the authors mainly targeted VM migration optimization schemes in the context of memory migration. The authors in [1] surveyed VM migration techniques, their potential benefits, and challenges, and categorize them in: (1) granularity-based migration (Single or Multiple), (2) manner-based (i.e., non-live and live migration), and (3) distance-based migration (for instance either LAN or WAN). Moreover, the authors also analyzed these migration schemes based on memory migration, storage migration, and network connection continuity. Authors in [5] presented a comprehensive survey on live virtual machine migration. The survey mainly targeted the strengths, weaknesses, and workload impact on the suitability of live VM migration techniques. Moreover, they also provide a comparison of pre-copy and post-copy VM migration based on several performance metrics. In [6], a review of forecasting methods based on efficient VM migration techniques was provided where the authors highlighted the key issues in predictive-based migrations and classified them based on applied prediction algorithms. They also presented a detailed discussion regarding the effectiveness of prediction-based VM migration schemes. The authors of [7] surveyed VM consolidation schemes and presented various performance metrics, optimization methods, their objectives, and evaluation approaches in cloud computing. Moreover, they also reviewed the software and hardware metrics, and VM consolidation architecture in cloud computing systems.

Given the existing works, there still lacks a survey paper that provides concrete and comprehensive discussions on AI-based as well as conventional live VM migration schemes and at the same provides an insight on the role of live VM migration schemes in modern CDC, which motivates the current work. Our survey paper differs from existing works in the following aspects:

  • 1.

    the current survey presents a detailed discussion of conventional and AI-based (e.g., supervised, unsupervised, reinforcement, and Q-learning) live VM migration schemes and provides a quick reference for both researchers and industry experts,

  • 2.

    we further classified both AI-based and conventional live VM migration strategies in terms of load balancing, energy efficiency, SLA violation, and network and bandwidth utilization,

  • 3.

    we critically analyze and compare the surveyed schemes based on different performance metrics such as downtime, migration time, QoS, bandwidth, and resources utilization, and

  • 4.

    we provide futuristic insights in the context of AI-based live VM migration, which lacks in almost all of the prior works.

The literature review in our paper provides a thorough explanation of the underlying working mechanism of these aforementioned classifications, and therefore, will be a valuable addition to the existing survey literature. The readers can greatly benefit from our survey paper, as we mostly cover the recent state-of-the-art schemes. To the best of our knowledge, we for the first time are providing a detailed survey on both AI-based and conventional live VM migration schemes. Furthermore, in Table 2 we have summarized the existing surveys in terms of live migration, non-live migration, storage migration, memory migration, energy consumption, load balancing, service level agreement, bandwidth, open research issues, and artificial intelligence.

The rest of the paper is structured as follows. Section 2 presents some basic knowledge of cloud computing, cloud data center, and virtualization. An overview of VM migration, migration types, performance matrices, live migration, and VM migration techniques is presented in Section 3. In Section 4, we classified the conventional and AI-based load balancing, energy-aware, SLA-aware, and network-aware live VM migration schemes. We provided the analysis of VM migrations in Section 5, the VM migration challenges and future research directions in Section 6 and finally, the conclusion is presented in Section 7. Furthermore, Table 1 provides the complete list of acronyms that are used in this survey paper.

Section snippets

Basic knowledge and background

Before diving deep into a detailed explanation, we first shed light on the basics of cloud computing, data center, virtualization, and VM migration.

VM migration: An overview

This section highlights the VM migration, VM migration benefits, types of VM migration, and migration performance metrics.

IoT devices have limited computation and storage resources while smart applications (e.g., smart farming, remote patient monitoring autonomous driving, smart home appliances, smart factories) utilizing such devices generate a huge amount of data. Due to limited resources, the generated data is transferred to cloud DC for processing and storage. Consider a smart forming

Classification of live VM migration schemes

A multitude of schemes has been developed to attain efficient and seamless live VM migration. This section reviews the most recent state-of-the-art live VM migration and classifies them based on load balancing, energy-aware, SLA-aware, and network-aware schemes as shown in Fig. 3. These schemes are further divided into conventional and AI-based live VM migration.

Live VM migration cost

Live VM migration is a costly process due to (i) the number of CPU resources it takes at the source host (i.e., computational cost), (ii) energy consumption for migration preparation and migration completion (i.e., energy cost), (iii) network bandwidth between the source and destination host to perform migration (i.e., network cost), (iv) the VM memory content size and the memory content update rate, (v) the number of VM migration, (vi) available network bandwidth for migration, and (vii) the

Live VM migration challenges and future research directions

Even though the conventional and AI-based VM migration strategies in CDC have been extensively studied, there exist several unsolved issues which require immediate attention. In this section, we have provided futuristic insights about the challenges associated with the live VM migration schemes. We organize this section into seven categories, outlined some of the most prominent research challenges and issues, and provide futuristic insight into each category. A detailed description of each

Conclusion

The Cloud computing model has sparked significant attention due to the on-demand resource provisioning over the Internet. In parallel, cloud resource management become more complex due to the increased demand for cloud services. To handle the constraint, VM migration is an effective mechanism that attempts to resolve resource management issues. The VM migration offers various advantages to cloud service providers and cloud consumers. A lot of research efforts in VM migration have been carried

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.compeleceng.2022.108297.

Muhammad Imran completed his Master degree in computer science from the Virtual University of Pakistan in 2019. He is currently pursuing a Ph.D. degree in Computer Engineering with the Department of Software and Communication Engineering at Hongik University, South Korea. His research interests are in the field of Cloud, Edge Computing, the Internet of Things, Information-Centric Networking, and Named Data Networking.

References (30)

  • ZhangF. et al.

    A survey on virtual machine migration: Challenges, techniques, and open issues

    IEEE Commun Surv Tutor

    (2018)
  • AhmadR.W. et al.

    Virtual machine migration in cloud data centers: A review, taxonomy, and open research issues

    J Supercomput

    (2015)
  • MasdariM. et al.

    Efficient VM migrations using forecasting techniques in cloud computing: A comprehensive review

    Cluster Comput

    (2020)
  • ZolfaghariR. et al.

    Virtual machine consolidation in cloud computing systems: Challenges and future trends

    Wirel Pers Commun

    (2020)
  • GamalM. et al.

    Osmotic bio-inspired load balancing algorithm in cloud computing

    IEEE Access

    (2019)
  • Cited by (13)

    View all citing articles on Scopus

    Muhammad Imran completed his Master degree in computer science from the Virtual University of Pakistan in 2019. He is currently pursuing a Ph.D. degree in Computer Engineering with the Department of Software and Communication Engineering at Hongik University, South Korea. His research interests are in the field of Cloud, Edge Computing, the Internet of Things, Information-Centric Networking, and Named Data Networking.

    Dr. Muhammad Ibrahim completed his Ph.D. in Computer Science from Capital University of Science and Technology, Islamabad in 2019. Currently, he is pursuing his postdoc from Jeju National University, South Korea and working as Assistant Professor at the University of Haripur. His area of research includes Cloud computing, VM Migration and Task Scheduling in Cloud Computing.

    Muhammad Salahuddin received his M.S degree in Computer Science from University, Islamabad, Pakistan in 2016. He is currently pursuing a Ph.D. degree in Computer Engineering from Broadband Convergence Networks Lab, Hongik University, South Korea. His major interests are in the field of wireless sensor networks (WSNs/UWSNs), Named data networking, NDN enabled Vehicular Edge/Fog computing and the Internet of Things.

    Muhammad Atif Ur Rehmanis a Lecturer (Assistant Professor) in the Department of Computing & Mathematics at Manchester Metropolitan University, the UK since May 2022. He received a Ph.D. degree in Electronics and Computer Engineering from Hongik University, South Korea in Feb 2022. His research interests are in the broader areas of edge cloud computing, intelligent communication protocol design, and Metaverse.

    Byung-Seo Kim received the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Florida, in 2001 and 2004, respectively. From 2005 to 2007, he was with Motorola Inc., Schaumburg, IL, USA, as a Senior Software Engineer in networks and enterprises. He is a Professor  at Department of Software and Communications Engineering, Hongik University, South Korea.

    This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No.2022R1A2C1003549) and in part by 2022 Hongik University Innovation Support program Fund.

    ☆☆

    This paper is for regular issues of CAEE. Reviews were processed by Associate Editor Dr. Chaker A. Kerrache and recommended for publication.

    View full text