Virtual machine placement with two-path traffic routing for reduced congestion in data center networks

doi:10.1016/j.comcom.2014.07.009

Computer Communications

Volume 53, 1 November 2014, Pages 1-12

https://doi.org/10.1016/j.comcom.2014.07.009 Get rights and content

Abstract

Virtualization-based Data Centers are increasingly becoming the hosting platform for a wide range of applications. The communication patterns in Data Center networks show the trend towards increasing bandwidth usage between virtual machines (VMs) within the Data Center resulting in higher chance of occurrence of network congestion. Thus, VM placement and routing algorithms are increasingly important to maximize application performance, provide fault tolerance, and reduce network loads. A less-than optimal placement of communicating VMs can cause inter-VM traffic to traverse bottlenecked network paths leading to large cross network traffic. The core network oversubscription and unbalanced workload placement could lead to long-lived congestion in Data Center networks. Multipath routing with traffic distributed in an appropriate proportion helps balance the load and decrease the possibility of congestion. Furthermore, by routing traffic on multiple link-disjoint paths, traffic can be protected against failures. The use of link-disjoint paths ensures the availability of at least one path for the traffic upon a link failure, thus guaranteeing a certain bandwidth (associated with the surviving paths). In this paper, we study the problem of VM placement with traffic routing on multiple paths for reduced occurrence of congestion while satisfying a certain protection grade which is defined as the fraction of rate (or bandwidth) guaranteed to be available in the event of single link failures. We develop an efficient algorithm based on a greedy technique for placing VMs onto servers satisfying the computing and memory resource requirements, taking into account the amount of inter-VM traffic and network load. In addition, we develop a two-path routing algorithm to satisfy the bandwidth and protection grade requirements so as to reduce the network congestion. Our simulation results show the effectiveness of the proposed algorithms in balancing the load and resilient when compared to other first-fit and random algorithms.

Introduction

Data centers host a wide variety of applications including web hosting, video services, e-commerce and social networking. In recent years, we have seen the traffic pattern changing from “North–South” to “East–West”. This is due to the increased usage of resources in Data Centers to run large-scale data-intensive tasks, such as indexing web pages or analysing large data-sets, often using variations of the Map Reduce paradigm. To provide support for large number of applications, Data Centers require high performance network interconnect to connect tens of thousands of servers. Conventional Data Centers follow to a great extent a common a 2/3-tier single-rooted network topology. Using single path routing in this type of networks cannot fully utilize the network capacity, leading to congestion on the oversubscribed links and underutilizing the resources on other available paths. In such networks, it is critical to employ effective load balancing schemes so that the bandwidth resources are efficiently utilized.

To overcome the problems like poor bisection bandwidth and poor performance isolation, inherent in the traditional Data Centers, new network architectures such as VL2 [1], Portland [2], and BCube [3] have been proposed. These topologies take the form of multi-rooted trees with one or multiple paths between hosts. Optimized routing over multiple paths helps improve link utilization and decrease congestion. Load balancing could be done using multi-path routing by splitting the traffic between a source–destination pair across multiple disjoint paths. While splitting the traffic among different paths makes the network more reliable and load balanced, it requires intelligent mechanisms to choose good paths and traffic-splitting ratios among different paths.

Virtualization is being deployed in Data Centers at a rapid pace to consolidate workloads for improved server utilization, for ease of provisioning, configuration management, and more generally, for efficient and flexible use of Data Center resources. It is highly desirable that VMs be placed to maximize application performance, power efficiency, fault tolerance, and reduce network bandwidth usage. Bi-section bandwidth is a critical resource in today’s Data Centers because of the high cost and limited bandwidth of higher-level network switches and routers. This problem is aggravated in virtualized environments where a set of virtual machines, jointly implementing some service, may run across multiple layer-2 (L2) hops. Poor placement of communicating VMs can cause inter-VM traffic to traverse bottlenecked network paths leading to unnecessary cross network traffic. In recent years, VM placement problem has received much attention from the researchers. Some works considered the placement of VMs with the constraints of limited resources of servers. The works carried out in [20], [21], [22] addressed the VM placement problem considering only the server resources such as CPU and memory, while leaving out networking aspects. In [19], [23], network-aware VM placement has been studied. In [24], VM placement with its impact on network traffic has been considered but with the main goal of reducing the total energy consumption in a Data Center.

Since Data Centers carry huge amounts of traffic, component faults have severe consequences. Fault tolerance or survivability is thus critically important to provide reliable services. If we make advance resource reservation, with sufficient backup bandwidth resources, we can provide 100% traffic protection. Since reserving backup resources is expensive, for a cost-effective solution, traffic can be protected partially wherein traffic will get reduced bandwidth in the event of failures. Partial protection ensures service availability in the event of failures but with reduced bandwidth and low performance which could be acceptable for most applications. We use protection grade as a measure of partial protection. We define protection grade as the fraction of bandwidth which is guaranteed to be available in the event of a single link failure. For example, if b units of bandwidth are guaranteed for a flow during the normal working condition and only b′ (⩽b) units are available upon a link failure, the protection grade provided is said to be b′/b. We note that unlike the failures in the wide area networks (where cables are usually laid under the ground and sea), failures inside Data Centers can be restored. Therefore, partial protection is a cost effective solution for providing reliable services (with little or no extra bandwidth) although full bandwidth is not guaranteed in the event of faults.

Traffic splitting helps to reduce congestion and ensure protection in the event of failures. Traffic flows with protection requirements can be split and sent across two or more paths. It is desirable to split large flows when compared to short flows. This is because, large flows are more likely to cause link congestion. Further, they are small in number, implying lesser implementation overhead due to traffic splitting. When the number of flows to which a flow is split increases, the protection grade increases, but the algorithmic complexity and implementation overhead due to traffic splitting also increase. Suppose that a flow which requires a certain protection grade and bandwidth requirement of b units is split into two flows and routed through two link-disjoint paths each with b₁ and b₂ units, respectively. The use of link-disjoint paths guarantees that at least one of the two paths is available in the event of a single link failure. This guarantees the availability of min(b₁, b₂) units of bandwidth upon a link failure implying a protection grade of min(b₁, b₂)/b. If the flow is split and sent across n link-disjoint paths and b_max is the maximum bandwidth used among these n paths, then the minimum protection grade guaranteed is [b − b_max]/b which happens when a link on the path with b_max units has failed. We can provide 100% protection guarantee by reserving spare bandwidth of b_max units on a backup path. However this requires excessive resources as the total bandwidth used is b + b_max units.

In this paper, we study the problem of VM placement with traffic splitting for reduced congestion and partial traffic protection. We consider flow demands with a specified computing, memory, and bandwidth resource requirements and also a protection grade requirement. We develop an efficient greedy approach based algorithm called Greedy VM placement with Two Path Routing (GVMTPR). The algorithm chooses appropriate servers for placing VMs from a set of candidate servers that can possibly satisfy the server resource requirements. It splits flows into two and route them through two link-disjoint paths so as to reduce congestion while satisfying the bandwidth and protection grade requirements. We use the maximum load on any link as a measure of congestion. We note that the path lengths are short in Data Center networks and there exist many paths of the same length, and therefore the impact of traffic splitting across two paths in less pronounced. Further, our work considers the current network state for multipath routing unlike the Equal Cost Multipath (ECMP) algorithm which chooses a path from among multiple paths without considering the existing utilization of the links. We demonstrate the effectiveness of our proposed algorithm through simulation results.

The rest of the paper is organized as follows. In Section 2 we describe the background and related works. In Section 3, we present the VM placement routing framework and explain various functional units. We formulate the problem in Section 4. We present and discuss the proposed algorithm in Section 5. We carry out performance study through simulations in Section 6. Finally, we make concluding remarks in Section 7.

Section snippets

Background and related works

In this section, we briefly review previous works related to this work, namely Data Center networking, then explain the various VM placement Heuristics.

VM placement and routing framework

In this section we describe our framework for VM placement and traffic routing. Fig. 1 shows the framework which uses a multi-rooted tree based Data Center Network with three layers of switches: Top-of-Rack (ToR) switches which connect to a layer of aggregate switches which in turn connect to the core switches. A physical host has many VMs running on it, which are connected to physical network interfaces through a virtual switch inside the hypervisor. Physical hosts are directly connected to a

Problem definition and formulation

Given a set of n servers $S = {S_{1}, S_{2}, \dots, S_{n}}$ and a set of m VM types $V = {V_{1}, V_{2}, \dots, V_{m}}$ . A server host S_i has a certain amount of resources represented as S_i = {S_com_i, S_mem_i, S_bw_i}, where S_com_i is the computing capacity, S_mem_i is the capacity of memory, and S_bw_i is the bandwidth, i.e., ingress/egress capacity. A VM flow (or simply a flow) originates at a VM and ends at another VM. A job’s resource requirement is specified as a vector J = {x₁, x₂, … , x_m}, where x_k is the number of VMs of type V_k. The

Proposed GVMTPR algorithm

As we stated earlier, the VM placement problem is complex and has been shown to be NP hard. The consideration of protection guarantee makes it more complex. We therefore develop a fast heuristic algorithm to solve the problem. We develop a Greedy method based algorithm called Greedy VM placement with Two Path Routing (GVMTPR) which works in two phases.

In phase 1, VMs are placed taking into consideration the resource efficiency and the impact of placement on network traffic with the assumption

Performance evaluation

This section studies the performance of the proposed GVMTPR algorithm, and other placement heuristics on a tiered Data Center network topology. The simulated Data Center is implemented by using ns-3 simulator [25], [26]. Each server has resources to host multiple VMs and the applications run on VMs generate the network traffic to other VMs. The network traffic generated by applications that are running on VMs is routed through the switches to other VMs based on routes and traffic proportions

Conclusions

In this paper, we addressed the problem of VM placement for reducing congestion with specified protection guarantees. We developed an algorithm called GVMTPR for VM placement and two path traffic routing so as to minimize congestion while providing the specified protection grade. The algorithm uses greedy method for intelligently placing VMs onto physical servers in such a way to have reduced traffic load on the network. The traffic is routed on two link-disjoint paths with a certain proportion

References (35)

A. Greenberg, J.R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D.A. Maltz, P. Patel, S. Sengupta, VL2: a scalable...
R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, A. Vahdat,...
C. Guo, H. Wu, K. Tan, L. Shiy, Y. Zhang, S. Lu, Bcube: a high performance, server-centric network architecture for...
M. Al-Fares et al.
A scalable, commodity data center network architecture
C. Clos
A study of non-blocking switching networks
Bell Syst. Tech. J.
(1953)
C. Leiserson
Fat-trees: universal networks for hardware efficient supercomputing
IEEE Trans. Comp.
(1985)
C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, S. Lu, Dcell: a scalable and fault-tolerant network structure for data...
Joe Wenjie Jiang, Tian Lan, Sangtae Ha, Minghua Chen, Mung Chiang, Joint VM placement and routing for data center...
Fei Song, Application-aware virtual machine placement in data centers, in: Proceedings of Innovative Mobile and...
J. Xu, J. Fortes, Multi-objective virtual machine placement in virtualized data center environments, in: Proceedings of...

U. Bellur, C. Rao, M. Kumar, Optimal placement algorithms for virtual machines, in: Proceedings of CoRR,...

F. Machida, M. Kawato, Y. Maeno, Redundant virtual machine placement for fault-tolerant consolidated server clusters,...

A. Verma, P. Ahuja, A. Neogi, pMapper: power and migration cost aware application placement in virtualized systems, in:...

S.S. Seiden et al.

New bounds for variable-sized online bin packing

SIAM J. Comput.

(2003)

E. Coffman et al.

Approximation Algorithms for Bin Packing: A Survey

(1997)

D. Ersoz, M.S. Yousif, C.R. Das, Characterizing network traffic in a cluster-based, multi-tier data center, in:...

S. Kandula, S. Sengupta, A. Greenberg, P. Patel, The nature of data center traffic: measurements and analysis, in:...

Cited by (37)

Kullback-Leibler distance criterion consolidation in cloud
2020, Journal of Network and Computer Applications
Citation Excerpt :
In this method, they utilize different heuristics based on multi-metric decision-making for the identification of underloaded hosts and placement of VMs. Greedy algorithms for VM placement are proposed in (Dong et al., 2013; Kanagavelu et al., 2014) and use heuristics to reduce the search space. Dong et al. (2013) try to perform the placement in a way that the least number of physical machines and network resources are active, considering the different resources of physical machines and the capacity of network links.
The dynamic virtual machine (VM) consolidation is a key resource management technique used for achieving a trade-off between performance and energy efficiency of cloud computing systems. Selection of the source and destination hosts is a key problem in the process of VM consolidation which faces different challenges in the effective management of cloud resources. One of these challenges is utilizing burstiness-aware algorithms for selecting the source and destination hosts to prevent frequent migrations, reduce Service Level Agreement (SLA) violations, and improve energy efficiency. In the current study, we propose a new approach based on the Kullback-Leibler Distance (KLD) criterion to tackle this problem. The proposed approach includes burstiness-aware algorithms for selecting the source and destination hosts to minimize SLA violations, improve energy efficiency, and decrease the number of migrations. We utilize both real-world and random workloads and CloudSim simulator to inspect the performance of proposed algorithms. The experimental results reveal that the proposed approach outperforms the previous algorithms in terms of performance.
Efficient virtual network function placement strategies for Cloud Radio Access Networks
2018, Computer Communications
Citation Excerpt :
Various optimization models for resource allocation in radio networks, as well as core networks, have been proposed along with the heuristic approaches. For example, optimal energy-efficient power allocation schemes for radio networks by Weng et al. [28]. Sigwele et al. [47] have proposed energy-efficient CRANs by cloud-based workload consolidation for 5G networks.
The new generation of 5G mobile services place stringent requirements for cellular network operators in terms of latency and costs. The latest trend in radio access networks (RANs) is to pool the baseband units (BBUs) of multiple radio base stations and to install them in a centralized infrastructure, such as a cloud, for statistical multiplexing gains. The technology is known as Cloud Radio Access Network (CRAN). Since cloud computing is gaining significant traction and virtualized data centers are becoming popular as a cost-effective infrastructure in the telecommunication industry, CRAN is being heralded as a candidate technology to meet the expectations of radio access networks for 5G. In CRANs, low energy base stations (BSs) are deployed over a small geographical location and are connected to a cloud via finite capacity backhaul links. Baseband processing unit (BBU) functions are implemented on the virtual machines (VMs) in the cloud over commodity hardware. Such functions, built in software, are termed as virtual functions (VFs). The optimized placement of VFs is necessary to reduce the total delays and minimize the overall costs to operate CRANs. Our study considers the problem of optimal VF placement over distributed virtual resources spread across multiple clouds, creating a centralized BBU cloud. We propose a combinatorial optimization model and the use of two heuristic approaches, which are, branch-and-bound (BnB) and simulated annealing (SA) for the proposed optimal placement. In addition, we propose enhancements to the standard BnB heuristic and compare the results with standard BnB and SA approaches. The proposed enhancements improve the quality of the solution in terms of latency and cost as well as reduce the execution complexity significantly. We also determine the optimal number of clouds, which need to be deployed so that the total links delays, as well as the service migration delays, are minimized, while the total cloud deployment cost is within the acceptable limits.
A Distributed Virtual-Machine Placement and Migration Approach Based on Modern Portfolio Theory
2024, Journal of Network and Systems Management
Traffic Congestion Detection and Alternative Route Provision Using Machine Learning and IoT-Based Surveillance
2023, Journal of Machine and Computing
An Efficient Wolf Optimizer System for Virtual Machine Placement in Wireless Network Over the Cloud Environment
2023, Wireless Personal Communications
Performance Interference of Virtual Machines: A Survey
2023, ACM Computing Surveys

View all citing articles on Scopus

View full text