Keywords

1 Introduction

Cloud computing has many trending features such as on-demand services, virtualization and automatic scaling which leads remarkable increase in cloud usage. Data centers that host major cloud services contain enormous servers which cause tremendously high energy consumption levels [1]. Along with the increase in the number of cloud users and powerful data centers, energy consumption of cloud systems gained significant importance [2, 3] over the last decade.

Growing importance of energy efficiency in cloud systems caused recent academic studies to concentrate on energy efficiency on cloud systems as well. In this manner, virtual machine (VM) consolidation is a widely used method to promote more efficient energy consumption and utilization. On the other hand, in cloud computing, service performance is leveraged mostly with automatic scaling that may frequently led over-provisioned infrastructure and high energy consumption.

Automatic scaling can be achieved in terms of vertical scaling and horizontal scaling. In this paper, horizontal scaling is focused where VM creation and VM consolidation policies becomes important aspects of energy consumption trade-offs.

VM consolidation can be used as a means to reduce energy consumption and optimize the VM utilization in horizontal scaling process. Consolidating applications in multiple VMs into a single VM may cause high utilization and reduced fault tolerance. Fault tolerance can be achieved either using a reactive approach or a proactive approach. Reactive fault tolerance policies aim to reduce the damage of failures after the failure whereas, proactive fault tolerance policies try to predict the faults and reacts before the failure [4].

In modern cloud systems, three aspects; fault tolerance, energy efficiency and performance requirements form a trade-off. High performance and fault tolerant cloud environments require horizontal scaling, and energy efficient cloud systems need VM consolidation for preventing unnecessary energy consumption. An approach is necessary to keep fault tolerance, energy efficiency and performance requirements in an optimized balance.

In this paper, a reactive fault tolerance approach is proposed for cloud environments which try to minimize the energy consumption by using VM consolidation. There are three different cases in the study; no-consolidation, consolidation by threshold and consolidation according to fault tolerant consolidation algorithm (FTC). No-consolidation case is presented as the control experiment for the proposed approaches; when a new VM is created it will not be consolidated even if it runs no applications. In the second case, consolidation by threshold, VMs would be consolidated when their utilization is less then defined threshold value. In the last case, FTC algorithm is used to accept or reject consolidation by threshold requests using the measure defined latter in the paper.

Major contributions of this paper are:

  1. 1.

    A novel measure to evaluate fault tolerance vulnerabilities.

  2. 2.

    An algorithm (FTC) to orchestrate energy efficiency and fault tolerance together.

  3. 3.

    Simulating application placement, VM consolidation and fault tolerance together to analyze trade-offs among them.

  4. 4.

    Reduced number of application migration and energy consumption in horizontal scaling.

The rest of this paper is organized as follows. The related work in this research field is presented in Sect. 2. Then, the simulation system features and structural details are described in Sect. 3. In Sect. 4, experiment details and results are presented, finally, followed by conclusion and planned future work in the last section.

2 Related Work

VM consolidation has become a significant technique for data center energy and resource management. For consolidation of VM several methods are carried on already such as Ant Colony Optimization, K-Nearest Neighbor, Greedy Heuristics [5,6,7]. On the other hand, VM consolidation requires live migration of the applications which are hosted by the corresponding VM [8]. Besides in [9], authors analyzed CO\(_2\) emissions of the data centers by considering their energy consumption in network level. In [3], the authors are proposed a model which migrates highly utilized virtual machines to the low utilized hosts while keeping the energy efficiency of the data center by realizing firefly algorithm. However, these techniques do not consider the fault toleration of the VMs.

In the real world, cloud systems could encounter system failures by their dynamic nature [10] which must be prevented to achieve guaranteed Quality of Service (QoS). Hence, various fault tolerant cloud environment approaches are investigated. In [11], fault occurrences are more similar to Weibull Distribution (WD).

The authors proposed an active replication model to provide fault tolerance in [12]. Then, in [13], the replication approach is further enhanced by using byzantine fault tolerance gain to optimize the overall cost. However, all of these studies require user experience and knowledge for configuration and application preparation phase. Since most of the approaches utilize VM level live-migration among data centers to create fault tolerant cloud environment, the techniques represented in [14] distinguishes these approaches by using application level live-migration.

In [15], the authors emphases the gap of fault tolerant, energy efficient cloud environment approaches by using VM consolidation.

3 Simulation System Structure of the Proposed Approach

Simulation softwareFootnote 1 used in this study is developed in Java 8 and tested on PC which has Intel Core i7 7700HQ @ 2.80 GHz, 16 GB DDR4-2400, 4 GB NVidia GTX1050, on a Windows 10 64-bit environment. Typically, a simulation run takes 15 s. The application creation time and resource usage data are generated randomly by using uniform distribution. The simulation framework works for only one physical machine and all the VMs are created and consolidated on the same physical machine.

Fig. 1.
figure 1

Simulation flow chart

The main structure of the simulation is shown in Fig. 1. The simulation consists of two main phases; application assignment of VM and VM consolidation on physical machine. The application placement [16] is a divergent and huge area for covering in this paper hence Round Robin algorithm is used to ease implementation of this part. So, it is same for all three test cases (no-consolidation, consolidation by threshold, consolidation by FTC). The second phase of the simulation is the main focus of enhancement.

In the first phase, basically, the simulation engine checks the simulation end period of time and continue if it has more period to work. Then generate certain number of applications according to Gaussian Distribution and search for available VM if it is able to find the VM then assign corresponding application to VM and return to the simulation counter. If it cannot find any available VM for assignment, then it will create a new VM and assign application to currently created VM.

In the second phase, after all application assignments are completed on corresponding period, simulation framework checks the VMs utilization if their utilization is less then defined threshold. Depending on that, it will consolidate VM on consolidation by threshold case. But in consolidation by FTC case it sends a consolidation request to FTC algorithm and the algorithm determines the request is accepted or rejected. On no consolidation test case this step will be skipped because in that scenario there is no consolidation checks.

3.1 Fault Tolerant Consolidation Algorithm

The FTC Algorithm works on every period of simulation lifetime. As shown in Algorithm 1, it checks every VMs running on the physical machine and compare their utilization with the defined threshold value.

figure a

After the threshold control on line 3, findByPolicy function works to find guaranteed VM according to its policy value which can be defined as minimum utilized, median utilized or maximum utilized. The effects of these policies are also demonstrated on the Sect. 4. In line 5, isMigratable function checks all VMs except VM which has less utilization than threshold and is selected by previous line to guarantee all applications of the selected VM could be migrated to the rest of VMs.

In consolidate function of the algorithm, the applications which are hosted by the VM should be migrated to other VMs before consolidation process of the VM. However, live-migration of applications causes overhead on resources consumption of the VMs which may easily result in higher energy consumption by 10% [17]. Therefore, the key factor of the consolidation is to reduce number of application migration to reduce energy consumption and it is not significant to migration application to which VM.

In general, the main purpose of the algorithm is to prevent new VM creation when a failure occurrence on any of the VMs. Therefore, it determines consolidation request approval by checking the other VMs are capable of to manage all its applications. If they are capable then the algorithm will accept the consolidation request and VM will be consolidated. Otherwise, the request will be rejected and consolidation would not be happened.

Energy consumption of VM\(_i\) is calculated in Eq. 1.

$$\begin{aligned} E_{i} = k + \sum _{j\,=\,1}^{m} cpu(a_{j}) + memory(a_{j}) \end{aligned}$$
(1)

where variable k represents idle energy consumption, a denotes for application and m is the total number of applications hosted by VM\(_i\). cpu and memory functions return the memory and CPU consumption of a\(_j\). The equation sums the idle energy consumption and all utilized resources (CPU and Memory in case of suggested model) which is used by applications.

In Eq. 2, total energy consumption is calculated by using VM energy consumption and application migration’ energy overhead.

$$\begin{aligned} TE = \sum _{i\,=\,1}^{n} (1+\frac{s(p)}{10}) \times E_i \end{aligned}$$
(2)

where variable s(p) represents the number of the application migration among the VMs on period p.

4 Simulation Results

In the simulation, energy consumption and number of the application migrations are compared for three different cases (no-consolidation, consolidation by threshold and FTC) which are mentioned in Sect. 1.

Fig. 2.
figure 2

Application generation distribution

Number of generated applications are shown in Fig. 2. To test model on a fluctuating application generated environment several Gaussian Distributions are overlapped independently from fault occurrence distribution.

In the first experiment, energy consumption on various threshold values are checked for all cases. The threshold values are defined as utilization percentage of the VM and shown in Table 1. In the other experiments, number of application migrations and energy consumption changes on simulation time are analyzed by using a constant 50% threshold value.

Table 1. Simulation variables

Defined simulation variables are shown in Table 1. For resources types CPU and Memory are used and application resources consumption are selected from uniform distribution on the interval of 0.1 and 0.3. The variables are defined according to Google Cluster Data [18, 19]. The VM capacities defined as 20 core CPU and normalized 20 units of memory. Faults occurrences are gathered from Weibull Distribution and it happens 10 times in a simulation lifetime. The simulation runs for 1000 period to collect results.

Fig. 3.
figure 3

Energy consumption comparison for different threshold values

In Fig. 3, total energy consumption of the simulation lifetime has calculated by using Eq. 2 for different threshold values. In No-consolidation case, model do not consolidate any VM and if it is necessary it creates new one so energy consumption values are higher than other two case. In normal conditions, the threshold value changes would not affect no-consolidation energy consumption but in this simulation model, faulty VM is selected randomly so it can be increase when occurred on VMs with high utilization or vice-versa. In Consolidation case, keeping the threshold value low gives the advantage of reduced energy consumption. Hence, consolidation of VMs with high utilization causes costly application migrations because of having more number of applications also fault occurrences affect Consolidation case worse than the other cases. The FTC model is not affected from threshold value changes by the advantage of its rejection capability. But its slightly more robust on higher threshold values. Instead of consolidating low threshold VMs by considering fault possibility, consolidation of VMs with high threshold results the better energy efficiency. In results, the FTC approach has reduced energy consumption by 30% to 50%.

Fig. 4.
figure 4

Threshold value is selected as 50%

In Fig. 4, number of application migration and energy consumption of whole system are compared for all three test cases. the FTC algorithm has the minimum number of application migration and also 50% less energy consumption according to consolidation by threshold case. When an error occurred in all cases, applications are migrated from faulty VM to other VMs which causes to increase of energy consumption on fault occurrence times of the simulation. The FTC model degrades impacts of fault occurrence and prevents the unnecessary application migrations for fault tolerance so it keeps energy consumption more stabilized.

Fig. 5.
figure 5

Max, Min, Median FTC Algorithm policies on 50% threshold value

In Fig. 5, three different guaranteed VM finder policies are compared; maximum, minimum and median utilized as mentioned in Sect. 3. The results show that if it is selected as minimum it will converge to the consolidation case, otherwise it is selected as two times maximum it will converge to the no-consolidation case. The best results are found when the maximum utilized VM is selected and other VMs are checked by hosting the maximum utilized VM applications.

5 Conclusion and Future Works

In this paper, a novel approach is proposed to satisfy fault tolerance and energy efficiency together. Hence, two different experiment are created to prove FTC model and its advantages. Energy consumption and number of application migration results are compared under various threshold values. The experiments show that with the FTC approach the energy consumption ca be reduced by 30% to 50% and number of application migrations can be reduced by %10 in a faulty cloud environment. The proposed approach also reduces the impact of faults on VMs.

In the future, the proposed approach is planned to be validated using different real-world data sets. An additional contribution to proposed approach can be to adapt machine learning techniques in order to perform time series analysis and smart prediction of future fluctuations on VM resource demands.