1 Introduction

Cloud computing, as an emerging technology, is featured by the ability of elastic provisioning of on-demand computing resources ranging from applications to storage over the internet on a pay-per-use manner. Cloud computing brings in numerous benefits for companies and end customers. E.g., end customers can invoke computational resources for almost all possible types of workload when resources are reachable. This erases the conventional need for IT administrators to build, provision, and maintain resources. Moreover, companies can dynamically scale upward or downward to meet the fluctuations of varying demands. Therefore, investments into physical computing infrastructures, which are substituted by virtual environments connected to cloud data centers, are thus no longer necessary. In this way, computational resources are priced based on the actual amount of resources and services invoked, thereby allowing customers to afford only for what they actually need.

By granting hardware/software/communication/storage components access to end customers, clouds deliver services at multiple levels, i.e., infrastructure, platform, and Software level. Infrastructure-as-a-Service (IaaS) clouds provision customers or tenant users virtual machine (VM) instances as computational resources. These instances are created and executed in the owner data-center. In contrast, PaaS and SaaS provide service in terms of platforms and applications, which are maintained by a third-party vendor with its interfaces exposed on the customers’ side.

Cloud users consistently concern about the quality of cloud service delivered. Thus, performance [1] promised by cloud providers is crucial to the success of cloud service offered. Cloud users are usually willing to see responses to their requests as fast as possible. We thus employ expected instantiation response delay as the most import quantitative metric, which decides system responsiveness. Moreover, the last thing cloud users want to see is the rejection of their requests. We therefore consider request rejection rate as another important performance metric, which directly determines users’ satisfaction. This rate is expressed by the probability that a submitted request receives no response or a negative response (cloud system being unable to instantiate its VM request). Note that the two performance metrics are determined by various system factors and parameters, e.g., service rates, buffer scale, multiplexing ability, and fault rates. As will be shown later in this work, a successful VM instantiation on IaaS cloud has to go through several provisioning steps, each of which can be subject to unexpected faults. Fault-handling activities are thus inevitable and such activities could strongly affect the overall cloud performance. However, a careful investigation into related works (discussed in next section) suggests that, for model simplicity, most existing approaches assume reliable and fault-free cloud provisioning and thus have less difficulty in theoretical derivation of performance metrics. Although some recent performance/QoS models consider fault/fault-handling, they rely on measurement/test data to analyze cloud performance. Measurement-based approaches have limited value in optimization and bottleneck-analysis. In contrast, comprehensive theoretical performance models capable of modeling unreliable cloud provisioning with efficient calculations of performance metrics are in high need.

For the abovementioned purpose, we propose a comprehensive analytical framework for performance estimation of fault-prone IaaS cloud with faulty virtual machine (VM) instantiation and fault handling. A stochastic queuing model with retrials is developed and its closed-form results of two performance metrics are obtained. To prove the effectiveness and accuracy of the proposed stochastic model, we employ runtime performance data from an actual campus cloud built on OpenStack to carry out a confidence-interval-check. The estimated performance using our proposed stochastic model well converges to corresponding 90% experimental confidence-intervals, thus suggesting the validity and accuracy of our proposed theoretical model.

To optimize performance and reduce cost, we translate the stochastic process into an optimal-responsiveness-determination problem, where we want to know the optimal system responsiveness possible with the constraint of rejection rate and the cost decided by system capacity, measured by the number of PMs and the size of admission buffer. We show that an intelligent algorithm, based on the simulated-annealing method, can effectively tackle the optimization problem.

2 Related studies

Recently, performance estimation of cloud computing systems is attracting increasing attention. Various model-driven methods in this direction are developed for analytical performance prediction of cloud computing systems and cloud datacenters. For example, Xiong et al. [2] work on percentile of job execution times and response delays and present an M/M/1 queue model for QoS prediction. It assumes reliable and fault-free cloud provisioning to simplify the derivation of performance results. Our recent work in [3, 4], however, shows that faults exist at different levels of IaaS cloud and the overhead required to counter such faults intensively affects performance especially when cloud is heavily loaded. Wang et al. [5] introduce a comprehensive performance model by considering machine error and repair. They also obtain the theoretical distributional functions of cloud task execution durations. He et al. [6] consider a comprehensive QoS model to analyze efficiency of VM allocation strategies with reliable cloud provisioning. Bruneo et al. [7, 8] model the aging process of cloud physical machines and derive performance results under different request load intensities. Our work differs in the ways below: 1) [7, 8] consider reliable provisioning but our work captures fault/fault-handling activities; and 2) [7, 8] ignore waiting periods, which are actually non-negligible. Ghosh et al. [9] develop an interacting performance model for describing resource placement, dynamic speed scaling, and job failure. To reduce model complexity, their model decomposes the provisioning process into sub phases and obtains performance results of each phase separately. Finally, an integration method composes performance results of separate phases together. For simplicity and tractability, the interacting performance model is subject to accuracy loss because such separate phases of IaaS clouds actually have inter-dependences with each other. Our framework, instead, consider a monolithic provisioning control-flow model where all provisioning activities and phases co-decide final performance. Khazaei et al. [10, 11] develop a more refined work using the CTMC model. They derive closed-form expressions of executing times and failure rate as the QoS metrics but use the fault-free assumption.

3 System model

The IaaS cloud paradigm is featured by elastic provisioning of virtualized computing entities on the Web. In a typical IaaS cloud, third-party providers owns hardware, software, communication, storage, system maintenance, backup, resiliency planning and other infrastructure entities for its customers. IaaS cloud offers highly dynamic resources that can be invoked and used on-demand at run-time. Such elasticity is well-suited for highly fluctuating workload and unexpected increase of request intensity.

In a typical IaaS cloud architecture, a cloud management unit is responsible for admitting consecutively-arrived requests. The request input flow can be usually captured by the input rate, λ. We use c to represent the size of the admission buffer. Such size can be determined before use (e.g., such size can be declared through the FRAME_SIZE field in OpenStack). Customers incoming may depart according to rate θ or enter the resubmission step according to rate 1-θ if the admission buffer has no more space. The virtual-machine-manager (VMM) is responsible for processing requests out of the admission buffer based on the first-in-first-output style and trying to instantiate corresponding VMs on PMs. For the performance estimation purpose, we need to evaluate the responsiveness of VM instantiation, i.e., the expected delay that a request to takes to see its corresponding VM created and ready for invocation.

In OpenStack, the time-stamp of VM creation can be obtained in the INSTANCE_SPAWNED property. As suggested by Fig. 1, the above process involves multiple phases and interactions. The last phase is the time required for VMM to create a VM. Its averaged speed (or processing rate), denoted by μ, is calculated as the reciprocal of averaged instantiation times. In OpenStack, such time is denoted by the interval between INSTANCE_BUILDING and INSTANCE_SPAWNED. With the support of VM multiplexing [12] mechanism, more than one virtual machines are allowed to be mounted on a single PM. However, such ability is usually bounded and we thus use m to denote such bound. VM multiplexing helps to achieves considerable resource saving in comparison with individual-VM based resource provisioning. However, a high multiplexing level is not necessarily profitable due to the fact that process interference can lead to performance and stability deterioration.

Fig. 1
figure 1

Process of VM instantiation on OpenStack

$$ Q=\left[\begin{array}{l}{A}_0\kern1em C\hfill \\ {}{B}_1\kern1em {A}_1\kern1.5em C\hfill \\ {}\kern2em {B}_2\kern1em {A}_2\kern1.5em C\hfill \\ {}\kern3.8em \dots \dots \kern-0.9em \dots \hfill \\ {}\kern8.7em {B}_{k-1}\kern1em {A}_{k-1}\kern1em \overset{\sim }{C}\hfill \\ {}\kern11.6em {\overset{\sim }{B}}_k\kern1.5em {\overset{\sim }{A}}_K\hfill \end{array}\right],\begin{array}{l}{A}_i= diag\left({a}_{i,0},{a}_{i,1}\dots, {a}_{i, g}\right), e= n\times m\hfill \\ {}{a}_{i, j}=\left\{\begin{array}{cc}\hfill - j\times {\mu}^{\hbox{'}}- i\times \mu -\lambda \hfill & \hfill i f0\le i\le e\hfill \\ {}\hfill - j\times {\mu}^{\hbox{'}}- e\times \mu -\lambda \hfill & \hfill i f\kern0.5em e< i\le k\hfill \end{array}\right.\hfill \\ {}{a}_{i, g}=\left\{\begin{array}{cc}\hfill - g\times {\mu}^{\hbox{'}}- i\times \left(1- f\right)\mu -\lambda \hfill & \hfill i f0\le i\le e\hfill \\ {}\hfill - g\times {\mu}^{\hbox{'}}- e\times \left(1- f\right)\mu -\lambda \hfill & \hfill i f e< i\le k\hfill \end{array}\right.\hfill \end{array} $$
(1)
$$ {\overset{\sim }{A}}_k=\left[\begin{array}{l}{a}_{k,0}\kern1em \left(1-\theta \right)\kern-7.3em \lambda \hfill \\ {}{d}_1\kern2.1em {a}_{k,1}\kern1em \left(1-\theta \right)\kern-6.3em \lambda \hfill \\ {}\kern3.4em \dots \kern-5em \dots \kern-4em \dots \hfill \\ {}\hfill \kern-6.5em {d}_j\kern2em {a}_{k, j}\kern2.5em \left(1-\theta \right)\lambda \hfill \\ {}\kern8em \dots \kern1em \dots \kern-1em \dots \hfill \\ {}\kern13.5em {d}_{g-1}\kern1.2em {a}_{k, g-1}\kern2em \left(1-\theta \right)\kern-1em \lambda \hfill \\ {}\hfill \kern15.6em {d}_g\kern3.1em {a}_{k, g-1}\kern1em {q}_c\hfill \\ {}\hfill \kern21em \dots \kern2em \dots \kern1em \dots \hfill \end{array}\right],\begin{array}{c}\hfill ak, j=- e\mu -\left(1-\theta \right)\lambda -{ j\theta \mu}^{\hbox{'}}\hfill \\ {}\hfill \begin{array}{l}{d}_j=\left\{\begin{array}{c}\hfill j\times {\mu}^{\hbox{'}}\times \theta \hfill \\ {}\hfill j\times {\mu}^{\hbox{'}}\times \theta + e\mu \left(1- f\right)\hfill \end{array}\right.\\ {}{q}_c=\left(1-\theta \right)\lambda + e\times f\times \mu \end{array}\hfill \end{array} $$
(2)

As shown in Fig. 1, all the provisioning phases are subject to faults and unsuccessful VM instantiation can thus be inevitable since internal or external communications are sometimes fault-prone. Such faults are usually caused by, for instance, temporary connection failure to remote database, communication congestion, inappropriate input/output sequence, gateway failure, SQL activity failure, and unexpected user quit. The overhead required to counter such faults, e.g., compensation/transactional-rollback/ re-instantiation, can have intensive impact on system performance. This is especially evident when system is heavily loaded as discussed later in the section of case study. The control-flow view of cloud provisioning on fault-prone IaaS cloud, in terms of a queuing network, is shown in Fig. 2. This model erases implementation contents of cloud infrastructures while focusing on the control-flow contents adequate for stochastic modeling. Based on the stochastic control flow model, the following section shows how closed-form expression of performance metrics can be derived and the theoretical effects of changing input loads (request load, fault load, capacity load).

Fig. 2
figure 2

The queuing model of unreliable IaaS cloud provisioning

4 Stochastic analysis

Let N(t) = n denote that the number of requests at the waiting phase or instantiation phase is n at timet, M(t) = m indicate that the number of requests at resubmission phasem, and X(t) = (N(t), M(t)) indicate the system state at timet, the resultant state transition chart is consequentlyE ∈ {0, 1,  ... , k} × {0, 1,  ... , ∞}. X(t) is can be seen as a Markov process with space E and its corresponding transition rules illustrated in Fig. 3. Its transition-rate matrix,Q, is consequently derived as Eq. 1.

Fig. 3
figure 3

The stochastic state space

It is suggested by Fig. 3 that X(t) is irreducible and non-periodical. Let π i , j (t) indicate the probability that system resides in state (k, j) and\( {\pi}_{k, j}=\underset{t\to \infty }{ \lim }{\pi}_{k, j}(t) \), we have that π i , j can be calculated as below given that its distribution is stationary:

$$ {B}_i=\left[\begin{array}{c}{b}_i{e}_i\hfill \\ {}\hfill \kern0.3em \dots \dots \hfill \\ {}\hfill \kern3.4em {b}_i\kern1em {e}_i\hfill \\ {}\hfill \kern5em {b}_i\hfill \end{array}\right],\begin{array}{c}\hfill bi=\left\{\begin{array}{cc}\hfill i\times \mu \times \left(1- f\right)\hfill & \hfill i f0\le i\le e\hfill \\ {}\hfill e\times \mu \times \left(1- f\right)\hfill & \hfill i f e< i\le k\hfill \end{array}\right.\hfill \\ {}\hfill e i=\left\{\begin{array}{cc}\hfill i\times \mu \times f\hfill & \hfill i f0\le i\le e\hfill \\ {}\hfill e\times \mu \times f\hfill & \hfill i f e< i\le k\hfill \end{array}\right.\hfill \end{array} $$
(3)
$$ {\overset{\sim }{B}}_k=\left[\begin{array}{l}{b}_k{e}_k\hfill \\ {}\hfill \kern-1.5em \dots \kern0.5em \dots \hfill \\ {}\hfill \kern5em 0\kern1.5em {b}_k\hfill \\ {}\hfill \kern4.5em 00\hfill \\ {}\hfill \kern4.5em \dots \kern0.4em \dots \hfill \end{array}\right],\begin{array}{l}{b}_k=\mu \times e\times \left(1- f\right)\\ {}{e}_k=\mu \times e\times f\end{array} $$
(4)
$$ C=\left[\begin{array}{l}\lambda \hfill \\ {}\hfill \kern-6em \dots \kern1.5em \dots \hfill \\ {}\hfill \kern-2em {c}_j\kern1.5em \lambda \hfill \\ {}\hfill \kern3.2em \dots \kern1.5em \dots \hfill \\ {}\hfill \kern7.5em {c}_g\kern1.5em \lambda \hfill \end{array}\right],{c}_j={\mu}^{\hbox{'}}\times \lambda $$
(5)
$$ \overset{\sim }{C}=\left[\begin{array}{l}\lambda \hfill \\ {}{c}_1\kern1.5em \lambda \hfill \\ {}\hfill \kern-9.2em \dots \kern1.5em \dots \hfill \\ {}\hfill \kern-4.7em {c}_1\kern1.5em \lambda \hfill \\ {}\hfill \dots \kern1em \dots \hfill \\ {}\hfill \kern9.3em {c}_g\kern1.5em \lambda \kern1.5em 0\kern2em \dots \hfill \end{array}\right],{c}_j={\mu}^{\hbox{'}}\times \lambda $$
(6)
$$ {\pi}_{k, j}={\pi}_{k, j-1}\prod_{l= g+1}^j{\rho}_l $$
(7)

where

$$ {\rho}_l=\frac{e\times f\times \mu +\left(1-\theta \right)\lambda}{l\times \theta \times {\mu}^{\hbox{'}}+ e\times \left(1- f\right)\mu} $$
(8)

It easily follows that ρ l decreases with l and there exists u ∈ N + such thatρ u  < 1

$$ \begin{array}{l}\sum_{j= g+1}^{\infty}\left\{\prod_{l= g+1}^j\left(\frac{e\times f\times \mu +\left(1-\theta \right)\lambda}{l\times \theta \times {\mu}^{\hbox{'}}+ e\times \left(1- f\right)\mu}\right)\right\}\hfill \\ {}\hfill =\sum_{j= g+1}^{\infty}\left\{\prod_{l= g+1}^j{\rho}_l\right\}<\sum_{j= g+1}^u{\left({\rho}_{g+1}\right)}^{j- g}+\sum_{j= g+1}^{\infty}\left(\prod_{l= g+1}^j{\rho}_l\right)\hfill \\ {}<\sum_{j= g+1}^u{\left({\rho}_{g+1}\right)}^{j- g}+{\left({\rho}_{g+1}\right)}^{u- g}\sum_{j= u+1}^{\infty}\left(\prod_{l= u+1}^j{\rho}_l\right)\hfill \\ {}<\sum_{j= g+1}^u{\left({\rho}_{g+1}\right)}^{j- g}+{\left({\rho}_{g+1}\right)}^{u- g}\sum_{j= u+1}^{\infty }{\left({\rho}_l\right)}^{j- u}\hfill \\ {}=\sum_{j= g+1}^u{\left({\rho}_{g+1}\right)}^{j- g}+{\left({\rho}_{g+1}\right)}^{u- g}\frac{\rho_u}{1-{\rho}_u}<\infty \hfill \end{array} $$
(9)

The above derivation suggests that (9) is always less than 1. Based on the property of the birth-death processes, Fig. 3 is therefore stationary. The steady state probabilities can therefore be obtained as:

$$ \pi Q=0,\sum_{i=0}^{k-1}\sum_{j=0}^g{\pi}_{i, j}+\sum_{j=0}^{\infty }{\pi}_{k, j}=1 $$
(10)

Since the stationary distribution exists, it easily follows that:

$$ \begin{array}{l}{\pi}_{k-1, g}\lambda +\left(1-\theta \right){\pi}_{k, g-1}\lambda + e\left(1- f\right)\mu \\ {}={\pi}_{k, g}\left( e\times \mu +\lambda \left(1-\theta \right)+\theta \times g\times {\mu}^{\hbox{'}}\right)\\ {}-{\pi}_{k, g+1}\left( g+1\right)\times \theta \times {\mu}^{\hbox{'}}\end{array} $$
(11)

From (7) we have:

$$ {\pi}_{k, g+1}={\pi}_{k, g}\frac{\lambda \left(1-\theta \right)+ e\left(1- f\right)\mu}{\left( g+1\right)\theta \times {\mu}^{\hbox{'}}+ e\times \left(1- f\right)\mu} $$
(12)

Combining the above equation with (11), we have:

$$ {\pi}_{k, g-1}\lambda \left(1-\theta \right)={\pi}_{k, g}\left( e\times \left(1- f\right)\mu + g\times \theta \times {\mu}^{\hbox{'}}\right) $$
(13)

which indicates that \( {A}_k={\overset{\sim }{A}}_k \) and \( {B}_k={\overset{\sim }{B}}_k \).

According to (10) and (13), it easily follows that:

$$ \begin{array}{l}{T}_0\frac{1}{W}{A}_0+{T}_1\frac{1}{W}{B}_1=0\\ {}{T}_i\frac{1}{W} C+{T}_{i+1}\frac{1}{W}{A}_{i+1}+{T}_{i+2}\frac{1}{W}{A}_{i+2}=0,0\le i\le k-2\\ {}{T}_{k-1}\frac{1}{W} C+{T}_k\frac{1}{W}{A}_k=0\end{array} $$
(14)

where T 0 is a basic solution of T 0(V k A k  + V k − 1 C) = 0 and π i , j is subject to:

$$ \begin{array}{l}\left({\pi}_{0,0},\dots, {\pi}_{0, g}\right)={T}_0\frac{1}{W}\\ {}\left({\pi}_{i,0},\dots, {\pi}_{i, g}\right)=\left({\pi}_{0,0},\dots, {\pi}_{0, g}\right){V}_i,0< i\le k\\ {}{\pi}_{k, j}={T}_0\frac{1}{W}{V}_k{\omega}_1\prod_{l= g+1}^j\frac{e\times f\times \mu +\left(1-\theta \right)\lambda}{l\times \theta \times {\mu}^{\hbox{'}}+ e\times \left(1- f\right)\mu}\end{array} $$
(15)

where ω 1 is a column vector and its dimension is g + 1. V k is subject to:

$$ \begin{array}{l}{V}_0= I\\ {}{V}_1=-{A}_0{\left({B}_1\right)}^{-1}\\ {}{V}_2=-\left({V}_0 C+{V}_1{A}_1\right){\left({B}_2\right)}^{-1}\\ {}{V}_i=-\left({V}_{i-2} C+{V}_{i-1}{A}_{i-1}\right){\left({B}_i\right)}^{-1},2< i\le k\end{array} $$
(16)

W can be obtained as:

$$ \begin{array}{l} W={T}_0\left(\sum_{i=0}^k{V}_i\right)\omega +\\ {}{T}_0{V}_k{\omega}_1\sum_{j= g+1}^{\infty}\left(\left.\prod_{l= g+1}^j\frac{e\times f\times \mu +\left(1-\theta \right)\lambda}{l\times \theta \times {\mu}^{\hbox{'}}+ e\times \left(1- f\right)\times \mu}\right)\right)\end{array} $$
(17)

From (14), it follows that:

$$ {T}_1\frac{1}{W}=-{T}_0\frac{1}{W}{A}_0{\left({B}_1\right)}^{-1}={T}_0\frac{1}{W}{V}_1 $$
(18)

and similarly:

$$ {T}_2\frac{1}{W}=-{T}_0\frac{1}{W}\left({V}_1{A}_1+ C\right){\left({B}_2\right)}^{-1}={T}_0\frac{1}{W}{V}_2 $$
(19)

We can therefore finally have:

$$ {T}_i=-{T}_0{V}_i $$
(20)

and the closed-form expression of T i is calculated.

Based on (20) and (7), we work out the steady-state probabilities, π i , j . Note that a similar derivation is presented in [13].

5 Performance estimation

As mentioned earlier, we are interested in: 1) Expected responsiveness of VM instantiation, T; and 2) Request rejection rate, R.

According to earlier observations, T means the expectation of the interval between request arrival and VM creation. Response delay is an often used performance measure for system efficiency and responsiveness. The principle behind a time-based performance guarantee usually prefers low response, thus allowing for a quick accomplishment of requested jobs. This also means high system reliability since in a fault-prone system the probability of seeing faults increases with the increase of response delay.

To derive the expectation of instantiation response delay, we first need to evaluated the probability that a cloud request need a retrial, P r :

$$ {P}_r=\left(1-\theta \right)\left(\sum_{j=0}^{\infty }{\pi}_{k, j}\right)+ f\left(1-\sum_{j=0}^{\infty }{\pi}_{k, j}\right) $$
(21)

The expected retrials of a request, N r , is therefore calculated as:

$$ {N}_r=\frac{1}{1-{P}_r}-1 $$
(22)

The expectation of the time for a request to stay before it enters the request input flow,T r , is consequently obtained as:

$$ {T}_r=\frac{\lambda^{\hbox{'}}/{\mu}^{\hbox{'}}+\frac{P0\left({\lambda}^{\hbox{'}}/\left( g\times {\mu}^{\hbox{'}}\right)\right){\left({\lambda}^{\hbox{'}}/{\mu}^{\hbox{'}}\right)}^g}{g!\left(1-{\lambda}^{\hbox{'}}/\left( g\times {\mu}^{\hbox{'}}\right)\right)}}{\lambda^{\hbox{'}}} $$
(23)

where P 0 indicates the probability that no request being resubmitted:

$$ {P}_0=\frac{1}{\sum_{l=0}^{g-1}\frac{{\left({\lambda}^{\hbox{'}}/{\mu}^{\hbox{'}}\right)}^l}{l!}+\frac{{\left({\lambda}^{\hbox{'}}/{\mu}^{\hbox{'}}\right)}^g}{g!}}\left(\frac{1}{1-{\lambda}^{\hbox{'}}/ g\times {\mu}^{\hbox{'}}}\right) $$
(24)

and λ ' is the intensity of retrial flow into the input flow:

$$ {\lambda}^{\hbox{'}}=\left(\lambda +{\lambda}^{\hbox{'}}\right)\left[\left(\sum_{j=0}^{\infty }{\pi}_{k, j}\right)\left(1-\theta \right)+\left(1-\sum_{j=0}^{\infty }{\pi}_{k, j}\right) f\right] $$
(25)

where λ ' is obtained as:

$$ {\lambda}^{\hbox{'}}=\lambda \frac{\sum_{j=0}^{\infty }{\pi}_{k, j}\left(1-\theta \right)+ f-{\sum}_{j=0}^{\infty }{\pi}_{k, j}\times f}{1-{\sum}_{j=0}^{\infty }{\pi}_{k, j}\left(1-\theta \right)- f+{\sum}_{j=0}^{\infty }{\pi}_{k, j}\times f} $$
(26)

The expectation of the aggregate time for a request to reside till its successful instantiation given that no rejection happens, T b , is consequently:

$$ {T}_b={N}_r\left({T}_r\sum_{j=0}^{\infty }{\pi}_{k, j}+\left({T}_r+{T}_{\upsilon}\right)\left(1-\sum_{j=0}^{\infty }{\pi}_{k, j}\right)\right) $$
(27)

where T υ is calculated as:

$$ {T}_{\upsilon}=\frac{1}{\mu}+\sum_{j=0}^g{\pi}_{0, j}\frac{\rho {\left(\rho \times e\right)}^e}{\lambda^{\hbox{'}\hbox{'}}\left(1-{\rho}^2\right)\times (e)!} $$
(28)

\( \rho =\frac{\lambda^{\hbox{'}\hbox{'}}}{e\times \mu} \) and

$$ {\lambda}^{\hbox{'}\hbox{'}}=\left(\lambda +{\lambda}^{\hbox{'}}\right)\left(1-\sum_{j=0}^{\infty }{\pi}_{k, j}\right) $$
(29)

Finally, T can be expressed as:

$$ T={T}_b+{T}_{\upsilon} $$
(30)

Request rejection rate, R, stands for the percentile of the rejected requests, due to the capacity constraint or faulty instantiation, to the number of submitted ones (Table 1).

$$ R=\theta \sum_{j=0}^{\infty }{\pi}_{k, j} $$
(31)
Table 1 Analytical performance vs. confidence intervals (CIs)

6 Experimental study

To prove the effectiveness and accuracy of the proposed work, we carry out an experimental study on an actual campus IaaS cloud of ChongQing University (CQU). The IaaS cloud has 6 identical Intel I450 servers, each of which has 3-CPU/8G RAM/4 TB RAID assigned as customers’ space. Each PM concurrently supports less than 32 VMs. The admission buffer maintains no more than 16 requests while the faulty rate ranges from 0.13 to 0.79% The request input rates and instantiation rates vary by time. The resubmission control has 4 parallel threads processing the faulty instantiations, which means g = 4. The leaving rate of blocked requests is 11.3%, which means that θ = 0.113. A VMM unit interprets each request into a corresponding VM instance. It is developed based on the XenServer and OpenStack toolkits. Its physical architecture is given in Fig. 4.

Fig. 4
figure 4

The physical architectural of the campus cloud

As suggested, the log file shows events of requests in consecutive intervals of 1 h. We calculate the 90% CIs from the measured results of response delays. A normal distribution is used as the fitting function to derive the CI of Tas:

$$ intv(T)=\left[\overline{c} t-{z}_{1- a/2}\frac{ s dv}{\sqrt{\widehat{s}}},\overline{c} t+{z}_{1- a/2}\frac{ s dv}{\sqrt{\widehat{s}}}\right] $$
(32)

where \( \overline{c} t \) suggests the average of measured instantiation response delay, sdv suggests the standard deviation, \( \widehat{s} \) suggests the sample size, z the z-distribution, and α the confidence level. Note that the reason for using the normal distribution is bifold: (a) the empirical distribution of experimental performance results obtained from the log-file highly resembles the normal distribution; (b) according to the central theorem, linear combination of independent random variables asymptotically shows a normal distribution no matter how the original variables themselves are distributed. This is especially true when the number of variables is large enough. As suggested by derivations in the sections of ‘stochastic analysis’ and ‘performance results’, response delay could be viewed as a combination of multiple task-waiting-times, and VM-instantiation-times, resubmission-handling times. Its empirical distribution therefore shows a high resemblance to the normal distribution.

Finally, we obtain the confidence interval of R:

$$ intv(R)=\left[\overline{r}-{z}_{1- a/2}\sqrt{\frac{\overline{r}-{\overline{r}}^2}{\widehat{s}}},\overline{r}+{z}_{1- a/2}\sqrt{\frac{\overline{r}-{\overline{r}}^2}{\widehat{s}}}\right] $$
(33)

where \( \overline{r} \) stands for the experimental rejection rate.

As told by Figs. 5 and 6, all calculated performance results derived from the theoretical framework converge to their 90% CIs, thereby implying the model accuracy. Its x-axis means the test times and the y-axis their corresponding results of averaged response delays and request rejection rate recorded in the log-file. Figs. 7 and 8 show performance versus request input rates when m = 2, c = 8, μ = 0.00125, μ′ = 0.01, g = 4, f = 0.08, θ = 0.1. Increase of input rate results in increase of expected response delays and rejection probability. This is more obvious when the cloud has more PMs. It is also observed that clouds with increased number of PMs are less sensitive to performance degradation with the increases of input rate. Figs. 9 and 10 illustrate performance variations with changes of VM instantiation speeds when m = 2, c = 8, μ′ = 0.01, g = 4, f = 0.08, θ = 0.1, λ = 0.01. Increase of VM instantiation rate results in reduced instantiation delays and rejection rate. In Figs. 11 and 12, the increase of buffer size results in reduced expected response delays and rejection rate, where the increase is obvious when the buffer is small. It can be suggested as well that the increase of PMs brings in increased performance.

Fig. 5
figure 5

Validation of response delay through confidence interval check

Fig. 6
figure 6

Validation of rejection rate through confidence interval check

Fig. 7
figure 7

Analytical response delay vs. input rate

Fig. 8
figure 8

Analytical rejection rate vs. input rate

Fig. 9
figure 9

Analytical response delay vs. VM instantiation

Fig. 10
figure 10

Analytical rejection rate vs. VM rate instantiation

Fig. 11
figure 11

Analytical response delay vs. Request buffer size

Fig. 12
figure 12

Analytical rejection rate vs. request buffer size

7 Optimal responsiveness determination

As discussed in sections above, we work out a comprehensive performance estimation model for unreliable IaaS cloud with faulty instantiations and retrials. In this section, we focus on identifying the optimal system capacity [14] with best performance while complying given cost constraints [15]. Specifically, we need to know the highest system responsiveness possible, in terms of the expected instantiation response delay, with the constraint of rejection rate and the cost decided by the number of PMs and the size of request buffer. This aim is interpreted as:

$$ \begin{array}{l}\begin{array}{cc}\hfill Min\hfill & \hfill T\left( n, c, m, g\right)\hfill \\ {} s. t.\hfill & R\left( n, c, m, g\right)< RJ\hfill \end{array}\\ {}\kern5em \begin{array}{l} n\times cpm< CPM\hfill \\ {} cbf(c)< CBUF\hfill \\ {} cm(m)\times n< CM\hfill \\ {} GLO< g< GUP\hfill \end{array}\end{array} $$
(34)

where n, k, m, g (i.e., the number of PMs, the size of request buffer, the multiplexing level, and the number of concurrent resubmitted requests that a cloud can support) stand for decision variables, cpm the cost of a single PM, cbf : N + → Real the cost function for the request buffer, cm : N + → Real the cost function multiplexing (multiplexing more jobs on a single PM is simply more expensive).

The problems to be solved is an instance of nonlinear and discrete programming formulation. The following reasons account for the non-linearity feature. All performance measures obtained from the stochastic models are nonlinear functions of the decision variables. Thus, the objective function is a nonlinear function of the decision variables and its results is in a non-linear form. As n , c , m , g are integer, the formulation is a nonlinear, integer programming problem, which is usually NP-hard.

We consider a intelligent algorithm based on the simulated annealing mechanism to solve the optimization problem. Simulated annealing is a randomized algorithm which provides near-optimal global solution. Annealing can be viewed as a thermal process for getting low energy states by giving heat at a high temperature. The process is consistently under a cooling mechanism. The resulting algorithm outputs a series of states of the solid in the way below. Given a current state i with energyE i , a next state (i + 1) with energy E i + 1 is generated through perturbation. According to the Metropolis criterion, if the energy gap between the two neighboring states is low enough then is seen as the updated current state. If the energy gap is larger than zero, then state i + 1 is chosen by probabilityexp((E i  − E i + 1)/k B T), where k B means the Boltzmann constant and TP the temperature. In this way, the simulated-annealing algorithm resembles the Metropolis algorithm. The way of identifying final solutions is similar to that of the physical annealing process, where optimal solutions resemble the states and the cost of a solution resembles its energy. When the cost of the current solution i is F(i), a next solution (i + 1) with cost F(i + 1) is generated. The next solution is deterministically chosen if F(i + 1) ≤ F(i), otherwise, the next solution is chosen with probabilityexp((F(i) − F(i + 1))/TP). TP stands for the controlling temperature. Simulated annealing therefore works in the way similar to that of the Metropolis algorithm evaluated with decreasing temperature. Its solution is numerical and asymptotically optimal.

For instance, let’s consider the following cost function of cbf and cm:

figure a
$$ cbf(c)=\left\{\begin{array}{c}\hfill 1\kern3.5em if\kern0.75em 0< c\le 8\hfill \\ {}\hfill 1.5\kern2.25em elseif\ 9< c\le 32\hfill \\ {}\hfill 2.2\kern1.75em elseif\ 33< c\le 64\hfill \end{array}\right. $$
(35)
$$ cm(c)=\left\{\begin{array}{c}\hfill 1\kern3.5em if\kern0.75em 0< c\le 8\hfill \\ {}\hfill 1.6\kern2.25em elseif\ 9< c\le 16\hfill \\ {}\hfill 4.3\kern1.75em elseif\ 17< c\le 32\hfill \end{array}\right. $$
(36)

With the cost and parameter settings above, we invoke the simulated-annealing-based algorithm to work out optimal solutions in Table 2.

Table 2 Solutions generated by the simulated-annealing-based algorithm

8 Conclusions

In this manuscript, we develop a stochastic performance-estimation model for fault-prone IaaS clouds with faulty-instantiations and retrials. We employ expected instantiation response delay and request rejection as the fundamental performance measures and evaluate the effects of changing capacity load, work load, and fault load on cloud responsiveness and request rejection rate. To prove the effectiveness and accuracy of our proposed model, we compare experimental performance results (from a campus cloud) and those of theoretical ones and carry out a confidence-interval validation. With the interests of knowing best system responsiveness with constraints of request rejection rate and capacity cost, we also formulate the theoretical performance model into an optimization problem and solve it using an intelligent algorithm.