1 Introduction

Considering cost-saving, expertise and management, more and more users begin to outsource their job executions to professional service providers, which promotes a new type of cloud offerings–service clouds. A service cloud operates a set of resources to provide professional services to users. Taking a rendering service cloud [2] as an example, it supplies rendering job execution services which accept rendering jobs from users and execute computation to produce pictures or videos as service results. For service providers, the cost of setting up, maintaining and operating data centers is generally high. Besides, the number of user arrivals and their demands for services are uncertain. In this situation, service providers can reduce their costs and risk by purchasing on-demand instances (pay per unit instance per unit time) from IaaS clouds such as Amazon [1].

Fig. 1.
figure 1

A working scenario of service clouds.

For service clouds, this paper targets the goal of maximizing the social welfare (i.e., the total utility of all individuals in the system, including the service provider and all users). Figure 1 shows the working scenario of a service cloud. Users are generally job-oriented with soft deadline constraints, which means they only care about the completion time of their jobs and have different valuations for different completion time. To achieve the optimal social welfare, the service provider purchases instances and schedules jobs according to the valuations of users. Thus users are required to report the valuations they are willing to pay to the service provider when submitting jobs. The service provider then purchases on-demand instances from IaaS clouds, schedules users’ jobs on these instances to produce service results and charges users corresponding payments. During this process, service providers should make decisions on IaaS instance purchasing as well as scheduling and pricing problems simultaneously. This is because the instance purchasing and scheduling schemes affect each other and ultimately influence the social welfare together.

The above decision-making process for arbitrary types of jobs in a service cloud is often a multifaceted puzzle, and thus is sometimes intractable. In this work we focus on pleasingly parallel jobs. They have flexible degree of parallelism and can be divided into an arbitrary but reasonable number of identical tasks to be executed in parallel on different instances, while no extra effort is needed. Thus different scheduling schemes of a job can result in different completion time. This type of job accounts for a large proportion in cloud market [7].

For optimally purchasing IaaS instances and scheduling parallel jobs with the goal of social welfare maximization, users’ private valuations for their jobs are needed. Since users are rational and may increase their utilities by misreporting, well-designed auction mechanisms are needed to extract users’ real valuations and make efficient decisions for service providers. However, there are many challenges when designing proper auction mechanisms. First, achieving optimal social welfare is NP-hard, even considering the parallel job scheduling problem only. Second, soft deadline constraints increase the difficulty of decision-making and truthfulness. Finally, the challenge further escalates when our mechanism makes decisions on instance purchasing and scheduling problems simultaneously.

In this paper, we propose a randomized auction mechanism for optimally purchasing IaaS instances and scheduling parallel jobs in service clouds. The main building blocks of this mechanism include an instance purchasing algorithm, a scheduling algorithm and a pricing algorithm. Our proposed randomized mechanism can achieve an approximately optimal social welfare and guarantee truthfulness in expectation, by which users will report their valuations truthfully. This mechanism is computationally efficient and individually rational, which means it has polynomial-time complexity and makes the utilities of all users non-negative. It can also schedule jobs while guaranteeing the non-preemption of tasks (i.e., an ongoing task cannot be interrupted before its completion). Through theoretical analysis and extensive simulations based on both synthetic and real data, we show that our mechanism can approximately maximize the social welfare.

The rest of the paper is organized as follows. Section 2 introduces the related works and Sect. 3 formulates the optimal instance purchasing and parallel job scheduling problem in service clouds as an integer programming. In Sect. 4, we propose a randomized mechanism. We then verify its performance through simulations in Sect. 5. Finally, we state concluding remarks in Sect. 6.

2 Related Work

Using mechanism design to schedule jobs has also been investigated in previous studies. Chen et al. [4] design a copula-based generic randomized truthful mechanism for scheduling on two unrelated machines. The goals of these works are all makespan minimization and neither of them considers the pricing problem of jobs. Varakantham et al. [8] consider the strategic variant of resource constrained project scheduling problems. They provide practical truthful mechanisms in which agents report their durations and costs of tasks as bids.

The mechanisms mentioned above assume that cloud providers or schedulers have their own resources, while in our problem service providers prefer to purchase resources from IaaS clouds. Because of the uncertainty of user demands, service providers should decide the instance purchasing scheme according to real-time situations to achieve a higher social welfare, rather than purchasing a fixed number of instances. Thus an instance purchasing algorithm is also needed.

Many works have been done for resource purchasing problem. In [11], Wang et al. propose two practical online algorithms that dynamically combine on-demand and reserved instances without any knowledge of the future. This work considers the resource purchasing problem from the perspective of end users, while in our problem the instance purchasers are also service providers who will provide services to end users. Zhao et al. [12] develop two resource purchasing models with the goal of minimizing purchase cost while meeting all the service demands. However, when providing services to users with the aim of social welfare maximization, service providers need to provide services to users selectively.

Based on these discussions, our main contribution lies in that we propose a randomized auction mechanism for optimally purchasing IaaS instances and scheduling parallel jobs in service clouds. In this mechanism, with the aim of social welfare maximization, instance purchasing as well as job scheduling and pricing schemes are determined simultaneously.

3 System Model

3.1 Fundamental Notations

Since the whole time axis is infinite and the users arrive constantly, we propose that the service provider makes decisions on instance purchasing, scheduling and pricing problems through round by round auctions with regular time intervals, which means the time intervals are of the same duration. At the start of each interval, the service provider purchases instances from IaaS clouds and processes all the job requests arrived so far. These jobs will either be completed in the next interval, or be rejected. The rejected jobs need to be re-submitted.

We assume that there are m types of on-demand instances which a service provider can purchase from an IaaS cloud platform, denoted by \(\mathcal {M}\!=\!\{1, 2,\ldots , m\}\). The purchase price of instance j is \( price_{j} \), which is fixed and set by IaaS providers. The number of units of instance j which are purchased by the service provider is \(r_{j}\). In each interval, the time axis is divided into T discrete slots, denoted by \(\mathcal {T}\! =\!\{1, 2, \ldots , T\}\). There are total n users whose requests will be processed in an interval, denoted by \(\mathcal {U}\! =\!\{1, 2,\ldots , n\}\). We assume that each user has a parallel job to execute and jobs are scheduled according to time slots. The runtime of a job on different instance types is varied, depending on the performance of the instances. We use \(l_{i,j}\) to denote the estimated runtime of job i on instance j, i.e., the number of time slots needed to complete job i on one unit of instance j. This information can be obtained via historical data collected from previous execution records. The problem of estimating jobs’ runtime is a complex issue that has been extensively researched ([9] et al.) and falls beyond the scope of this paper.

Considering the practical situations of job execution, we use threshold \(k_{i}\) to limit the degree of parallelism, i.e., the number of tasks that job i can be divided into. The valuations of user i are denoted by \(v_{i}\). Specially \(v_{i}^{e}\) means user i is willing to pay \(v_{i}^{e}\) if its job is completed at time slot e. Let \(b_{i}\) and \(b_{i}^{e}\) represent corresponding bids reported by users, which may be different from \(v_{i}\) and \(v_{i}^{e}\). Since jobs are scheduled according to time slots, each job has T possible completion time. Thus each user can submit at most T bids to represent discrete deadline options. Let \(p_{i}\) denote the payment charged to user i, which is calculated by the service provider. Let binary variable \(x_{i}^{e}\) indicate whether job i is fully completed at time slot e. Then the utility \(u_{i}\) of user i is:

$$u_{i}=\left\{ \begin{array}{lcl} \mathop {\sum }\nolimits _{e\in \mathcal {T}} v_{i}^{e}x_{i}^{e}-p_{i} &{} &{} \text {if job}\,\, i \,\,\text {is fully completed}\\ 0 &{} &{} \text {otherwise} \end{array} \right. $$

3.2 Problem Formulation

We now formulate the optimal IaaS instance purchasing and parallel job scheduling problem in service clouds. Let \(y_{i,j}^{e}(t)\) represent the number of units of instance j assigned to job i at time slot t, when job i will be completed at time slot e. Under the assumption of \(b_{i}\!= \!v_{i}\), the problem can be formulated in more formal and mathematical terms:

$$\begin{aligned} \qquad \max \quad&\sum _{i\in \mathcal {U}}\sum _{e\in \mathcal {T}}b_{i}^{e} x_{i}^{e} - \sum _{j\in \mathcal {M}} price_{j} \cdot r_{j} \qquad \quad \qquad \qquad \qquad \qquad \qquad \qquad \hbox {(IP)} \end{aligned}$$
$$\begin{aligned} \text {s.t.}\quad&\sum _{t\le e}\sum _{j\in \mathcal {M}}y_{i,j}^{e}(t) \cdot \frac{1}{l_{i,j}}=x_{i}^{e}&\quad&\forall \ i\in \mathcal {U}, e\in \mathcal {T} \end{aligned}$$
(1)
$$\begin{aligned}&\sum _{e\in \mathcal {T}} x_{i}^{e} \le 1&\quad&\forall \ i\in \mathcal {U}\end{aligned}$$
(2)
$$\begin{aligned}&\sum _{i\in \mathcal {U}}\sum _{e\in \mathcal {T}}y_{i,j}^{e}(t)\le r_{j}&\quad&\forall \ t\in \mathcal {T}, j\in \mathcal {M}\end{aligned}$$
(3)
$$\begin{aligned}&\sum _{t\le e}\sum _{j\in \mathcal {M}}y_{i,j}^{e}(t) \le k_{i}&\quad&\forall \ i\in \mathcal {U}, e\in \mathcal {T}\end{aligned}$$
(4)
$$\begin{aligned}&y_{i,j}^{e}(t) \in \mathbb {Z}&\quad&\forall \ i\in \mathcal {U}, e\in \mathcal {T}, j\in \mathcal {M}, t\le e \end{aligned}$$
(5)
$$\begin{aligned}&x_{i}^{e}\in \{0,1\}&\quad&\forall \ i\in \mathcal {U}, e\in \mathcal {T} \end{aligned}$$
(6)
$$\begin{aligned}&r_{j}\in \mathbb {Z}&\quad&\forall \ j\in \mathcal {M} \end{aligned}$$
(7)

Constraint (5) means a unit of instance can only run one job per time slot for guaranteeing the non-preemption of tasks. Constraint (6) means partial completion of a job is not allowed. Besides, since in practice IaaS clouds only sell integer units of instances to users, Constraint (7) is needed.

By solving (IP) we can get the optimal instance purchasing and job scheduling schemes. However, finding the exact solution of (IP) is NP-hard. Besides, to satisfy the assumption of \(b_{i}\,=\,v_{i}\), the truthful reports of users should be guaranteed. Considering these challenges, we design a randomized auction mechanism.

4 A Randomized Auction Mechanism

The main process of the mechanism is as follows.

  • Formulate the optimal IaaS instance purchasing and parallel job scheduling problem in service clouds as an integer programming (IP), and relax its integer constraints to get corresponding linear programming (LP).

  • Calculate the optimal fractional solution \( r^{*} \) and \( x^{*} \) (as well as corresponding y) by solving the linear programming (LP). This fractional solution is actually an infeasible scheme.

  • Transform the optimal fractional \( r^{*} \) by rounding it up to the nearest integer, and purchase instances from IaaS clouds according to it.

  • Decompose \( x^{*} \) into a series of feasible integer scheduling schemes, through a coloring decomposition algorithm. Select one feasible integer solution randomly and schedule jobs on the purchased instances according to it.

  • If user i’s job can be completed, calculate the payment \(p_{i}\,=\, V _{\mathcal {U}\backslash i}\!-\! ( V^{*} _{\mathcal {U}\backslash i})\).

Here \(p_{i}\) is the marginal harm caused by the participation of i to other users. \( V _{\mathcal {U}\backslash i}\) is the optimal social welfare without i’s participation. \( V_{\mathcal {U}\backslash i}^{*} \) is the social welfare caused by \( x^{*} \) minus the corresponding valuation of i.

The coloring decomposition algorithm consists of two main steps. The first step is constructing a color set \(\mathcal {I}\). Let \(a_{i}^{e}\) denote a full allocation, meaning if we schedule job i according to \(a_{i}^{e}\), job i can be fully completed at time slot e. Let \(a_{i,j}^{e}(t)=y_{i,j}^{e}(t)/x_{i}^{e}\). Then we round fractional \(a_{i,j}^{e}(t)\) up to the nearest integer. Assume that \(x_{i}^{e}\!=\!\frac{q}{N}\!+\!z\), where \(q, N \!\in \!\mathbb {N}\) and \(0 \!\le \!z \!\le \!\frac{1}{N}\). N is a parameter which can be set by the service provider. Let:

$$ \overline{x}_{i}^{e}=\left\{ \begin{array}{lcl} \frac{q+1}{N} &{} &{} \text {with probability} \, N\cdot z\\ \frac{q}{N} &{} &{} \text {otherwise} \end{array} \right. $$

For each user i, we add \(N\!\cdot \!\overline{x}_{i}^{e}\) copies of \(a_{i}^{e}\) into \(\mathcal {I}\). The second step is dividing the allocations in \(\mathcal {I}\) into many independent groups, with the following rules: (i) No two allocations in the same group belong to the same job. (ii) No two allocations in the same group conflict with time and the capacity constraint.

Theorem 1

The proposed randomized mechanism can achieve an approximately optimal expected social welfare in polynomial-time.

The theoretical lower bound of the expected social welfare is \(\frac{ OPT^{*} }{\alpha }-\!\sum _{j} price_{j} (\frac{1}{\alpha }+\!\frac{\alpha -1}{\alpha } C_{max} )\), where \(\alpha \!=\!(1\!+\!\frac{ C_{max} }{ C_{min} \!-k})(1\!+\!\frac{nT}{N})\). Here \( C_{max} \), \( C_{min} \) and k represent the maximum, minimum units of instances and the maximum threshold of jobs respectively, i.e., \( C_{max} \,=\,\max _{j} m_{j}, \; C_{min} \,=\,\min _{j} m_{j},\) and \(k\,=\,\max _{i} k_{i}\).

Theorem 2

The proposed randomized mechanism is truthful in expectation.

For each user, reporting its true valuations always maximizes its expected utility, regardless of the bids reported by other users.

Theorem 3

The proposed randomized mechanism is individually rational.

The utility of each user is non-negative, which means the payment a user should pay will not exceed its valuation. This property guarantees the voluntary participation of users.

5 Performance Evaluation

5.1 Simulation Setup

We evaluate the performance of our randomized mechanism through both synthetic and real data, by comparing with the optimal fractional result calculated by the fractional VCG mechanism [5, 6, 10]. The optimal fractional result is infeasible in practice and thus it only works as a benchmark for comparison.

We assume that there are \(m\,=\,5\) types of instances purchased by the service provider from IaaS clouds. We assume that the time interval of auctions is set to 1 h and the time axis of a round auction is divided into \(T\,=\,6\) time slots with 10 min per slot. The price is set according to Amazon EC2 on-demand instances. The bids of users equal jobs’ runtime timing corresponding instance price, and then timing a value randomly picked in the range [0.8, 1.2] to reflect the preferences of users. The bids of a user are monotonically non-decreasing with completion time e. The parameter N which is used in the decomposition algorithm is set to 5000. Besides, considering the randomized nature, for each experiment we randomly select 100 integer scheduling schemes rather than only one to compute the average result.

5.2 EXP1: Influence of Different Runtime Distributions

We investigate the generated social welfare with different runtime distributions. Normal (20, 5) means the runtime is drawn from a normal distribution with mean 20 and standard deviation 5. Uniform (15, 25) means the runtime is drawn from a uniform distribution bounded by 15 and 25. Constant 20 means the runtime is always 20. The number of users n varies from 60 to 150, adding 10 to each experiment. The threshold \(k_{i}\) is drawn from [5, 30].

Fig. 2.
figure 2

Social welfare of our randomized mechanism compared with the optimal fractional result and theoretical lower bound, with three runtime distributions. (Color figure online)

In Fig. 2, the optimal fractional result (green curve) is the largest. The actual social welfare (blue point) has randomness and is always better than the theoretical lower bound (red curve) calculated by Theorem 1. We can see that the three figures corresponding to three different runtime distributions are much similar, which means the performance of our mechanism does not change significantly under different runtime distributions. This result can enable our mechanism to be more widely used.

5.3 EXP2: Performance of Varying the Number of Users

In this simulation, we study the performance of our mechanism when the number of users is varying, using Google cluster-usage data [3]. We vary the number of users n from 10 to 100, adding 10 to each experiment.

Fig. 3.
figure 3

Social welfare and total payment of our randomized mechanism compared with the optimal fractional result, when the number of users is varying.

From Fig. 3 we can see that as the number of users increases, the optimal fractional and actual social welfare as well as the total payment increase almost linearly. This is because when simultaneously considering the instance purchasing and job scheduling problems, our mechanism will purchase more instances if satisfying more users’ demands increases the social welfare. This is different from the situation where service providers operate data centers themselves or purchase fixed resources. This simulation shows that under real-world data our randomized mechanism can approximately maximize the social welfare.

6 Conclusions

In this paper, we propose a randomized auction mechanism which approximately maximizes the social welfare for optimally purchasing IaaS instances and scheduling parallel jobs in service clouds. This mechanism is truthful in expectation, computationally efficient, individually rational and can guarantee the non-preemption of jobs. The theoretical analysis and extensive simulations validate the efficiency of our mechanism.