A cost-driven online auto-scaling algorithm for web applications in cloud environments

https://doi.org/10.1016/j.knosys.2022.108523Get rights and content

Abstract

Today, many web application service providers rely on clouds to deploy applications to serve users. Generally, request arrivals faced by web applications are dynamic and uncertain. When a service provider deploys web applications in clouds, for saving costs, it needs to flexibly rent cloud VM (Virtual Machine) instances based on dynamic request arrivals. However, renting an instance too early may incur more rental fees for the new instance being added incorrectly due to few future requests, and renting an instance too late may incur more penalty fees for SLA (service-level agreement) violations due to too many future requests, which indicates that an arbitrary instance scaling decision will incur more costs. For making optimal instance scaling decisions, future request arrival rate curves are needed, but it is generally very hard to predict them precisely. To solve this problem, in this paper, we propose a cost-driven online auto-scaling algorithm which can make optimized instance rental decisions without requiring future knowledge. We show theoretically that the proposed algorithm can achieve a guaranteed competitive ratio which is less than 2. Eventually, we verify the effectiveness of our online auto-scaling algorithm via extensive experiments using workload data which can simulate real end users.

Introduction

In general, the number of end users that most web applications face is dynamic and uncertain. Among these web applications, there is a type whose workloads fluctuate by hours, such as: (1) examination registration system, whose number of end users will suddenly increase by dozens or hundreds of times when the system is just open; (2) questionnaire filling system, which will face a sudden increase of end users in a short time when the questionnaire link is sent to the audiences; and (3) online examination system, which has a relatively fixed time range during which almost all end users will access the system. Web applications as described above often experience workload increases by hours, and after a few hours or days, workloads usually return to a low and steady state.

To cope with the above workload uncertainties, resources are needed to be flexibly provided to web applications while cloud computing can provide resource services elastically. To save costs and increase flexibility, in practice, based on cloud computing, more and more enterprises choose to host their web applications on the cloud. SaaS (Software-as-a-Service) and IaaS (Infrastructure-as-a-Service) are both service models of cloud computing, where SaaS delivers applications over the Internet as services and IaaS can deliver computation infrastructure in the form of VM (Virtual Machine) instances as services [1]. Since IaaS clouds can deliver VM instances flexibly and inexpensively in a pay-as-you-go manner, to save costs, SaaS providers often rely on IaaS instances to deploy and run their web applications. In a web application service system under cloud environments, as shown in Fig. 1, there are mainly three roles, which are IaaS providers, application users, and SaaS providers. When an application user needs a web application, it can submit a web application request to SaaS providers. A SaaS provider delivers the desired application and signs SLA (service-level agreement) with the application user. To maintain the delivered web application as well as fulfill the SLA, the SaaS provider needs to make an instance scaling plan to rent instances from IaaS providers, which is important to optimize costs. As shown in Fig. 1, the costs which a SaaS provider faces are composed of two parts: rental fees, which are paid to IaaS providers for renting instances, and penalty fees, which are paid to application users for SLA violations [2]. Because workloads are dynamic and uncertain, when workloads increase, deciding whether and when to add new instances is critical for saving costs and fulfilling the SLA. Specifically, adding a new instance too early may incur more rental fees due to fewer request arrivals in the future, where simply relying on existing instances can process these future requests. Meanwhile, adding a new instance too late may incur more penalty fees due to high SLA violations which are caused by continued and even increased request arrivals in the future. Thus, it may be not cost-effective to add new instances arbitrarily. Therefore, to make the optimal decisions of whether and when to add a new instance, future request arrival curves are needed. However, it is generally very hard to predict such future request arrivals precisely. Thus, considering the dynamic and uncertain nature of web application workloads, to save costs, a SaaS provider needs an efficient instance scaling algorithm which can dynamically add instances at appropriate time without a priori knowledge of future request arrivals.

Currently, mainstream IaaS providers offer two main billing models: on-demand and reserved, of which the reserved model offers a price discount but requires a long reservation period compared with an on-demand one [3], [4]. For example, in the case of Aliyun [5], the reservation period of a reserved instance is at least one week. For hourly workload fluctuations, renting a reserved instance often results in resources being idle for a long time period. Besides, making better decisions of renting a reserved instance often requires forecasting workloads far into the future [6], [7], which also poses a great challenge. In this situation, on-demand instances, which can be released at any time while users only pay for used machine-hours, have a better price advantage. Thus in this paper we use on-demand instances to deal with web application workloads which fluctuate hourly.

From the perspective of SaaS providers, using different instance scaling algorithms for a web application to deal with dynamic and uncertain workloads results in different costs and application performance. In recent years, there have been many attempts in deploying web applications economically under cloud environments. Experience-based resource auto-scaling models used in practice can often make quick decisions about whether and when to scale resources, but they are often too simple and inflexible to be cost-effective [8]. There are some cost-effective models which are based on workload prediction. For example, Imai et al. [9] reduce the costs of renting instances by predicting the arrival rate of future requests to allocate appropriate resources. Mao et al. [10] select the most cost-effective instances by considering the request processing time as being known. Adam et al. [11] consider workload arrivals as stochastic processes to solve the minimum overallocation problem to optimize costs. The above can optimize the costs of deploying web applications under cloud environments. However, they are often limited in practice since they require a priori knowledge, such as future request arrivals and request processing time, which are often difficult to obtain in practice.

To address the above challenges, in this paper we propose a cost-effective online auto-scaling algorithm to help SaaS providers make real-time decisions about whether and when to rent new IaaS instances. Our proposed online auto-scaling algorithm makes real-time scaling decisions in an online way without requiring any knowledge of future requests, which is a big difference from many existing work. We prove theoretically that our online auto-scaling algorithm can guarantee a bounded competitive ratio. Through extensive experiments using workload data which can simulate real end users, we verify the effectiveness of our online auto-scaling algorithm and demonstrate that it can help SaaS providers significantly save costs compared with directly renting new instances.

The rest of the paper is organized as follows. Section 2 discusses the related work, and Section 3 formulates the problem to be solved. In Section 4, we propose an online auto-scaling algorithm and analyze its competitive ratio. In Section 5, we verify the effectiveness of our online auto-scaling algorithm by extensive experiments. Finally, in Section 6, we state the conclusions and future work.

Section snippets

Related work

More and more SaaS providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of IaaS clouds is elasticity. It allows SaaS providers to acquire or release computing resources on demand, which enables SaaS providers to automatically scale the resources provisioned to their applications under dynamic workloads without human intervention [12]. One of the most important concerns of SaaS providers is the

System model

In this section, we first review pricing details of on-demand instances, and then describe the concept of penalty. Finally, we formulate the instance auto-scaling problem.

An online auto-scaling algorithm

In this section, we first present the optimal offline auto-scaling algorithm OPT as a benchmark. Then we present our online auto-scaling algorithm, and show theoretically that its competitive ratio is less than 2.

Experimental evaluations

In this section, we evaluate the performance of our online auto-scaling algorithm via experimental simulations driven by workload data which can simulate real end users.

Conclusions and future work

Many web applications typically face hourly or day-by-day workload fluctuations, and for optimizing the cost it is worth investigating whether and when to add on-demand cloud instances which are billed per hour. In this paper, we propose an online auto-scaling algorithm to help SaaS providers in cloud environments make real-time decisions to dynamically add new instances, without requiring a priori knowledge of future request arrival rate. We show theoretically that our online auto-scaling

CRediT authorship contribution statement

Wen Si: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing – original draft, Writing – review & editing. Li Pan: Conceptualization, Methodology, Resources, Writing – review & editing, Visualization, Supervision. Shijun Liu: Writing – review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to acknowledge the support provided by the National Key R&D Program of China under Grant 2017YFA0700601, the Key Research and Development Program of Shandong Province, China (2020CXGC010102), project ZR2020LZH011 supported by Shandong Provincial Natural Science Foundation, China, and the Young Scholars Program of Shandong University, China .

References (55)

  • ImaiS. et al.

    Uncertainty-aware elastic virtual machine scheduling for stream processing systems

  • MaoM. et al.

    Auto-scaling to minimize cost and meet application deadlines in cloud workflows

  • AdamO.Y. et al.

    Stochastic resource provisioning for containerized multi-tier web services in clouds

    IEEE Trans. Parallel Distrib. Syst.

    (2017)
  • QuC. et al.

    Auto-scaling web applications in clouds: A taxonomy and survey

    ACM Comput. Surv.

    (2018)
  • NguyenH. et al.

    AGILE: Elastic distributed resource scaling for infrastructure-as-a-service

  • ShenZ. et al.

    CloudScale: Elastic resource scaling for multi-tenant cloud systems

  • HanH. et al.

    Cashing in on the cache in the cloud

    IEEE Trans. Parallel Distrib. Syst.

    (2012)
  • HanR. et al.

    Lightweight resource scaling for cloud applications

  • Amazon EC2

    (2021)
  • RightScale

    (2021)
  • EnStratus

    (2021)
  • Scalr

    (2021)
  • SharmaU. et al.

    A cost-aware elasticity provisioning system for the cloud

  • GandhiA. et al.

    AutoScale: Dynamic, robust capacity management for multi-tier data centers

    ACM Trans. Comput. Syst.

    (2012)
  • FernandezH. et al.

    Autoscaling web applications in heterogeneous cloud infrastructures

  • ChenG. et al.

    Energy-aware server provisioning and load dispatching for connection-intensive internet services

  • GuenterB. et al.

    Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning

  • Cited by (6)

    View full text