Enabling cost-aware and adaptive elasticity of multi-tier cloud applications

https://doi.org/10.1016/j.future.2012.05.018Get rights and content

Abstract

Elasticity (on-demand scaling) of applications is one of the most important features of cloud computing. This elasticity is the ability to adaptively scale resources up and down in order to meet varying application demands. To date, most existing scaling techniques can maintain applications’ Quality of Service (QoS) but do not adequately address issues relating to minimizing the costs of using the service. In this paper, we propose an elastic scaling approach that makes use of cost-aware criteria to detect and analyse the bottlenecks within multi-tier cloud-based applications. We present an adaptive scaling algorithm that reduces the costs incurred by users of cloud infrastructure services, allowing them to scale their applications only at bottleneck tiers, and present the design of an intelligent platform that automates the scaling process. Our approach is generic for a wide class of multi-tier applications, and we demonstrate its effectiveness against other approaches by studying the behaviour of an example e-commerce application using a standard workload benchmark.

Highlights

► Elasticity enables adaptively scaling up and down cloud applications to meet run-time requirements. ► We propose an approach for achieving cost-effective elasticity. ► Cost-aware criteria are introduced. ► Changing workloads are adapted by scaling up or down only the bottleneck components in multi-tier applications.

Introduction

Cloud computing has received wide attention over the past few years. New services offered by cloud IaaS (Infrastructure-as-a-Service) providers, such as Amazon Web Services (WS)  [1], GoGrid  [2] and IBM  [3], are generating a huge demand from application owners. The pay-as-you go model used by such providers is appealing to most application owners. It removes the costs of buying, installing and maintaining a dedicated infrastructure for running an application. Moreover, most IaaS providers allow the application owners to scale up and down the resources used based on the computational demands of their applications, thus letting them pay only for the amount of resources they use. This model is appealing for deploying applications that provide services for third parties, e.g. traditional e-commerce sites, financial services applications, online healthcare applications, gaming applications, media servers and bioinformatics applications. If the workload of a service increases (e.g. more end users start submitting requests at the same time), the application owner can ideally scale up the resources used to maintain the Quality of Service (QoS) of their service. When the workload eases down, they can then scale down the resources used. Within this context, elasticity (on-demand scaling), also known as redeploying or dynamic provisioning, of applications has become one of the most important features of a cloud computing platform. This elasticity enables real-time acquisition/release of computing resources to scale the applications themselves up and down in order to meet their run-time requirements, while letting application owners pay only for the resources used.

Our motivation in this paper is investigating the development of new methods that assist the owners of applications deployed on IaaS clouds in managing the costs of their own applications while still maintaining the Quality of Service (QoS) they provide to their end users. Addressing this issue effectively requires taking a closer look at the structure of most common services and applications deployed on IaaS clouds to provide services to other parties. Such applications are typically implemented as multi-tier applications running on distributed software platforms. Taking the example of an e-commerce website, there are at least three tiers: a frontend web server for handling HTTP requests; a middle-tier application server for implementing business logic; and a backend database with data store and processing. Each of the tiers can be implemented using one or more servers. Depending on different types of incoming requests, servers at each tier can be stressed by heavy workloads, or can become idle due to light workloads. When scaling up and down an application, it is thus crucial to discover the real bottlenecks that may be caused at any, or all, of the servers.

Although some of the existing scaling techniques  [1], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] address the question of how to maintain an applications’ Quality of Service (QoS), they rarely consider the equally important aspect of cloud computing—the cost of using the resources themselves. Applications deployed in a cloud environment require both good performance and cost-efficient resource usage.

In this paper, we propose a scaling approach that is both cost-aware and workload-adaptive, allowing application owners to perform more efficient cloud elasticity management. The paper features four key elements:

  • Cost-aware criteria: a flexible analytical model is developed to capture the behaviour of multi-tier applications. Cost-aware criteria are introduced to measure the effect of cost of resources on every unit of response time.

  • Workload-adaptive scaling: using the above criteria, a Cost-Aware Scaling (CAS) algorithm is designed to handle changing workloads of multi-tier applications by adaptively scaling up and down bottlenecked tiers within applications.

  • Automation of application scaling: a standard and extensible specification is introduced to describe the properties of the servers, including their VM configuration, IaaS user settings, linking relationships and other constraints. Based on this specification, the best cost-aware scaling approach required for an application can be automatically computed and executed.

  • Implementation and experimental evaluation: an intelligent platform based on the CAS algorithm is implemented to automate the scaling process of cloud applications on the IC-Cloud infrastructure  [17]. The proposed cost-aware approach is tested using an industry standard benchmark  [18] and the test results show: (1) the CAS algorithm responds to changing workloads effectively by scaling applications up and down appropriately to meet their QoS requirements; (2) deployment costs are reduced compared to other scaling techniques.

The remainder of this paper is organised as follows: Section  2 illustrates the need for this new approach to elasticity by describing some current examples and challenges; Section  3 discusses related work; Section  4 provides a more detailed overview of the properties of typical multi-tier applications and describes the architecture of the Imperial Smart Scaling engine (iSSe) implemented to support our approach; Section  5 explains the proposed CAS algorithm and its details; Section  6 reports the experimental evaluation of the algorithm’s effectiveness; Sections  7 Discussions, 8 Conclusions present discussion of the approach and summarise directions for future work.

Section snippets

Motivation

This section illustrates two challenges that need to be addressed in order to achieve elastic scaling in a large class of multi-tier applications deployed on IaaS clouds. Without loss of generality, we use a simple example based on an e-commerce website to capture the typical behaviour of such applications. Also for simplicity, we focus only on applications that are deployed on the resources of single IaaS cloud provider. As discussed in the introduction, the workload for such applications

Traditional scaling techniques before clouds

Scaling of applications has been studied extensively before clouds. Early work considers single-tier applications and focuses on transforming performance targets into underlying computing resources such as CPU and memory  [4], [5], [6]. Further investigations classify an application into multiple tiers  [7], [8], [9], [10]. They then break down the end-to-end response time by each tier and conduct the worst-case capacity estimation to ensure applications meeting the peak workload. Overall, the

A system to support elastic scaling

In this section, we first present an overview of the properties of multi-tier applications (Section  4.1). A platform, called iSSe, is then introduced to support the elastic scaling of these applications (Section  4.2). Finally, we explain how iSSe achieves the automation of application scaling (Section  4.3).

The CAS algorithm

In this section, we provide an overview of the CAS algorithm (Section  5.1) and introduce two cost-aware criteria to guide scaling up and down of applications (Section  5.2). Using these criteria, we then present two capacity estimation algorithms: the Cost-Aware-Capacity-Estimation (CACE)-For-Scaling-Up (Section  5.3) and CACE-For-Scaling-Down (Section  5.4). We also define the performance metrics to evaluate the behaviour of the scaled applications (Section  5.5).

Experimental evaluation

In this section, we first introduce the experimental set-up (Section  6.1), following the results of experimental evaluation. The evaluation is designed to illustrate the effectiveness of our CAS algorithm in adapting changing workloads by effectively scaling up and down applications (Section  6.2). More importantly, the CAS algorithm’s salient feature in delivering cost-efficient services is demonstrated by comparison with existing techniques (Section  6.3).

Discussion on the CAS algorithm

The CAS algorithm presented in this paper is based on reactive (immediate) scaling of multi-tier applications rather than using predictive mechanisms. The reactive scaling approaches are used by most providers, such as Amazon WS  [1] and Rightscale  [11], since they are simpler to support and require no prior knowledge of the workload characteristics. The CAS algorithm uses two methods to handle the possible errors in capacity estimation. First, it adds/removes only one server to/from the

Conclusions

In this paper, we have argued that on-demand scaling of cloud applications raises new challenges for delivering cost-efficient services. We proposed a cost-sensitive elastic scaling approach which lowers resource allocation costs by detecting the bottlenecks in a class of multi-tier applications and accordingly scales resources up or down only at these points. We also presented the design and implementation of an intelligent platform based on our scaling approach to achieve cost-effective

Acknowledgments

The authors would like to thank all other members of the Discovery Science Research Group, especially Xinyu Liu for her contribution in developing the system. We would also like to thank Dr Roy Clements and Tania Buckthorp for their helpful comments on the paper.

Rui Han is a researcher and Ph.D. student at the Department of Computing, Imperial College London, UK. He received M.Sc. from Tsinghua University, China. His research interests are cloud computing, cloud resource management and workflow technology. He is experienced in the design and development of cloud deployment platforms and process-aware information systems.

References (48)

  • B. Urgaonkar et al.

    Cataclysm: handling extreme overloads in Internet services

  • B. Urgaonkar et al.

    An analytical model for multi-tier Internet services and its applications

  • X. Liu et al.

    Adaptive control of multi-tiered web applications using queueing predictor

  • B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, Dynamic provisioning of multi-tier Internet applications, in: Second...
  • B. Urgaonkar et al.

    Agile dynamic provisioning of multi-tier Internet applications

    ACM Transactions on Autonomous and Adaptive Systems

    (2008)
  • Rightscale. http://www.rightscale.com/...
  • Unicloud. http://www.univa.com/products/unicloud...
  • A. Nathani et al.

    Policy based resource allocation in IaaS cloud

    Future Generation Computer Systems

    (2011)
  • J. Bi et al.

    Dynamic provisioning modeling for virtualized multi-tier applications in cloud data center

  • Y. Hu et al.

    Resource provisioning for cloud computing

  • L. Guo et al.

    IC Cloud: a design space for composable cloud computing

  • TPC Transaction Processing Performance Council. http://www.tpc.org/...
  • R. Buyya et al.

    Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities

  • L. Rodero-Merino et al.

    Using clouds to scale grid resources: an economic model

    Future Generation Computer Systems

    (2011)
  • Rui Han is a researcher and Ph.D. student at the Department of Computing, Imperial College London, UK. He received M.Sc. from Tsinghua University, China. His research interests are cloud computing, cloud resource management and workflow technology. He is experienced in the design and development of cloud deployment platforms and process-aware information systems.

    Moustafa M. Ghanem is a research fellow at the Department of Computing, Imperial College London, UK. He holds a Ph.D. and an M.Sc. in high performance computing from Imperial College London. He is a Research Fellow in the Department of Computing, Imperial College London. His current research interests are in large-scale informatics applications, including large-scale data and text mining applications and infrastructures, Grid and Cloud computing and workflow systems for e-Science applications. He has published more than 80 papers in these areas. Moustafa Ghanem was previously Research Director at InforSense Ltd and VP for Research at Nile University, Egypt.

    Li Guo is a post-doc researcher and research associate in computing science at the Department of Computing, Imperial College London, UK. He received Ph.D. degree in artificial intelligence at the University of Edinburgh, UK. He has been working in the area of grid computing and cloud computing since 2006. For the past 5 years, he has been involved in major grid and cloud related EU and UK projects. He is the chief architect of the Imperial College Cloud platform. His research interests include large scale distributed systems, intelligent applications, and cloud computing.

    Yike Guo is a professor in computing science at the Department of Computing, Imperial College London, UK. He graduated in computer science from Tsinghua University, PRC and received a Ph.D. degree in computational logic and declarative programming at Imperial College London. He has been working in the area of data intensive analytical computing since 1995, when he was the technical director of Imperial College Parallel Computing Centre. During the past 10 years, he has been leading the data mining group of the department to carry out many research projects, including some major UK e-science projects such as a discovery net on grid based data analysis for scientific discovery, MESSAGE on wireless mobile sensor network for environment monitoring, Biological Atlas of Insulin Resistance (BAIR) on system biology for diabetes study. He has focused on applying data mining technology to scientific data analysis in the fields of life science and healthcare, environment science and security. His research interests include large-scale scientific data analysis, data mining algorithms and applications, parallel algorithms, and cloud computing.

    Michelle Osmond is a post-doc researcher and research associate in computing science at the Department of Computing, Imperial College London, UK. She has a Ph.D. and M.Sc. in Computing Science from Imperial College London and a B.Sc. in Physics. Her research interests include cloud computing, data mining and bioinformatics.

    View full text