Enabling cost-aware and adaptive elasticity of multi-tier cloud applications
Highlights
► Elasticity enables adaptively scaling up and down cloud applications to meet run-time requirements. ► We propose an approach for achieving cost-effective elasticity. ► Cost-aware criteria are introduced. ► Changing workloads are adapted by scaling up or down only the bottleneck components in multi-tier applications.
Introduction
Cloud computing has received wide attention over the past few years. New services offered by cloud IaaS (Infrastructure-as-a-Service) providers, such as Amazon Web Services (WS) [1], GoGrid [2] and IBM [3], are generating a huge demand from application owners. The pay-as-you go model used by such providers is appealing to most application owners. It removes the costs of buying, installing and maintaining a dedicated infrastructure for running an application. Moreover, most IaaS providers allow the application owners to scale up and down the resources used based on the computational demands of their applications, thus letting them pay only for the amount of resources they use. This model is appealing for deploying applications that provide services for third parties, e.g. traditional e-commerce sites, financial services applications, online healthcare applications, gaming applications, media servers and bioinformatics applications. If the workload of a service increases (e.g. more end users start submitting requests at the same time), the application owner can ideally scale up the resources used to maintain the Quality of Service (QoS) of their service. When the workload eases down, they can then scale down the resources used. Within this context, elasticity (on-demand scaling), also known as redeploying or dynamic provisioning, of applications has become one of the most important features of a cloud computing platform. This elasticity enables real-time acquisition/release of computing resources to scale the applications themselves up and down in order to meet their run-time requirements, while letting application owners pay only for the resources used.
Our motivation in this paper is investigating the development of new methods that assist the owners of applications deployed on IaaS clouds in managing the costs of their own applications while still maintaining the Quality of Service (QoS) they provide to their end users. Addressing this issue effectively requires taking a closer look at the structure of most common services and applications deployed on IaaS clouds to provide services to other parties. Such applications are typically implemented as multi-tier applications running on distributed software platforms. Taking the example of an e-commerce website, there are at least three tiers: a frontend web server for handling HTTP requests; a middle-tier application server for implementing business logic; and a backend database with data store and processing. Each of the tiers can be implemented using one or more servers. Depending on different types of incoming requests, servers at each tier can be stressed by heavy workloads, or can become idle due to light workloads. When scaling up and down an application, it is thus crucial to discover the real bottlenecks that may be caused at any, or all, of the servers.
Although some of the existing scaling techniques [1], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] address the question of how to maintain an applications’ Quality of Service (QoS), they rarely consider the equally important aspect of cloud computing—the cost of using the resources themselves. Applications deployed in a cloud environment require both good performance and cost-efficient resource usage.
In this paper, we propose a scaling approach that is both cost-aware and workload-adaptive, allowing application owners to perform more efficient cloud elasticity management. The paper features four key elements:
- •
Cost-aware criteria: a flexible analytical model is developed to capture the behaviour of multi-tier applications. Cost-aware criteria are introduced to measure the effect of cost of resources on every unit of response time.
- •
Workload-adaptive scaling: using the above criteria, a Cost-Aware Scaling (CAS) algorithm is designed to handle changing workloads of multi-tier applications by adaptively scaling up and down bottlenecked tiers within applications.
- •
Automation of application scaling: a standard and extensible specification is introduced to describe the properties of the servers, including their VM configuration, IaaS user settings, linking relationships and other constraints. Based on this specification, the best cost-aware scaling approach required for an application can be automatically computed and executed.
- •
Implementation and experimental evaluation: an intelligent platform based on the CAS algorithm is implemented to automate the scaling process of cloud applications on the IC-Cloud infrastructure [17]. The proposed cost-aware approach is tested using an industry standard benchmark [18] and the test results show: (1) the CAS algorithm responds to changing workloads effectively by scaling applications up and down appropriately to meet their QoS requirements; (2) deployment costs are reduced compared to other scaling techniques.
The remainder of this paper is organised as follows: Section 2 illustrates the need for this new approach to elasticity by describing some current examples and challenges; Section 3 discusses related work; Section 4 provides a more detailed overview of the properties of typical multi-tier applications and describes the architecture of the Imperial Smart Scaling engine (iSSe) implemented to support our approach; Section 5 explains the proposed CAS algorithm and its details; Section 6 reports the experimental evaluation of the algorithm’s effectiveness; Sections 7 Discussions, 8 Conclusions present discussion of the approach and summarise directions for future work.
Section snippets
Motivation
This section illustrates two challenges that need to be addressed in order to achieve elastic scaling in a large class of multi-tier applications deployed on IaaS clouds. Without loss of generality, we use a simple example based on an e-commerce website to capture the typical behaviour of such applications. Also for simplicity, we focus only on applications that are deployed on the resources of single IaaS cloud provider. As discussed in the introduction, the workload for such applications
Traditional scaling techniques before clouds
Scaling of applications has been studied extensively before clouds. Early work considers single-tier applications and focuses on transforming performance targets into underlying computing resources such as CPU and memory [4], [5], [6]. Further investigations classify an application into multiple tiers [7], [8], [9], [10]. They then break down the end-to-end response time by each tier and conduct the worst-case capacity estimation to ensure applications meeting the peak workload. Overall, the
A system to support elastic scaling
In this section, we first present an overview of the properties of multi-tier applications (Section 4.1). A platform, called iSSe, is then introduced to support the elastic scaling of these applications (Section 4.2). Finally, we explain how iSSe achieves the automation of application scaling (Section 4.3).
The CAS algorithm
In this section, we provide an overview of the CAS algorithm (Section 5.1) and introduce two cost-aware criteria to guide scaling up and down of applications (Section 5.2). Using these criteria, we then present two capacity estimation algorithms: the Cost-Aware-Capacity-Estimation (CACE)-For-Scaling-Up (Section 5.3) and CACE-For-Scaling-Down (Section 5.4). We also define the performance metrics to evaluate the behaviour of the scaled applications (Section 5.5).
Experimental evaluation
In this section, we first introduce the experimental set-up (Section 6.1), following the results of experimental evaluation. The evaluation is designed to illustrate the effectiveness of our CAS algorithm in adapting changing workloads by effectively scaling up and down applications (Section 6.2). More importantly, the CAS algorithm’s salient feature in delivering cost-efficient services is demonstrated by comparison with existing techniques (Section 6.3).
Discussion on the CAS algorithm
The CAS algorithm presented in this paper is based on reactive (immediate) scaling of multi-tier applications rather than using predictive mechanisms. The reactive scaling approaches are used by most providers, such as Amazon WS [1] and Rightscale [11], since they are simpler to support and require no prior knowledge of the workload characteristics. The CAS algorithm uses two methods to handle the possible errors in capacity estimation. First, it adds/removes only one server to/from the
Conclusions
In this paper, we have argued that on-demand scaling of cloud applications raises new challenges for delivering cost-efficient services. We proposed a cost-sensitive elastic scaling approach which lowers resource allocation costs by detecting the bottlenecks in a class of multi-tier applications and accordingly scales resources up or down only at these points. We also presented the design and implementation of an intelligent platform based on our scaling approach to achieve cost-effective
Acknowledgments
The authors would like to thank all other members of the Discovery Science Research Group, especially Xinyu Liu for her contribution in developing the system. We would also like to thank Dr Roy Clements and Tania Buckthorp for their helpful comments on the paper.
Rui Han is a researcher and Ph.D. student at the Department of Computing, Imperial College London, UK. He received M.Sc. from Tsinghua University, China. His research interests are cloud computing, cloud resource management and workflow technology. He is experienced in the design and development of cloud deployment platforms and process-aware information systems.
References (48)
- et al.
Managing dynamic enterprise and urgent workloads on clouds using layered queuing and historical performance models
Simulation Modelling Practice and Theory
(2011) - et al.
Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility
Future Generation Computer Systems
(2009) - et al.
Adaptive resource provisioning for read intensive multi-tier applications in the cloud
Future Generation Computer Systems
(2011) - et al.
Online scheduling of workflow applications in grid environments
Future Generation Computer Systems
(2011) - et al.
Cloud brokering mechanisms for optimized placement of virtual machines across multiple providers
Future Generation Computer Systems
(2012) - Amazon Web Services (Amazon WS). http://aws.amazon.com/...
- Microsoft Azure. http://www.microsoft.com/azure/...
- IBM developer cloud. http://www.ibm.com/cloud/developer...
- et al.
Dynamic resource allocation for shared data centers using online measurements
- et al.
Model-based resource provisioning in a web service utility
Cataclysm: handling extreme overloads in Internet services
An analytical model for multi-tier Internet services and its applications
Adaptive control of multi-tiered web applications using queueing predictor
Agile dynamic provisioning of multi-tier Internet applications
ACM Transactions on Autonomous and Adaptive Systems
Policy based resource allocation in IaaS cloud
Future Generation Computer Systems
Dynamic provisioning modeling for virtualized multi-tier applications in cloud data center
Resource provisioning for cloud computing
IC Cloud: a design space for composable cloud computing
Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities
Using clouds to scale grid resources: an economic model
Future Generation Computer Systems
Cited by (129)
Towards Dynamic Request Updating with Elastic Scheduling for Multi-Tenant Cloud-Based Data Center Network
2024, IEEE Transactions on Network Science and EngineeringCloud Elasticity: VM vs container: A Survey
2024, Research SquareFast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs
2023, Journal of Cloud ComputingCILP: Co-Simulation-Based Imitation Learner for Dynamic Resource Provisioning in Cloud Computing Environments
2023, IEEE Transactions on Network and Service ManagementEfficient Computation of Optimal Thresholds in Cloud Auto-scaling Systems
2023, ACM Transactions on Modeling and Performance Evaluation of Computing Systems
Rui Han is a researcher and Ph.D. student at the Department of Computing, Imperial College London, UK. He received M.Sc. from Tsinghua University, China. His research interests are cloud computing, cloud resource management and workflow technology. He is experienced in the design and development of cloud deployment platforms and process-aware information systems.
Moustafa M. Ghanem is a research fellow at the Department of Computing, Imperial College London, UK. He holds a Ph.D. and an M.Sc. in high performance computing from Imperial College London. He is a Research Fellow in the Department of Computing, Imperial College London. His current research interests are in large-scale informatics applications, including large-scale data and text mining applications and infrastructures, Grid and Cloud computing and workflow systems for e-Science applications. He has published more than 80 papers in these areas. Moustafa Ghanem was previously Research Director at InforSense Ltd and VP for Research at Nile University, Egypt.
Li Guo is a post-doc researcher and research associate in computing science at the Department of Computing, Imperial College London, UK. He received Ph.D. degree in artificial intelligence at the University of Edinburgh, UK. He has been working in the area of grid computing and cloud computing since 2006. For the past 5 years, he has been involved in major grid and cloud related EU and UK projects. He is the chief architect of the Imperial College Cloud platform. His research interests include large scale distributed systems, intelligent applications, and cloud computing.
Yike Guo is a professor in computing science at the Department of Computing, Imperial College London, UK. He graduated in computer science from Tsinghua University, PRC and received a Ph.D. degree in computational logic and declarative programming at Imperial College London. He has been working in the area of data intensive analytical computing since 1995, when he was the technical director of Imperial College Parallel Computing Centre. During the past 10 years, he has been leading the data mining group of the department to carry out many research projects, including some major UK e-science projects such as a discovery net on grid based data analysis for scientific discovery, MESSAGE on wireless mobile sensor network for environment monitoring, Biological Atlas of Insulin Resistance (BAIR) on system biology for diabetes study. He has focused on applying data mining technology to scientific data analysis in the fields of life science and healthcare, environment science and security. His research interests include large-scale scientific data analysis, data mining algorithms and applications, parallel algorithms, and cloud computing.
Michelle Osmond is a post-doc researcher and research associate in computing science at the Department of Computing, Imperial College London, UK. She has a Ph.D. and M.Sc. in Computing Science from Imperial College London and a B.Sc. in Physics. Her research interests include cloud computing, data mining and bioinformatics.