Event-driven approach for predictive and proactive management of SLA violations in the Cloud of Things

doi:10.1016/j.future.2018.02.025

Future Generation Computer Systems

Volume 84, July 2018, Pages 78-97

https://doi.org/10.1016/j.future.2018.02.025 Get rights and content

Highlights

•
Utilizes an event-driven approach to model the future state of SLOs and ascertain its impact on the compliance of the SLAs to proactively manage their violations.
•
Utilizes event calculus to model each event into its fluents to determine the future state of a SLO and the compliance of the SLA to defined agreement.
•
Utilizes Bayesian Networks to model the future state of uncertain events.
•
Validates the proposed methodology on a real-world cloud dataset to demonstrate the applicability of our proposed model.

Abstract

In a dynamic environment such as the cloud-of-things, one of the most critical factors for successful service delivery is the QoS under defined constraints. Even though guarantees in the form of service level agreements (SLAs) are provided to users, many services exhibit dynamic Quality of Service (QoS) variations. This QoS variation as well as changes in the behavior and state of the service is caused by some internal events (such as varying loads) and external events (such as location and weather), which results in frequent SLA violations. Most of the existing violation prediction approaches use historic data to predict future QoS values. They do not consider dynamic changes and the events that cause these changes in QoS attributes. In this paper, we propose an event-driven-based proactive approach for predicting SLA violations by combining logic-based reasoning and probabilistic inferencing. The results show that our proposed approach is efficient and proactively identifies SLA violations under uncertain QoS observations.

Introduction

Cloud computing and the Internet of Things (IoT) are two emerging paradigms that have enabled businesses to access on-demand computing resources and be connected to various computing devices spread across different locations, respectively. The advantages of merging cloud computing and IoT have given rise to the notion of the cloud-of-things (CoT) [1]. One of the main challenges for successful service delivery in the CoT environment is the quality of service (QoS) under defined constraints, such as response time range, minimum service availability or a certain throughput. These service constraints are usually stated by the provider in the service offer and are mutually agreed upon by the provider and consumer in the service level agreement (SLA). However, due to the characteristics of the CoT and the services themselves, many services exhibit dynamic QoS variations that result in frequent changes in their behaviors leading to SLA violations [[2], [3]]. For example, some QoS attributes such as response time, service availability, and operational cost can change dynamically due to various internal (such as varying loads) and external (such as location and weather) events [4]. This leads to problems at two stages as follows:

Stage 1: Most of the QoS representation approaches that represent a service’s QoS specifics are limited to showing their static QoS attributes only [[5], [6]]. In other words, they are unable to dynamically represent a change in the service quality parameters to users on which the SLAs are formed. Hence, service users do not have the correct platform to make service selection decisions. In order to further understand the importance of the issue of dynamic QoS variations, consider that there is more than one service with a similar functionality. Then the critical factor for choosing the best service comes down to the QoS attributes, which is not very well defined. So the services that are chosen based on using only these static QoS constraints might not be the best ones [7].

Stage 2: To model the dynamic change in a service’s QoS parameters after it is provisioned, apart from monitoring, existing approaches use the service’s historic QoS values and predict future ones [[8], [9]]. However, in a dynamic service amalgamated environment such as the CoT, a resulting service may be a combination of sub-services. The existing approaches do not proactively consider the dynamic changes and/or the events that cause such changes in the resulting service’s QoS attributes to manage service violations.

To emphasize the importance of each stage and their drawbacks, Fig. 1 shows the service management life cycle that has pre-SLA formation (referred to as pre-interaction in the remainder of the paper) and post-SLA formation (post-interaction) time phases. The pre-interaction time phase includes the tasks of service representation, discovery, negotiation and selection stages while the post-interaction time phase includes service provisioning, monitoring and violation detection. Both time phases have general monitoring and adaptation phases to ensure better service management and QoS delivery to the customers. In the context of service management and as shown in Table 1, it is important to address the drawbacks of stage 1 for the accurate representation of a service in the phases of service representation, service discovery, service selection and negotiation; whereas it is important to address the drawbacks of stage 2 for the better management of a service during the phases of service provisioning and violation detection.

Our focus in this paper is more towards stage 2, as the limitations in the current literature demand the development of an event-driven approach that can identify events which will impact a service’s QoS and model their effect on QoS attributes to manage SLA violations. Event-driven computing [10] is about defining an automatic reaction to events. An event can be defined as a happening within a system or domain. A raw (simple) event is produced directly from the provider whereas derived events are produced by some processing agent that applies logic on a set of input (simple) events and produces a set of output derived (complex) events [11]. Event-driven computing can be used in two ways to respond to events: first, responding to events that have occurred, i.e. a reactive response and second, responding to events that may occur, i.e. a proactive response.

In this paper, we propose a proactive event-driven approach for predicting and managing SLA violations. Proactive event-driven computing for managing SLA violations evolves from reactive event-driven computing and involves dealing with uncertainty at two levels. The first level of uncertainty is determining events that will lead to a variation in the defined QoS and the second type of uncertainty is determining or predicting what will happen as a result of these events occurring. In other words, proactive event-driven computing predicts the occurrence of events and the uncertainty of the situation (or effect) after the occurrence of an event in order to reduce the negative consequences and exploit future prospects that would otherwise be missed by avoiding SLA violations. Based on this, our proposed approach first triggers the defined rules to ascertain the event that causes QoS variation and then predicts a possible situation, such as an SLA violation due to the occurrence of the event using logical reasoning and probabilistic inferencing. The rest of the paper is organized as follows. In the next section, we discuss the related work in the area of SLA violation. The proposed framework is presented in Section 3. In Section 4, we formalize the WS-Agreement [12] with respect to the proposed framework. The syntax and semantics of the proposed system are presented in Section 5. In Section 6, we describe the reasoning and decision making part of our proposed framework, followed by the experimental validation of the proposed system in Section 7. Finally, Section 8 concludes the paper.

Section snippets

Related work

The existing approaches to handle QoS variations in order to predict SLA violations can be divided into two categories, namely QoS monitoring and QoS prediction. QoS monitoring is widely used for violation detection in SLAs and recommends the actions which are necessary to remove the violation [[13], [14]]. Farrell et al. [15] modeled the normative state of contracts using the event calculus for the automated monitoring of SLAs. Raimondi et al. [16] used timed automata to model and verify the

Proposed framework for proactive event-driven based SLA violation management

The three components of the proactive event-driven based SLA violation management framework and the information flow between them are shown in Fig. 2. Two prerequisites, namely the WS-Agreement SLA Document and the QoS Repository are needed before processing in the modules can begin. The WS-Agreement SLA Document captures the SLA between the service user and the service provider which is being managed to avoid SLA violations. QoS Repository contains the QoS information related to the SLOs being

Formalizing the guarantee terms of the SLA from WS-agreement SLA document

In this section, we explain the preliminaries and syntax of how an SLA that is represented in a WS-Agreement SLA document [12] form is translated to ascertain the GTs in it and their constraints. An SLA specified in a WS-Agreement represents the GTs which may be in a hierarchical or nested structure. For example, as shown in Fig. 4, GT $_{2}$ and GT₃ are nested under GT₁ as are GT₅ and GT₆ under GT₄ but GT₇ is not nested or dependent on other GTs.

Event calculus approach to model the future state of GTs

A basic event calculus defines that the events happen at a time point and fluents hold for time intervals. A fluent is any string of characters representing some property or variable that changes its value due to the occurrence of an event. In other words, fluents are initiated or terminated by the occurrence of events. In the context of our work, the event calculus approach on the occurrence of event $e$ ascertains the GTs that will be impacted and the fluents that will hold for them over a time

Proactive determination of an SLA’s state at a future period of time

The syntax in the previous sections represent how a GT’s state transitions over a period of time due to the occurrence of event $e$ and the holding of its corresponding fluents. To ascertain if such changes in the states of GTs will result in an SLA violation, the final state of the SLA needs to be determined as either satisfied, violated or uncertain. Two types of reasoning, namely logical and probabilistic, are used for this purpose as explained in this section. However, before such reasoning

Implementation and experimental setup

This section describes the implementation and working of the proposed system for the event-driven proactive violation detection of SLAs. We conducted experiments on a computer equipped with an Intel i7-4790 3.60 GHz CPU with 16 Gb RAM. The development of the prototype system is carried out in SWI-Prolog version 7.2.3. The system allows the user to configure an SLA by defining its individual and composite guarantee terms. An implementation of the CGTs with different cases is shown in Fig. 10.

Conclusion and future work

In this paper, we focused on the problem of dynamic QoS variation in the post-interaction phase of the service life cycle. From this perspective, we proposed a novel proactive event-driven approach for predicting SLA violation under uncertain QoS. We modeled a WS-Agreement SLA in the event calculus and extended it to represent uncertain situations. We also identified events and defined rules for the events that cause QoS variation and then, using logical reasoning and probabilistic inferencing,

Falak Nawaz is currently a Ph.D. candidate at School of Business, University of New South Wales (UNSW), Canberra, Australia. His research focuses on Service Level Agreement (SLA), Service Management, and SLA Violation Prediction. His research interests also include Service Computing, Cloud Computing, and Internet of Things.

References (35)

FanjiangY.-Y. et al.
Search based approach to forecasting QoS attributes of web services using genetic programming
Inf. Softw. Technol.
(2016)
LiW. et al.
Resource virtualization and service selection in cloud logistics
J. Netw. Comput. Appl.
(2013)
XuY. et al.
Context-aware QoS prediction for web service recommendation and selection
Expert Syst. Appl.
(2016)
J. Soldatos, M. Serrano, M. Hauswirth, Convergence of utility computing with the Internet-of-things, in: Proc. - 6th...
S. Chun, S. Seo, B. Oh, K.H. Lee, Semantic description, discovery and integration for the Internet of Things, in: Proc....
B. Cavallo, M. Di Penta, G. Canfora, An Empirical Comparison of Methods to support QoS-aware Service Selection, 2010,...
F. Nawaz, K. Qadir, H.F. Ahmad, SEMREG-Pro: A semantic based registry for proactive web service discovery using...
RehmanZ.U. et al.
Parallel cloud service selection and ranking based on QoS history
Int. J. Parallel Program.
(2014)
HussainO.K. et al.
A user-based early warning service management framework in cloud computing
Comput. J.
(2015)
LeitnerP. et al.
Data-driven and automated prediction of service level agreement violations in service compositions
Distrib. Parallel Databases
(2013)

Y. Engel, O. Etzion, Towards proactive event-driven computing, in; Proceedings of the 5th ACM International Conference...

Y. Engel, O. Etzion, Z. Feldman, A basic model for proactive event-driven computing, in: Proc. 6th ACM Int. Conf....

T.N.A. Andrieux, K. Czajkowski, A. Dan, K. Keahey, H. Ludwig, M.X.J. Pruyne, J. Rofrano, S. Tuecke, Web services...

M.H. Hasan, J. Jaafar, M.F. Hassan, A review on monitoring vague Quality of Service (QoS) compliance for web services,...

A. Michlmayr, F. Rosenberg, P. Leitner, S. Dustdar, Comprehensive qos monitoring of web services and event-based sla...

A.D.H. Farrell, M.J. Sergot, M. Salle, C. Bartolini, D. Trastour, A. Christodoulou, Using the Event Calculus for the...

F. Raimondi, J. Skene, W. Emmerich, Efficient online monitoring of web-service SLAs, in: Proc. 16th ACM SIGSOFT Int....

Cited by (19)

Web service adaptation: A decade's overview
2023, Computer Science Review
With the exponential growth of communication and information technologies, adaptation has gained a significant attention as it becomes a key feature of service-based systems, allowing them to operate and evolve in highly dynamic and uncertain environments. Although several Web service standards and frameworks have been proposed and extended, existing solutions do not provide a suitable architecture, in which all aspects of monitoring and adaptation (e.g., proactive, cross-layer, and autonomic adaptation) can be expressed. In addition, the emergence of new computing environments to host and execute various types of services (Web/cloud services, big data-intensive services, mobile services, microservices, etc.) raises the need for more efficient monitoring and adaptation systems. This survey aims to bring a synthesis and a road-map to the adaptation of service-based systems. We also discuss adaptation solutions in emerging service models, such as cloud services and big services. Based on an adaptation taxonomy which we extracted from the surveyed approaches, and by identifying the main requirements and goals of service adaptation in Web, cloud and big data environments, detailed analysis and discussions, as well as the open issues, are provided.
PERCEPTUS: Predictive complex event processing and reasoning for IoT-enabled supply chain
2019, Knowledge-Based Systems
Citation Excerpt :
Secondly, these CEP systems can only detect existing disruptive events from a data stream. They do not provide predictive and proactive decision support to the supply chain planners when the information about the disruptive event is incomplete and/or uncertain [14–16]. Owing to the above problems, this study aims to address the challenges faced by IoT-enabled supply chains and proposes a PERCEPTUS framework for complex event reasoning.
Internet of Things (IoT) is an emerging paradigm that connects various physical sensor devices spread across different locations. IoT-enabled supply chain provides a natural combination to achieve supply chain visibility (SCV) which refers to ability of supply chain partners to collect and analyse distributed supply chain data for the planning and decision support. This data is normally collected and analysed in real-time by a specialized software known as Complex Event Processing (CEP) engines. However, current CEP engines have two well-known limitations. Firstly, current CEP engines are job specific and fail to combine multiple related sensor data streams coming from distributed sources, thereby, supply chain partners are exposed to manage the underlying information heterogeneity. Secondly, these CEP systems do not provide decision support to the supply chain planners when the information about the potential disruptive event is incomplete and/or uncertain. In this paper, a PERCEPTUS framework is proposed to address the above mentioned issues. It, firstly, utilizes semantic annotation process to integrate and annotate events coming from heterogeneous data streams. Secondly, it performs complex event processing to process and correctly interpret annotated complex events. Thirdly, it provides complex event reasoning (by combining logical and probabilistic reasoning) to predict disruption events (such as process failure) under incomplete and/or uncertain information. Finally, the proposed framework is validated using the dataset of a semi-conductor manufacturing process to demonstrate its superiority in terms of accuracy in predicting disruptive events as compared to the baseline approach.
Proactive management of SLA violations by capturing relevant external events in a Cloud of Things environment
2019, Future Generation Computer Systems
Citation Excerpt :
This is done by comparing its defined QoS constraints in the SLA with the recently monitored QoS values. This run-time commitment of each GT to its defined constraint values is measured and expressed in one of the following three states [4]: GTs can either be in a satisfied, violated or uncertain state during SLA execution.
The cloud of things (CoT) is an emerging paradigm that has merged and combined cloud computing and the Internet of Things (IoT). Such a paradigm has enabled service providers to provide on-demand computing resources from devices spread across different locations for service users to be dynamically connected to them. While this benefits the CoT service providers and users in many ways, it also brings a key challenge of ensuring that the service is delivered according to the promised quality. Failure to ensure this will result in the service provider experiencing penalties of different types and the service user experiencing disruptions. The literature addresses this problem by proactively managing for SLA violations. However, given the geographically dispersed region of a formed CoT service, in this paper we argue that for proactive SLA violation identification, we need specialized techniques that also consider events that are outside the usual control of service providers and users, but will impact the CoT environment and the quality of service. We propose a framework that identifies such external events of interest and ascertains their impact on achieving the service according to the promised quality. We explain the working of our proposed framework in detail and demonstrate its superiority in proactively determining SLA violations as compared to existing approaches.
An MCDM method for cloud service selection using a Markov chain and the best-worst method
2018, Knowledge-Based Systems
Citation Excerpt :
As the number of cloud services is constantly growing, a user is exposed to many choices. Having such choices leads to the challenge of selecting the right service from the right cloud provider at the right time [4]. A cloud broker architecture (Fig. 1) that takes into account the user requirements gives a ranked list of potential cloud services is proposed in order to reduce the complexity of service selection for the cloud user.
Due to the increasing number of cloud services, service selection has become a challenging decision for many organisations. It is even more complicated when cloud users change their preferences based on the requirements and the level of satisfaction of the experienced service. The purpose of this paper is to overcome this drawback and develop a cloud broker architecture for cloud service selection by finding a pattern of the changing priorities of User Preferences (UPs). To do that, a Markov chain is employed to find the pattern. The pattern is then connected to the Quality of Service (QoS) for the available services. A recently proposed Multi Criteria Decision Making (MCDM) method, Best Worst Method (BWM), is used to rank the services. We show that the method outperforms the Analytic Hierarchy Process (AHP). The proposed methodology provides a prioritized list of the services based on the pattern of changing UPs. The methodology is validated through a case study using real QoS performance data of Amazon Elastic Compute (Amazon EC2) cloud services.
Using dynamic voltage frequency scaling and service-level agreement to reduce energy consumption in cloud datacenters based on distance
2023, Concurrency and Computation: Practice and Experience
Journey from cloud of things to fog of things: Survey, new trends, and research directions
2023, Software - Practice and Experience

View all citing articles on Scopus

Naeem Khalid Janjua is a Lecturer at the School of Science, Edith Cowan University, Perth. He is an Associate Editor for International Journal of Computer System Science and Engineering (IJCSSE) and International Journal of Intelligent systems (IJEIS). He has published an authored book, a book chapter, and various articles in international journals and refereed conference proceedings. His areas of active research are defeasible reasoning, argumentation, ontologies, data modeling, cloud computing, machine learning and data mining. He works actively in the domain of business intelligence and Web-based intelligent decision support systems.

Omar Khadeer Hussain is a senior lecturer at the University of New South Wales, Canberra. His research interests are in business intelligence, cloud computing and logistics informatics. In these areas, his research work focusses on utilizing decision making techniques for facilitating smart achievement of business outcomes. His research work has been published in various top international journals such as Information Systems, The Computer Journal, Knowledge Based Systems, Future Generation of Computer Systems etc. He has won awards and funding from competitive bodies such as the Australian Research Council for his research.

Farookh Khadeer Hussain is a Senior Lecturer in School of Software, University of Technology Sydney. He is an Associate Member of the Advanced Analytics Institute and a Core Member of the Centre for Artificial Intelligence. His key research interests are in trust-based computing, cloud of things, blockchains and machine learning. He has published widely in these areas in top journals such as FGCS, The Computer Journal, JCSS, IEEE Transactions on Industrial Informatics, IEEE Transactions on Industrial Electronics etc.

Elizabeth Chang is Professor and Canberra Fellow at the UNSW at the Australian Defence Force Academy (ADFA). She has 30 years of work experience in both Academia and Industry. She has been a full Professor in IT, Software Engineering and Logistics Informatics for 14 years. She had been in senior positions in commercial corporations for 10 years, typically working on commercial grade large software development. Her key research strength is in large complex software development methodologies, requirement engineering, structure and unstructured database design and implementation, trust, security, risk and privacy. In the 2012 edition of MIS Quarterly vol. 36 issue. 4 Special Issues on Business Research, Professor Chang was listed fifth in the world for researchers in Business Intelligence.

Morteza Saberi is a Research Fellow at UNSW Canberra and has an outstanding research records and significant capabilities in area of Business Intelligence, Data Mining and applied machine learning. He has published more than 140 papers in reputable academic journals and conference proceedings. His Google Scholar citations and h-index are 1400 and 18 respectively. He was a Lecturer at the Department of Industrial Engineering at University of Tafresh. He is also the recipient of the 2006–2012 Best Researcher of Young researcher Club, Islamic Azad University (Tafresh Branch). He is also the recipient of National Eminent Researcher Award among Young researcher Club, Islamic Azad University members.

View full text

Event-driven approach for predictive and proactive management of SLA violations in the Cloud of Things

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed framework for proactive event-driven based SLA violation management

Formalizing the guarantee terms of the SLA from WS-agreement SLA document

Event calculus approach to model the future state of GTs

Proactive determination of an SLA’s state at a future period of time

Implementation and experimental setup

Conclusion and future work

Inf. Softw. Technol.

J. Netw. Comput. Appl.

Expert Syst. Appl.

Parallel cloud service selection and ranking based on QoS history

Int. J. Parallel Program.

A user-based early warning service management framework in cloud computing

Comput. J.

Data-driven and automated prediction of service level agreement violations in service compositions

Distrib. Parallel Databases