Performance prediction model for cloud service selection from smart data

https://doi.org/10.1016/j.future.2018.03.015Get rights and content

Highlights

  • Utilization of Smart data for automating the cloud service selection process

  • A naïve Bayes classifier approach is proposed as the prediction model for cloud service selection

  • Various experiments were carried out on a one-year realistic dataset to validate the efficiency of the proposed prediction model

Abstract

Cloud computing is a computing model that has experienced significant growth in the world in contemporary time. Cloud providers offer services to consumers at different levels of performance, costs, and configurations. Many enterprises and organizations are planning to move their services to a cloud platform. The most challenging issue for them is choosing the most appropriate services that meet their requirements. In this paper, we try to tackle this challenge by automating the selection process based on actual workload pattern from Smart data and resource demand acquired from existing service history data. An automatic performance prediction model based on Naïve Bayes classifiers is proposed to predict the performance metrics of cloud nodes with respect to different options for configuration of their resources. We examined Naïve Bayes classifier along with kernel density estimation to solve the zero variance of feature distribution and enhance the accuracy of predictions. We also evaluated our model using a detailed one-year dataset from a realistic environment with thousands of records and hundreds of machines. A simulation on the MATLAB was performed and the results showed that the proposed model indicates how naïve Bayes can provide accurate and efficient results.

Introduction

Cloud computing is emerging as a new and popular paradigm in computing and storage resource allocation technology over the Internet. This paradigm of computing offers significant benefits to businesses, government agencies, and the general population by relieving them of low-level tasks related to setting up IT infrastructure as well as allowing organizations to begin small and increase resources only when there is a high demand for service, thus enabling more time for innovation and the creation of business value [1]. Cloud computing is a model for offering a pool of services like software as a service, infrastructure as a service, and platform as a service, all of which are provided on demand.

Nowadays, migrating applications and/or data to a cloud service has proven to be a challenging process. Certain obstacles exist that may prevent the full features that cloud computing offers. Those challenges are predominantly regarding to the fact that existing applications have particular needs and configurations that must be fulfilled by cloud service providers [2], as each provider provides services with different levels of performance, costs, and other characteristics. Therefore, it is essential but also challenging to investigate how to estimate and predict how many resources are necessary to optimize cloud-computing job performance.

By looking to the existing research, we found that most studies have focused on vendor selection without taking into account organizations’ systems and performance. Some researchers have conducted studies about how to select the best cloud provider by identifying selection criteria, while others have focused on cost and security. In contrast, the current research is the first attempt to study user’s requirements in such a scenario that an existing service is about to be deployed on cloud platform or migrated from one cloud platform to another. We can observe that as system performance fluctuates according to workload parameters, a thorough understanding about the relationship between workload and performance becomes necessary to establish, leading to more accurate prediction of potential service performance on a new cloud platform.

In this paper, we try to address these challenges by examining and predicting the resources required on the cloud. Therefore, we analyze user’s requirements by applying a method for investigating the past service information and performance to establish a knowledge of required resources and to predict the performance before the migration to cloud. Thus, we proposed an automatic performance prediction model based on a naïve Bayes classifier to predict the performance metrics of cloud nodes with respect to different options for configuration of node resources (workload attributes). Naïve Bayes classifier is simple probabilistic classifier that are widely used for classification based on applying the Bayes theorem with strong (Naive) independence assumptions [[3], [4]]. In this paper we benefit from existing systems data by obtaining a dataset from a real environment and then conducted some experiments using a naïve Bayes classifier to predict the actual system performance. Moreover, we conducted another experiment to predict performance on cloud side using CPU benchmark data. Also in our models, we change the fit to a kernel density estimation function (KDEF) with different widths to solve the within-class variance problem.

The main contributions of this paper are briefly summarized as follows:

  • Adding requirement analysis for cloud vendor selection based on the existing paradigm of service.

  • Adopting Bayes classifier and KDEF to achieve better accuracy and granularity of performance prediction on new cloud platform and workload pattern.

  • Presenting experimental results with real dataset from Saudi Ministry of Finance and verifying the effectiveness of proposed solution.

The rest of this paper is organized as follows: Section 2 describes the existing researches on prediction methods using Bayes classifiers in cloud computing. Section 3 describes and introduces the Bayes classifier and describes the proposed model. Section 4 describes the experiments and results for the proposed model. Section 5 describes the conclusion of the paper and future work as well.

Section snippets

Related work

There exist many methods of prediction using Bayes classifiers in cloud computing. Such methods are designed to predict some cloud components, such as service level agreement (SLA), load prediction, resource classification, and classification of cloud data.

In [5] the authors proposed a method to predict SLA violation in the cloud using a Bayesian classifier by using the QoS of used services as input in Bayes. They used historical SLA datasets. In [6] they proposed a method for predicting the

Prediction models based on Bayes classifier

The problem addressed in this paper was to study and predict the performance metrics of cloud nodes based on workload attributes. The problem would be solved by taking a large dataset of labeled workload attributes from cloud nodes and building a naïve Bayes classifier from those labeled data. The naïve Bayes classifier would then be able to predict an unlabeled example based on the information learned from the labeled examples.

A model of predicting the performance of cloud node resources has

Dataset collection

We collected a large workload dataset from the KSA Ministry of Finance [36] that contains 28,147 instances from 13 cloud nodes. It was recorded during the period from March 1, 2016, to February 20, 2017, in continuous time slots. These different date periods of collecting the data provided more diversity to allow a fair test of the classifier and more accurate evaluation of the work. In the model, nodes 1 and 5 are HP RP 4440, nodes 2–4 and 6 are HP RP 7420, and nodes 7–13 are HP DL 380 G5. The

Conclusion

In this paper we try to tackle the challenge of choosing the best cloud services to fulfill an organization’s requirements by automating the selection process based on actual workload pattern and resource demand acquired from existing service history data. The proposed methodology is modeled based on a naïve Bayes classifier to predict the performance metrics of cloud nodes with respect to different options for configuration of their resources. We examined naïve Bayes classifier along with

Acknowledgment

This paper was fully financially supported by King Saud University through the Vice Deanship of Research Chairs: Chair of Pervasive and Mobile Computing.

Abdullah Mohammed Al-Faifi is a Ph.D. candidate in college of computer and Information Sciences at king Saud University at Riyadh, Saudi Arabia. He is also research Staff member of research chair of pervasive and mobile computing in the university. He has completed about six years in Ph.D. program and he has published more than six papers in a good journal. His research interests include cloud computing and success factors.

References (37)

  • NethajiV. et al.

    An automatic approach to detect software anomalies in cloud computing using pragmatic Bayes approach

    Int. J. Mod. Educ. Comput. Sci.

    (2014)
  • KrunalPatel et al.

    Classification of cloud data using Bayesian classification

    Int. J. Sci. Res.

    (2013)
  • ValaJay et al.

    Service provider selection of IAAS using Naive Bayes approach

    Int. J. Eng. Res. Technol. (IJERT)

    (2012)
  • KamdarMiss Apexa B. et al.

    A survey: classification of huge cloud datasets with efficient map-reduce policy

    Int. J. Eng. Trends Technol. (IJETT)

    (2014)
  • AungSwe Swe et al.

    Naïve Bayes classifier based traffic prediction system on cloud infrastructure

  • Reyhane Askari Hemmat, Abdelhakim Hafid, SLA Violation Prediction In Cloud Computing: A Machine Learning Perspective,...
  • TamaniNouredine et al.

    Vehicular cloud service provider selection: A flexible approach

  • EzenwokeAzubuike et al.

    Towards a visualization framework for service selection in cloud e-marketplaces

  • Cited by (0)

    Abdullah Mohammed Al-Faifi is a Ph.D. candidate in college of computer and Information Sciences at king Saud University at Riyadh, Saudi Arabia. He is also research Staff member of research chair of pervasive and mobile computing in the university. He has completed about six years in Ph.D. program and he has published more than six papers in a good journal. His research interests include cloud computing and success factors.

    Biao Song received his Ph.D. degree in Computer Engineering from Kyung Hee University, South Korea in 2012. Currently he is with King Saud University, Kingdom of Saudi Arabia as Assistant Professor, in College of Computer and Information Science. His current research interests are Cloud computing, remote display technologies and dynamic VM resource allocation.

    Mohammad Mehedi Hassan is currently an Associate Professor of Information Systems Department in the College of Computer and Information Sciences (CCIS), King Saud University (KSU), Riyadh, Kingdom of Saudi Arabia. He received his Ph.D. degree in Computer Engineering from Kyung Hee University, South Korea in February 2011. He received Best Paper Award from CloudComp conference at China in 2014. He also received Excellence in Research Award from CCIS, KSU in 2015 and 2016 respectively. He has published over 100+ research papers in the journals and conferences of international repute. He has served as, chair, and Technical Program Committee member in numerous international conferences/workshops like IEEE HPCC, ACM BodyNets, IEEE ICME, IEEE ScalCom, ACM Multimedia, ICA3PP, IEEE ICC, TPMC, IDCS, etc. He has also played role of the guest editor of several international ISI-indexed journals. His research areas of interest are cloud federation, multimedia cloud, sensor-cloud, Internet of things, Big data, mobile cloud, cloud security, IPTV, sensor network, 5G network, social network, publish/subscribe system and recommender system. He is a member of IEEE.

    Atif Alamri is an Associate Professor of Software Engineering Department at the College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia. He is the director of Research chair of pervasive and Mobile Computing (CPMC). He received the Ph.D. degree in Computer Science from the University of Ottawa, ON, Canada in 2010. His research interest includes multimedia assisted health systems, social network, Big data, ambient intelligence, and cloud computing.

    Abdu Gumaei, has obtained his bachelor degree in Computer Science from Computer Science Department at AL-Mustansiriya University in Baghdad, Iraq; and his master degree in Computer Science from Computer Science Department at King Saud University, Riyadh, Saudi Arabia. Currently, he is a Ph.D. candidate in Computer Science at King Saud University. His main areas of interest are software engineering, image processing, computer vision and machine learning. He has worked as a lecturer and taught many courses such as programming languages at computer science department, Taiz University. He has several research in the field of image processing. He has obtained a patent from the United States Patent and Trademark Office (USPTO) in year 2013.

    View full text