Performance prediction model for cloud service selection from smart data
Introduction
Cloud computing is emerging as a new and popular paradigm in computing and storage resource allocation technology over the Internet. This paradigm of computing offers significant benefits to businesses, government agencies, and the general population by relieving them of low-level tasks related to setting up IT infrastructure as well as allowing organizations to begin small and increase resources only when there is a high demand for service, thus enabling more time for innovation and the creation of business value [1]. Cloud computing is a model for offering a pool of services like software as a service, infrastructure as a service, and platform as a service, all of which are provided on demand.
Nowadays, migrating applications and/or data to a cloud service has proven to be a challenging process. Certain obstacles exist that may prevent the full features that cloud computing offers. Those challenges are predominantly regarding to the fact that existing applications have particular needs and configurations that must be fulfilled by cloud service providers [2], as each provider provides services with different levels of performance, costs, and other characteristics. Therefore, it is essential but also challenging to investigate how to estimate and predict how many resources are necessary to optimize cloud-computing job performance.
By looking to the existing research, we found that most studies have focused on vendor selection without taking into account organizations’ systems and performance. Some researchers have conducted studies about how to select the best cloud provider by identifying selection criteria, while others have focused on cost and security. In contrast, the current research is the first attempt to study user’s requirements in such a scenario that an existing service is about to be deployed on cloud platform or migrated from one cloud platform to another. We can observe that as system performance fluctuates according to workload parameters, a thorough understanding about the relationship between workload and performance becomes necessary to establish, leading to more accurate prediction of potential service performance on a new cloud platform.
In this paper, we try to address these challenges by examining and predicting the resources required on the cloud. Therefore, we analyze user’s requirements by applying a method for investigating the past service information and performance to establish a knowledge of required resources and to predict the performance before the migration to cloud. Thus, we proposed an automatic performance prediction model based on a naïve Bayes classifier to predict the performance metrics of cloud nodes with respect to different options for configuration of node resources (workload attributes). Naïve Bayes classifier is simple probabilistic classifier that are widely used for classification based on applying the Bayes theorem with strong (Naive) independence assumptions [[3], [4]]. In this paper we benefit from existing systems data by obtaining a dataset from a real environment and then conducted some experiments using a naïve Bayes classifier to predict the actual system performance. Moreover, we conducted another experiment to predict performance on cloud side using CPU benchmark data. Also in our models, we change the fit to a kernel density estimation function (KDEF) with different widths to solve the within-class variance problem.
The main contributions of this paper are briefly summarized as follows:
Adding requirement analysis for cloud vendor selection based on the existing paradigm of service.
Adopting Bayes classifier and KDEF to achieve better accuracy and granularity of performance prediction on new cloud platform and workload pattern.
Presenting experimental results with real dataset from Saudi Ministry of Finance and verifying the effectiveness of proposed solution.
The rest of this paper is organized as follows: Section 2 describes the existing researches on prediction methods using Bayes classifiers in cloud computing. Section 3 describes and introduces the Bayes classifier and describes the proposed model. Section 4 describes the experiments and results for the proposed model. Section 5 describes the conclusion of the paper and future work as well.
Section snippets
Related work
There exist many methods of prediction using Bayes classifiers in cloud computing. Such methods are designed to predict some cloud components, such as service level agreement (SLA), load prediction, resource classification, and classification of cloud data.
In [5] the authors proposed a method to predict SLA violation in the cloud using a Bayesian classifier by using the QoS of used services as input in Bayes. They used historical SLA datasets. In [6] they proposed a method for predicting the
Prediction models based on Bayes classifier
The problem addressed in this paper was to study and predict the performance metrics of cloud nodes based on workload attributes. The problem would be solved by taking a large dataset of labeled workload attributes from cloud nodes and building a naïve Bayes classifier from those labeled data. The naïve Bayes classifier would then be able to predict an unlabeled example based on the information learned from the labeled examples.
A model of predicting the performance of cloud node resources has
Dataset collection
We collected a large workload dataset from the KSA Ministry of Finance [36] that contains 28,147 instances from 13 cloud nodes. It was recorded during the period from March 1, 2016, to February 20, 2017, in continuous time slots. These different date periods of collecting the data provided more diversity to allow a fair test of the classifier and more accurate evaluation of the work. In the model, nodes 1 and 5 are HP RP 4440, nodes 2–4 and 6 are HP RP 7420, and nodes 7–13 are HP DL 380 G5. The
Conclusion
In this paper we try to tackle the challenge of choosing the best cloud services to fulfill an organization’s requirements by automating the selection process based on actual workload pattern and resource demand acquired from existing service history data. The proposed methodology is modeled based on a naïve Bayes classifier to predict the performance metrics of cloud nodes with respect to different options for configuration of their resources. We examined naïve Bayes classifier along with
Acknowledgment
This paper was fully financially supported by King Saud University through the Vice Deanship of Research Chairs: Chair of Pervasive and Mobile Computing.
Abdullah Mohammed Al-Faifi is a Ph.D. candidate in college of computer and Information Sciences at king Saud University at Riyadh, Saudi Arabia. He is also research Staff member of research chair of pervasive and mobile computing in the university. He has completed about six years in Ph.D. program and he has published more than six papers in a good journal. His research interests include cloud computing and success factors.
References (37)
- et al.
A method to dynamic stochastic multicriteria decision making with log-normally distributed random variables
Sci. World J.
(2013) - et al.
Google hostload prediction based on Bayesian model with optimized feature combination
J. Parallel Distrib. Comput.
(2014) - et al.
Bayesian classifiers based on kernel density estimation: Flexible classifiers
Internat. J. Approx. Reason.
(2009) - et al.
A survey on multi-criteria decision making methods for evaluating cloud computing services
J. Internet Technol.
(2015) Naïve (Bayes) at forty: The independence assumption in information retrieval
Mach. Learn.
(1998)- et al.
On the optimality of the simple Bayesian classifier under zero–one loss
Mach. Learn.
(1997) - Bing Tang, Mingdong Tang, Bayesian model-based prediction of service level agreement violations for cloud services, in:...
- et al.
Host load prediction in a Google compute cloud with a Bayesian model
- et al.
A cloud service resource classification strategy based on feature similarity
J. Netw.
(2014) - Obed Jules, Abdelhakim Hafid, Serhani, Mohamed Adel, Bayesian network, and probabilistic ontology driven trust model...
An automatic approach to detect software anomalies in cloud computing using pragmatic Bayes approach
Int. J. Mod. Educ. Comput. Sci.
Classification of cloud data using Bayesian classification
Int. J. Sci. Res.
Service provider selection of IAAS using Naive Bayes approach
Int. J. Eng. Res. Technol. (IJERT)
A survey: classification of huge cloud datasets with efficient map-reduce policy
Int. J. Eng. Trends Technol. (IJETT)
Naïve Bayes classifier based traffic prediction system on cloud infrastructure
Vehicular cloud service provider selection: A flexible approach
Towards a visualization framework for service selection in cloud e-marketplaces
Cited by (0)
Abdullah Mohammed Al-Faifi is a Ph.D. candidate in college of computer and Information Sciences at king Saud University at Riyadh, Saudi Arabia. He is also research Staff member of research chair of pervasive and mobile computing in the university. He has completed about six years in Ph.D. program and he has published more than six papers in a good journal. His research interests include cloud computing and success factors.
Biao Song received his Ph.D. degree in Computer Engineering from Kyung Hee University, South Korea in 2012. Currently he is with King Saud University, Kingdom of Saudi Arabia as Assistant Professor, in College of Computer and Information Science. His current research interests are Cloud computing, remote display technologies and dynamic VM resource allocation.
Mohammad Mehedi Hassan is currently an Associate Professor of Information Systems Department in the College of Computer and Information Sciences (CCIS), King Saud University (KSU), Riyadh, Kingdom of Saudi Arabia. He received his Ph.D. degree in Computer Engineering from Kyung Hee University, South Korea in February 2011. He received Best Paper Award from CloudComp conference at China in 2014. He also received Excellence in Research Award from CCIS, KSU in 2015 and 2016 respectively. He has published over 100+ research papers in the journals and conferences of international repute. He has served as, chair, and Technical Program Committee member in numerous international conferences/workshops like IEEE HPCC, ACM BodyNets, IEEE ICME, IEEE ScalCom, ACM Multimedia, ICA3PP, IEEE ICC, TPMC, IDCS, etc. He has also played role of the guest editor of several international ISI-indexed journals. His research areas of interest are cloud federation, multimedia cloud, sensor-cloud, Internet of things, Big data, mobile cloud, cloud security, IPTV, sensor network, 5G network, social network, publish/subscribe system and recommender system. He is a member of IEEE.
Atif Alamri is an Associate Professor of Software Engineering Department at the College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia. He is the director of Research chair of pervasive and Mobile Computing (CPMC). He received the Ph.D. degree in Computer Science from the University of Ottawa, ON, Canada in 2010. His research interest includes multimedia assisted health systems, social network, Big data, ambient intelligence, and cloud computing.
Abdu Gumaei, has obtained his bachelor degree in Computer Science from Computer Science Department at AL-Mustansiriya University in Baghdad, Iraq; and his master degree in Computer Science from Computer Science Department at King Saud University, Riyadh, Saudi Arabia. Currently, he is a Ph.D. candidate in Computer Science at King Saud University. His main areas of interest are software engineering, image processing, computer vision and machine learning. He has worked as a lecturer and taught many courses such as programming languages at computer science department, Taiz University. He has several research in the field of image processing. He has obtained a patent from the United States Patent and Trademark Office (USPTO) in year 2013.