Abstract
Resource provisioning in cloud servers depends on future resource utilization of different jobs. As resource utilization trends vary dynamically, effective resource provisioning requires prediction of future resource utilization. The problem becomes more complicated as performance metrics related to one resource may depend on utilization of other resources also. In this paper, different multivariate frameworks are proposed for improving the future resource metric prediction in cloud. Different techniques for identifying the set of resource metrics relevant for the prediction of desired resource metric are analyzed. The proposed multivariate feature selection and prediction frameworks are validated for CPU utilization prediction in Google cluster trace. Joint analysis based on the prediction performance of the multivariate framework as well as its stability is used for selecting the most suitable feature selection framework. The results of the joint analysis indicate that features selected using the Granger causality technique perform best for multivariate resource usage prediction.
Similar content being viewed by others
References
Alelyani S, Zhao Z, Liu H (2011) A dilemma in assessing stability of feature selection algorithms. In: International Conference on High Performance Computing and Communications (HPCC), IEEE, pp 701–707. https://doi.org/10.1109/HPCC.2011.99
Borkowski M, Schulte S, Hochreiner C (2016) Predicting cloud resource utilization. In: 9th International Conference on Utility and Cloud Computing (UCC), ACM, New York, USA, pp 37–42. https://doi.org/10.1145/2996890.2996907
Caglar F, Gokhale A (2014) iOverbook: intelligent resource-overbooking to support soft real-time applications in the cloud. In: 7th International Conference on Cloud Computing (CLOUD), IEEE, Anchorage, USA, pp 538–545. https://doi.org/10.1109/CLOUD.2014.78
Chakraborty K, Mehrotra K, Mohan CK, Ranka S (1992) Forecasting the behavior of multivariate time series using neural networks. Neural Netw 5(6):961–970. https://doi.org/10.1016/S0893-6080(05)80092-9
Chen Z, Zhu Y, Di Y, Feng S (2015) Self-adaptive prediction of cloud resource demands using ensemble model and subtractive-fuzzy clustering based fuzzy neural network. Comput Intell Neurosci 919805:17. https://doi.org/10.1155/2015/919805
Ching WK, Ng MK, Fung ES (2008) Higher-order multivariate Markov chains and their applications. Linear Algebra Appl 428(23):492–507. https://doi.org/10.1016/j.laa.2007.05.021
Dannecker L (2015) Energy time series forecasting: efficient and accurate forecasting of evolving time series from the energy domain, 1st edn. Springer, Berlin. https://doi.org/10.1007/978-3-658-11039-0
De Silva AM, Leong PH (2014) Grammar based feature generation for time-series prediction, 1st edn. Springer, Berlin. https://doi.org/10.1007/978-981-287-411-5
Di S, Kondo D, Cirne W (2014) Google hostload prediction based on Bayesian model with optimized feature combination. J Parallel Distrib Comput 74(1):1820–1832. https://doi.org/10.1016/j.jpdc.2013.10.001
Dougherty B, White J, Schmidt DC (2012) Model-driven auto-scaling of green cloud computing infrastructure. Future Gener Comput Syst 28(2):371–378. https://doi.org/10.1016/j.future.2011.05.009
Fang L, Zhao H, Wang P, Yu M, Yan J, Cheng W, Chen P (2015) Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data. Biomed Signal Process Control 21:82–89. https://doi.org/10.1016/j.bspc.2015.05.011
Gong Z, Gu X, Wilkes J (2010) PRESS: PRedictive Elastic ReSource Scaling for cloud systems. In: International Conference on Network and Service Management (CNSM), IEEE, Niagara Falls, Canada, pp 9–16. https://doi.org/10.1109/CNSM.2010.5691343
Granero MS, Segovia JT, Prez JG (2008) Some comments on hurst exponent and the long memory processes on capital markets. Physica A 387(22):5543–5551. https://doi.org/10.1016/j.physa.2008.05.053
Grossglauser M, Bolot JC (1996) On the relevance of long-range dependence in network traffic. IEEE/ACM Trans Netw 26(4):15–24. https://doi.org/10.1109/90.803379
Gupta S, Dinesh DA (2017) Resource usage prediction of cloud workloads using deep bidirectional long short term memory networks. In: 11th International Conference on Advanced Networks and Telecommunications Systems (ANTS), IEEE, Bhubaneswar, India, pp 1–6. https://doi.org/10.1109/ANTS.2017.8384098
Gupta S, Dileep AD, Gonsalves TA (2016) Fractional difference based hybrid model for resource prediction in cloud network. In: 5th International Conference on Network, Communication and Computing (ICNCC), ACM, Kyoto, Japan, pp 93–97. https://doi.org/10.1145/3033288.3033310
Hirwa JS, Cao J (2014) An ensemble multivariate model for resource performance prediction in the cloud. In: Network and Parallel Computing NPC 2014 Lecture Notes in Computer Science, vol 8707, pp 333–346
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu R, Jiang J, Liu G, Wang L (2013) CPU load prediction using support vector regression and Kalman smoother for cloud. In: 33rd International Conference on Distributed Computing Systems Workshops (ICDCSW), IEEE, Philadelphia, USA, pp 88–92. https://doi.org/10.1109/ICDCSW.2013.60
Huang J, Li C, Yu J (2012) Resource prediction based on double exponential smoothing in cloud computing. In: 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), pp 2056–2060. https://doi.org/10.1109/CECNet.2012.6201461
Hurst HE (1951) Long-term storage capacity of reservoirs. Trans Am Soc Civ Eng 116:770–808
Kaur T, Chana I (2015) Energy efficiency techniques in cloud computing: a survey and taxonomy. ACM Comput Surv 48(2):22:1–22:46. https://doi.org/10.1145/2742488
Leland WE, Taqqu MS, Willinger W, Wilson DV (1994) On the self-similar nature of ethernet traffic. IEEE/ACM Trans Netw 2(1):1–15. https://doi.org/10.1109/90.282603
Li Z, Wang C, Lv H, Xu T (2015) Research on CPU workload prediction and balancing in cloud environment. Int J Hybrid Inf Technol 8(2):159–172
Liang J, Nahrstedt K, Zhou Y (2004) Adaptive multi-resource prediction in distributed resource sharing environment. In: International Symposium on Cluster Computing and the Grid (CCGrid), IEEE, pp 293–300. https://doi.org/10.1109/CCGrid.2004.1336580
Liu J, Zhang Y, Zhou Y, Zhang D, Liu H (2015) Aggressive resource provisioning for ensuring QoS in virtualized environments. IEEE Trans Cloud Comput 3(2):119–131. https://doi.org/10.1109/TCC.2014.2353045
Liu T, Wei H, Zhang K, Guo W (2016) Mutual information based feature selection for multivariate time series forecasting. In: 35th Chinese Control Conference (CCC), IEEE, Chengdu, China, pp 7110–7114. https://doi.org/10.1109/ChiCC.2016.7554480
Mandelbrot BB (1983) The fractal geometry of nature, vol 173. Macmillan, London
Messias VR, Estrella JC, Ehlers R, Santana MJ, Santana RC, Reiff-Marganiec S (2016) Combining time series prediction models using genetic algorithm to autoscaling web applications hosted in the cloud infrastructure. Neural Comput Appl 27(8):2383–2406. https://doi.org/10.1007/s00521-015-2133-3
Nguyen H, Shen Z, Gu X, Subbiah S, Wilkes J (2013) AGILE: elastic distributed resource scaling for infrastructure-as-a-service. In: 10th International Conference on Autonomic Computing (ICAC), USENIX, San Jose, CA, pp 69–82
Nogueira S, Brown G (2016) Measuring the stability of feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Springer, Cham, pp 442–457
Peña D, Sánchez I (2007) Measuring the advantages of multivariate vs. univariate forecasts. J Time Ser Anal 28(6):886–909. https://doi.org/10.1111/j.1467-9892.2007.00538.x
Reiss C, Wilkes J, Hellerstein JL (2011) Google cluster-usage traces: format + schema. Revised 17 Nov 2014 for version 2.1. Posted at https://github.com/google/cluster-data
Shyam GK, Manvi SS (2016) Virtual resource prediction in cloud environment: a Bayesian approach. J Netw Comput Appl 65:144–154. https://doi.org/10.1016/j.jnca.2016.03.002
Sims CA (1980) Macroeconomics and reality. Econom J Econom Soc 48(1):1–48. https://doi.org/10.2307/1912017
Song B, Yu Y, Zhou Y, Wang Z, Du S (2017) Host load prediction with long short-term memory in cloud computing. J Supercomput. https://doi.org/10.1007/s11227-017-2044-4
Sun Y, Li J, Liu J, Chow C, Sun B, Wang R (2015) Using causal discovery for feature selection in multivariate numerical time series. Mach Learn 101(1–3):377–395. https://doi.org/10.1007/s10994-014-5460-1
Trapletti A, Leisch F, Hornik K (2000) Stationary and integrated autoregressive neural network processes. Neural Comput 12(10):2427–2450. https://doi.org/10.1162/089976600300015006
Wang H, Khoshgoftaar TM, Napolitano A (2015) Stability of three forms of feature selection methods on software engineering data. In: International Conference on Software Engineering and Knowledge Engineering (SEKE), pp 385–390. https://doi.org/10.1142/S0218194015400288
Ye J, Xiao C, Esteves RM, Rong C (2015) Time series similarity evaluation based on Spearmans correlation coefficients and distance measures. In: International Conference on Cloud Computing and Big Data in Asia, Springer, pp 319–331
Zhang Q, Zhani MF, Zhang S, Zhu Q, Boutaba R, Hellerstein JL (2012) Dynamic energy-aware capacity provisioning for cloud computing environments. In: International Conference on Autonomic Computing (ICAC), ACM, New York, NY, USA, pp 145–154. https://doi.org/10.1145/2371536.2371562
Zhang Y, Zhong M, Geng N, Jiang Y (2017) Forecasting electric vehicles sales with univariate and multivariate time series models: the case of China. PLoS ONE 12(5):1–15. https://doi.org/10.1371/journal.pone.0176729
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gupta, S., Dileep, A.D. & Gonsalves, T.A. A joint feature selection framework for multivariate resource usage prediction in cloud servers using stability and prediction performance. J Supercomput 74, 6033–6068 (2018). https://doi.org/10.1007/s11227-018-2510-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2510-7