Abstract
Resource provisioning for cloud computing requires adaptive and accurate prediction of cloud workloads. However, existing studies in workload prediction have faced significant challenges in predicting time-varying cloud workloads of diverse trends and patterns, and the lack of accurate prediction often results in resource waste and violation of Service-Level Agreements (SLAs). We propose a bagging-like ensemble framework for cloud workload prediction with Adaptive Pattern Mining (APM). Within this framework, we first design a two-step method with various models to simultaneously capture the “low frequency” and “high frequency” characteristics of highly variable workloads. For a given workload, we further develop an error-based weights aggregation method to integrate the prediction results from multiple pattern-specific models into a final result to predict a future workload. We conduct experiments to demonstrate the efficacy of APM in workload prediction with various prediction lengths using two real-world workload traces from Google and Alibaba cloud data centers, which are of different types. Extensive experimental results show that APM achieves above 19.62% improvement over several classic and state-of-the-art workload prediction methods for highly variable real-world cloud workloads.














Similar content being viewed by others
Data Availability
The cluster data that support the findings of this study are available in “https://github.com/google/cluster-data” and “https://github.com/alibaba/clusterdata”.
Notes
https://github.com/xdbdilab/APM
https://github.com/alibaba/clusterdata.
https://github.com/google/cluster-data.
References
Chen Z, Hu J, Min G, Zomaya AY, El-Ghazawi T (2019) Towards accurate prediction for high-dimensional and highly-variable cloud workloads with deep learning. IEEE Transact Parallel Distributed Syst 31(4):923–934
Di S, Kondo D, Cirne W (2014) Google hostload prediction based on bayesian model with optimized feature combination. J Parallel Distrib Comput 74(1):1820–1832
Yang Q, Zhou Y, Yu Y, Yuan J, Xing X, Du S (2015) Multi-step-ahead host load prediction using autoencoder and echo state networks in cloud computing. J Supercomput 71(8):3037–3053. https://doi.org/10.1007/s11227-015-1426-8
Zhang W, Duan P, Yang LT, Xia F, Li Z, Lu Q, Gong W, Yang S (2017) Resource requests prediction in the cloud computing environment with a deep belief network. Software: Practice and Experience 47(3), 473–488 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2426
Tang X, Liao X, Zheng J, Yang X (2018) Energy efficient job scheduling with workload prediction on cloud data center. Clust Comput 21(3):1581–1593. https://doi.org/10.1007/s10586-018-2154-7
Kumar S, Muthiyan N, Gupta S, Dileep A, Nigam A (2018) Association learning based hybrid model for cloud workload prediction. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 IEEE. https://ieeexplore.ieee.org/abstract/document/8488996
Gupta S, Dileep AD, Gonsalves TA (2018) A joint feature selection framework for multivariate resource usage prediction in cloud servers using stability and prediction performance. J Supercomput 74(11):6033–6068. https://doi.org/10.1007/s11227-018-2510-7
Duggan M, Shaw R, Duggan J, Howley E, Barrett E (2019) A multitime-steps-ahead prediction approach for scheduling live migration in cloud data centers. Software: Practice and Experience 49(4), 617–639 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2635
Breiman L (1996) Bagging predictors. Machine learning 24(2), 123–140 https://link.springer.com/article/10.1007/BF00058655
Yang Q, Peng C, Zhao H, Yu Y, Zhou Y, Wang Z, Du S (2014) A new method based on psr and ea-gmdh for host load prediction in cloud computing system. J Supercomput 68(3):1402–1417. https://doi.org/10.1007/s11227-014-1097-x
Cetinski K, Juric MB (2015) Ame-wpc: Advanced model for efficient workload prediction in the cloud. J Netw Comput Appl 55:191–201
Tofighy S, Rahmanian AA, Ghobaei-Arani M (2018) An ensemble cpu load prediction algorithm using a bayesian information criterion and smooth filters in a cloud computing environment. Software: Practice and Experience 48(12), 2257–2277 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2641
Amiri M, Mohammad-Khanli L, Mirandola R (2018) An online learning model based on episode mining for workload prediction in cloud. Future Gener Comput Syst 87:83–101
Amiri M, Mohammad-Khanli L, Mirandola R (2018) A sequential pattern mining model for application workload prediction in cloud environment. J Netw Comput Appl 105:21–62. https://doi.org/10.1016/j.jnca.2017.12.015
Kumar J, Singh AK (2021) Performance assessment of time series forecasting models for cloud datacenter networks’ workload prediction. Wireless Personal Communict 116(3):1949–1969. https://doi.org/10.1007/s11277-020-07773-6
Zharikov E, Telenyk S, Bidyuk P (2020) Adaptive workload forecasting in cloud data centers. J Grid Comput 18(1):149–168. https://doi.org/10.1007/s10723-019-09501-2
Cao J, Fu J, Li M, Chen J (2014) Cpu load prediction for cloud environment based on a dynamic ensemble model. Software: Practice and Experience 44(7), 793–804 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2231
Liu C, Liu C, Shang Y, Chen S, Cheng B, Chen J (2017) An adaptive prediction approach based on workload pattern discrimination in the cloud. J Netw Comput Appl 80:35–44
Box GE, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J American Statist Association 65(332):1509–1526
Yu Y, Jindal V, Bastani F, Li F, Yen I.-L (2018) Improving the smartness of cloud management via machine learning based workload prediction. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 38–44 IEEE. https://ieeexplore.ieee.org/abstract/document/8377827
Rahmanian AA, Ghobaei-Arani M, Tofighy S (2018) A learning automata-based ensemble resource usage prediction algorithm for cloud computing environment. Future Generat Comput Syst 79:54–71
Kaur G, Bala A, Chana I (2019) An intelligent regressive ensemble approach for predicting resource usage in cloud computing. J Parallel Distributed Comput 123:1–12
Berral JL, Wang C, Youssef A (2020) AI4DL: Mining Behaviors of Deep Learning Workloads for Resource Management, 7
Zhou S, Li J, Zhang K, Wen M, Guan Q (2020) An Accurate Ensemble Forecasting Approach for Highly Dynamic Cloud Workload With VMD and R-Transformer. IEEE Access 8:115992–116003. https://doi.org/10.1109/ACCESS.2020.3004370
Wang X, Cao J, Yang D, Qin Z, Buyya R (2021) Online cloud resource prediction via scalable window waveform sampling on classified workloads. Future Generat Comput Syst 117:338–358. https://doi.org/10.1016/j.future.2020.12.005
Song B, Yu Y, Zhou Y, Wang Z, Du S (2018) Host load prediction with long short-term memory in cloud computing. J Supercomput 74(12):6554–6568. https://doi.org/10.1007/s11227-017-2044-4
Kumar J (2021) Self directed learning based workload forecasting model for cloud resource management. Information Sciences, 22
Mason K, Duggan M, Barrett E, Duggan J, Howley E (2018) Predicting host cpu utilization in the cloud using evolutionary neural networks. Future Generat Comput Syst 86:162–173
Ullah QZ, Khan GM, Hassan S (2020) Cloud infrastructure estimation and auto-scaling using recurrent cartesian genetic programming-based ANN. IEEE Access 8:17965–17985. https://doi.org/10.1109/ACCESS.2020.2966678
Saxena D, Singh AK (2020) Auto-adaptive learning-based workload forecasting in dynamic cloud environment. Inter J Comput Appl 1–11. https://doi.org/10.1080/1206212X.2020.1830245
Shyam GK, Manvi SS (2016) Virtual resource prediction in cloud environment: a bayesian approach. J Netw and Comut Appl 65:144–154
Alibaba Cluster Trace Program. https://github.com/alibaba/clusterdata/tree/v2018
Google cluster-usage traces v3. https://drive.google.com/file/d/10r6cnJ5cJ89fPWCgj7j4LtLBqYN9RiI9/view
Hirwa JS, Cao J (2014) An ensemble multivariate model for resource performance prediction in the cloud. In: IFIP International Conference on Network and Parallel Computing, pp. 333–346 Springer. https://link.springer.com/chapter/10.1007/978-3-662-44917-2_28
Sagi O, Rokach L (2018) Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery 8(4), 1249 https://doi.org/10.1002/widm.1249._eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1249. Accessed 2022-05-26
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 https://doi.org/10.1109/TNN.2005.845141.Conference Name: IEEE Transactions on Neural Networks
Zhang T, Ramakrishnan R, Livny M (1996) Birch: An efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114. https://doi.org/10.1145/235968.233324
Sherstinsky A (2020) Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena 404:132306. https://doi.org/10.1016/j.physd.2019.132306
Ueda N, Nakano R (1996) Generalization error of ensemble estimators. In: Proceedings of International Conference on Neural Networks (ICNN’96), vol. 1, pp. 90–95. IEEE, Washington, DC, USA https://doi.org/10.1109/ICNN.1996.548872.http://ieeexplore.ieee.org/document/548872/ Accessed 2021-06-23
Chen M, Li X, Zhao T (2019) On Generalization Bounds of a Family of Recurrent Neural Networks. arXiv:1910.12947 [cs, stat] arXiv: 1910.12947. Accessed 2021-04-01
Li X, Lu J, Wang Z, Haupt J, Zhao T (2019) On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond. arXiv:1806.05159 [cs, stat] arXiv: 1806.05159. Accessed 2021-07-09
Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of Machine Learning, Adaptive Computation and Machine Learning series, 2nd edn. MIT Press. https://books.google.co.id/books?id=dWB9DwAAQBAJ
Zhu Y, Zhang W, Chen Y, Gao H (2019) A novel approach to workload prediction using attention-based lstm encoder-decoder network in cloud environment. EURASIP Journal on Wireless Communications and Networking 2019(1), 1–18 https://link.springer.com/article/10.1186/s13638-019-1605-z
Sun Q, Tan Z, Zhou X (2020) Workload prediction of cloud computing based on svm and bp neural networks. Journal of Intelligent & Fuzzy Systems 39(3), 2861–2867 https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs191266
Acknowledgements
This work is supported by the National Natural Science Foundation of China [Grant No. 62172316]; the Ministry of Education Humanities and Social Science Project of China [Grant No. 17YJA790047]; the Soft Science Research Plans of Shaanxi Province [Grant No. 2020KRZ018]; the Research project on major theoretical and practical problems of philosophy and Social Sciences in Shaanxi Province [Grant No. 20JZ-25]; the Key R &D Program of Shaanxi [Grant No. 2019ZDLGY13-03-02]; the Natural Science Foundation of Shaanxi Province, China [Grant No. 2019JM-368]; and the Key R &D Program of Hebei [Grant No. 20310102D]
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bao, L., Yang, J., Zhang, Z. et al. On accurate prediction of cloud workloads with adaptive pattern mining. J Supercomput 79, 160–187 (2023). https://doi.org/10.1007/s11227-022-04647-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04647-5