Skip to main content

Advertisement

Log in

On accurate prediction of cloud workloads with adaptive pattern mining

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Resource provisioning for cloud computing requires adaptive and accurate prediction of cloud workloads. However, existing studies in workload prediction have faced significant challenges in predicting time-varying cloud workloads of diverse trends and patterns, and the lack of accurate prediction often results in resource waste and violation of Service-Level Agreements (SLAs). We propose a bagging-like ensemble framework for cloud workload prediction with Adaptive Pattern Mining (APM). Within this framework, we first design a two-step method with various models to simultaneously capture the “low frequency” and “high frequency” characteristics of highly variable workloads. For a given workload, we further develop an error-based weights aggregation method to integrate the prediction results from multiple pattern-specific models into a final result to predict a future workload. We conduct experiments to demonstrate the efficacy of APM in workload prediction with various prediction lengths using two real-world workload traces from Google and Alibaba cloud data centers, which are of different types. Extensive experimental results show that APM achieves above 19.62% improvement over several classic and state-of-the-art workload prediction methods for highly variable real-world cloud workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data Availability

The cluster data that support the findings of this study are available in “https://github.com/google/cluster-data” and “https://github.com/alibaba/clusterdata”.

Notes

  1. https://github.com/xdbdilab/APM

  2. https://github.com/alibaba/clusterdata.

  3. https://github.com/google/cluster-data.

References

  1. Chen Z, Hu J, Min G, Zomaya AY, El-Ghazawi T (2019) Towards accurate prediction for high-dimensional and highly-variable cloud workloads with deep learning. IEEE Transact Parallel Distributed Syst 31(4):923–934

    Article  Google Scholar 

  2. Di S, Kondo D, Cirne W (2014) Google hostload prediction based on bayesian model with optimized feature combination. J Parallel Distrib Comput 74(1):1820–1832

    Article  Google Scholar 

  3. Yang Q, Zhou Y, Yu Y, Yuan J, Xing X, Du S (2015) Multi-step-ahead host load prediction using autoencoder and echo state networks in cloud computing. J Supercomput 71(8):3037–3053. https://doi.org/10.1007/s11227-015-1426-8

    Article  Google Scholar 

  4. Zhang W, Duan P, Yang LT, Xia F, Li Z, Lu Q, Gong W, Yang S (2017) Resource requests prediction in the cloud computing environment with a deep belief network. Software: Practice and Experience 47(3), 473–488 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2426

  5. Tang X, Liao X, Zheng J, Yang X (2018) Energy efficient job scheduling with workload prediction on cloud data center. Clust Comput 21(3):1581–1593. https://doi.org/10.1007/s10586-018-2154-7

    Article  Google Scholar 

  6. Kumar S, Muthiyan N, Gupta S, Dileep A, Nigam A (2018) Association learning based hybrid model for cloud workload prediction. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 IEEE. https://ieeexplore.ieee.org/abstract/document/8488996

  7. Gupta S, Dileep AD, Gonsalves TA (2018) A joint feature selection framework for multivariate resource usage prediction in cloud servers using stability and prediction performance. J Supercomput 74(11):6033–6068. https://doi.org/10.1007/s11227-018-2510-7

    Article  Google Scholar 

  8. Duggan M, Shaw R, Duggan J, Howley E, Barrett E (2019) A multitime-steps-ahead prediction approach for scheduling live migration in cloud data centers. Software: Practice and Experience 49(4), 617–639 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2635

  9. Breiman L (1996) Bagging predictors. Machine learning 24(2), 123–140 https://link.springer.com/article/10.1007/BF00058655

  10. Yang Q, Peng C, Zhao H, Yu Y, Zhou Y, Wang Z, Du S (2014) A new method based on psr and ea-gmdh for host load prediction in cloud computing system. J Supercomput 68(3):1402–1417. https://doi.org/10.1007/s11227-014-1097-x

    Article  Google Scholar 

  11. Cetinski K, Juric MB (2015) Ame-wpc: Advanced model for efficient workload prediction in the cloud. J Netw Comput Appl 55:191–201

    Article  Google Scholar 

  12. Tofighy S, Rahmanian AA, Ghobaei-Arani M (2018) An ensemble cpu load prediction algorithm using a bayesian information criterion and smooth filters in a cloud computing environment. Software: Practice and Experience 48(12), 2257–2277 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2641

  13. Amiri M, Mohammad-Khanli L, Mirandola R (2018) An online learning model based on episode mining for workload prediction in cloud. Future Gener Comput Syst 87:83–101

    Article  Google Scholar 

  14. Amiri M, Mohammad-Khanli L, Mirandola R (2018) A sequential pattern mining model for application workload prediction in cloud environment. J Netw Comput Appl 105:21–62. https://doi.org/10.1016/j.jnca.2017.12.015

    Article  Google Scholar 

  15. Kumar J, Singh AK (2021) Performance assessment of time series forecasting models for cloud datacenter networks’ workload prediction. Wireless Personal Communict 116(3):1949–1969. https://doi.org/10.1007/s11277-020-07773-6

    Article  Google Scholar 

  16. Zharikov E, Telenyk S, Bidyuk P (2020) Adaptive workload forecasting in cloud data centers. J Grid Comput 18(1):149–168. https://doi.org/10.1007/s10723-019-09501-2

    Article  Google Scholar 

  17. Cao J, Fu J, Li M, Chen J (2014) Cpu load prediction for cloud environment based on a dynamic ensemble model. Software: Practice and Experience 44(7), 793–804 https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2231

  18. Liu C, Liu C, Shang Y, Chen S, Cheng B, Chen J (2017) An adaptive prediction approach based on workload pattern discrimination in the cloud. J Netw Comput Appl 80:35–44

    Article  Google Scholar 

  19. Box GE, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J American Statist Association 65(332):1509–1526

    Article  MathSciNet  MATH  Google Scholar 

  20. Yu Y, Jindal V, Bastani F, Li F, Yen I.-L (2018) Improving the smartness of cloud management via machine learning based workload prediction. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 38–44 IEEE. https://ieeexplore.ieee.org/abstract/document/8377827

  21. Rahmanian AA, Ghobaei-Arani M, Tofighy S (2018) A learning automata-based ensemble resource usage prediction algorithm for cloud computing environment. Future Generat Comput Syst 79:54–71

    Article  Google Scholar 

  22. Kaur G, Bala A, Chana I (2019) An intelligent regressive ensemble approach for predicting resource usage in cloud computing. J Parallel Distributed Comput 123:1–12

    Article  Google Scholar 

  23. Berral JL, Wang C, Youssef A (2020) AI4DL: Mining Behaviors of Deep Learning Workloads for Resource Management, 7

  24. Zhou S, Li J, Zhang K, Wen M, Guan Q (2020) An Accurate Ensemble Forecasting Approach for Highly Dynamic Cloud Workload With VMD and R-Transformer. IEEE Access 8:115992–116003. https://doi.org/10.1109/ACCESS.2020.3004370

    Article  Google Scholar 

  25. Wang X, Cao J, Yang D, Qin Z, Buyya R (2021) Online cloud resource prediction via scalable window waveform sampling on classified workloads. Future Generat Comput Syst 117:338–358. https://doi.org/10.1016/j.future.2020.12.005

    Article  Google Scholar 

  26. Song B, Yu Y, Zhou Y, Wang Z, Du S (2018) Host load prediction with long short-term memory in cloud computing. J Supercomput 74(12):6554–6568. https://doi.org/10.1007/s11227-017-2044-4

    Article  Google Scholar 

  27. Kumar J (2021) Self directed learning based workload forecasting model for cloud resource management. Information Sciences, 22

  28. Mason K, Duggan M, Barrett E, Duggan J, Howley E (2018) Predicting host cpu utilization in the cloud using evolutionary neural networks. Future Generat Comput Syst 86:162–173

    Article  Google Scholar 

  29. Ullah QZ, Khan GM, Hassan S (2020) Cloud infrastructure estimation and auto-scaling using recurrent cartesian genetic programming-based ANN. IEEE Access 8:17965–17985. https://doi.org/10.1109/ACCESS.2020.2966678

    Article  Google Scholar 

  30. Saxena D, Singh AK (2020) Auto-adaptive learning-based workload forecasting in dynamic cloud environment. Inter J Comput Appl 1–11. https://doi.org/10.1080/1206212X.2020.1830245

  31. Shyam GK, Manvi SS (2016) Virtual resource prediction in cloud environment: a bayesian approach. J Netw and Comut Appl 65:144–154

  32. Alibaba Cluster Trace Program. https://github.com/alibaba/clusterdata/tree/v2018

  33. Google cluster-usage traces v3. https://drive.google.com/file/d/10r6cnJ5cJ89fPWCgj7j4LtLBqYN9RiI9/view

  34. Hirwa JS, Cao J (2014) An ensemble multivariate model for resource performance prediction in the cloud. In: IFIP International Conference on Network and Parallel Computing, pp. 333–346 Springer. https://link.springer.com/chapter/10.1007/978-3-662-44917-2_28

  35. Sagi O, Rokach L (2018) Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery 8(4), 1249 https://doi.org/10.1002/widm.1249._eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1249. Accessed 2022-05-26

  36. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 https://doi.org/10.1109/TNN.2005.845141.Conference Name: IEEE Transactions on Neural Networks

  37. Zhang T, Ramakrishnan R, Livny M (1996) Birch: An efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114. https://doi.org/10.1145/235968.233324

    Article  Google Scholar 

  38. Sherstinsky A (2020) Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena 404:132306. https://doi.org/10.1016/j.physd.2019.132306

    Article  MathSciNet  MATH  Google Scholar 

  39. Ueda N, Nakano R (1996) Generalization error of ensemble estimators. In: Proceedings of International Conference on Neural Networks (ICNN’96), vol. 1, pp. 90–95. IEEE, Washington, DC, USA https://doi.org/10.1109/ICNN.1996.548872.http://ieeexplore.ieee.org/document/548872/ Accessed 2021-06-23

  40. Chen M, Li X, Zhao T (2019) On Generalization Bounds of a Family of Recurrent Neural Networks. arXiv:1910.12947 [cs, stat] arXiv: 1910.12947. Accessed 2021-04-01

  41. Li X, Lu J, Wang Z, Haupt J, Zhao T (2019) On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond. arXiv:1806.05159 [cs, stat] arXiv: 1806.05159. Accessed 2021-07-09

  42. Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of Machine Learning, Adaptive Computation and Machine Learning series, 2nd edn. MIT Press. https://books.google.co.id/books?id=dWB9DwAAQBAJ

  43. Zhu Y, Zhang W, Chen Y, Gao H (2019) A novel approach to workload prediction using attention-based lstm encoder-decoder network in cloud environment. EURASIP Journal on Wireless Communications and Networking 2019(1), 1–18 https://link.springer.com/article/10.1186/s13638-019-1605-z

  44. Sun Q, Tan Z, Zhou X (2020) Workload prediction of cloud computing based on svm and bp neural networks. Journal of Intelligent & Fuzzy Systems 39(3), 2861–2867 https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs191266

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China [Grant No. 62172316]; the Ministry of Education Humanities and Social Science Project of China [Grant No. 17YJA790047]; the Soft Science Research Plans of Shaanxi Province [Grant No. 2020KRZ018]; the Research project on major theoretical and practical problems of philosophy and Social Sciences in Shaanxi Province [Grant No. 20JZ-25]; the Key R &D Program of Shaanxi [Grant No. 2019ZDLGY13-03-02]; the Natural Science Foundation of Shaanxi Province, China [Grant No. 2019JM-368]; and the Key R &D Program of Hebei [Grant No. 20310102D]

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bao, L., Yang, J., Zhang, Z. et al. On accurate prediction of cloud workloads with adaptive pattern mining. J Supercomput 79, 160–187 (2023). https://doi.org/10.1007/s11227-022-04647-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04647-5

Keywords