Skip to main content
Log in

Long range dependence in cloud servers: a statistical analysis based on Google workload trace

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Analysis and characterization of cloud workloads provides crucial information for designing optimal resource management policies. In this work, we propose to analyse long range dependence nature of cloud resource workloads. Long range dependence is a phenomenon widely studied in Ethernet and Internet traffic. But there is a dearth of works that analyse long range dependence in cloud workloads. In this work, we propose to verify the presence of long range dependence in cloud workloads using autocorrelation analysis and rescaled range analysis method. In addition to experimental evidence, studies on long range dependence are incomplete without a sound theoretical justification in support of its origins in cloud workloads. In this context, we propose to analytically analyse, aggregate workload in the datacenter using different metrics like arrival, service distributions of jobs and their resource usage. For a dependable explanation of long range dependence in cloud workloads, we analyse workloads from standard real dataset of Google cluster trace. Based on the analysis, we see that analysed metrics display heavy tailed behaviour and using a mathematical formulation, we prove that aggregate workload exhibits long range dependence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Al-khafajiy M, Baker T, Waraich A, Al-Jumeily D, Hussain A (2018) Iot-fog optimal workload via fog offloading. In: 2018 IEEE/ACM international conference on utility and cloud computing companion (UCC Companion), pp 359–364. https://doi.org/10.1109/UCC-Companion.2018.00081

  2. Alam M, Shakil KA, Sethi S (2016) Analysis and clustering of workload in google cluster trace based on resource usage. In: International conference on computational science and engineering (CSE) and international conference on embedded and ubiquitous computing (EUC) and 15th international symposium on distributed computing and applications for business engineering (DCABES). IEEE, Paris, France, pp 740–747. https://doi.org/10.1109/CSE-EUC-DCABES.2016.271

  3. Ardagna D, Casolari S, Colajanni M, Panicucci B (2012) Dual time-scale distributed capacity allocation and load redirect algorithms for cloud systems. J Parallel Distrib Comput 72(6):796–808. https://doi.org/10.1016/j.jpdc.2012.02.014

    Article  MATH  Google Scholar 

  4. Ardagna D, Panicucci B, Trubian M, Zhang L (2012) Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Trans Serv Comput 5(1):2–19. https://doi.org/10.1109/TSC.2010.42

    Article  Google Scholar 

  5. Chunlin L, Layuan L (2014) Multi-layer resource management in cloud computing. J Netw Syst Manag 22(1):100–120. https://doi.org/10.1007/s10922-012-9261-1

    Article  Google Scholar 

  6. Cline D, Samorodnitsky G (1994) Subexponentiality of the product of independent random variables. Stoch Process Appl 49(1):75–98. https://doi.org/10.1016/0304-4149(94)90113-9

    Article  MathSciNet  MATH  Google Scholar 

  7. Delignette-Muller ML, Dutang C (2015) fitdistrplus: An R package for fitting distributions. J Stat Softw 64(4):1–34

    Article  Google Scholar 

  8. Doukhan P, Oppenheim G, Taqqu M (2002) Theory and applications of long-range dependence, 1st edn. Springer, Berlin

    MATH  Google Scholar 

  9. Field AJ, Harder U, Harrison PG (2004) Measurement and modelling of self-similar traffic in computer networks. IEE Proc Commun 151(4):355–363. https://doi.org/10.1049/ip-com:20040368

    Article  Google Scholar 

  10. Granero MS, Segovia JT, Pérez JG (2008) Some comments on Hurst exponent and the long memory processes on capital markets. Physica A Stat Mech Appl 387(22):5543–5551. https://doi.org/10.1016/j.physa.2008.05.053

    Article  Google Scholar 

  11. Grossglauser M, Bolot JC (1996) On the relevance of long-range dependence in network traffic. SIGCOMM Comput Commun Rev 26(4):15–24. https://doi.org/10.1145/248157.248159

    Article  Google Scholar 

  12. Gupta S, Dileep AD, Gonsalves TA (2016) Fractional difference based hybrid model for resource prediction in cloud network. In: Fifth international conference on network, communication and computing (ICNCC), ACM, Kyoto, Japan, pp 93–97. https://doi.org/10.1145/3033288.3033310

  13. Gupta S, Dileep AD, Gonsalves TA (2018) A joint feature selection framework for multivariate resource usage prediction in cloud servers using stability and prediction performance. J Supercomput 74(11):6033–6038. https://doi.org/10.1007/s11227-018-2510-7

    Article  Google Scholar 

  14. Han Y, Chan J, Leckie C (2013) Analysing virtual machine usage in cloud computing. In: Ninth world congress on services, IEEE, Santa Clara, CA, USA, pp 370–377. https://doi.org/10.1109/SERVICES.2013.9

  15. Leland WE, Taqqu MS, Willinger W, Wilson DV (1994) On the self-similar nature of Ethernet traffic. IEEE/ACM Trans Netw 2(1):1–15. https://doi.org/10.1109/90.282603

    Article  Google Scholar 

  16. Loboz C (2012) Cloud resource usage—heavy tailed distributions invalidating traditional capacity planning models. J Grid Comput 10(1):85–108. https://doi.org/10.1007/s10723-012-9211-x

    Article  Google Scholar 

  17. Reiss C, Wilkes J, Hellerstein JL (2011) Google cluster-usage traces: format + schema

  18. Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Third ACM Symposium on Cloud Computing SoCC. ACM, pp 7:1–7:13. https://doi.org/10.1145/2391229.2391236

  19. Song B, Yu Y, Zhou Y, Wang Z, Du S (2018) Host load prediction with long short-term memory in cloud computing. J Supercomput 74(12):6554–6568. https://doi.org/10.1007/s11227-017-2044-4

    Article  Google Scholar 

  20. Taqqu MS, Willinger W, Sherman R (1997) Proof of a fundamental result in self-similar traffic modeling. SIGCOMM Comput Commun Rev 27(2):5–23. https://doi.org/10.1145/263876.263879

    Article  Google Scholar 

  21. Willinger W, Taqqu MS, Sherman R, Wilson DV (1997) Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level. IEEE/ACM Trans Netw 5(1):71–86. https://doi.org/10.1109/90.554723

    Article  Google Scholar 

  22. Wisitpongphan N, Peha JM (2003) Effect of TCP on self-similarity of network traffic. In: Twelfth international conference on computer communications and networks (ICCCN). IEEE, Dallas, TX, USA, pp 370–373

  23. Zukerman M, Neame TD, Addie RG (2003) Internet traffic modeling and future technology implications. In: Twenty-second annual joint conference of the IEEE computer and communications societies (INFOCOM). IEEE, vol 1, pp 587–596. https://doi.org/10.1109/INFCOM.2003.1208709

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaifu Gupta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Dileep, A.D. Long range dependence in cloud servers: a statistical analysis based on Google workload trace. Computing 102, 1031–1049 (2020). https://doi.org/10.1007/s00607-019-00779-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-019-00779-4

Keywords

Mathematics Subject Classification

Navigation