Skip to main content
Log in

Cloud Resource Usage—Heavy Tailed Distributions Invalidating Traditional Capacity Planning Models

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

For years Capacity Planning professionals knew or suspected that various characteristics of computer usage have non-normal distribution. At the same time much of the traditional workload modeling and forecasting is based on mathematical techniques assuming some sort of normality of underlying distributions. If the dissonance between the existing and assumed distribution exists, then resulting capacity models are of lower quality, with possibly erroneous forecasts—and confidence intervals much wider than expected. This paper analyzes distribution of daily resource usage on three storage clusters for 478 days. For each day we consider the distribution of resource usage by customer accounts for five different resources: storage used, storage transactions executed, internal network transfer, egress transfer and inter-data-center transfer—7170 sample distributions in total. All distributions were highly imbalanced and most distribution samples have tails heavier than log-normal, exponential, or normal distributions. These findings spell significant problems for most models assuming normality. Mathematically: Central Limit Theorem does not apply to power-law distributions—so the ‘averaging’ effect cannot be counted on to help with modeling using traditional approach. Operationally: very high volatility found means that the ‘capacity buffers’ need to be large, leading to wasted capacity. Other, administrative, means need to be applied to reduce that. Overall the distributions of resource usage in cloud storage are so far from normal, even after usual transformations, that traditional approach to forecasting and capacity planning needs to be reconsidered. The distributions of log-returns of time series describing resource usage are much more heavy-tailed than similar distributions for stock indexes. Since no financial professional would use linear regression for stock market analysis and forecasting—it stands to reason that capacity planning should move toward employing tools accounting for heavy-tailed distributions, too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51, 661–703 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Clauset, A., Young, M., Gleditsch, K.S.: J. Conflict Resolut. 51, 58 (2007)

    Article  Google Scholar 

  3. Goldstein, M.L., Morri, S.A., Yen, G.G.: Problems with fitting to the power-law distribution. Eur. Phys. J. B. 41(2), 255–258 (2004)

    Article  Google Scholar 

  4. Gunther, N.: Guerilla capacity planning. iUniverse (October 31, 2000), ISBN-10: 3642065570

  5. James, A., Plank, M.J.: On fitting power laws to ecological data arxiv:0712.06131

  6. Leland, W., Taqqu, M., Willinger, W., Wilson, D.: On the self-similar nature of ethernet traffic, IEEE/ACM TON (1994)

  7. Lilifoers, H.W.: J. Amer. Statist. Assoc. 64, 387–389 (1969)

    Google Scholar 

  8. Mantegna, R.N., Stanley, H.E.: An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge University Press, Cambridge (1999)

    Book  Google Scholar 

  9. Marvasti, M.A.: How ‘Normal’ is your IT data. Proceedings of the Computer Measurement Group’s 2009 International Conference, www.cmg.org

  10. Newman, M.E.J.: Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 323–351 (2006)

    Article  Google Scholar 

  11. Shalizi, C.: Power law distributions, 1/f Noise, Long-Memory Time Series http://cscs.umich.edu/~crshalizi/notabene/power-laws.html

  12. Van der Loo, M.P.J.: Distribution based outlier detection in univariate data, discussion paper 10003, Statistic Netherlands

  13. Agrawal, N., Bolosky, W.J., Douceur, J.R., Lorch, J.R.: A five-year study of file-system metadata. Trans. Storage 3,3,Article 9 (October 2007). doi:10.1145/1288783.1288788

  14. Li, H.: Workload dynamics on clusters and grids. J. Supercomput. 47(1), (2009)

  15. Li, H., Muskulus, M., Wolters, L.: Modeling job arrivals in a data-intensive grid. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) Int’l. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), Revised Selected Papers, In: Lecture Notes in Computer Science, vol. 4376, pp. 210–231. Springer (2007)

  16. Litzkow, M.J., Livny, M., Mutka, M.W.: Condor-a hunter of idle workstations, 8th International Conference on Distributed Computing Systems, pp. 104–111 (1988)

  17. Iosup, A., Li, H., Jan, M., Anoep, S, Dumitrescu, C., Wolters, L., Dick, H., Epema, J.: The grid workloads archive. Future Gener. Comp. Sy. 24(7), 672–686 (2008)

    Article  Google Scholar 

  18. Li, H., Heusdens, R., Muskulus, M.V., Wolters, L.: Analysis and synthesis of pseudo-periodic job arrivals in grids: a matching pursuit approach IEEE/ACM Intl. Symp. on Cluster Computing and the Grid (CCGrid) IEEE Computer Society, pp. 183–196 (2007)

  19. Li, H., Muskulus, M., Wolters, L.: Modeling job arrivals in a data-intensive grid. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) Int’l. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), Revised Selected Papers, In: Lecture Notes in Computer Science, vol. 4376, pp. 210–231. Springer (2007)

  20. Li, H., Wolters, L.: Towards a better understanding of workload dynamics on data-intensive clusters and grids. In: Int’l. Parallel &Distributed Processing Symposium (IPDPS), IEEE Computer Society, pp. 1–10 (2007)

  21. Li, H.: Workload characterization, modeling, and prediction in grid computing. PhD thesis, https://openaccess.leidenuniv.nl/bitstream/1887/12574/1/Thesis.pdf

  22. Park, C., Hernandez-Campos, F., Marron, J.S., Donelson Smith, F.: Long-range dependence in a changing internet traffic mix. Comput. Netw. 48(3), 401–422 (2005)

    Article  Google Scholar 

  23. Allspaw, J.: The art of capacity planning: scaling web resources, O’Reilly Media; 1 edn. (September 15, 2008), ISBN-10: 0596518579

  24. Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Modern Phys. 74, 47–97 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  25. Rasch, D., Guiard, V.: The robustness of parametric statistical methods. Psychol. Sci. 46(2), 175–208 (2004)

    Google Scholar 

  26. Peterson, D., Grossman, R.: Power laws in large shop DASD I/O Activity, CMG Proceedings, pp. 822–833 (Dec. 1995)

  27. Peterson, D., Adams, D.: Fractal patterns in DASD I/O Traffic, CMG Proceedings, Dec, (1996)

  28. Milligan, C., Peterson, D.: A practical approach for estimating true I/O skew, CMG Proceedings, pp. 970–981 (Dec. 1994)

  29. Peterson, D.: Data center I/O patterns and power laws, CMG Proceedings (1996)

  30. Adamic, L.A.: Zipf, Power-laws, and Pareto—a ranking tutorial. Xerox Palo Alto Research Center, Palo Alto, CA. Retrieved on 2011-07-26. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

  31. Nicholls, P.T.: J. Am. Soc. Inform. Sci. 40, 379–385 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles Loboz.

Additional information

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loboz, C. Cloud Resource Usage—Heavy Tailed Distributions Invalidating Traditional Capacity Planning Models. J Grid Computing 10, 85–108 (2012). https://doi.org/10.1007/s10723-012-9211-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-012-9211-x

Keywords

Navigation