Skip to main content
Log in

Profiling services for resource optimization and capacity planning in distributed systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The capacity needs of online services are mainly determined by the volume of user loads. For large-scale distributed systems running such services, it is quite difficult to match the capacities of various system components. In this paper, a novel and systematic approach is proposed to profile services for resource optimization and capacity planning. We collect resource consumption related measurements from various components across distributed systems and further search for constant relationships between these measurements. If such relationships always hold under various workloads along time, we consider them as invariants of the underlying system. After extracting many invariants from the system, given any volume of user loads, we can follow these invariant relationships sequentially to estimate the capacity needs of individual components. By comparing the current resource configurations against the estimated capacity needs, we can discover the weakest points that may deteriorate system performance. Operators can consult such analytical results to optimize resource assignments and remove potential performance bottlenecks. In this paper, we propose several algorithms to support capacity analysis and guide operator’s capacity planning tasks. Our algorithms are evaluated with real systems and experimental results are also included to demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Almeida, V., Menasce, D.: Capacity planning: An essential tool for managing web services. IEEE IT Prof. 4(4), 33–38 (2002)

    Article  Google Scholar 

  2. Amazon: http://phx.corporate-ir.net/phoenix.zhtml?c=97664&p=irol-newsArticle&ID=798960&highlight=

  3. Barford, P., Crovella, M.: Generating representative web workloads for network and server performance evaluation. In: SIGMETRICS ’98/PERFORMANCE ’98: Proceedings of the 1998 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pp. 151–160, 1998

  4. Bracewell, R.: The Fourier Transform and Its Applications, 3nd edn. McGraw-Hill Science/Engineering/Math, New York (1999)

    Google Scholar 

  5. Brockwell, P., Davis, R.: Introduction to Time Series and Forecasting, 2nd edn. Springer, Berlin (2003)

    Google Scholar 

  6. Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.: Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: OSDI’04: Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, p. 16, 2004

  7. Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms, 1st ed. MIT Press/McGraw-Hill, Cumberland, New York (1990)

    Google Scholar 

  8. Das, R., Kephart, J., Whalley, I., Vytas, P.: Towards commercialization of utility-based resource allocation. In: The 3rd International Conference on Autonomic Computing (ICAC2006), pp. 287–290, Dublin, Ireland, June 2006

  9. Fan, X., Weber, W.-D., Barroso, L.A.: Power provisioning for a warehouse-sized computer. In: ISCA ’07: Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 13–23, San Diego, California, USA, 2007

  10. Hellerstein, J., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. Wiley-IEEE Press, New York (2004)

    Google Scholar 

  11. IBM. http://www-935.ibm.com/services/us/its/pdf/g563-0339-00.pdf

  12. JBoss. http://www.jboss.org

  13. Jiang, G., Chen, H., Yoshihira, K.: Discovering likely invariants of distributed transaction systems for autonomic system management. In: The 3rd International Conference on Autonomic Computing (ICAC2006), pp. 199–208, Dublin, Ireland, June 2006

  14. Jiang, G., Chen, H., Yoshihira, K.: Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Trans. Dependable Secure Comput. 3(4), 312–326 (2006)

    Article  Google Scholar 

  15. Jiang, G., Chen, H., Yoshihira, K.: Efficient and scalable algorithms for inferring likely invariants in distributed systems. IEEE Trans. Knowl. Data Eng. 19(11) (2007)

  16. Kant, K., Won, Y.: Server capacity planning for web traffic workload. IEEE Trans. Knowl. Data Eng. 11(5), 731–747 (1999)

    Article  Google Scholar 

  17. Kephart, J., Chess, D.: The vision of autonomic computing. Computer 36(1), 41–52 (2003)

    Article  MathSciNet  Google Scholar 

  18. Kuo, B.: Automatic Control Systems, 6th edn. Prentice-Hall, Englewood (1991)

    Google Scholar 

  19. Kusic, D., Kandasamy, N.: Risk-aware limited lookahead control for dynamic resource provisioning in enterprise computing systems. In: The 3rd International Conference on Autonomic Computing (ICAC2006), pp. 74–83, Dublin, Ireland, June 2006

  20. Ljung, L.: System Identification—Theory for The User, 2nd edn. Prentice Hall PTR, New York (1998)

    Google Scholar 

  21. Menasce, D., Dowdy, L., Almeida, V.: Performance by Design: Computer Capacity Planning By Example, 1st ed. Prentice Hall PTR, New York (2004)

    Google Scholar 

  22. Microsoft. http://office.microsoft.com/en-us/assistance/HA011647631033.aspx

  23. Microsoft. http://technet.microsoft.com/en-us/library/aa997558.aspx

  24. Microsoft Office 2003 system requirments. http://support.microsoft.com/kb/822129

  25. Oracle. http://www.dba-oracle.com/monitoring_tablepack.htm

  26. Parekh, J., Jung, G., Swint, G., Pu, C., Sahai, A.: Comparison of performance analysis approaches for bottleneck detection in multi-tier enterprise applications. In: IEEE International Workshop on Quality of Services, pp. 302–306, New Haven, CT, USA, 2006

  27. Petstore: http://java.sun.com/developer/releases/petstore/

  28. Rissanen, J.: Stochastic Complexity in Statistical Inquiry Theory. World Scientific, Singapore (1989)

    Google Scholar 

  29. Stewart, C., Kelly, T., Zhang, A.: Exploiting nonstationarity for performance prediction. SIGOPS Oper. Syst. Rev. 41(3), 31–44 (2007)

    Article  Google Scholar 

  30. Stewart, C., Shen, K.: Performance modeling and system management for multi-component online services. In: NSDI’05: Proceedings of the 2nd conference on Symposium on Networked Systems Design and Implementation, pp. 71–84, Boston, Massachusetts, USA, 2005

  31. Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: An analytical model for multi-tier internet services and its applications. SIGMETRICS Perform. Eval. Rev. 33(1), 291–302 (2005)

    Article  Google Scholar 

  32. Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: Analytic modeling of multitier internet applications. ACM Trans. Web 1(1), 2 (2007)

    Article  Google Scholar 

  33. Walsh, W., Tesauro, G., Kephart, J., Das, R.: Utility functions in autonomic systems. In: The First International Conference on Autonomic Computing (ICAC2004), pp. 70–77, New York, May 2004

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guofei Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, G., Chen, H. & Yoshihira, K. Profiling services for resource optimization and capacity planning in distributed systems. Cluster Comput 11, 313–329 (2008). https://doi.org/10.1007/s10586-008-0063-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-008-0063-x

Keywords

Navigation