skip to main content
10.1145/3038912.3052649acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Public Access

Performance Monitoring and Root Cause Analysis for Cloud-hosted Web Applications

Authors Info & Claims
Published:03 April 2017Publication History

ABSTRACT

In this paper, we describe Roots - a system for automatically identifying the "root cause" of performance anomalies in web applications deployed in Platform-as-a-Service (PaaS) clouds. Roots does not require application-level instrumentation. Instead, it tracks events within the PaaS cloud that are triggered by application requests using a combination of metadata injection and platform-level instrumentation.

We describe the extensible architecture of Roots, a prototype implementation of the system, and a statistical methodology for performance anomaly detection and diagnosis. We evaluate the efficacy of Roots using a set of PaaS-hosted web applications, and detail the performance overhead and scalability of the implementation.

References

  1. M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Antonopoulos and L. Gillam. Cloud Computing: Principles, Systems and Applications. Springer Publishing Company, Incorporated, 1st edition, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Attariyan, M. Chow, and J. Flinn. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Chen and L.-M. Liu. Joint estimation of model parameters and outlier effects in time series. Journal of the American Statistical Association, 88(421):284--297, 1993.Google ScholarGoogle Scholar
  6. M. Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer. Pinpoint: Problem determination in large, dynamic internet services. In Proceedings of the 2002 International Conference on Dependable Systems and Networks, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Amazon cloud watch, 2016. https://aws.amazon.com/cloudwatch {Accessed Sep 2016}.Google ScholarGoogle Scholar
  8. G. Da Cunha Rodrigues, R. N. Calheiros, V. T. Guimaraes, G. L. d. Santos, M. B. de Carvalho, L. Z. Granville, L. M. R. Tarouco, and R. Buyya. Monitoring of cloud computing environments: Concepts, solutions, trends, and future directions. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Datadog: Cloud monitoring as a service, 2016. https://www.datadoghq.com {Accessed Sep 2016}.Google ScholarGoogle Scholar
  10. D. J. Dean, H. Nguyen, P. Wang, and X. Gu. Perfcompass: Toward runtime performance anomaly fault localization for infrastructure-as-a-service clouds. In Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dynatrace: Digital performance management and application performance monitoring, 2016. https://www.dynatrace.com {Accessed Sep 2016}.Google ScholarGoogle Scholar
  12. R. Fonseca, G. Porter, R. H. Katz, S. Shenker, and I. Stoica. X-trace: A pervasive network tracing framework. In Proceedings of the 4th USENIX Conference on Networked Systems Design #38; Implementation, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. App Engine - Run your applications on a fully managed PaaS, 2015. "https://cloud.google.com/appengine" {Accessed March 2015}.Google ScholarGoogle Scholar
  14. Google Cloud SDK Service Quotas, 2015. https://cloud.google.com/appengine/docs/quotas {Accessed March 2015}.Google ScholarGoogle Scholar
  15. U. Groemping. Relative importance for linear regression in r: The package relaimpo. Journal of Statistical Software, 17(1), 2006.Google ScholarGoogle Scholar
  16. Q. Guan, Z. Zhang, and S. Fu. Proactive failure management by integrated unsupervised and semi-supervised learning for dependable cloud systems. In Availability, Reliability and Security (ARES), 2011 Sixth International Conference on, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Ibidunmoye, F. Hernández-Rodriguez, and E. Elmroth. Performance anomaly detection and bottleneck identification. ACM Comput. Surv., 48(1), July 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Jayathilaka, C. Krintz, and R. Wolski. Response time service level agreements for cloud-hosted web applications. In Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Keller and H. Ludwig. The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services. J. Netw. Syst. Manage., 11(1), Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Killick, P. Fearnhead, and I. A. Eckley. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500):1590--1598, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  21. O. Kononenko, O. Baysal, R. Holmes, and M. W. Godfrey. Mining modern repositories with elasticsearch. In Proceedings of the 11th Working Conference on Mining Software Repositories, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Krintz. The appscale cloud platform: Enabling portable, scalable web application deployment. IEEE Internet Computing, 17(2), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Latency is Everywhere and it Costs Your Sales, 2009. http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it {Accessed Sep 2016}.Google ScholarGoogle Scholar
  24. G. R. Lindeman R.H., Merenda P.F. Introduction to Bivariate and Multivariate Analysis. Scott, Foresman, Glenview, IL, 1980.Google ScholarGoogle Scholar
  25. J. a. P. Magalhaes and L. M. Silva. Root-cause analysis of performance anomalies in web-based applications. In Proceedings of the 2011 ACM Symposium on Applied Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. P. Magalhaes and L. M. Silva. Detection of performance anomalies in web-based applications. In Proceedings of the 2010 Ninth IEEE International Symposium on Network Computing and Applications, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Microsoft Azure Cloud SDK Service Quotas, 2015. http://azure.microsoft.com/en-us/documentation/articles/azure-subscription-service-limits {Accessed March 2015}.Google ScholarGoogle Scholar
  28. M. Natu, R. K. Ghosh, R. K. Shyamsundar, and R. Ranjan. Holistic performance monitoring of hybrid clouds: Complexities and future directions. IEEE Cloud Computing, 3(1), Jan 2016.Google ScholarGoogle Scholar
  29. New relic: Application performance management and monitoring, 2016. https://newrelic.com {Accessed Sep 2016}.Google ScholarGoogle Scholar
  30. H. Nguyen, Y. Tan, and X. Gu. Pal: Propagation-aware anomaly localization for cloud hosted distributed applications. In Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The Eucalyptus open-source cloud-computing system. In IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Pinheiro, M. Aparicio, and C. Costa. Adoption of cloud computing systems. In Proceedings of the International Conference on Information Systems and Design of Communication, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Soni. Cloud computing basics--platform as a service (paas). Linux J., 2014(238), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Performance Monitoring and Root Cause Analysis for Cloud-hosted Web Applications

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader