Skip to main content

Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4742))

Abstract

Self-managing systems require continuous monitoring to ensure correct operation. Detailed monitoring is often too costly to use in production. An alternative is adaptive monitoring, whereby monitoring is kept to a minimal level while the system behaves as expected, and the monitoring level is increased if a problem is suspected. To enable such an approach, we must model the system, both at a minimal level to ensure correct operation, and at a detailed level, to diagnose faulty components. To avoid the complexity of developing an explicit model based on the system structure, we employ simple statistical techniques to identify relationships in the monitored data. These relationships are used to characterize normal operation and identify problematic areas.

We develop and evaluate a prototype for the adaptive monitoring of J2EE applications. We experiment with 29 different fault scenarios of three general types, and show that we are able to detect the presence of faults in 80% of cases, where all but one instance of non-detection is attributable to a single fault type. We are able to shortlist the faulty component in 65% of cases where anomalies are observed.

Supported in part by an IBM Centre of Advanced Studies (CAS), Toronto PhD fellowship.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pertet, S., Narasimhan, P.: Causes of failure in web applications. Technical Report CMU-PDL-05-109, Carnegie Mellon University Parallel Data Lab (December 2005)

    Google Scholar 

  2. Hecker, D.E.: Occupational employment projections to 2014. Monthly Labor Review, pp. 70–101 (November 2005)

    Google Scholar 

  3. Topal, B., Ogle, D., Pierson, D., Thoensen, J., Sweitzer, J., Chow, M., Hoffmann, M.A., Durham, P., Telford, R., Sheth, S., Studwell, T.: Autonomic problem determination: A first step toward self-healing computing systems. Technical report, IBM (2003)

    Google Scholar 

  4. Fox, A., Patterson, D.: Self-repairing computers. Scientific American (June 2003)

    Google Scholar 

  5. IBM Corp.: IBM WebSphere Application Server V6 Performance Tools, http://pub-lib.boulder.ibm.com/infocenter/ieduasst/v1r1m0/topic/com.ibm.iea.was_v6/was/6.0/Performance/WASv6_PerformanceTools.pdf

  6. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE Computer 36(1), 41–50 (2003)

    Google Scholar 

  7. Munawar, M.A., Ward, P.A.: Adaptive monitoring in enterprise software systems. In: Tackling Computer Systems Problems with Machine Learning Techniques (SysML) (June 2006)

    Google Scholar 

  8. Microsoft Corp.: .NET Platform http://www.microsoft.com/net/

  9. Sun Microsystems, Inc.: Java 2 platform enterprise edition, v 1.4 API specification, http://java.sun.com/j2ee/1.4/docs/api/

  10. Sun Microsystems Inc.: JMX — Java Management Extensions, http://java.sun.com/-products/JavaManagement/

  11. Dmitriev, M.: Profiling java applications using code hotswapping and dynamic call graph revelation. In: International Workshop on Software and Performance, pp. 139–150 (2004)

    Google Scholar 

  12. Mirgorodskiy, A.V., Miller, B.P.: Autonomous analysis of interactive systems with self-propelled instrumentation. In: Multimedia Computing and Networking (2005)

    Google Scholar 

  13. Munawar, M.A., Quan, K., Ward, P.A.: Interaction analysis of heterogeneous monitoring data for autonomic problem determination. In: The IEEE International Symposium on Ubisafe Computing. IEEE Computer Society Press, Los Alamitos (2007)

    Google Scholar 

  14. Appleby, K., Faik, J., Kar, G., Saile, A., Agarwal, M., Neogi, A.: Threshold management for problem determination in transaction based e-commerce systems. In: Integrated Network Management, pp. 733–746 (May 2005)

    Google Scholar 

  15. Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.: Correlating instrumentation data to system states: A building block for automated diagnosis and control. In: Symposium on Operating Systems Design and Implementation (OSDI), pp. 231–244 (December 2004)

    Google Scholar 

  16. Kiciman, E., Armando, F.: Detecting application-level failures in component-based internet services. IEEE Transactions on Neural Networks 16(5), 1027–1041 (2005)

    Article  Google Scholar 

  17. Brown, A., Kar, G., Keller, A.: An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. Integrated Network Management, 377–390 (May 2001)

    Google Scholar 

  18. Hauswirth, M., Sweeney, P.F., Diwan, A., Hind, M.: Vertical profiling: Understanding the behavior of object-oriented applications. Object-Oriented Programming, Systems, Languages, and Applications (2004)

    Google Scholar 

  19. Agarwal, M., Anerousis, N., Gupta, M., Mann, V., Mummert, L., Sachindran, N.: Problem determination in enterprise middleware systems using change point correlation of time series data. Network Operations and Management Symposium (April 2006)

    Google Scholar 

  20. Jiang, G., Chen, H., Yoshihira, K.: Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Transactions Dependable and Secure Computing 3(4), 312–326 (2006)

    Article  Google Scholar 

  21. Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics: Identifying Influential Data and Source of Collinearity. John Wiley and Sons, New York (1980)

    Google Scholar 

  22. IBM Corp.: Trade 6 Performance Benchmark Sample for WebSphere Application Server, http://www-306.ibm.com/software/webservers/appserv/was/performance.html

  23. Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Mahwah (1988)

    MATH  Google Scholar 

  24. SAS Institute Inc.: SAS OnlineDoc Version8, http://v8doc.sas.com/

  25. Hellerstein, J.L., Zhang, F., Shahabuddin, P.: Characterizing normal operation of a web server: Application to workload forecasting and problem detection. In: Proceedings of Computer Measurement Group (December 1998)

    Google Scholar 

  26. Munawar, M.A., Ward, P.A.: A comparative study of pairwise regression techniques for problem determination. Technical Report 2007-15, ECE, University of Waterloo (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ivan Stojmenovic Ruppa K. Thulasiram Laurence T. Yang Weijia Jia Minyi Guo Rodrigo Fernandes de Mello

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Munawar, M.A., Ward, P.A.S. (2007). Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds) Parallel and Distributed Processing and Applications. ISPA 2007. Lecture Notes in Computer Science, vol 4742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74742-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74742-0_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74741-3

  • Online ISBN: 978-3-540-74742-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics