Abstract
Self-managing systems require continuous monitoring to ensure correct operation. Detailed monitoring is often too costly to use in production. An alternative is adaptive monitoring, whereby monitoring is kept to a minimal level while the system behaves as expected, and the monitoring level is increased if a problem is suspected. To enable such an approach, we must model the system, both at a minimal level to ensure correct operation, and at a detailed level, to diagnose faulty components. To avoid the complexity of developing an explicit model based on the system structure, we employ simple statistical techniques to identify relationships in the monitored data. These relationships are used to characterize normal operation and identify problematic areas.
We develop and evaluate a prototype for the adaptive monitoring of J2EE applications. We experiment with 29 different fault scenarios of three general types, and show that we are able to detect the presence of faults in 80% of cases, where all but one instance of non-detection is attributable to a single fault type. We are able to shortlist the faulty component in 65% of cases where anomalies are observed.
Supported in part by an IBM Centre of Advanced Studies (CAS), Toronto PhD fellowship.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pertet, S., Narasimhan, P.: Causes of failure in web applications. Technical Report CMU-PDL-05-109, Carnegie Mellon University Parallel Data Lab (December 2005)
Hecker, D.E.: Occupational employment projections to 2014. Monthly Labor Review, pp. 70–101 (November 2005)
Topal, B., Ogle, D., Pierson, D., Thoensen, J., Sweitzer, J., Chow, M., Hoffmann, M.A., Durham, P., Telford, R., Sheth, S., Studwell, T.: Autonomic problem determination: A first step toward self-healing computing systems. Technical report, IBM (2003)
Fox, A., Patterson, D.: Self-repairing computers. Scientific American (June 2003)
IBM Corp.: IBM WebSphere Application Server V6 Performance Tools, http://pub-lib.boulder.ibm.com/infocenter/ieduasst/v1r1m0/topic/com.ibm.iea.was_v6/was/6.0/Performance/WASv6_PerformanceTools.pdf
Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE Computer 36(1), 41–50 (2003)
Munawar, M.A., Ward, P.A.: Adaptive monitoring in enterprise software systems. In: Tackling Computer Systems Problems with Machine Learning Techniques (SysML) (June 2006)
Microsoft Corp.: .NET Platform http://www.microsoft.com/net/
Sun Microsystems, Inc.: Java 2 platform enterprise edition, v 1.4 API specification, http://java.sun.com/j2ee/1.4/docs/api/
Sun Microsystems Inc.: JMX — Java Management Extensions, http://java.sun.com/-products/JavaManagement/
Dmitriev, M.: Profiling java applications using code hotswapping and dynamic call graph revelation. In: International Workshop on Software and Performance, pp. 139–150 (2004)
Mirgorodskiy, A.V., Miller, B.P.: Autonomous analysis of interactive systems with self-propelled instrumentation. In: Multimedia Computing and Networking (2005)
Munawar, M.A., Quan, K., Ward, P.A.: Interaction analysis of heterogeneous monitoring data for autonomic problem determination. In: The IEEE International Symposium on Ubisafe Computing. IEEE Computer Society Press, Los Alamitos (2007)
Appleby, K., Faik, J., Kar, G., Saile, A., Agarwal, M., Neogi, A.: Threshold management for problem determination in transaction based e-commerce systems. In: Integrated Network Management, pp. 733–746 (May 2005)
Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.: Correlating instrumentation data to system states: A building block for automated diagnosis and control. In: Symposium on Operating Systems Design and Implementation (OSDI), pp. 231–244 (December 2004)
Kiciman, E., Armando, F.: Detecting application-level failures in component-based internet services. IEEE Transactions on Neural Networks 16(5), 1027–1041 (2005)
Brown, A., Kar, G., Keller, A.: An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. Integrated Network Management, 377–390 (May 2001)
Hauswirth, M., Sweeney, P.F., Diwan, A., Hind, M.: Vertical profiling: Understanding the behavior of object-oriented applications. Object-Oriented Programming, Systems, Languages, and Applications (2004)
Agarwal, M., Anerousis, N., Gupta, M., Mann, V., Mummert, L., Sachindran, N.: Problem determination in enterprise middleware systems using change point correlation of time series data. Network Operations and Management Symposium (April 2006)
Jiang, G., Chen, H., Yoshihira, K.: Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Transactions Dependable and Secure Computing 3(4), 312–326 (2006)
Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics: Identifying Influential Data and Source of Collinearity. John Wiley and Sons, New York (1980)
IBM Corp.: Trade 6 Performance Benchmark Sample for WebSphere Application Server, http://www-306.ibm.com/software/webservers/appserv/was/performance.html
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Mahwah (1988)
SAS Institute Inc.: SAS OnlineDoc Version8, http://v8doc.sas.com/
Hellerstein, J.L., Zhang, F., Shahabuddin, P.: Characterizing normal operation of a web server: Application to workload forecasting and problem detection. In: Proceedings of Computer Measurement Group (December 1998)
Munawar, M.A., Ward, P.A.: A comparative study of pairwise regression techniques for problem determination. Technical Report 2007-15, ECE, University of Waterloo (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Munawar, M.A., Ward, P.A.S. (2007). Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds) Parallel and Distributed Processing and Applications. ISPA 2007. Lecture Notes in Computer Science, vol 4742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74742-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-74742-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74741-3
Online ISBN: 978-3-540-74742-0
eBook Packages: Computer ScienceComputer Science (R0)