skip to main content
10.1145/1370018.1370032acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Monitoring multi-tier clustered systems with invariant metric relationships

Published:12 May 2008Publication History

ABSTRACT

To ensure high availability, self-managing systems require self-monitoring and a system model against which to analyze monitoring data. Characterizing relationships between system metrics has been shown to model simple multi-tier transaction systems effectively, enabling failure detection and fault diagnosis. In this paper we show how to extend this invariant metric-relationships approach to clustered multi-tier systems. We show through analysis and experimentation that naive application of the approach increases cost dramatically while reducing diagnosis accuracy. We demonstrate that randomization at the load balancer during the invariant-identification phase will improve diagnosis accuracy, though it neither completely eliminates the problem nor reduces the cost; indeed, it may increase the cost, as this approach will require a long learning phase to remove all accidental correlations. Finally, we argue that knowing the system structure is necessary to effectively apply invariants to the clustered environment.

References

  1. M. Agarwal, N. Anerousis, M. Gupta, V. Mann, L. Mummert, and N. Sachindran. Problem determination in enterprise middleware systems using change point correlation of time series data. In NOMS, April 2006.Google ScholarGoogle Scholar
  2. A. Brown, G. Kar, and A. Keller. An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. In IM, 2001.Google ScholarGoogle Scholar
  3. I. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J. Chase. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In OSDI, pages 231--244, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Coleman and T. Lau. Set up and run a Trade6 benchmark with DB2 UDB. IBM developerWorks. http://www128.ibm.com/developerworks/edu/dm-dw-dm-0506lau.html?S_TACT=105AGX11&S_CMP=LIB.Google ScholarGoogle Scholar
  5. Y. Diao, F. Eskesen, S. Froehlich, J. L. Hellerstein, A. Keller, L. Spainhower, and M. Surendra. Generic on-line discovery of quantitative models for service level management. In IM, pages 157--170, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. Z. Guo, G. Jiang, H. Chen, and K. Yoshihira. Tracking probabilistic correlation of monitoring data for fault detection in complex systems. In DSN, pages 259--268, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: Understanding the behavior of object-oriented applications. In OOPSLA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Jain. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling. Wiley, New York, 1991.Google ScholarGoogle Scholar
  9. G. Jiang, H. Chen, and K. Yoshihira. Discovering likely invariants of distributed transaction systems for autonomic system management. In ICAC, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Jiang, H. Chen, and K. Yoshihira. Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Transactions on Dependable and Secure Computing, pages 312--326, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. O. Kephart and D. M. Chess. The vision of autonomic computing. IEEE Computer, 36(1):41--50, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Kiciman and A. Fox. Detecting application-level failures in component based internet services. IEEE Trans. on Neural Networks, 16(5):1027--1041, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Mickens, M. Szummer, and D. Narayanan. Snitch: Interactive decision trees for troubleshooting misconfigurations. In SysML, April 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Moore, J. Chase, P. Ranganathan, and R. Sharma. Making scheduling "cool": temperature-aware workload placement in data centers. In USENIX ATEC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. A. Munawar, K. Quan, and P. A. Ward. Integrating Monitoring data for problem determination in business-critical software systems. In JoATC, 2008.Google ScholarGoogle Scholar
  16. M. A. Munawar and P. A. Ward. Adaptive monitoring in enterprise software systems. In SysML, June 2006.Google ScholarGoogle Scholar
  17. M. A. Munawar and P. A. Ward. A comparative study of pairwise regression techniques for problem determination. In CASCON, pages 152--166, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. A. Munawar and P. A. S. Ward. Leveraging many simple statistical models to adaptively monitor software systems. In ISPA, pages 457--470, August 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Pertet, R. Gandhi, and P. Narasimhan. Fingerpointing correlated failures in replicated systems. In SysML, April 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sun Microsystems Inc. JMX - Java Management Extensions. Available at http://java.sun.com/products/JavaManagement/.Google ScholarGoogle Scholar
  21. H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with Peerpressure. In OSDI, pages 17--17,Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Monitoring multi-tier clustered systems with invariant metric relationships

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SEAMS '08: Proceedings of the 2008 international workshop on Software engineering for adaptive and self-managing systems
                May 2008
                144 pages
                ISBN:9781605580371
                DOI:10.1145/1370018

                Copyright © 2008 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 12 May 2008

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                SEAMS '08 Paper Acceptance Rate17of31submissions,55%Overall Acceptance Rate17of31submissions,55%

                Upcoming Conference

                ICSE 2025

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader