skip to main content
10.1145/1244002.1244027acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Performance problem localization in self-healing, service-oriented systems using Bayesian networks

Published: 11 March 2007 Publication History

Abstract

In distributed, service-oriented environments, performance problem localization is required to provide self-healing capabilities and deliver the desired quality of service (QoS). This paper presents an automated approach to identifying system elements causing performance problems. Applying probabilistic inference to collected response time and elapsed time data, the approach 1) infers elapsed time for services where data is missing, 2) estimates the response time degradation caused by different services using the duration, abnormality and response time correlation of their elapsed times, and 3) identifies the services that are the most important causes of slow response time and yield the most benefit if recovered. The approach has been used to localize a performance problem on the test bed of a real-world service-oriented Grid. Evaluation using simulations shows that the approach consistently achieves better accuracy than traditional techniques in various service-oriented settings.

References

[1]
M. K. Agarwal, K. Appleby, M. Gupta, G. Kar, A. Neogi, and A. Sailer. Problem determination using dependency graphs and run-time behavior models. In Proceedings of the 15th IFIP/IEEE Distributed Systems: Operations and Management, Davis, California, USA, November 2004.
[2]
M. Aguilera, J. Mogul, J. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 74--89, Bolton Landing, NY, USA, 2003.
[3]
N. Alur, M. Goodwin, H. Kawada, R. Midgette, D. Shenoy, R. Warley, and A. Betawadkar-Norwood. Db2 ii: Performance monitoring, tuning and capacity planning guide. Technical report, IBM Corporation, November 2004.
[4]
P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using magpie for request extraction and workload modelling. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, San Francisco, CA, USA, December 2004.
[5]
J. Brady, D. Gavaghan, A. Simpson, M. Mulet-Parada, and R. Highnam. eDiaMoND: A grid-enabled federated database of annotated mammograms. In Grid Computing: Making the Global Infrastructure a Reality, pages 923--943. Wiley Series, 2003.
[6]
M. Chen, E. Kiciman, E. Brewer, and A. Fox. Pinpoint: Problem determination in large, dynamic internet services. In Proceedings of the IEEE International Conference on Dependable Systems and Networks, pages 595--604, Bethesda, MD, USA, 2002.
[7]
I. Foster and C. Kesselman. The Grid 2: Buleprint for a New Computing Infrastructuer. Morgan Kaufmann, 2004.
[8]
C. Ira, C. Jeffrey, G. Moises, K. Terence, and S. Julie. Correlating instrumentation data to system states: a building block for automated diagnosis and control. In Proceedings of Symposium on Operating Systems Design and Implementation, pages 231--244, San Francisco, California, USA, 2004.
[9]
J. O. Kephart and D. M. Chess. The vision of autonomic computing. Computer, 36(1):41--50, 2003.
[10]
G. McKnight and D. Watts. Help me find my ibm eserver xseries performance problem. Technical report, IBM Corporation, 2004.
[11]
C. Moler. Numerical Computing with MATLAB. Society for Industrial and Applied Mathematics, 2004.
[12]
R. Neapolitan. Probabilistic Reasoning in Expert Systems. Wiley Interscience, 1989.
[13]
I. Rish, M. Brodie, N. Odintsova, S. Ma, and G. Grabarnik. Real-time problem determination in distributed systems using active probing. In Proceedings of the 9th IEEE/IFIP Network Operations and Management Symposium, pages 133--146, Seoul, Korea, 2004.
[14]
B. Roehm, T. Erker, C. Finneran, V. Mann, K.-M. Wan, and P. Wiedeking. Using websphere extended deployment v6.0 to build an on demand production environment. Technical report, IBM Corporation, August 2006.
[15]
M. Steinder and A. Sethi. End-to-end service failure diagnosis using belief networks. In Proceedings of the 7th IEEE/IFIP Network Operation and Management Symposium, pages 375--390, Florence, Italy, 2002.
[16]
B. Urgaonkar, P. Shenoy, A. Chandray, and P. Goyalz. Dynamic provisioning of multi-tier internet applications. In Proceedings of the 2nd International Conference on Autonomic Computing, pages 217--228, Seattle, Washington, USA, 2005.
[17]
G. Wood and K. Hailey. The self-managing database: Automatic performance diagnosis. Technical report, Oracle Corporation, November 2003.
[18]
R. Zhang, S. Heisig, S. Moyle, and S. McKeever. Ogsa-based grid workload monitoring. In Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid, pages 668--675, Cardiff, UK, May 2005.

Cited By

View all
  • (2009)Problem localization using probabilistic dependency analysis for automated system management in ubiquitous computingInternet Research10.1108/1066224091095231919:2(136-152)Online publication date: 3-Apr-2009
  • (2008)Fault Localization for Self-Managing Based on Bayesian NetworkThe KIPS Transactions:PartB10.3745/KIPSTB.2008.15-B.2.13715B:2(137-146)Online publication date: 30-Apr-2008
  • (2008)A self-diagnosis approach for performance problem localization in component-based applicationsNOMS 2008 - 2008 IEEE Network Operations and Management Symposium10.1109/NOMS.2008.4575250(931-934)Online publication date: Apr-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
March 2007
1688 pages
ISBN:1595934804
DOI:10.1145/1244002
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian networks
  2. end-to-end response time
  3. missing data
  4. problem localization
  5. service-oriented computing

Qualifiers

  • Article

Conference

SAC07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Problem localization using probabilistic dependency analysis for automated system management in ubiquitous computingInternet Research10.1108/1066224091095231919:2(136-152)Online publication date: 3-Apr-2009
  • (2008)Fault Localization for Self-Managing Based on Bayesian NetworkThe KIPS Transactions:PartB10.3745/KIPSTB.2008.15-B.2.13715B:2(137-146)Online publication date: 30-Apr-2008
  • (2008)A self-diagnosis approach for performance problem localization in component-based applicationsNOMS 2008 - 2008 IEEE Network Operations and Management Symposium10.1109/NOMS.2008.4575250(931-934)Online publication date: Apr-2008
  • (2007)Scalable problem localization for distributed systemsProceedings of the 2nd international conference on Scalable information systems10.5555/1366804.1366902(1-8)Online publication date: 6-Jun-2007
  • (2007)Comparing the use of bayesian networks and neural networks in response time modeling for service-oriented systemsProceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches10.1145/1272457.1272467(67-74)Online publication date: 25-Jun-2007
  • (2007)Autonomic Performance Recuperation for Service-oriented SystemsIEEE International Conference on Services Computing (SCC 2007)10.1109/SCC.2007.30(544-551)Online publication date: Jul-2007
  • (2007)Modeling Autonomic Recovery in Web Services with Multi-tier RebootsIEEE International Conference on Web Services (ICWS 2007)10.1109/ICWS.2007.127(1222-1223)Online publication date: Jul-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media