Skip to main content

Active Diagnosis of High-Level Faults in Distributed Internet Services

  • Conference paper
Challenges for Next Generation Network Operations and Service Management (APNOMS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5297))

Included in the following conference series:

  • 1110 Accesses

Abstract

For fault Diagnosis in internet service, the detection and localization of high-level failure is very important and a real big challenge. The diagnose methods that passively collect information have two drawbacks: 1) requiring the target system to report its inner message; 2) it’s impossible to detect and locate faults before user senses them. This paper proposes an active diagnose method which test internet service with probes and make fault inferences based on the probe results. Probing method is proactive and adaptive with low cost. We evaluate it through applying it to a J2EE application “Pet Store”, compare it with a current passive method Pinpoint, and show that our method outperforms Pinpoint.

This paper is supported by 973 Project of China (2007CB310703), 863 Project of China (2008AA01Z201), National Natural Science Foundation of China (90604020,  90604021) , Program for New Century Excellent Talents in University (NCET-07-0106),  Fok Ying Tong Education Foundation (111069), and ZTE Fund

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, M.Y., Kıcıman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: problem determination in large, dynamic Internet services. In: Intl. Conf. on Dependable Systems and Networks (DSN), pp. 595–604 (2002)

    Google Scholar 

  2. Kompella, R.R., Yates, J., Greenberg, A., Snoeren, A.C.: Detection and Localization of Network Black Holes. In: IEEE INFOCOM (2007)

    Google Scholar 

  3. Oppenheimer, D., Ganapathi, A., Patterson, D.A.: why do internet services fail and what can be done about it. In: Proceedings of USITS 2003: 4th USENIX Symposium on Internet technologies and Systems, Seattle, WA, USA, March 26–28 (2003)

    Google Scholar 

  4. Khanna, G., Laguna, I., Arshad, F.A., Bagchi, S.: Distributed Diagnosis of Failures in a Three Tier E-Commerce System. In: 26th IEEE International Symposium on Reliable Distributed Systems

    Google Scholar 

  5. Rish, I., Brodie, M., Ma, S., Odintsova, N., Beygelzimer, A., Grabarnik, G., Hernandez, K.: Adaptive diagnosis in distributed systems. IEEE Transactions on neural networks 16(5) (September 2005)

    Google Scholar 

  6. Jamesleon.: http://sourceforge.net/

  7. Brodie, M., Rish, I., Ma, S.: Intelligent probing: A cost-effective approach to fault diagnosis in computer networks. IBM Systems Journal 41(3) (2002)

    Google Scholar 

  8. Oppenheimer, D., Patterson, D.A.: Architecture operation and dependability of large-scale Internet services. IEEE Internet Computing (2002)

    Google Scholar 

  9. Yemini, A., Kliger, S.: High Speed and Robust Event Correlation. IEEE Communication Magazine 34(5), 82–90 (1996)

    Article  Google Scholar 

  10. Lee, Iyer, R.: Software dependability in the Tandem GUARDIAN system. IEEE Transactions on Software Engineering 21(5) (1995)

    Google Scholar 

  11. Brown, I.A., Patterson, D.A.: Embracing Failure: A Case for Recovery-Oriented Computing (ROC). In: 2001 High Performance Transaction Processing Symp., Asilomar, CA (October 2001)

    Google Scholar 

  12. Aguilera, M.K., Mogul, J.C., Wiener, J.L., Reynolds, P., Muthitacharoen, A.: Performance debugging for distributed systems of black boxes. In: Proc. of the 19th ACM SOSP, pp. 74–89 (2003)

    Google Scholar 

  13. Candea, G., Kıcıman, E., Kawamoto, S., Fox, A.: Autonomous Recovery in Componentized Internet Applications. Cluster Computing Journal 9(1) (February 2006)

    Google Scholar 

  14. Pet Store J2EE Application, http://java.sun.com/blueprints/code/index.html

  15. Cuppens, F., Miege, A.: Alert correlation in a cooperative intrusion detection framework. In: Proceedings of the 2002 IEEE Symp. on Security and Privacy, May 12-15 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Long, H., Cheng, L., Zeng, Y., Wu, L. (2008). Active Diagnosis of High-Level Faults in Distributed Internet Services. In: Ma, Y., Choi, D., Ata, S. (eds) Challenges for Next Generation Network Operations and Service Management. APNOMS 2008. Lecture Notes in Computer Science, vol 5297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88623-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88623-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88622-8

  • Online ISBN: 978-3-540-88623-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics