Skip to main content
Log in

Active probing based Internet service fault management in uncertain and noisy environment

  • Published:
Science in China Series F: Information Sciences Aims and scope Submit manuscript

Abstract

In Internet service fault management based on active probing, uncertainty and noises will affect service fault management. In order to reduce the impact, challenges of Internet service fault management are analyzed in this paper. Bipartite Bayesian network is chosen to model the dependency relationship between faults and probes, binary symmetric channel is chosen to model noises, and a service fault management approach using active probing is proposed for such an environment. This approach is composed of two phases: fault detection and fault diagnosis. In first phase, we propose a greedy approximation probe selection algorithm (GAPSA), which selects a minimal set of probes while remaining a high probability of fault detection. In second phase, we propose a fault diagnosis probe selection algorithm (FDPSA), which selects probes to obtain more system information based on the symptoms observed in previous phase. To deal with dynamic fault set caused by fault recovery mechanism, we propose a hypothesis inference algorithm based on fault persistent time statistic (FPTS). Simulation results prove the validity and efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Molina-Jimenez C, Shrivastava S, Crowcroft J, et al. On the monitoring of contractual service level agreements. In: Proceedings of First IEEE International Workshop on Electronic Contracting, IEEE Computer Society, 2004. 1–8

  2. Li F, Thottan M. End-to-end service quality measurement using source-routed probes. In: IEEE INFOCOM, 2006

  3. Chen Z X. Proactive probing and probing on demand in service fault localization. Int J Intell Contr Syst, 2005, 2(2): 107–113

    Google Scholar 

  4. Natu M, Sethi A S. Active probing approach for fault localization in computer networks. In: E2EMON’06, 2006

  5. Nguyen X, Thiran P. Using end-to-end data to infer lossy inks in sensor networks. In: IEEE INFOCOM, 2006

  6. Steinder M, Sethi A S. A survey of fault localization techniques in computer networks. Sci Comp Program Comp Syst (AH), 2004, 53(22): 165–194

    Article  MATH  MathSciNet  Google Scholar 

  7. Steinder M, Sethi A S. Probabilistic fault diagnosis in communication systems through incremental hypothesis updating. Comp Netw, 2004, 45(4): 537–562

    Article  MATH  Google Scholar 

  8. Steinder M, Sethi A S. Probabilistic fault diagnosis in communication systems using belief networks. IEEE/ACM Trans Netw, 2004, 12(5): 809–822

    Article  Google Scholar 

  9. Huang X H, Zou S H, Wang W D, et al. Fault management for Internet service: Modeling and algorithms. In: IEEE International Conference on Communications, ICC 2006

  10. Agrawal S, Naidu K V M, Rastogi R. Diagnosing link-level anomalies using passive probes. In: 26th IEEE International Conference on Computer Communications. IEEE INFOCOM 2007, 2007. 1757–1765

  11. Keller A, Ludwig H. The WSLA framework: Specifying and monitoring service level agreements for web services. J Netw Syst Manag, Special Issue on E-Business Management, Plenum Publishing Corporation, 2003, 11(1): 57–81

    Google Scholar 

  12. Keynote Systems, Inc. Available: http://www.keynote.com

  13. Software Research Inc. Available: http://www.soft.com

  14. Natu M, Sethi A S. Probabilistic fault diagnosis using adaptive probing. In: IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, San Jose, CA, 2007. 38–49

  15. Rish I, Brodie M, Ma S, et al. Adaptive diagnosis in distributed systems. IEEE Trans Neural Netw (special issue on Adaptive Learning Systems in Communication Networks), 2005, 16(5): 1088–1109

    Google Scholar 

  16. Tang Y N, Al-Shaer E S, Boutaba R. Active integrated fault localization in communication networks. In: 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. 2005. 543–556

  17. Weerawarana S, Francisco C. Business Process with BPEL4WS: Understanding BPEL4WS, Part 1. Research report, IBM developerWorks, Aug. 2002; www-106.ibm.com/developerworks/webservices/library/ws-bpelcol1/

  18. Bagchi S, Kar G, Hellerstein J. Dependency analysis in distributed systems using fault injection: Application to problem determination in an e-commerce environment. In: 12th International Workshop on Distributed Systems: Operations and Management, DSOM’2001, 2001

  19. Basu S, Casati F, Daniel F. Web service dependency discovery tool for SOA management. In: 2007 IEEE International Conference on Services Computing: SOA Industry Summit, 2007

  20. Fox A, Gribble S D, Chawathe Y, et al. Cluster-based scalable network services. In: Proceedings of the Sixteenth ACM Symposium on Operating System Principles, 1997

  21. Kiciman E, Subramanian L. A root cause localization model for large-scale systems. In: Proceedings of USENIX Hot Topics On Dependability (HotDep), 2005

  22. Chen M Y, Kiciman E, Fratkin E, et al. Pinpoint: Problem determination in large, dynamic, Internet services. In: Proceedings of the International Conference on Dependable Systems and Networks (IPDS Track), 2002

  23. Steinder M, Sethi A S. The present and future of event correlation: A need for end-to-end service fault localization. In: World Multi-Conf. Systemics, Cybernetics, and Informatics (SCI), 2001

  24. Huang X H, Zou S H, Wang W D, Cheng S D. MDFM: Multi-domain fault management for Internet services. In: 8th International Conference on Management of Multimedia Networks and Services, MMNS 2005. New York: Springer-Verlag, LNCS 3754, 2005. 121–132

    Google Scholar 

  25. Narasimha R, Dihidar S, Ji C, et al. Scalable fault diagnosis in IP Networks using graphical models: A variational inference approach. In: IEEE International Conference on Communications. ICC’07. 2007. 147–152

  26. Candea G, Kiciman E, Zhang S, et al. JAGR: An autonomous self-recovering application server. In: Proceedings of the 5th International Workshop on Active Middleware Services, 2003

  27. Lerner U, Parr R, Koller D, et al. Bayesian fault detection and diagnosis in dynamic systems. In: Proceedings of the 17th National Conference on Artificial Intelligence (AAAI), 2000. 531–537

  28. Ding J G, Kramer B, Xu S H, et al. Predictive fault management in the dynamic environment of IP networks. In: Proceedings of IEEE Workshop on IP Operations and Management. 2004. 233–239

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to LingWei Chu.

Additional information

Supported by the National Basic Research Program of China (973 Program) (Grant No. 2003CB314806), the National High-Tech Research & Development Program of China (863 Program) (Grant Nos. 2007AA12Z321 and 2007AA01Z206), and the National Natural Science Foundation of China (Grant Nos. 60603060, 60502037 and 90604019)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chu, L., Zou, S., Cheng, S. et al. Active probing based Internet service fault management in uncertain and noisy environment. Sci. China Ser. F-Inf. Sci. 51, 1857–1870 (2008). https://doi.org/10.1007/s11432-008-0143-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-008-0143-9

Keywords

Navigation