Skip to main content
Log in

A Generic Model for Fault Isolation in Integrated Management Systems

Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Distributed systems in enterprises as well astelecommunication environments strongly demand moreautomated fault management. A single fault in thesecomplex systems might cause a huge number of symptomatic error messages and side effects to occur. Thecommon root faults for these symptoms have to beidentified to start fault removal procedures as soon aspossible and to decrease system down-time. This paper presents a methodology for fault isolation inintegrated management systems. A generic model isdescribed that unifies the view of the management systemon the managed environment. It integrates the relevant aspects of network, system, and servicemanagement layers in order to perform integrated faultisolation. Our approach is based on a general dependencygraph model. It captures the information that isrequired to determine the root cause of a fault on theone hand, and the set of fault affected services andcustomers on the other hand. The layered TMNarchitecture serves as an example for an integratedmanagement environment throughout this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. CCITT, Principles for a telecommunications management network, Recommendation M.3010, 1992.

  2. ISO 7498-4 Standard Information Processing Systems, Open Systems Interconnection, Basic Reference Model-Part 4: Management Framework, 1991.

  3. G. Dreo and R. Valta, Using master tickets as a storage for problem solving expertise, In Proc. of 4th IFIP/IEEE International Symposium on Integrated Network Management, Chapman and Hall, London, pp. 328–340, 1995.

    Google Scholar 

  4. American National Standard for Information Technology, Fault Isolation-Information Characterization X3T8-1994, Draft, 1994.

  5. ISO/IEC 10164 Standard, Information Technology, Open Systems Interconnection, Management Information Services, 1991.

  6. J. C. La Prie, Dependability: Concepts and terminology, fault tolerant distributed conputing. In IFIP WG 10.4 Dependable Computing and Fault Tolerance, 1990.

  7. Y. A. Nygate and L. Sterling, ASPEN-Designing complex knowledge based systems. In Proceedings of the Ten Israeli Symposium on Artificial Intelligence Computing, Vision, and Neural Networks, pp. 51–60, 1993.

  8. W. Kehl and H. Hopfmüller, Model-based reasoning for the management of telecommunication network. In Proceedings of IEEE International Conference on Communications ICC93, Geneva, pp. 13–17, 1993.

  9. G. Jakobson and M. D. Weissman, Alarm Correlation, IEEE Network, pp. 52–59, 1993.

  10. A. Finkel, The design and validation of rule based expert systems, IBM Research Report, 1992.

  11. Y. A. Nygate, Event correlation using rule and object based techniques. In Processing of Fourth IFIP/IEEE International Symposium on Integrated Network Management, Chapman and Hall, London, pp. 279–289, 1995.

    Google Scholar 

  12. A. A. Hopgood, Rule based control of a telecommunications network using the blackboard model. Artificial Intelligence in Engineering, Vol. 9, pp. 29–38, 1994.

    Google Scholar 

  13. M. Frontini, J. Griffin, and S. Towers, A knowledge-based system for fault localization in wide area networks. In IFIP TC6/WG 6.6 Symposium on Integrated Network Management, Verlag, San Francisco, pp. 519–530, 1991.

    Google Scholar 

  14. S. Kliger, S. Yemini, Y. Yemini, D. Ohsie, and S. Stolfo, A coding approach to event correlation. In Processing of Fourth IFIP/IEEE International Symposium on Integrated Network Management, Chapman and Hall, London, pp. 266–277, 1995.

    Google Scholar 

  15. A. Bouloutas, S. Calo, and A. Finkel, Alarm correlation and fault identification in communication networks, IBM Technical Report TR-17967, 1992.

  16. J. F. Jordaan and M. Paterok, Event correlation in heterogeneous networks using OSI management framework. In H. G. Hegering and Y. Yemini (eds.), Integrated Network Management, III, North Holland, Amsterdam, pp. 683–695, 1993.

    Google Scholar 

  17. K. Houck, S. B. Calo, and A. Finkel, Towards a practical alarm correlation system. In Processing of Fourth IFIP/IEEE International Symposium on Integrated Network Management, Chapman and Hall, London, pp. 226–237, 1995.

    Google Scholar 

  18. S. Kätker and M. Paterok, Verfahren zur AutomatischenÜberprüfung eines Datenüber-Tragungsnet zwerks, German Patent No. DE 44 28 132 C 2, 1996.

  19. I. Katzela and S. B. Calo, Centralized vs. distributed fault localization. In Processing of Fourth IFIP/IEEE International Symposium on Integrated Network Management, Chapman and Hall, London, pp. 251–261, 1995.

    Google Scholar 

  20. ISO 10165-4 Standard: Information Technology, Open Systems Interconnection, Management Information Services, Structure of Management Information, Part 4: Guidelines for the Definition of Managed Objects, 1991.

  21. Network ManagementForum: Discovering OMNIPoint-A Common Approach to the Integrated Management of Networked Information Systems, Prentice Hall, Englewood Cliffs.

  22. ISO/IEC 10165-7 Standard: Information Technology, Open Systems Interconnection, Structure of Management Information-Part 7: General Relationship Model, 1994.

  23. CCITT Recommendation X.700, Management Framework Definition for Open Systems Interconnection (OSI) for CCITT Applications, 1992.

  24. S. Kätker, A modeling framework for integrated distributed systems fault management. In A. Schill, C. Mittasch, O. Spaniol, and C. Popien (eds.), Distributed Platforms, Chapman and Hall, London, pp. 186–198, 1996.

    Google Scholar 

  25. F. Dupuy, C. Nilson, and Y. Inoue, The TINA Consortium: toward networking telecommunications information services, IEEE Communication Magazine, Vol. 33, No.11, pp. 78–83, 1995.

    Google Scholar 

  26. The Common Object Request Broker: Architecture and Specification, OMG Document No. 91.12.1, Rev. 2.0, 1995.

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Katker, S., Geihs, K. A Generic Model for Fault Isolation in Integrated Management Systems. Journal of Network and Systems Management 5, 109–130 (1997). https://doi.org/10.1023/A:1018766610444

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018766610444

Keywords

Navigation