Skip to main content
Log in

An Architecture for Inter-Domain Troubleshooting

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

In this paper, we explore the constraints of a new problem: that of coordinating network troubleshooting among peer administrative domains and untrusted observers. Our approach permits any entity to report problems, whether it is a Network Operations Center (NOC), end-user, or application. Our goals are to define the inter-domain coordination problem clearly, and to develop an architecture which allows observers to report problems and receive timely feedback, regardless of their own locations and identities. By automating this process, we also relieve human bottlenecks at help desks and NOCs whenever possible. We present a troubleshooting approach for coordinating problem diagnosis, and describe Global Distributed Troubleshooting (GDT), a distributed protocol which realizes this approach. We show through simulation that GDT scales well as the number of observers and problems grows.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Shri K. Goyal and Ralph W. Worrest, Expert systems in network maintenance and management, in Proceedings of the International Conference on Communications, June 1986.

  2. Makoto Yoshida, Makoto Kobayashi, and Haruo Yamaguchi, Customer control of network management from the service provider's perspective, IEEE Communications Magazine, pp. 35–40, 1990.

  3. Kraig R. Meyer and Dale S. Johnson, Experience in network management: The Merit network operations center, in Integrated Network Management, II, IFIP TC6/WG6, pp. 301–310, 1991.

  4. Alan Hannan, Inter-provider outage notification. North American Network Operator's Group, May 1996. http://www.academ.com/nanog/may1996/outage.html.

  5. Craig Labovitz, Routing stability analysis, North American Network Operator's Group, October 1996. http://www.academ.com/nanog/oct1996/routing-stability.html.

  6. Merit/ISI, Inter-provider notification.http://compute.merit.edu/ipn.html.

  7. S. Kaetker and K. Geihs, A generic model for fault isolation in integrated management system, Journal of Network and Systems Management, Vol. 5, No. 2, 1997.

  8. D. Logothesis and K. Trivedi, The effect of detection and restoration times on error recovery in communication networks, Journal of Network and Systems Management, Vol. 5, No. 2, 1997.

  9. Metin Feridun, Diagnosis of connectivity problems in the internet, in Integrated Network Management, II, IFIP TC6/WG6.6, pp. 691–701, 1991.

  10. M. Feridun, M. Leib, M. Nodine, and J. Ong, ANM: Automated network management System, IEEE Network Magazine, Vol. 2, No. 2, pp. 13–19, 1988.

    Google Scholar 

  11. Frank Feather,Dan Slewlorek, and Roy Maxion, Fault detection in an ethernet network using anomaly signature matching, in Proceedings of ACM SIGCOMM, pp. 279–288, September 1993.

  12. Paul Hong and Prodip Sen, Incorporating non-deterministic reasoning in managing heterogeneous network faults, in Integrated Network Management, II, IFIP TC6/WG6, pp. 481–492, 1991.

  13. David D. Clark, The design philosophy of the DARPA Internet protocols, Proceedings of ACM SIGCOMM, pp. 106–114, 1988.

  14. Marshall T. Rose, The Simple Book, 2nd edn., Prentice Hall, 1994.

  15. Jill Huntington-Lee, Kornel Terplan, and Jeffrey A. Gibson, HP Openview: A Manager's Guide, McGraw-Hill, 1997.

  16. David G. Thaler and Chinya V. Ravishankar, NView: A visual framework for network tool integration, in Proceedings of the IEEE International Phoenix Conference on Computers and Communications, pp. 283–289, March 1995.

  17. Germán Goldszmidt and Yechiam Yemini, Distributed management by delegation, in Proceedings of the International Conference on Distributed Computing Systems, June 1995.

  18. M. Kahani and H. W. P. Beadle, Decentralized approaches for network management, Computer Communication Review, Vol. 27, No. 3, pp. 36–47, 1997.

    Google Scholar 

  19. Craig Labovitz, Netnow. http://www.merit.edu/sim ipma/netnow.

  20. W. Fenner and S. Casner, A “traceroute” facility for IP multicast. Internet Draft, November 1997.

  21. David G. Thaler, III, An architecture for inter-domain network troubleshooting, Ph.D. thesis, University of Michigan, January 1998.

  22. Germán Goldszmidt and Yechiam Yemini, Evaluating management decisions via delegation, in Integrated Network Management, III, IFIP TC6/WG6, 1993.

  23. ISO, Information processing systems—open systems interconnection—basic reference model—part 4: Management framework, 1989, ISO 7498-4.

  24. Zheng Wang, Model of network faults, in Integrated Network Management, I, IFIP TC6/WG6, pp. 345–352, 1989.

  25. Willis Stinson and Shaygan Kheradpir, A state-based approach to real-time telecommunications network management, in NOMS, pp. 520–532, 1992.

  26. Aiko Pras, Network Management Architectures, PhD thesis, University of Twente, Enschede, Netherlands,February 1995.

    Google Scholar 

  27. A. Guttman, R-trees: A dynamic index structure for spatial searching, in Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, 47–57, 1984.

  28. P. Mockapetris, Domain names—concepts and facilities, RFC-1034, November 1987.

  29. Gerald W. Neufeld, Descriptive names in X.500, in Proceedings of ACM SIGCOMM, pp. 64–70, 1989.

  30. Larry L. Peterson, The profile naming service, ACM Transactions on Computer Systems, Vol. 6, No. 4, pp. 341–364, 1988.

    Google Scholar 

  31. D. Thaler and C. V. Ravishankar, Distributed top-down hierarchy construction, in Proceedings of the IEEE INFOCOM, 1998.

  32. David G. Thaler and Chinya V. Ravishankar, Using name-based Mappings to increase hit rates, IEEE/ACM Transactions on Networking, Vol. 6, No. 1, pp. 1–14, 1998.

    Google Scholar 

  33. D. Eastlake and C. Kaufman, Domain name system security extensions, January 1997, RFC-2065.

  34. D. Eastlake, Secure domain name system dynamic update, April 1997, RFC-2137.

  35. D. Thaler Globally-distributed troubleshooting (GDT): Protocol specification, Internet Draft, January 1997, draft-thaler-gdt-00.txt.

  36. Lawrence Berkeley National Labs, ns software. http://www-nrg.ee.lbl.gov/ns/

  37. Sally Floyd and Van Jacobson, Random early detection gateways for congestion avoidance, IEEE/ACM Transactions on Networking, Vol. 1, No. 4, pp. 397–413, 1993.

    Google Scholar 

  38. MICE, mwatch tools. ftp://ftp.cl.cam.ac.uk/pub/mwatch.README.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chinya V. Ravishankar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thaler, D.G., Ravishankar, C.V. An Architecture for Inter-Domain Troubleshooting. Journal of Network and Systems Management 12, 155–189 (2004). https://doi.org/10.1023/B:JONS.0000034212.53702.30

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:JONS.0000034212.53702.30

Navigation