Skip to main content
Log in

Reliable Distributed Network Management by Replication

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

This paper presents a new clustering architecture for SNMP agents that supports semi-active replication of managed objects. A cluster of agents provides fault-tolerant object functionality: replicated managed objects of a crashed agent of a given cluster may be accessed through a peer cluster. The proposed architecture is structured in three layers. The lower layer corresponds to the managed objects at the network elements. The middle layer contains management entities called clusters that monitor and replicate managed objects. The upper layer allows the definition of management clusters as well as the relationship between clusters. A practical tool was implemented and is presented. The impact of replication on network performance is evaluated as well as a probabilistic analysis of replicated object consistency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. A. Leinwand and K. F. Conroy, Network Management: A Practical Perspective, Addison-Wesley, 1996.

  2. D. Medhi and D. Tipper, Journal of Network and Systems Management: Special Issue on Fault Management in Communication Networks, Vol. 5, No. 2, June 1997.

  3. E. P. Duarte, and T. Nanya, A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm, IEEE Transactions on Computers, Vol. 47, No. 1, pp. 34-45, January 1998.

    Google Scholar 

  4. E. P. Duarte, and A. Weber, A Distributed Network Connectivity Algorithm, Proceedings of the 6th IEEE International Symposium on Autonomous Decentralized Systems (ISADS'2003), Pisa, Italy, 2003.

  5. S. Kätker and M. Paterok, Fault Isolation and Event Correlation for Integrated Network Management, Proceedings of the 5th IEEE/IFIP International Symposium on Integrated Network Management (IM'97), San Diego CA, 1997.

  6. C. S. Hood and C. Ji, Proactive Network Fault Detection, Proc. INFOCOM 97, 1997.

  7. W. Stallings, Snmp, Snmpv2, Snmpv3 and Rmon 1 and 2, Addison-Wesley, Reading, MA, 1999.

    Google Scholar 

  8. M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso, Understanding Replication in Databases and Distributed Systems, Technical Report SSC/1999/035, École Polytechnique Fédérale de Lausa nne, Switzerland, September 1999.

    Google Scholar 

  9. D. Harrington, R. Presuhn, and B. Wijnen, An Architecture for Describing SNMP Management Frameworks, RFC 2271, January 1998.

  10. Distributed Management (DisMan) Charter—http://www.ietf.org/html.charters/disman-charter.html

  11. K. Birman, Building Reliable and Secure Network Applications, Prentice-Hall, 1996.

  12. R. V. Renesse, K. P. Birman, and S. Maffeis, Horus: A Flexible Group Communication System, Communications of the ACM, Vol. 39, No. 4, pp. 76–83, April 1996.

    Google Scholar 

  13. M. G. Hayden, The Ensemble System, PhD Thesis, Cornell University, Ithaca, January 1998.

    Google Scholar 

  14. The NET-SNMP Home Page, http://net-snmp.sourceforge.net.

  15. E. P. Duarte, and Aldri L. dos Santos, Semi-Active Replication of SNMP Objects in Agent Groups Applied for Fault Management, Proceedings of the 7 th IFIP/IEEE International Symposium on Integrated Network Management (IM'01), Seattle, May 2001.

  16. E. P. Duarte, and Aldri L. dos Santos, Network Fault Management Based on SNMP Agent Groups, Proceedings of the IEEE 21st International Conference on Distributed Computing Systems Workshops (ICDCS'2001), Workshop on Applied Reliable Group Communications, Mesa, Arizona, pp. 51–56, April 2001.

    Google Scholar 

  17. J. Wei, C. Shen, B. J. Wilson, and M. J. Post, Network Control and Management of a Reconfigurable WDM Network, Proceedings of the Military Communications Conference (MILCOM'96), Mclean, Virginia, October 1996.

    Google Scholar 

  18. J. Schönwälder, Using Multicast-SNMP to Coordinate Distributed Management Agents, IEEE Workshop on Systems Management, June 1996.

  19. K.-H. Lee, A Group Communication Protocol for Distributed Network Management Systems, Proc. ICCC 95, pp. 363–368, 1995.

  20. D. Breitgand Group Communication as an Infrastructure for Distributed Systems Management, Master Dissertation, Hebrew University of Jerusalem, June 1997.

  21. R. Guerraoui and A. Schiper, Fault-Tolerance by Replication in Distributed Systems, International Conference on Reliable Software Technologies, Springer Verlag, (LNCS), 1996.

  22. L. E. Moser,P. M. Melliar-Smith, and P. Narasimhan, A fault tolerance framework for CORBA, 29th Annual International Symposium on Fault-Tolerant Computing, 1999.

  23. R. Guerraoui and A. Schiper, Software-based Replication for Fault Tolerance, IEEE Computer, Vol. 30, No. 4, pp. 68–74, April 1997.

    Google Scholar 

  24. F. B. Schneider, Implementing fault-tolerant services using the state machine approach: A tutorial, ACM Computing Surveys, Vol. 22, No. 4, pp. 299–319, December 1990.

    Google Scholar 

  25. X. Défago, A. Schiper, and N. Sergent, Semi-passive replication, Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems (SRDS), pp. 43–50, West Lafayette, IN, USA, October 1998.

    Google Scholar 

  26. M. A. Hiltunem and R. D. Schlichting, A configurable membership service, IEEE Transactions on Computers, Vol. 47, No. 5, May 1998.

  27. S. Mishra, C. Fetzer, and F. Cristian, The Timewheel Group Membership Protocol, Proceedings of the 3rd IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, Orlando, FL, April 1998.

    Google Scholar 

  28. D. Levi and J. Schönwälder, Definitions of Managed Objects for the Delegation of Management Scripts, RFC 3165, August 2001.

  29. R. Kavasseri and B. Stewart, Event MIB, RFC 2981, October 2000.

  30. D. Levi and J. Schönwälder, Definitions of Managed Objects for Scheduling Management Operations, RFC 3231, January 2002.

  31. S. Chisholm and D. Romascanu, Alarm MIB, Working in Progress, December 2001.

  32. W. Chen, N. Jain, and S. Singh, ANMP: Ad Hoc Network Management Protocol, IEEE Journal on Selected Areas in Communications, Vol. 17, No. 8, August 1999.

  33. Aldri L. Santos, Elias P. Duarte, and Glenn Mansfield, A Clustering Architecture for Replicating Managed Objects, Work in Progress, November 2001. Available at: http://www.rfc-editor.org/internet-drafts/draft-aldri-disman-replication-mib-00.txt.

  34. R. Farrow,TCP SYN Flooding Attacks and Remedies, Network Computing Unix World, http://www.networkcomputing.com/unixworld/security/004/004.txt.html.

  35. K. McCloghrie, SNMPv2 Management Information Base for the Transmission Control Protocol using SMIv2, RFC 2012, IETF, November 1996.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elias P. Duarte Jr..

Rights and permissions

Reprints and permissions

About this article

Cite this article

dos Santos, A.L., Duarte, E.P. & Keeni, G.M. Reliable Distributed Network Management by Replication. Journal of Network and Systems Management 12, 191–213 (2004). https://doi.org/10.1023/B:JONS.0000034213.75955.8b

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:JONS.0000034213.75955.8b

Navigation