Abstract
This paper presents a new clustering architecture for SNMP agents that supports semi-active replication of managed objects. A cluster of agents provides fault-tolerant object functionality: replicated managed objects of a crashed agent of a given cluster may be accessed through a peer cluster. The proposed architecture is structured in three layers. The lower layer corresponds to the managed objects at the network elements. The middle layer contains management entities called clusters that monitor and replicate managed objects. The upper layer allows the definition of management clusters as well as the relationship between clusters. A practical tool was implemented and is presented. The impact of replication on network performance is evaluated as well as a probabilistic analysis of replicated object consistency.
Similar content being viewed by others
REFERENCES
A. Leinwand and K. F. Conroy, Network Management: A Practical Perspective, Addison-Wesley, 1996.
D. Medhi and D. Tipper, Journal of Network and Systems Management: Special Issue on Fault Management in Communication Networks, Vol. 5, No. 2, June 1997.
E. P. Duarte, and T. Nanya, A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm, IEEE Transactions on Computers, Vol. 47, No. 1, pp. 34-45, January 1998.
E. P. Duarte, and A. Weber, A Distributed Network Connectivity Algorithm, Proceedings of the 6th IEEE International Symposium on Autonomous Decentralized Systems (ISADS'2003), Pisa, Italy, 2003.
S. Kätker and M. Paterok, Fault Isolation and Event Correlation for Integrated Network Management, Proceedings of the 5th IEEE/IFIP International Symposium on Integrated Network Management (IM'97), San Diego CA, 1997.
C. S. Hood and C. Ji, Proactive Network Fault Detection, Proc. INFOCOM 97, 1997.
W. Stallings, Snmp, Snmpv2, Snmpv3 and Rmon 1 and 2, Addison-Wesley, Reading, MA, 1999.
M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso, Understanding Replication in Databases and Distributed Systems, Technical Report SSC/1999/035, École Polytechnique Fédérale de Lausa nne, Switzerland, September 1999.
D. Harrington, R. Presuhn, and B. Wijnen, An Architecture for Describing SNMP Management Frameworks, RFC 2271, January 1998.
Distributed Management (DisMan) Charter—http://www.ietf.org/html.charters/disman-charter.html
K. Birman, Building Reliable and Secure Network Applications, Prentice-Hall, 1996.
R. V. Renesse, K. P. Birman, and S. Maffeis, Horus: A Flexible Group Communication System, Communications of the ACM, Vol. 39, No. 4, pp. 76–83, April 1996.
M. G. Hayden, The Ensemble System, PhD Thesis, Cornell University, Ithaca, January 1998.
The NET-SNMP Home Page, http://net-snmp.sourceforge.net.
E. P. Duarte, and Aldri L. dos Santos, Semi-Active Replication of SNMP Objects in Agent Groups Applied for Fault Management, Proceedings of the 7 th IFIP/IEEE International Symposium on Integrated Network Management (IM'01), Seattle, May 2001.
E. P. Duarte, and Aldri L. dos Santos, Network Fault Management Based on SNMP Agent Groups, Proceedings of the IEEE 21st International Conference on Distributed Computing Systems Workshops (ICDCS'2001), Workshop on Applied Reliable Group Communications, Mesa, Arizona, pp. 51–56, April 2001.
J. Wei, C. Shen, B. J. Wilson, and M. J. Post, Network Control and Management of a Reconfigurable WDM Network, Proceedings of the Military Communications Conference (MILCOM'96), Mclean, Virginia, October 1996.
J. Schönwälder, Using Multicast-SNMP to Coordinate Distributed Management Agents, IEEE Workshop on Systems Management, June 1996.
K.-H. Lee, A Group Communication Protocol for Distributed Network Management Systems, Proc. ICCC 95, pp. 363–368, 1995.
D. Breitgand Group Communication as an Infrastructure for Distributed Systems Management, Master Dissertation, Hebrew University of Jerusalem, June 1997.
R. Guerraoui and A. Schiper, Fault-Tolerance by Replication in Distributed Systems, International Conference on Reliable Software Technologies, Springer Verlag, (LNCS), 1996.
L. E. Moser,P. M. Melliar-Smith, and P. Narasimhan, A fault tolerance framework for CORBA, 29th Annual International Symposium on Fault-Tolerant Computing, 1999.
R. Guerraoui and A. Schiper, Software-based Replication for Fault Tolerance, IEEE Computer, Vol. 30, No. 4, pp. 68–74, April 1997.
F. B. Schneider, Implementing fault-tolerant services using the state machine approach: A tutorial, ACM Computing Surveys, Vol. 22, No. 4, pp. 299–319, December 1990.
X. Défago, A. Schiper, and N. Sergent, Semi-passive replication, Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems (SRDS), pp. 43–50, West Lafayette, IN, USA, October 1998.
M. A. Hiltunem and R. D. Schlichting, A configurable membership service, IEEE Transactions on Computers, Vol. 47, No. 5, May 1998.
S. Mishra, C. Fetzer, and F. Cristian, The Timewheel Group Membership Protocol, Proceedings of the 3rd IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, Orlando, FL, April 1998.
D. Levi and J. Schönwälder, Definitions of Managed Objects for the Delegation of Management Scripts, RFC 3165, August 2001.
R. Kavasseri and B. Stewart, Event MIB, RFC 2981, October 2000.
D. Levi and J. Schönwälder, Definitions of Managed Objects for Scheduling Management Operations, RFC 3231, January 2002.
S. Chisholm and D. Romascanu, Alarm MIB, Working in Progress, December 2001.
W. Chen, N. Jain, and S. Singh, ANMP: Ad Hoc Network Management Protocol, IEEE Journal on Selected Areas in Communications, Vol. 17, No. 8, August 1999.
Aldri L. Santos, Elias P. Duarte, and Glenn Mansfield, A Clustering Architecture for Replicating Managed Objects, Work in Progress, November 2001. Available at: http://www.rfc-editor.org/internet-drafts/draft-aldri-disman-replication-mib-00.txt.
R. Farrow,TCP SYN Flooding Attacks and Remedies, Network Computing Unix World, http://www.networkcomputing.com/unixworld/security/004/004.txt.html.
K. McCloghrie, SNMPv2 Management Information Base for the Transmission Control Protocol using SMIv2, RFC 2012, IETF, November 1996.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
dos Santos, A.L., Duarte, E.P. & Keeni, G.M. Reliable Distributed Network Management by Replication. Journal of Network and Systems Management 12, 191–213 (2004). https://doi.org/10.1023/B:JONS.0000034213.75955.8b
Issue Date:
DOI: https://doi.org/10.1023/B:JONS.0000034213.75955.8b