Abstract
Distributed hierarchical network monitoring model has been proposed to solve scalability problem of centralized model. In this distributed model, a top-level monitoring manager, called main manager, obtains aggregate management information from mid-level managers, named domain managers, forming a hierarchical structure. However, if some of monitoring managers crash, network elements cannot be continuously and correctly monitored until the managers are repaired. To address this important, but previously unresolved issue, this paper presents a new fault-tolerance protocol for domain managers, named DMFTP, allowing the managers to efficiently utilize their organization structure. Therefore, this protocol can minimize failure detection overhead and the number of live managers affected by each manager node crash. Also, it tolerates concurrent manager failures and, after the failed managers have been repaired, ensures their immediate and consistent recovery.
This work was supported by Electronics and Telecommunications Research Institute Grant.
Chapter PDF
Similar content being viewed by others
References
J. Ahn, S. Min, Y. Choi and B. Lee. A Novel Fault-Tolerance Strategy for Large-Scale Network Monitoring. Technical Report KU-CSE-02-049, Korea University, 2002.
G. Goldszmidt and Y. Yemini. Delegated Agents for Network Management. IEEE Communication Magazine, 36(3):66–70, March 1998.
R. Guerraoui and A. Schiper. Software-Based Replication for Fault Tolerance. IEEE Computer, 30(4):68–74, 1997.
J. Philippe, M. Flatin and S. Znaty. Two Taxonomies of Distributed Network and System Management Paradigms. Emerging Trends and Challenges in Network Management, 2000.
R. Subramanyan, J. Miguel-Alonso and J.A.B. Fortes. A scalable SNMP-based distributed monitoring system for heterogeneous network computing. In Proc. of the 12nd ACM/IEEE International Supercomputing Conference. Dallas, Texas, Nov 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ahn, J., Min, S., Choi, Y., Lee, B. (2003). Low-Cost Fault-Tolerance Protocol for Large-Scale Network Monitoring. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds) Computational Science — ICCS 2003. ICCS 2003. Lecture Notes in Computer Science, vol 2659. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44863-2_50
Download citation
DOI: https://doi.org/10.1007/3-540-44863-2_50
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40196-4
Online ISBN: 978-3-540-44863-1
eBook Packages: Springer Book Archive