GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems

Subramaniyan, Rajagopal; Raman, Pirabhu; George, Alan D.; Radlinski, Matthew

doi:10.1007/s10586-006-4900-5

GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems

Published: January 2006

Volume 9, pages 101–120, (2006)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Rajagopal Subramaniyan¹,
Pirabhu Raman¹,
Alan D. George¹ &
…
Matthew Radlinski¹

202 Accesses
12 Citations
6 Altmetric
Explore all metrics

Abstract

Gossip protocols have proven to be effective means by which failures can be detected in large, distributed systems in an asynchronous manner without the limitations associated with reliable multicasting for group communications. In this paper, we discuss the development and features of a Gossip-Enabled Monitoring Service (GEMS), a highly responsive and scalable resource monitoring service, to monitor health and performance information in heterogeneous distributed systems. GEMS has many novel and essential features such as detection of network partitions and dynamic insertion of new nodes into the service. Easily extensible, GEMS also incorporates facilities for distributing arbitrary system and application-specific data. We present experiments and analytical projections demonstrating scalability, fast response times and low resource utilization requirements, making GEMS a potent solution for resource monitoring in distributed computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MiCA: A Compositional Architecture for Gossip Protocols

A Scalable Monitor for Large Systems

Distributed Monitoring and Management of Exascale Systems in the Argo Project

References

R. Wolski, Dynamically forecasting network performance to support dynamic scheduling using the network weather service, Cluster Computing, 1 (1) (1998) 119–131.
Article Google Scholar
R. Wolski, N. Spring, and J. Hayes, The network weather service: A distributed resource performance forecasting service for metacomputing, Journal of Future Generation Computing Systems, 15 (5/6) (1999) 757–768.
Google Scholar
Z. Liang, Y. Sun, and C. Wang, Clusterprobe: An open, flexible and scalable cluster monitoring tool, in:Proceedings of 1^stIEEE Computer Society International Workshop on Cluster Computing, Melbourne, Australia, (1999) 261–268.
R. Buyya, PARMON: A portable and scalable monitoring system for clusters, International Journal on Software: Practice & Experience, 30 (7) (2000) 723–739.
MATH Google Scholar
R. Van Renesse, K. Birman, and W. Vogels, Astrolabe: A robust and scalable technology for distributed systems monitoring, management, and data mining, ACM Transactions on Computer Systems 21 (3) (2003).
International Business Machines Corporation, IBM LoadLeveler: User's Guide (September, 1993).
J. Basney and M. Livny, Managing network resources in condor, in:Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9), Pittsburgh, Pennsylvania (2000) pp. 298–299.
R. Van Renesse, R. Minsky and M. Hayden, A gossip-style failure detection service, in: Proc. of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing Middleware, England, (1998) pp. 55–70.
M. Burns, A. George, and B. Wallace, Simulative performance analysis of gossip failure detection for scalable distributed systems, Cluster Computing, 2 (3) (1999) 207–217.
Article Google Scholar
S. Ranganathan, A. George, R. Todd, and M. Chidester, Gossip-style failure detection and distributed consensus for scalable heterogeneous clusters, Cluster Computing, 4 (3) (2001) 197–209.
Article Google Scholar
K. Sistla, A. George, R. Todd and R. Tilak, Performance analysis of flat and layered gossip services for failure detection and consensus in scalable heterogeneous clusters, in: Proc. of IEEE Heterogeneous Computing Workshop at IPDPS, San Francisco, CA, (2001) pp. 23–27.
K. Sistla, A. George and R. Todd, experimental analysis of a gossip-based service for scalable, distributed failure detection and consensus, Cluster Computing, 6 (3) (2003) 237–251.
Article Google Scholar
W. Vogels, D. Dumitriu, A. Agarwal, T. Chia and K. Guo, Scalability of microsoft cluster service, in: Proceedings of the 2nd USENIX Windows NT Symposium, Seattle, Washington, August 3–4 (1998).
H. C. Lin and C. S. Raghavendra, A dynamic load balancing policy with a central job dispatcher (LBC), IEEE Transactions on Software Engineering 18 (2) (1992) 148–158.
Article Google Scholar
S. Zhou, A trace-driven simulation study of dynamic load balancing, IEEE Transactions on Software Engineering 14 (9) (1988) 1327–1341.
Article Google Scholar
M. Zaki, W. Li and S. Parthasarathy, Customized dynamic load balancing for a network of workstations, Journal of Parallel and Distributed Computing 43 (2) (1997) 156–162.
Article Google Scholar
M. Willebeek-LeMair and A. Reeves, Strategies for dynamic load balancing on highly parallel computers, IEEE Transactions on Parallel and Distributed Systems 4 (9) (1993) 979–993.
Article Google Scholar
C. Xu, B. Monien, and R. Luling, Nearest neighbor algorithms for load balancing in parallel computers, Concurrency: Practice and Experience 7 (7) (1995) 707–736.
Google Scholar
I. Ahmed, Semi-distributed load balancing for massively parallel multicomputer systems, IEEE Transactions on Software Engineering, 17 (10) (1991) 987–1004.
Google Scholar

Download references

Author information

Authors and Affiliations

High-performance Computing and Simulation (HCS) Research Laboratory, Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, P.O. Box 116200, 32611-6200
Rajagopal Subramaniyan, Pirabhu Raman, Alan D. George & Matthew Radlinski

Authors

Rajagopal Subramaniyan
View author publications
You can also search for this author in PubMed Google Scholar
Pirabhu Raman
View author publications
You can also search for this author in PubMed Google Scholar
Alan D. George
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Radlinski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan D. George.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Subramaniyan, R., Raman, P., George, A.D. et al. GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems. Cluster Comput 9, 101–120 (2006). https://doi.org/10.1007/s10586-006-4900-5

Download citation

Issue Date: January 2006
DOI: https://doi.org/10.1007/s10586-006-4900-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems

Abstract

Access this article

Similar content being viewed by others

MiCA: A Compositional Architecture for Gossip Protocols

A Scalable Monitor for Large Systems

Distributed Monitoring and Management of Exascale Systems in the Argo Project

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems

Abstract

Access this article

Similar content being viewed by others

MiCA: A Compositional Architecture for Gossip Protocols

A Scalable Monitor for Large Systems

Distributed Monitoring and Management of Exascale Systems in the Argo Project

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation