Skip to main content

A Scalable Monitoring Solution for Large-Scale Distributed Systems

  • Conference paper
  • First Online:
Computer Aided Systems Theory – EUROCAST 2015 (EUROCAST 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9520))

Included in the following conference series:

Abstract

Applications running in large-scale distributed systems face many challenges and difficulties. Constraints imposed to such systems need to be thoroughly checked in order to ensure a proper service delivery to the client. The current paper proposes a monitoring solution for large-scale distributed systems relying on abstract state machines. Data gathered from the monitoring components are used in calculating metrics and establishing a diagnosis for the system. Emphasis is put on failure detection and on ensuring non-functional requirements of the system such as fault-tolerance and resilience. The model introduced in this paper will be integrated in a cloud-enabled large-scale distributed system. The novelty of the solution consists of finding the best integration architecture for state-of-the-art algorithms and tools and refining them to an efficient version for large-scale distributed systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A tuple (\(\mathrm {M_{k}}\), \(\mathrm {T_{k}}\), \(\mathrm {I_{k}}\)) refers to (\(\mathrm {Monitoring Component_{k}}\), \(\mathrm {Topology_{k}}\), \(\mathrm {Metrics Set_{k}}\)).

References

  1. Parkhill, D.F.: The Challenge of the Computer Utility. Addison-Wesley Publishing Company, Reading (1966)

    Google Scholar 

  2. Nemes, S. T.: Adaptation Engine for Large-Scale Distributed Systems. In: Computer Aided Systems Theory - EUROCAST 2015, To appear. Springer, Las Palmas (2015)

    Google Scholar 

  3. Kutare, M., Eisenhauer, G., Wang, C., Schwan, K., Talwar, V., Wolf, M.: Monalytics: online monitoring and analytics for managing large scale data centers. In: Proceedings of the 7th International Conference on Autonomic Computing, pp. 141–150. ACM (2010)

    Google Scholar 

  4. Rak, M., Venticinque, S., Mahr, T., Echevarria, G., Esnal, G.: Cloud application monitoring: the mOSAIC approach. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 758–763. IEEE (2011)

    Google Scholar 

  5. Palmieri, R., di Sanzo, P., Quaglia, F., Romano, P., Peluso, S., Didona, D.: Integrated monitoring of infrastructures and applications in cloud environments. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., et al. (eds.) Euro-Par 2011, Part I. LNCS, vol. 7155, pp. 45–53. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Massie, M.L., Chun, B.N., Culler, D.E: The ganglia distributed monitoring system: design, parallel computing, implementation and experience (2003)

    Google Scholar 

  7. Börger, E., Stärk, R.F.: Abstract State Machines: A Method for High-Level System Design and Analysis. Springer, Heidelberg (2003)

    Book  MATH  Google Scholar 

  8. Lynch, N.: Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco (1996)

    MATH  Google Scholar 

  9. Hamid, B., Mosbah, M.: A formal model for fault-tolerance in distributed systems. In: Winther, R., Gran, B.A., Dahll, G. (eds.) SAFECOMP 2005. LNCS, vol. 3688, pp. 108–121. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Driscoll, K., Hall, B., Sivencrona, H., Zumsteg, P.: Byzantine fault tolerance, from theory to reality. In: Anderson, S., Felici, M., Littlewood, B. (eds.) SAFECOMP 2003. LNCS, vol. 2788, pp. 235–248. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Stärk, R.F., Schmid, J., Börger, E.: Java and the Java Virtual Machine: Definition, Verification, Validation. Springer, Heidelberg (2001)

    Book  MATH  Google Scholar 

  12. Blass, A., Gurevich, Y.: Abstract state machines capture parallel algorithms: correction and extension. ACM Trans. Comput. Logic 9(3), 19:1–19:32 (2008)

    MathSciNet  Google Scholar 

  13. Glässer, U., Gu, Q.-P.: Formal description and analysis of a distributed location service for mobile ad hoc networks. In: Theoretical Computer Science (2005)

    Google Scholar 

  14. Rady, M., Lampesberger, H.: Monitoring of client-cloud interaction. In: Buchberger, B., Prinz, A., Schewe, K.D., Thalheim, B. (eds.) Correct Software in Web Applications and Web Services. Texts & Monographs in Symbolic Computation, pp. 177–228. Springer, Heidelberg (2014)

    Google Scholar 

  15. Bósa, K.: A formal model of a cloud service architecture in terms of ambient ASM. Technical report, Christian Doppler Laboratory for Client-Centric Cloud Computing (CDCC), Johannes Kepler University Linz, Hagenberg, Austria (2012)

    Google Scholar 

  16. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York (2003)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreea Buga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Buga, A. (2015). A Scalable Monitoring Solution for Large-Scale Distributed Systems. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2015. EUROCAST 2015. Lecture Notes in Computer Science(), vol 9520. Springer, Cham. https://doi.org/10.1007/978-3-319-27340-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27340-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27339-6

  • Online ISBN: 978-3-319-27340-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics