Skip to main content

A Scalable Monitor for Large Systems

  • Conference paper
  • First Online:
Cloud Computing and Services Sciences (CLOSER 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 512))

Included in the following conference series:

  • 445 Accesses

Abstract

Current monitoring solutions are not well suited to monitoring large data centers in different ways: lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multi-tenant applications. In this paper, we present a novel monitoring architecture that strives to address these problems. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through an experimental analysis and we measure the overhead of the DHT-based communication scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Dean, J., Lopes, J.: MapReduce: simplified data processing on large clusters. In: OSDI 2004, 6th Symposium on Operating Systems Design and Implementation, USENIX Association (2004)

    Google Scholar 

  2. Calder, B., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: SOSP 2011, 23rd ACM Symposium on Operating System Principles. ACM (2011)

    Google Scholar 

  3. Shvachko, K., et al.: The hadoop distributed file system. In: MSST 2010, 26th Symposium on Massive Storage Systems and Technologies. IEEE Computer Society (2010)

    Google Scholar 

  4. Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east (2012). http://www.emc.com/leadership/digital-universe/iview/big-data-2020.htm

  5. Traverse: distributed, scalable, high-availability architecture (2010–2013). http://www.zyrion.com/company/whitepapers

  6. Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Litvinova, A., Engelmann, C., Scott, S.L.: A proactive fault tolerance framework for high-performance computing. In: PDCN 2010, 9th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2010). ACTA Press (2010)

    Google Scholar 

  8. Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)

    Article  Google Scholar 

  9. Keller, A., Ludwig, H.: The WSLA framework: specifying and monitoring service level agreements for web services. J. Netw. Syst. Manag. 11, 57–81 (2003)

    Article  Google Scholar 

  10. Surhone, L.M., Tennoe, M.T., Henssonow, S.F.: OpenNMS. Betascript Publishing, Mauritius (2011)

    Google Scholar 

  11. Olups, R.: Zabbix 1.8 Network Monitoring. Packt Publishing, Birmingham (2010)

    Google Scholar 

  12. Badger, M.: Zenoss Core Network and System Monitoring. Packt Publishing Ltd., Birmingham (2008)

    Google Scholar 

  13. Kundu, D., Lavlu, S.: Cacti 0.8 Network Monitoring. Packt Publishing, Birmingham (2009)

    Google Scholar 

  14. Davis, C.: Graphite - Scalable Realtime Graphing (2013). http://graphite.wikidot.com

  15. Josephsen, D.: Building a Monitoring Infrastructure with Nagios. Prentice Hall, Upper Saddle River (2007)

    Google Scholar 

  16. Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: LISA 2010, 24th International Conference on Large Installation System Administration. USENIX Association (2010)

    Google Scholar 

  17. Hoffman, S., Souza, S.D.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing, Birmingham (2013)

    Google Scholar 

  18. Sacerdoti, F.D., Katz, M.J., Massie, M.L., Culler, D.E.: Wide area cluster monitoring with Ganglia. In: Proceedings of Cluster Computing (2003)

    Google Scholar 

  19. Renesse, R.V., Birman, K.P., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21, 164–206 (2003)

    Article  Google Scholar 

  20. Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: ICS 2002, 16th International Conference on Supercomputing. ACM (2002)

    Google Scholar 

  21. Babu, S., Subramanian, L., Widom, J.: A data stream management system for network traffic management. In: NRDM 2001, 1st Workshop on Network-Related Data Management (2001)

    Google Scholar 

  22. Cranor, C., Johnson, T., Spataschek, O.: Gigascope: a stream database for network applications. In: SIGMOD 2003, 2003 ACM SIGMOD International Conference on Management of Data. ACM (2003)

    Google Scholar 

  23. Voicu, R., Newman, H., Cirstoiu, C.: MonALISA: an agent based, dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180, 2472–2498 (2009)

    Article  MATH  Google Scholar 

  24. Hasselmeyer, P., d’Heureuse, N.: Towards holistic multi-tenant monitoring for virtual data centers. In: NOMS 2010, 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. IEEE Computer Society (2010)

    Google Scholar 

  25. Liu, B., Lee, W.C., Lee, D.L.: Supporting complex multi-dimensional queries in p2p systems. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)

    Google Scholar 

  26. Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  27. Joung, Y.J., Fang, C.T., Yang, L.W.: Keyword search in dht-based peer-to-peer networks. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)

    Google Scholar 

  28. Tang, C., Xu, Z., Mahalingam, M.: psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev. 33, 89–94 (2003)

    Article  Google Scholar 

  29. Andreolini, M., Pietri, M., Tosi, S., Balboni, A.: Monitoring large cloud-based systems. In: CLOSER 2014, 4th International Conference on Cloud Computing and Services Science. SCITEPRESS Digital Library (2014)

    Google Scholar 

  30. Andreolini, M., Lancellotti, R., Yu, P.S.: A flexible and efficient lookup algorithm for peer-to-peer systems. In: IPDPS 2009, 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society (2009)

    Google Scholar 

  31. Andreolini, M., Colajanni, M., Pietri, M.: A scalable architecture for real-time monitoring of large information systems. In: NCCA 2012, 2nd IEEE Symposium on Network Cloud Computing and Applications. IEEE Computer Society (2012)

    Google Scholar 

  32. Sigoure, B.: OpenTSDB, a distributed, scalable Time Series Database (2010). http://opentsdb.net

  33. Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: CIT 2011, 11th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2011)

    Google Scholar 

  34. Olston, C., et al.: Pig Latin: a not-so-foreign language for data processing. In: SIGMOD 2008, 2008 ACM SIGMOD International Conference on Management of Data. ACM, New York (2008)

    Google Scholar 

  35. George, L.: HBase: The Definitive Guide. O’Reilly Media, Sebastopol (2011)

    Google Scholar 

  36. Castro, M., Druschel, P., Kermarrec, A.M., Rowstron, A.: Scribe: a large-scale and decentralized application-level multicast infrastructure. IEEE J. Sel. Areas Commun. (JSAC) 20, 1489–1499 (2002)

    Article  MATH  Google Scholar 

  37. Marchetti, M., Colajanni, M., Messori, M.: Selective and early threat detectionin large networked systems. In: CIT 2010, 10th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2010)

    Google Scholar 

  38. Leu, J.S., Yee, Y.S., Chen, W.L.: Comparison of map-reduce and SQL on large-scale data processing. In: ISPA 2010, 1st International Symposium on Parallel and Distributed Processing with Applications. IEEE Computer Society (2010)

    Google Scholar 

  39. Pietri, M., Tosi, S., Andreolini, M., Colajanni, M.: Real-time adaptive algorithm for resource monitoring. In: CNSM 2013, 9th International Conference on Network and Service Management, Zurich, Switzerland, CNSM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauro Andreolini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Andreolini, M., Pietri, M., Tosi, S., Lancellotti, R. (2015). A Scalable Monitor for Large Systems. In: Helfert, M., Desprez, F., Ferguson, D., Leymann, F., Méndez Munoz, V. (eds) Cloud Computing and Services Sciences. CLOSER 2014. Communications in Computer and Information Science, vol 512. Springer, Cham. https://doi.org/10.1007/978-3-319-25414-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25414-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25413-5

  • Online ISBN: 978-3-319-25414-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics