A Scalable Monitor for Large Systems

Andreolini, Mauro; Pietri, Marcello; Tosi, Stefania; Lancellotti, Riccardo

doi:10.1007/978-3-319-25414-2_7

Mauro Andreolini¹⁵,
Marcello Pietri¹⁶,
Stefania Tosi¹⁶ &
…
Riccardo Lancellotti¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 512))

Included in the following conference series:

International Conference on Cloud Computing and Services Science

445 Accesses

Abstract

Current monitoring solutions are not well suited to monitoring large data centers in different ways: lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multi-tenant applications. In this paper, we present a novel monitoring architecture that strives to address these problems. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through an experimental analysis and we measure the overhead of the DHT-based communication scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Dean, J., Lopes, J.: MapReduce: simplified data processing on large clusters. In: OSDI 2004, 6th Symposium on Operating Systems Design and Implementation, USENIX Association (2004)
Google Scholar
Calder, B., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: SOSP 2011, 23rd ACM Symposium on Operating System Principles. ACM (2011)
Google Scholar
Shvachko, K., et al.: The hadoop distributed file system. In: MSST 2010, 26th Symposium on Massive Storage Systems and Technologies. IEEE Computer Society (2010)
Google Scholar
Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east (2012). http://www.emc.com/leadership/digital-universe/iview/big-data-2020.htm
Traverse: distributed, scalable, high-availability architecture (2010–2013). http://www.zyrion.com/company/whitepapers
Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Chapter Google Scholar
Litvinova, A., Engelmann, C., Scott, S.L.: A proactive fault tolerance framework for high-performance computing. In: PDCN 2010, 9th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2010). ACTA Press (2010)
Google Scholar
Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)
Article Google Scholar
Keller, A., Ludwig, H.: The WSLA framework: specifying and monitoring service level agreements for web services. J. Netw. Syst. Manag. 11, 57–81 (2003)
Article Google Scholar
Surhone, L.M., Tennoe, M.T., Henssonow, S.F.: OpenNMS. Betascript Publishing, Mauritius (2011)
Google Scholar
Olups, R.: Zabbix 1.8 Network Monitoring. Packt Publishing, Birmingham (2010)
Google Scholar
Badger, M.: Zenoss Core Network and System Monitoring. Packt Publishing Ltd., Birmingham (2008)
Google Scholar
Kundu, D., Lavlu, S.: Cacti 0.8 Network Monitoring. Packt Publishing, Birmingham (2009)
Google Scholar
Davis, C.: Graphite - Scalable Realtime Graphing (2013). http://graphite.wikidot.com
Josephsen, D.: Building a Monitoring Infrastructure with Nagios. Prentice Hall, Upper Saddle River (2007)
Google Scholar
Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: LISA 2010, 24th International Conference on Large Installation System Administration. USENIX Association (2010)
Google Scholar
Hoffman, S., Souza, S.D.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing, Birmingham (2013)
Google Scholar
Sacerdoti, F.D., Katz, M.J., Massie, M.L., Culler, D.E.: Wide area cluster monitoring with Ganglia. In: Proceedings of Cluster Computing (2003)
Google Scholar
Renesse, R.V., Birman, K.P., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21, 164–206 (2003)
Article Google Scholar
Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: ICS 2002, 16th International Conference on Supercomputing. ACM (2002)
Google Scholar
Babu, S., Subramanian, L., Widom, J.: A data stream management system for network traffic management. In: NRDM 2001, 1st Workshop on Network-Related Data Management (2001)
Google Scholar
Cranor, C., Johnson, T., Spataschek, O.: Gigascope: a stream database for network applications. In: SIGMOD 2003, 2003 ACM SIGMOD International Conference on Management of Data. ACM (2003)
Google Scholar
Voicu, R., Newman, H., Cirstoiu, C.: MonALISA: an agent based, dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180, 2472–2498 (2009)
Article MATH Google Scholar
Hasselmeyer, P., d’Heureuse, N.: Towards holistic multi-tenant monitoring for virtual data centers. In: NOMS 2010, 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. IEEE Computer Society (2010)
Google Scholar
Liu, B., Lee, W.C., Lee, D.L.: Supporting complex multi-dimensional queries in p2p systems. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)
Google Scholar
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)
Chapter Google Scholar
Joung, Y.J., Fang, C.T., Yang, L.W.: Keyword search in dht-based peer-to-peer networks. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)
Google Scholar
Tang, C., Xu, Z., Mahalingam, M.: psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev. 33, 89–94 (2003)
Article Google Scholar
Andreolini, M., Pietri, M., Tosi, S., Balboni, A.: Monitoring large cloud-based systems. In: CLOSER 2014, 4th International Conference on Cloud Computing and Services Science. SCITEPRESS Digital Library (2014)
Google Scholar
Andreolini, M., Lancellotti, R., Yu, P.S.: A flexible and efficient lookup algorithm for peer-to-peer systems. In: IPDPS 2009, 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society (2009)
Google Scholar
Andreolini, M., Colajanni, M., Pietri, M.: A scalable architecture for real-time monitoring of large information systems. In: NCCA 2012, 2nd IEEE Symposium on Network Cloud Computing and Applications. IEEE Computer Society (2012)
Google Scholar
Sigoure, B.: OpenTSDB, a distributed, scalable Time Series Database (2010). http://opentsdb.net
Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: CIT 2011, 11th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2011)
Google Scholar
Olston, C., et al.: Pig Latin: a not-so-foreign language for data processing. In: SIGMOD 2008, 2008 ACM SIGMOD International Conference on Management of Data. ACM, New York (2008)
Google Scholar
George, L.: HBase: The Definitive Guide. O’Reilly Media, Sebastopol (2011)
Google Scholar
Castro, M., Druschel, P., Kermarrec, A.M., Rowstron, A.: Scribe: a large-scale and decentralized application-level multicast infrastructure. IEEE J. Sel. Areas Commun. (JSAC) 20, 1489–1499 (2002)
Article MATH Google Scholar
Marchetti, M., Colajanni, M., Messori, M.: Selective and early threat detectionin large networked systems. In: CIT 2010, 10th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2010)
Google Scholar
Leu, J.S., Yee, Y.S., Chen, W.L.: Comparison of map-reduce and SQL on large-scale data processing. In: ISPA 2010, 1st International Symposium on Parallel and Distributed Processing with Applications. IEEE Computer Society (2010)
Google Scholar
Pietri, M., Tosi, S., Andreolini, M., Colajanni, M.: Real-time adaptive algorithm for resource monitoring. In: CNSM 2013, 9th International Conference on Network and Service Management, Zurich, Switzerland, CNSM (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Physics, Computer Science and Mathematics, University of Modena and Reggio Emilia, Via Campi 213/a, 41125, Modena, Italy
Mauro Andreolini
Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Via Vignolese 905/b, 41125, Modena, Italy
Marcello Pietri, Stefania Tosi & Riccardo Lancellotti

Authors

Mauro Andreolini
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Pietri
View author publications
You can also search for this author in PubMed Google Scholar
Stefania Tosi
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Lancellotti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mauro Andreolini .

Editor information

Editors and Affiliations

School of Computing, Dublin City University, Dublin 9, Ireland
Markus Helfert
LIP / Inria Ecole normale supérieure de Lyon, Lyon, France
Frédéric Desprez
Dell, ROUND ROCK, USA
Donald Ferguson
University of Stuttgart, Stuttgart, Germany
Frank Leymann
Universitat Autònoma de Barcelona, Bellaterra, Spain
Victor Méndez Munoz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andreolini, M., Pietri, M., Tosi, S., Lancellotti, R. (2015). A Scalable Monitor for Large Systems. In: Helfert, M., Desprez, F., Ferguson, D., Leymann, F., Méndez Munoz, V. (eds) Cloud Computing and Services Sciences. CLOSER 2014. Communications in Computer and Information Science, vol 512. Springer, Cham. https://doi.org/10.1007/978-3-319-25414-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-25414-2_7
Published: 30 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25413-5
Online ISBN: 978-3-319-25414-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics