Abstract
Current monitoring solutions are not well suited to monitoring large data centers in different ways: lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multi-tenant applications. In this paper, we present a novel monitoring architecture that strives to address these problems. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through an experimental analysis and we measure the overhead of the DHT-based communication scheme.
References
Dean, J., Lopes, J.: MapReduce: simplified data processing on large clusters. In: OSDI 2004, 6th Symposium on Operating Systems Design and Implementation, USENIX Association (2004)
Calder, B., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: SOSP 2011, 23rd ACM Symposium on Operating System Principles. ACM (2011)
Shvachko, K., et al.: The hadoop distributed file system. In: MSST 2010, 26th Symposium on Massive Storage Systems and Technologies. IEEE Computer Society (2010)
Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east (2012). http://www.emc.com/leadership/digital-universe/iview/big-data-2020.htm
Traverse: distributed, scalable, high-availability architecture (2010–2013). http://www.zyrion.com/company/whitepapers
Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Litvinova, A., Engelmann, C., Scott, S.L.: A proactive fault tolerance framework for high-performance computing. In: PDCN 2010, 9th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2010). ACTA Press (2010)
Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)
Keller, A., Ludwig, H.: The WSLA framework: specifying and monitoring service level agreements for web services. J. Netw. Syst. Manag. 11, 57–81 (2003)
Surhone, L.M., Tennoe, M.T., Henssonow, S.F.: OpenNMS. Betascript Publishing, Mauritius (2011)
Olups, R.: Zabbix 1.8 Network Monitoring. Packt Publishing, Birmingham (2010)
Badger, M.: Zenoss Core Network and System Monitoring. Packt Publishing Ltd., Birmingham (2008)
Kundu, D., Lavlu, S.: Cacti 0.8 Network Monitoring. Packt Publishing, Birmingham (2009)
Davis, C.: Graphite - Scalable Realtime Graphing (2013). http://graphite.wikidot.com
Josephsen, D.: Building a Monitoring Infrastructure with Nagios. Prentice Hall, Upper Saddle River (2007)
Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: LISA 2010, 24th International Conference on Large Installation System Administration. USENIX Association (2010)
Hoffman, S., Souza, S.D.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing, Birmingham (2013)
Sacerdoti, F.D., Katz, M.J., Massie, M.L., Culler, D.E.: Wide area cluster monitoring with Ganglia. In: Proceedings of Cluster Computing (2003)
Renesse, R.V., Birman, K.P., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21, 164–206 (2003)
Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: ICS 2002, 16th International Conference on Supercomputing. ACM (2002)
Babu, S., Subramanian, L., Widom, J.: A data stream management system for network traffic management. In: NRDM 2001, 1st Workshop on Network-Related Data Management (2001)
Cranor, C., Johnson, T., Spataschek, O.: Gigascope: a stream database for network applications. In: SIGMOD 2003, 2003 ACM SIGMOD International Conference on Management of Data. ACM (2003)
Voicu, R., Newman, H., Cirstoiu, C.: MonALISA: an agent based, dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180, 2472–2498 (2009)
Hasselmeyer, P., d’Heureuse, N.: Towards holistic multi-tenant monitoring for virtual data centers. In: NOMS 2010, 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. IEEE Computer Society (2010)
Liu, B., Lee, W.C., Lee, D.L.: Supporting complex multi-dimensional queries in p2p systems. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)
Joung, Y.J., Fang, C.T., Yang, L.W.: Keyword search in dht-based peer-to-peer networks. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)
Tang, C., Xu, Z., Mahalingam, M.: psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev. 33, 89–94 (2003)
Andreolini, M., Pietri, M., Tosi, S., Balboni, A.: Monitoring large cloud-based systems. In: CLOSER 2014, 4th International Conference on Cloud Computing and Services Science. SCITEPRESS Digital Library (2014)
Andreolini, M., Lancellotti, R., Yu, P.S.: A flexible and efficient lookup algorithm for peer-to-peer systems. In: IPDPS 2009, 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society (2009)
Andreolini, M., Colajanni, M., Pietri, M.: A scalable architecture for real-time monitoring of large information systems. In: NCCA 2012, 2nd IEEE Symposium on Network Cloud Computing and Applications. IEEE Computer Society (2012)
Sigoure, B.: OpenTSDB, a distributed, scalable Time Series Database (2010). http://opentsdb.net
Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: CIT 2011, 11th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2011)
Olston, C., et al.: Pig Latin: a not-so-foreign language for data processing. In: SIGMOD 2008, 2008 ACM SIGMOD International Conference on Management of Data. ACM, New York (2008)
George, L.: HBase: The Definitive Guide. O’Reilly Media, Sebastopol (2011)
Castro, M., Druschel, P., Kermarrec, A.M., Rowstron, A.: Scribe: a large-scale and decentralized application-level multicast infrastructure. IEEE J. Sel. Areas Commun. (JSAC) 20, 1489–1499 (2002)
Marchetti, M., Colajanni, M., Messori, M.: Selective and early threat detectionin large networked systems. In: CIT 2010, 10th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2010)
Leu, J.S., Yee, Y.S., Chen, W.L.: Comparison of map-reduce and SQL on large-scale data processing. In: ISPA 2010, 1st International Symposium on Parallel and Distributed Processing with Applications. IEEE Computer Society (2010)
Pietri, M., Tosi, S., Andreolini, M., Colajanni, M.: Real-time adaptive algorithm for resource monitoring. In: CNSM 2013, 9th International Conference on Network and Service Management, Zurich, Switzerland, CNSM (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Andreolini, M., Pietri, M., Tosi, S., Lancellotti, R. (2015). A Scalable Monitor for Large Systems. In: Helfert, M., Desprez, F., Ferguson, D., Leymann, F., Méndez Munoz, V. (eds) Cloud Computing and Services Sciences. CLOSER 2014. Communications in Computer and Information Science, vol 512. Springer, Cham. https://doi.org/10.1007/978-3-319-25414-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-25414-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25413-5
Online ISBN: 978-3-319-25414-2
eBook Packages: Computer ScienceComputer Science (R0)