SimMon: A Toolkit for Simulating Monitoring Mechanism in Cloud Computing Environments

Zhao, Xinkui; Yin, Jianwei; Lin, Pengxiang; Zhi, Chen; Feng, Shichun; Wu, Hao; Chen, Zuoning

doi:10.1007/978-3-662-48616-0_33

SimMon: A Toolkit for Simulating Monitoring Mechanism in Cloud Computing Environments

Xinkui Zhao¹⁷,
Jianwei Yin¹⁷,
Pengxiang Lin¹⁷,
Chen Zhi¹⁷,
Shichun Feng¹⁷,
Hao Wu¹⁷ &
…
Zuoning Chen¹⁸

Conference paper
First Online: 25 November 2015

1589 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9435))

Abstract

Monitoring is significant to supervise the state of services and guide adaptive management of services in cloud computing environments. Working as auxiliary tools, monitoring systems are expected to incur the least extra cost on physical resources (CPU, memory, network, etc.). Since the scale and requirement of different data centers vary from each other, it is impossible to design a suit-to-all monitoring solution for all the data centers. However, for a certain data center, it is hard to determine whether a predesign monitoring mechanism is well suited before the mechanism is deployed in a real production environment. To address these issues, we propose SimMon, a toolkit for simulating monitoring mechanism in cloud computing environments. SimMon is used to simulate the process on collection, dissemination, storage and requisition of monitoring data. With the help of SimMon, system administrators are able to compare different monitoring mechanisms and select the best one before it is adopted by a monitoring system in a real-world data center.

This work was partially sponsored by National Natural Science Foundation of China under Grant (No. 61272129), National High-Tech Research Program of China (NO. 2013AA01A213), New-Century Excellent Talents Program by Ministry of Education of China (No. NCET-12-0491), Zhejiang Provincial Natural Science Foundation of China (No. LR13F020002), Science and technology Program of Zhejiang Province (No.2012C01037-1).

You have full access to this open access chapter, Download conference paper PDF

1 Motivation

Fueled up by the explosive growth of services in cloud computing environments, traditional predesigned and suit-to-all monitoring tools are not efficient enough for cloud monitoring for three reasons. (1) The number of monitoring target becomes huge, which makes the traditional centralized management structure incapable to efficiently coordinate these dispersed collection agents. Efficient and distributed organization of the monitoring agents are required to ensure the performance of monitoring systems [2]. (2) The volume of data that are disseminated across data centers is large, which incurs much more extra network pressure [4]. To eliminate the extra pressure, tricky strategies, such as dynamic control on monitoring target, monitoring interval, data polling strategy, etc., are necessary to handle collected data [3]. (3) In cloud environments, services are located on infrastructures across different regions, which makes the underlying network structure for data dissemination complex. To ensure fast data access and high availability, new protocols that provide better solution for monitoring data storage and requisition are imperative [5].

Considering the above reasons, it is vital to design monitoring mechanisms according to the characteristics and monitoring requirements of a specific data center in cloud computing environments. In this work, we propose SimMon, a toolkit to simulate monitoring mechanisms and evaluate their effectiveness. SimMon is used in two main scenarios: (1) to test whether a monitoring strategy would work well in a certain data center before it is deployed and run in a real production environment; (2) to compare the results of different strategies and decide which strategy is the most appropriate one for a specific data center.

2 Architecture of SimMon

Figure 1 depicts the architecture of SimMon. It is composed by four main components: network, data storage, data dissemination and strategy control panel.

Modeling Network. Network layer is an important consideration since the bandwidth and time costed by data transmission highly rely on underlying network structure and the logical topology of monitoring systems. In order to model the network structure, we simulate the behavior of root switch, aggregation switch and access switch separately and combine them as a layered structure. Apart from the underlying network structure, logical monitoring topology focuses on the organization of monitoring agents. A monitoring system commonly consists of three kinds of agent: collection agent (also called as sensor), federation agent and root agent. Collection agents are hosted on the same virtual machine with the service target to locally collect monitoring measurements. Federation agents are in charge of data organization and processing for a subset of collection agents. Root agents act as central nervous to control the global scheduling strategies of monitoring systems. We define the three kinds of agents to support the design for centralized, tree-based, P2P-based, and hybrid topologies. We adopt an event-based mechanism to control the process on packets transformation and handle the packet loss situation. Latency between nodes is calculated from the underlying network structure and a BRITE-style file that contains delay metrics between each pair of virtual machines.

Modeling Data Storage. In monitoring systems, collected data are usually transferred from collection agents to federation agents and stored in a data repository for future query and analysis. To model the data storage process, we give an interface to simulate different data repositories, such as MySQL and HBase. Concurrently with the support for database simulation, the organization of the storage nodes in a distributed database is also important to reduce the total network bandwidth cost and the chance for resource conflicts. A good algorithm should consider the data volume to be transferred and the resource usage of business-related workload in the data center. Furthermore, in data query process, it is important to find the shortest route to get requested data. We implement a cache-hit strategy to store the data that collected in the most recent period in cache for fast query. To ensure high availability, we design a structure to support users to define different replication strategies. More than one copy of replication of the collected data are stored in replication servers in case of emergency.

Modeling Data Dissemination. In a distributed monitoring system, monitoring data are collected by collection agents and disseminated to federation agents. In the dissemination layer, there are three main processes that may cost extra resources: getting monitoring data, disseminating the data, and receiving the data. In the process of getting monitoring data, we simulate strategies to deploy bunches of sensors intelligently and implement algorithms on precise target selection, accurate collection interval selection and dynamic data preprocessing to reduce the data from source. In the process of disseminating the data, we reduce data dissemination actions by intelligent strategies on load balancing and data polling (data collected by the dispersed collection agents can be pushed to federation nodes passively or be pulled by federation nodes proactively). In the process of receiving the transferred data, we design two protocols: unicast protocol and multicast protocol. An unicast protocol can improve the accuracy of delivered data, while a broadcast protocol brings efficiency for data delivery.

Modeling Strategy Control Panel. Sensors are the source of monitoring data, and they are developed and deployed individually with monitoring systems. Meanwhile monitoring systems should be capable to discover independent sensors and add them into management consoles. There are two main solutions to discover newly installed sensors: event-based announcement from the installed sensors and periodic scan from federation agents or root agents. We implement security and privacy policies by creating subnets for a certain set of sensors. Monitoring systems are expected to send alarms to system administrators when a certain kind of event occurs. To filter out those false alarms, we build an interface to support users to redesign the alarm strategy and compare their results.

3 Implementation

On the design of SimMon, we first adopt the classes that are inherited from CloudSim [1] to build a testbed that contains hosts, switches, virtual machines, and workloads, and the testbed is a simulation of a data center in cloud computing environment. Based on the simulated data center, we develop a new toolkit to support users to build different monitoring mechanisms. We use Java language to implement the simulation toolkit and the toolkit program contains 15890 lines of code in total. Source code of SimMon is available at http://www.cmsci.net/pxlin/simmon.

4 Demonstration

In the demonstrations, we first use SimMon to simulate a cloud data center with 10000 physical servers (PSs), and each PS host 16 virtual machines. The PSs are dispersed in 10 individual small-scale data centers, and they are connected by a tree-based underlying network. Workloads running in the cloud environment are simulated with certain distributions. The simulated cloud data center is the target that we want to monitor. Hence all the monitoring mechanisms are designed based on the data center. We use three examples to demonstrate three common usage scenarios of SimMon.

Influence of different topologies in monitoring systems. In this demonstration, we build four monitoring systems with different monitoring topologies: star-based, tree-based, P2P-based, and hybrid. In each monitoring system, we first simulate a data polling strategy that pushes collected data with certain interval to federation agents. We then summarize the total extra cost caused by monitoring systems and compare the influence of different topologies.
Data dissemination cost comparison by different polling strategies. In the demonstration, we implement three data polling strategies: push at a certain interval, hybrid push and pull, intelligent exchange between push and pull. Based on SimMon, we compare the extra cost and accuracy of these strategies.
Effective alarm reduction by different alarm strategies. In this demonstration, we test three different strategies on producing alarms: alarm on CPU usage, alarm on memory usage and alarm on CPU and memory usage. Based on SimMon, we compare the number of effective alarms that are caused by the three strategies.

References

Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A., Buyya, R.: Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw. Pract. Experience 41(1), 23–50 (2011)
Article Google Scholar
Jain, N., Kit, D., Mahajan, P., Yalagandula, P., Dahlin, M., Zhang, Y.: Star: self-tuning aggregation for scalable monitoring. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 962–973. VLDB Endowment (2007)
Google Scholar
Lu, X., Yin, J., Li, Y., Deng, S., Zhu, M.: An efficient data dissemination approach for cloud monitoring. In: Liu, C., Ludwig, H., Toumani, F., Yu, Q. (eds.) ICSOC 2012. LNCS, vol. 7636, pp. 733–747. Springer, Heidelberg (2012)
Chapter Google Scholar
Meng, S., Iyengar, A.K., Rouvellou, I.M., Liu, L.: Volley: violation likelihood based state monitoring for datacenters. In: Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013, pp. 1–10. IEEE Computer Society, Washington, DC (2013)
Google Scholar
Wang, C., Schwan, K., Talwar, V., Eisenhauer, G., Hu, L., Wolf, M.: A flexible architecture integrating monitoring and analytics for managing large-scale data centers. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 141–150. ACM (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, China
Xinkui Zhao, Jianwei Yin, Pengxiang Lin, Chen Zhi, Shichun Feng & Hao Wu
National Parallel Computing Engineering Research Center, Beijing, China
Zuoning Chen

Authors

Xinkui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Yin
View author publications
You can also search for this author in PubMed Google Scholar
Pengxiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zhi
View author publications
You can also search for this author in PubMed Google Scholar
Shichun Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zuoning Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinkui Zhao .

Editor information

Editors and Affiliations

Queensland University of Technology, Brisbane, Queensland, Australia
Alistair Barros
Université Paris Dauphine, Paris, France
Daniela Grigori
M.S. Ramaiah University, Bangalore, India
Nanjangud C. Narendra
University of Wollongong, Wollongong, New South Wales, Australia
Hoa Khanh Dam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, X. et al. (2015). SimMon: A Toolkit for Simulating Monitoring Mechanism in Cloud Computing Environments. In: Barros, A., Grigori, D., Narendra, N., Dam, H. (eds) Service-Oriented Computing. ICSOC 2015. Lecture Notes in Computer Science(), vol 9435. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48616-0_33

Download citation

DOI: https://doi.org/10.1007/978-3-662-48616-0_33
Published: 25 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48615-3
Online ISBN: 978-3-662-48616-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics