Skip to main content
Log in

Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Cloud computing has recently emerged as a new paradigm to provide computing services through large-size data centers where customers may run their applications in a virtualized environment. The advantages of cloud in terms of flexibility and economy encourage many enterprises to migrate from local data centers to cloud platforms, thus contributing to the success of such infrastructures. However, as size and complexity of cloud infrastructures grow, scalability issues arise in monitoring and management processes. Scalability issues are exacerbated because available solutions typically consider each virtual machine (VM) as a black box with independent characteristics, which is monitored at a fine-grained granularity level for management purposes, thus generating huge amounts of data to handle. We claim that scalability issues can be addressed by leveraging the similarity between VMs in terms of resource usage patterns. In this paper, we propose an automated methodology to cluster similar VMs starting from their resource usage information, assuming no knowledge of the software executed on them. This is an innovative methodology that combines the Bhattacharyya distance and ensemble techniques to provide a stable evaluation of similarity between probability distributions of multiple VM resource usage, considering both system- and network-related data. We evaluate the methodology through a set of experiments on data coming from an enterprise data center. We show that our proposal achieves high and stable performance in automatic VMs clustering, with a significant reduction in the amount of data collected which allows to lighten the monitoring requirements of a cloud data center.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. R project home page: http://www.r-project.org/.

  2. Python home page: http://www.python.org/.

  3. Bourne shell home page: http://www.gnu.org/software/bash/.

  4. Cacti home page: http://www.cacti.net.

  5. Munin home page: http://munin-monitoring.org/.

  6. Ganglia Monitoring System home page: http://ganglia.sourceforge.net/.

References

  • Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. J. Inf. Retr. 12(4), 461–486 (2009)

    Article  Google Scholar 

  • Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: Proc. of the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Cyprus (2011)

    Google Scholar 

  • Ardagna, D., Panicucci, B., Trubian, M., Zhang, L.: Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Trans. Serv. Comput. 5(1), 2–19 (2012)

    Article  Google Scholar 

  • Beloglazov, A., Buyya, R.: Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In: Proc. of (MGC’10), Bangalore, India (2010)

    Google Scholar 

  • Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)

    MATH  MathSciNet  Google Scholar 

  • Canali, C., Lancellotti, R.: Automated clustering of virtual machines based on correlation of resource usage. Commun. Softw. Syst. 8(4), 102–109 (2012a)

    Google Scholar 

  • Canali, C., Lancellotti, R.: Automated clustering of VMs for scalable cloud monitoring and management. In: Proc. of 20th International Conference on Software, Telecommunications and Computer Networks (SOFTCOM’12), Split, Croatia (2012b)

    Google Scholar 

  • Canali, C., Lancellotti, R.: Automatic clustering of VM based on Bhattacharyya distance. In: Proc. of International Workshop on Multi-Cloud Applications and Federated Clouds (MultiCloud’13), Prague, Czech Republic (2013)

    Google Scholar 

  • Castro, M., Liskov, B.: Practical byzantine fault tolerance. In: OSDI, pp. 173–186 (1999)

    Google Scholar 

  • Choi, E., Lee, C.: Feature extraction based on the Bhattacharyya distance. Pattern Recognit. 36(8), 1703–1709 (2003)

    Article  Google Scholar 

  • Chung, W.C., Chang, R.S.: A new mechanism for resource monitoring in grid computing. Future Gener. Comput. Syst. 25(1), 1–7 (2009)

    Article  Google Scholar 

  • Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’04, pp. 551–556. ACM, New York (2004). doi:10.1145/1014052.1014118

    Google Scholar 

  • Durkee, D.: Why cloud computing will never be free. ACM Queue 8(4), 20:20–20:29 (2010)

    Google Scholar 

  • Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recognit. 41(1), 176–190 (2008)

    Article  MATH  Google Scholar 

  • Freedman, D., Diaconis, P.: On the histogram as a density estimator:L2 theory. Probab. Theory Relat. Fields 57(4), 453–476 (1981)

    MATH  MathSciNet  Google Scholar 

  • Gmach, D., Rolia, J., Cherkasova, L., Kemper, A.: Resource pool management: reactive versus proactive or let’s be friends. Comput. Netw. 53(17), 2905–2922 (2009)

    Article  Google Scholar 

  • Gong, Z., Gu, X.: PAC: pattern-driven application consolidation for efficient cloud computing. In: Proc. of IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS’10), Miami Beach, Florida (2010)

    Google Scholar 

  • Gullo, F., Tagarelli, A., Greco, S.: Diversity-based weighting schemes for clustering ensembles. In: Proc. of the 9th SIAM International Conference on Data Mining (SDM’09), Sparks, Nevada, USA (2009)

    Google Scholar 

  • Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  • Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab—an S4 package for kernel methods in R. Tech. Rep. 9, WU Vienna University of Economics and Business (2004)

  • Kusic, D., Kephart, J.O., Hanson, J.E., Kandasamy, N., Jiang, G.: Power and performance management of virtualized computing environment via lookahead. Clust. Comput. 12(1), 1–15 (2009)

    Google Scholar 

  • Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). doi:10.1007/s11222-007-9033-z

    Article  MathSciNet  Google Scholar 

  • Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  • Meng, X., Pappas, V., Zhang, L.: Improving the scalability of data center networks with traffic-aware virtual machine placement. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10, San Diego, California, USA (2010)

    Google Scholar 

  • Naeem, A.N., Ramadass, S., Yong, C.: Controlling scale sensor networks data quality in the ganglia grid monitoring tool. Commun. Comput. 7(11), 18–26 (2010)

    Google Scholar 

  • Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856. MIT Press, Cambridge (2001)

    Google Scholar 

  • Sanguinetti, G., Laidler, J., Lawrence, N.: Automatic determination of the number of clusters using spectral algorithms. In: IEEE Workshop on Machine Learning for Signal Processing, pp. 55–60 (2005). doi:10.1109/MLSP.2005.1532874

    Chapter  Google Scholar 

  • Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  • Setzer, T., Stage, A.: Decision support for virtual machine reassignments in enterprise data centers. In: Proc. of IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS’10), Osaka, Japan (2010)

    Google Scholar 

  • Setzer, T., Stage, A.: Filtering multivariate workload non-conformance in shared IT-infrastructures. In: Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM’11), Dublin, Ireland (2011)

    Google Scholar 

  • Singh, R., Shenoy, P.J., Natu, M., Sadaphal, V.P., Vin, H.M.: Predico: a system for what-if analysis in complex data center applications. In: Proc. of 12th International Middleware Conference, Lisbon, Portugal (2011)

    Google Scholar 

  • Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)

    MATH  MathSciNet  Google Scholar 

  • Tan, J., Dube, P., Meng, X., Zhang, L.: Exploiting resource usage patterns for better utilization prediction. In: Proc. of the 31st International Conference on Distributed Computing Systems Workshops (ICDCSW’11), Minneapolis, USA (2011)

    Google Scholar 

  • Tang, C., Steinder, M., Spreitzer, M., Pacifici, G.: A scalable application placement controller for enterprise data centers. In: Proceedings of the 16th International Conference on World Wide Web, WWW’07, Banff, Alberta, Canada (2007)

    Google Scholar 

  • Tu, C.Y., Kuo, W.C., Teng, W.H., Wang, Y.T., Shiau, S.: A power-aware cloud architecture with smart metering. In: Proc. of 39th International Conference on Parallel Processing Workshops (ICPPW’10), San Diego, CA (2010)

    Google Scholar 

  • Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007a)

    Google Scholar 

  • Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proc. of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007b)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Canali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canali, C., Lancellotti, R. Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems. Autom Softw Eng 21, 319–344 (2014). https://doi.org/10.1007/s10515-013-0134-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-013-0134-y

Keywords

Navigation