Abstract
Aggregating large data sets related to hardware and software resources into clusters is at the basis of several operations and strategies for management and control. High variability and noise characterizing data collected from system resources monitoring prevent the application of existing solutions that are affected by low accuracy and scarce robustness.
We present a new algorithm which extends the clustering method to data center management because it is able to find groups of related objects even when correlation is hidden by high variability.
Our experimental evaluation performed on both synthetic and real data shows the accuracy and robustness of the proposed solution, and its ability in clustering servers with correlated functionality.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liao, T.W.: Clustering of time series data - a survey. Pattern Recognition 38 (2005)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. on Neural Networks 16 (2005)
Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proc. of the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France (2004)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press (1967)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Technical Report TR 00-034, University of Minnesota - Department of Computer Science and Engineering, Minneapolis (2000)
Cohen, J.: Applied multiple regression/correlation analysis for the behavioral sciences. L. Erlbaum Associates (2003)
Spearman, C.: The proof and measurement of association between two things. The American Journal of Psychology 100 (1904)
Kendall, M.G.: Rank correlation methods. Charles Griffin & Company Ltd. (1962)
Papadimitriou, S., Sun, J., Yu, P.S.: Local correlation tracking in time series. In: IEEE International Conference on Data Mining, Los Alamitos, CA, USA (2006)
Hamao, Y., Masulis, R., Ng, V.: Correlations in price changes and volatility across international stock markets. Review of Financial Studies 3 (1990)
Taqqu, M.S.: Random processes with long-range dependence and high variability. Journal of Geophysical Research 92 (1987)
Willinger, W., Alderson, D., Li, L.: A pragmatic approach to dealing with high-variability in network measurements. In: Proc. of the 4th ACM SIGCOMM Conference on Internet Measurement, Taormina, Sicily, Italy (2004)
Bennani, M.N., Menasce, D.A.: Assessing the robustness of self-managing computer systems under highly variable workloads. In: Proc. of the First International Conference on Autonomic Computing, Washington, DC, USA (2004)
Andreolini, M., Casolari, S., Colajanni, M.: Models and framework for supporting run-time decisions in web-based systems. ACM Trans. on the Web 2 (2008)
Ghosh, S., Squillante, M.S.: Analysis and control of correlated web server queues. Computer Communications 5244 (2004)
Buda, A., Jarynowski, A.: Life-time of correlations and its applications. Wydawnictwo Niezalezne (2010)
Sørensen, T.: A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content. Biologiske Skrifter. E. Munksgaard (1948)
Papadimitriou, S., Yu, P.S.: Optimal multi-scale patterns in time series streams. In: Proc. of the 2006 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA (2006)
Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming pattern discovery in multiple time-series. In: Proc. of the 31st International Conference on Very Large Data Bases, Trondheim, Norway (2005)
Bakshi, B.R.: Multiscale pca with application to multivariate statistical process monitoring. AIChE Journal 44 (1998)
Abrahao, B., Zhang, A.: Characterizing application workloads on cpu utilization in utility computing. Technical Report HPL-2004-157, Hewlett-Packard Labs (2004)
Khattree, R., Naik, D.: Multivariate data reduction and discrimination with SAS software. SAS Institute Inc. (2000)
Lakhina, A., Papagiannaki, K., Crovella, M., Diot, C., Kolaczyk, E.D., Taft, N.: Structural analysis of network traffic flows. In: Proc. of the Joint International Conference on Measurement and Modeling of Computer Systems, New York, NY, USA (2004)
Hurst, H.E.: Long-term storage capacity of reservoirs. Trans. of the American Society of Civil Engineers 116 (1951)
Weron, R.: Estimating long range dependence: finite sample properties and confidence intervals. Physica A 312 (2002)
Brockwell, B.L., Davis, R.A.: Time Series: Theory and Methods. Springer (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tosi, S., Casolari, S., Colajanni, M. (2013). Supporting Data Center Management through Clustering of System Data Streams. In: Guyot, V. (eds) Advanced Infocomm Technology. ICAIT 2012. Lecture Notes in Computer Science, vol 7593. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38227-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-38227-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38226-0
Online ISBN: 978-3-642-38227-7
eBook Packages: Computer ScienceComputer Science (R0)