Abstract
Detecting anomalies in the flow of system logs of a high performance computing (HPC) facility is a challenging task. Although previous research has been conducted to identify nominal and abnormal phases; practical ways to provide system administrators with a reduced set of the most useful messages to identify abnormal behaviour remains a challenge. In this paper we describe an extensive study of logs classification and anomaly detection using K-means on real HPC unlabelled data extracted from the Curie supercomputer. This method involves (1) classifying logs by format, which is a valuable information for admin, then (2) build normal and abnormal classes for anomaly detection. Our methodology shows good performances for clustering and detecting abnormal logs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Morey, J.-M.: Numerical simulation at CEA. In: Proceedings of SNA + MC (2013)
David, J.: Building a Monitoring Infrastructure with Nagios. Prentice Hall PTR, Upper Saddle River (2007)
Bautista, E., Whitney, C., Davis, T.: Big data behind big data. In: Arora, R. (ed.) Conquering Big Data with High Performance Computing, pp. 163–189. Springer, Cham (2016)
Sigoure, B.: OpenTSDB scalable time series database (TSDB) (2012)
Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of The NetDB, pp. 1–7 (2011)
Reelsen, A.: Using elasticsearch, logstash and kibana to create realtime dashboards (2014)
Ning, X., Jiang, G., Chen, H., Yoshihira, K.: HLAer: a system for heterogeneous log analysis
Aggarwal, C.C., Yu, P.: Outlier detection with uncertain data. In: SDM (2008)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41, 15 (2009)
Gupta, M., Han, J., Aggarwal, C., Gao, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26, 2250–2267 (2014)
Stearley, J.: Towards informatic analysis of syslogs. In: Cluster Computing. IEEE (2004)
Chuah, E., Jhumka, A., Narasimhamurthy, S., et al.: Linking resource usage anomalies with system failures from cluster log data. IEEE (2013)
Gurumdimma, N., Jhumka, A., et al.: CRUDE: combining resource usage data and error logs for accurate error detection in large-scale distributed systems. IEEE (2016)
Rajaraman, A., Ullman, J.D.: Data mining. In: Mining of Massive Datasets (PDF) (2011)
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. University of California Press, Berkeley (1967)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csáki, F. (eds.) 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2–8 (1971)
Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Münz, G., Li, S., Carle, G.: Traffic anomaly detection using k-means clustering. In: GI/ITG-Workshop MMBnet, September 2007
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dani, M.C., Doreau, H., Alt, S. (2017). K-means Application for Anomaly Detection and Log Classification in HPC. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-60045-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60044-4
Online ISBN: 978-3-319-60045-1
eBook Packages: Computer ScienceComputer Science (R0)