K-means Application for Anomaly Detection and Log Classification in HPC

Dani, Mohamed Cherif; Doreau, Henri; Alt, Samantha

doi:10.1007/978-3-319-60045-1_23

Mohamed Cherif Dani¹⁶,
Henri Doreau¹⁷ &
Samantha Alt¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10351))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2268 Accesses

Abstract

Detecting anomalies in the flow of system logs of a high performance computing (HPC) facility is a challenging task. Although previous research has been conducted to identify nominal and abnormal phases; practical ways to provide system administrators with a reduced set of the most useful messages to identify abnormal behaviour remains a challenge. In this paper we describe an extensive study of logs classification and anomaly detection using K-means on real HPC unlabelled data extracted from the Curie supercomputer. This method involves (1) classifying logs by format, which is a valuable information for admin, then (2) build normal and abnormal classes for anomaly detection. Our methodology shows good performances for clustering and detecting abnormal logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lopper: An Efficient Method for Online Log Pattern Mining Based on Hybrid Clustering Tree

Building an Adaptive Logs Classification System: Industrial Report

Anomaly Detection on System Generated Logs—A Survey Study

Notes

References

Morey, J.-M.: Numerical simulation at CEA. In: Proceedings of SNA + MC (2013)
Google Scholar
David, J.: Building a Monitoring Infrastructure with Nagios. Prentice Hall PTR, Upper Saddle River (2007)
Google Scholar
Bautista, E., Whitney, C., Davis, T.: Big data behind big data. In: Arora, R. (ed.) Conquering Big Data with High Performance Computing, pp. 163–189. Springer, Cham (2016)
Chapter Google Scholar
Sigoure, B.: OpenTSDB scalable time series database (TSDB) (2012)
Google Scholar
Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of The NetDB, pp. 1–7 (2011)
Google Scholar
Reelsen, A.: Using elasticsearch, logstash and kibana to create realtime dashboards (2014)
Google Scholar
Ning, X., Jiang, G., Chen, H., Yoshihira, K.: HLAer: a system for heterogeneous log analysis
Google Scholar
Aggarwal, C.C., Yu, P.: Outlier detection with uncertain data. In: SDM (2008)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41, 15 (2009)
Article Google Scholar
Gupta, M., Han, J., Aggarwal, C., Gao, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26, 2250–2267 (2014)
Article MATH Google Scholar
Stearley, J.: Towards informatic analysis of syslogs. In: Cluster Computing. IEEE (2004)
Google Scholar
Chuah, E., Jhumka, A., Narasimhamurthy, S., et al.: Linking resource usage anomalies with system failures from cluster log data. IEEE (2013)
Google Scholar
Gurumdimma, N., Jhumka, A., et al.: CRUDE: combining resource usage data and error logs for accurate error detection in large-scale distributed systems. IEEE (2016)
Google Scholar
Rajaraman, A., Ullman, J.D.: Data mining. In: Mining of Massive Datasets (PDF) (2011)
Google Scholar
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. University of California Press, Berkeley (1967)
MATH Google Scholar
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csáki, F. (eds.) 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2–8 (1971)
Google Scholar
Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Münz, G., Li, S., Carle, G.: Traffic anomaly detection using k-means clustering. In: GI/ITG-Workshop MMBnet, September 2007
Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel, 2 Rue de Paris, Meudon, France
Mohamed Cherif Dani & Samantha Alt
CEA, DAM, DIF, 91297, Arpajon, France
Henri Doreau

Authors

Mohamed Cherif Dani
View author publications
You can also search for this author in PubMed Google Scholar
Henri Doreau
View author publications
You can also search for this author in PubMed Google Scholar
Samantha Alt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henri Doreau .

Editor information

Editors and Affiliations

Artois University, Lens, France
Salem Benferhat
Artois University, Lens, France
Karim Tabia
Texas State University, San Marcos, Texas, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dani, M.C., Doreau, H., Alt, S. (2017). K-means Application for Anomaly Detection and Log Classification in HPC. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-60045-1_23
Published: 03 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60044-4
Online ISBN: 978-3-319-60045-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

K-means Application for Anomaly Detection and Log Classification in HPC

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Lopper: An Efficient Method for Online Log Pattern Mining Based on Hybrid Clustering Tree

Building an Adaptive Logs Classification System: Industrial Report

Anomaly Detection on System Generated Logs—A Survey Study

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

K-means Application for Anomaly Detection and Log Classification in HPC

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Lopper: An Efficient Method for Online Log Pattern Mining Based on Hybrid Clustering Tree

Building an Adaptive Logs Classification System: Industrial Report

Anomaly Detection on System Generated Logs—A Survey Study

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation