Skip to main content
Log in

Log message anomaly detection with fuzzy C-means and MLP

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Log messages are one the most valuable sources of information in the cloud and other software systems. These logs can be used for audits and ensuring system security. Many millions of log messages are produced each day which makes anomaly detection challenging. Automating the detection of anomalies can save time and money as well as improve detection performance. In this paper, an anomaly detection method is proposed using radius-based fuzzy C-means with more clusters than the number of data classes and a multilayer perceptron (MLP) network. The cluster centers and a radius are used to select reliable positive and negative log messages. Moreover, class probabilities are used with an expert to correct the network output for suspect logs. The proposed model is evaluated with three well-known data sets, namely BGL, Openstack and Thunderbird. The results obtained show that this model provides better results than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://github.com/logpai/loghub/tree/master/BGL

  2. https://github.com/logpai/loghub/tree/master/OpenStack

  3. https://github.com/logpai/loghub/tree/master/Thunderbird

  4. https://github.com/pytorch

  5. https://github.com/scikit-learn/scikit-learn

References

  1. Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu MR (2019) Tools and benchmarks for automated log parsing. In: International conference on software engineering: software engineering in practice, pp 121–130

  2. He S, Lin Q, Lou J-G, Zhang H, Lyu MR, Zhang D (2018) Identifying impactful service system problems via log analysis. In: ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 60–70

  3. Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) SherLog: Error diagnosis by connecting clues from run-time logs. In: Architectural support for programming languages and operating systems, pp 143–154

  4. Wu F, Anchuri P, Li Z (2017) Structural event detection from log messages. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 1175–1184

  5. Vaarandi R, Blumbergs B, Kont M (2018) An unsupervised framework for detecting anomalous messages from syslog log files. In: IEEE/IFIP network operations and management symposium, pp 1–6

  6. Yen T-F, Oprea A, Onarlioglu K, Leetham T, Robertson W, Juels A, Kirda E (2013) Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In: Annual computer security applications conference, pp 199–208

  7. Lin Q, Zhang H, Lou J, Zhang Y, Chen X (2016) Log clustering based problem identification for online service systems. In: IEEE/ACM international conference on software engineering, pp 102–111

  8. Simeone O (2018) A Very brief introduction to machine learning with applications to communication systems. IEEE Trans Cogn Commun Netw 4(4):648–664. https://doi.org/10.1109/TCCN.2018.2881442

    Article  Google Scholar 

  9. Handrich S, Herzog A, Wolf A, Herrmann CS (2011) Combining supervised, unsupervised, and reinforcement learning in a network of spiking neurons. In: Advances in cognitive neurodynamics (II). Springer, Berlin, pp 163–176

  10. Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  11. Affonso C, Rossi ALD, Vieira FHA, de Leon Ferreira de Carvalho ACP (2017) Deep learning for biological image classification. Expert Syst Appl 85:114–122. https://doi.org/10.1016/j.eswa.2017.05.039, http://www.sciencedirect.com/science/article/pii/S0957417417303627

    Article  Google Scholar 

  12. Chen S, Wang L, Li W, Zhang K (2019) Deep learning method with attention for extreme multi-label text classification. In: Trends in artificial intelligence. Springer, Berlin, pp 179–190

  13. Wazid M, Das AK (2016) An efficient hybrid anomaly detection scheme using k-means clustering for wireless sensor networks. Wirel Pers Commun 90(4):1971–2000. https://doi.org/10.1007/s11277-016-3433-3

    Article  Google Scholar 

  14. Ding N, Ma H, Gao H, Ma Y, Tan G (2019) Real-time anomaly detection based on long short-term memory and gaussian mixture model. Comput Electr Eng 79:106458. https://doi.org/10.1016/j.compeleceng.2019.106458, http://www.sciencedirect.com/science/article/pii/S0045790618334372

    Article  Google Scholar 

  15. Ma MX, Ngan HYT, Liu W (2016) Density-based outlier detection by local outlier factor on largescale traffic data. Image Processing: Machine Vision Applications IX. https://www.ingentaconnect.com/content/ist/ei/2016/00002016/00000014/art00003https://www.ingentaconnect.com/content/ist/ei/2016/00002016/00000014/art00003

  16. Antonini M, Vecchio M, Antonelli F, Ducange P, Perera C (2018) Smart audio sensors in the internet of things edge for anomaly detection. IEEE Access 6:67594–67610. https://doi.org/10.1109/ACCESS.2018.2877523

    Article  Google Scholar 

  17. Taylor A, Japkowicz N, Leblanc S (2015) Frequency-based anomaly detection for the automotive CAN bus. In: World congress on industrial control systems security, pp 45–49

  18. Farzad A, Gulliver TA (2020) Unsupervised log message anomaly detection. ICT Express 6(3):229–237. https://doi.org/10.1016/j.icte.2020.06.003, http://www.sciencedirect.com/science/article/pii/S2405959520300643

    Article  Google Scholar 

  19. Reidemeister T, Jiang M, Ward PAS (2011) Mining unstructured log files for recurrent fault diagnosis. In: IFIP/IEEE international symposium on integrated network management and workshops, pp 377–384

  20. Wang B, Ying S, Cheng G, Wang R, Yang Z, Dong B (2020) Log-based anomaly detection with the improved K-nearest neighbor. Int J Softw Eng Knowl Eng 30 (2):239–262. https://doi.org/10.1142/S0218194020500114

    Article  Google Scholar 

  21. Hirakawa R, Uchida H, Nakano A, Tominaga K, Nakatoh Y (2021) Large scale log anomaly detection via spatial pooling. Cognitive Robotics 1:188–196. https://doi.org/10.1016/j.cogr.2021.10.001

    Article  Google Scholar 

  22. Savaridassan P, Maragatham G (2021) Integrated deep auto-encoder and Q-learning-based scheme to detect anomalies and supporting forensics in cloud computing environments. Wirel Pers Commun, https://doi.org/10.1007/s11277-021-08785-6

  23. Wang J, Zhao C, He S, Gu Y, Alfarraj O, Abugabah A (2022) LogUAD: log unsupervised anomaly detection based on Word2Vec. Comput Syst Sci Eng 41(3):1207–1222. https://doi.org/10.32604/csse.2022.022365

    Article  Google Scholar 

  24. Farzad A, Gulliver TA (2021) Two class pruned log message anomaly detection. SN Computer Science 2(5):391. https://doi.org/10.1007/s42979-021-00772-9

    Article  Google Scholar 

  25. Du M, Li F, Zheng G, Srikumar V (2017) DeepLog: anomaly detection and diagnosis from system logs through deep learning. In: ACM conference on computer and communications security, pp 1285–1298

  26. Zhang D-Q, Chen S-C (2004) A novel kernelized fuzzy C-means algorithm with application in medical image segmentation. Artif Intell Med 32(1):37–50. https://doi.org/10.1016/j.artmed.2004.01.012

    Article  Google Scholar 

  27. Koohi H, Kiani K (2016) User based collaborative filtering using fuzzy C-means. Measurement 91:134–139. https://doi.org/10.1016/j.measurement.2016.05.058, http://www.sciencedirect.com/science/article/pii/S0263224116302159

    Article  Google Scholar 

  28. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Parallel distributed processing – explorations in the microstructure of cognition. MIT Press, Cambridge, MA, pp 318–362

  29. Palo HK, Mohanty MN, Chandra M (2015) Use of different features for emotion recognition using MLP network. In: Computational vision and robotics. https://link.springer.com/chapter/10.1007/978-81-322-2196-8_2https://link.springer.com/chapter/10.1007/978-81-322-2196-8_2. Springer, Berlin, pp 7–15

  30. Zhang C, Pan X, Li H, Gardiner A, Sargent I, Hare J, Atkinson PM (2018) A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification. J Photogramm Remote Sens 140:133–144. https://doi.org/10.1016/j.isprsjprs.2017.07.014

    Article  Google Scholar 

  31. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Advanced applications in pattern recognition. Springer, Berlin. https://www.springer.com/gp/book/9781475704525

    Book  MATH  Google Scholar 

  32. He S, Zhu J, He P, Lyu MR (2016) Experience report: System log analysis for anomaly detection. In: IEEE international symposium on software reliability engineering, pp 207–218

  33. Yang R, Qu D, Gao Y, Qian Y, Tang Y (2019) nLSALog: An anomaly detection framework for log sequence in security management. IEEE Access 7:181152–181164

    Article  Google Scholar 

  34. Ma H, Ekanayake C, Saha T K (2012) Power transformer fault diagnosis under measurement originated uncertainties. IEEE Trans Dielectr Electr Insul 19(6):1982–1990. https://doi.org/10.1109/TDEI.2012.6396956

    Article  Google Scholar 

  35. Xianfeng Y, Pengfei L (2015) Tailoring fuzzy C-means clustering algorithm for big data using random sampling and particle swarm optimization. Int J Database Theory Appl 8(3):191–202. https://doi.org/10.14257/ijdta.2015.8.3.16

    Article  Google Scholar 

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Farzad.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest with regards to this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farzad, A., Gulliver, T.A. Log message anomaly detection with fuzzy C-means and MLP. Appl Intell 52, 17708–17717 (2022). https://doi.org/10.1007/s10489-022-03300-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03300-1

Keywords

Navigation