Abstract
Log messages are one the most valuable sources of information in the cloud and other software systems. These logs can be used for audits and ensuring system security. Many millions of log messages are produced each day which makes anomaly detection challenging. Automating the detection of anomalies can save time and money as well as improve detection performance. In this paper, an anomaly detection method is proposed using radius-based fuzzy C-means with more clusters than the number of data classes and a multilayer perceptron (MLP) network. The cluster centers and a radius are used to select reliable positive and negative log messages. Moreover, class probabilities are used with an expert to correct the network output for suspect logs. The proposed model is evaluated with three well-known data sets, namely BGL, Openstack and Thunderbird. The results obtained show that this model provides better results than existing methods.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu MR (2019) Tools and benchmarks for automated log parsing. In: International conference on software engineering: software engineering in practice, pp 121–130
He S, Lin Q, Lou J-G, Zhang H, Lyu MR, Zhang D (2018) Identifying impactful service system problems via log analysis. In: ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 60–70
Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) SherLog: Error diagnosis by connecting clues from run-time logs. In: Architectural support for programming languages and operating systems, pp 143–154
Wu F, Anchuri P, Li Z (2017) Structural event detection from log messages. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 1175–1184
Vaarandi R, Blumbergs B, Kont M (2018) An unsupervised framework for detecting anomalous messages from syslog log files. In: IEEE/IFIP network operations and management symposium, pp 1–6
Yen T-F, Oprea A, Onarlioglu K, Leetham T, Robertson W, Juels A, Kirda E (2013) Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In: Annual computer security applications conference, pp 199–208
Lin Q, Zhang H, Lou J, Zhang Y, Chen X (2016) Log clustering based problem identification for online service systems. In: IEEE/ACM international conference on software engineering, pp 102–111
Simeone O (2018) A Very brief introduction to machine learning with applications to communication systems. IEEE Trans Cogn Commun Netw 4(4):648–664. https://doi.org/10.1109/TCCN.2018.2881442
Handrich S, Herzog A, Wolf A, Herrmann CS (2011) Combining supervised, unsupervised, and reinforcement learning in a network of spiking neurons. In: Advances in cognitive neurodynamics (II). Springer, Berlin, pp 163–176
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge, MA
Affonso C, Rossi ALD, Vieira FHA, de Leon Ferreira de Carvalho ACP (2017) Deep learning for biological image classification. Expert Syst Appl 85:114–122. https://doi.org/10.1016/j.eswa.2017.05.039, http://www.sciencedirect.com/science/article/pii/S0957417417303627
Chen S, Wang L, Li W, Zhang K (2019) Deep learning method with attention for extreme multi-label text classification. In: Trends in artificial intelligence. Springer, Berlin, pp 179–190
Wazid M, Das AK (2016) An efficient hybrid anomaly detection scheme using k-means clustering for wireless sensor networks. Wirel Pers Commun 90(4):1971–2000. https://doi.org/10.1007/s11277-016-3433-3
Ding N, Ma H, Gao H, Ma Y, Tan G (2019) Real-time anomaly detection based on long short-term memory and gaussian mixture model. Comput Electr Eng 79:106458. https://doi.org/10.1016/j.compeleceng.2019.106458, http://www.sciencedirect.com/science/article/pii/S0045790618334372
Ma MX, Ngan HYT, Liu W (2016) Density-based outlier detection by local outlier factor on largescale traffic data. Image Processing: Machine Vision Applications IX. https://www.ingentaconnect.com/content/ist/ei/2016/00002016/00000014/art00003https://www.ingentaconnect.com/content/ist/ei/2016/00002016/00000014/art00003
Antonini M, Vecchio M, Antonelli F, Ducange P, Perera C (2018) Smart audio sensors in the internet of things edge for anomaly detection. IEEE Access 6:67594–67610. https://doi.org/10.1109/ACCESS.2018.2877523
Taylor A, Japkowicz N, Leblanc S (2015) Frequency-based anomaly detection for the automotive CAN bus. In: World congress on industrial control systems security, pp 45–49
Farzad A, Gulliver TA (2020) Unsupervised log message anomaly detection. ICT Express 6(3):229–237. https://doi.org/10.1016/j.icte.2020.06.003, http://www.sciencedirect.com/science/article/pii/S2405959520300643
Reidemeister T, Jiang M, Ward PAS (2011) Mining unstructured log files for recurrent fault diagnosis. In: IFIP/IEEE international symposium on integrated network management and workshops, pp 377–384
Wang B, Ying S, Cheng G, Wang R, Yang Z, Dong B (2020) Log-based anomaly detection with the improved K-nearest neighbor. Int J Softw Eng Knowl Eng 30 (2):239–262. https://doi.org/10.1142/S0218194020500114
Hirakawa R, Uchida H, Nakano A, Tominaga K, Nakatoh Y (2021) Large scale log anomaly detection via spatial pooling. Cognitive Robotics 1:188–196. https://doi.org/10.1016/j.cogr.2021.10.001
Savaridassan P, Maragatham G (2021) Integrated deep auto-encoder and Q-learning-based scheme to detect anomalies and supporting forensics in cloud computing environments. Wirel Pers Commun, https://doi.org/10.1007/s11277-021-08785-6
Wang J, Zhao C, He S, Gu Y, Alfarraj O, Abugabah A (2022) LogUAD: log unsupervised anomaly detection based on Word2Vec. Comput Syst Sci Eng 41(3):1207–1222. https://doi.org/10.32604/csse.2022.022365
Farzad A, Gulliver TA (2021) Two class pruned log message anomaly detection. SN Computer Science 2(5):391. https://doi.org/10.1007/s42979-021-00772-9
Du M, Li F, Zheng G, Srikumar V (2017) DeepLog: anomaly detection and diagnosis from system logs through deep learning. In: ACM conference on computer and communications security, pp 1285–1298
Zhang D-Q, Chen S-C (2004) A novel kernelized fuzzy C-means algorithm with application in medical image segmentation. Artif Intell Med 32(1):37–50. https://doi.org/10.1016/j.artmed.2004.01.012
Koohi H, Kiani K (2016) User based collaborative filtering using fuzzy C-means. Measurement 91:134–139. https://doi.org/10.1016/j.measurement.2016.05.058, http://www.sciencedirect.com/science/article/pii/S0263224116302159
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Parallel distributed processing – explorations in the microstructure of cognition. MIT Press, Cambridge, MA, pp 318–362
Palo HK, Mohanty MN, Chandra M (2015) Use of different features for emotion recognition using MLP network. In: Computational vision and robotics. https://link.springer.com/chapter/10.1007/978-81-322-2196-8_2https://link.springer.com/chapter/10.1007/978-81-322-2196-8_2. Springer, Berlin, pp 7–15
Zhang C, Pan X, Li H, Gardiner A, Sargent I, Hare J, Atkinson PM (2018) A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification. J Photogramm Remote Sens 140:133–144. https://doi.org/10.1016/j.isprsjprs.2017.07.014
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Advanced applications in pattern recognition. Springer, Berlin. https://www.springer.com/gp/book/9781475704525
He S, Zhu J, He P, Lyu MR (2016) Experience report: System log analysis for anomaly detection. In: IEEE international symposium on software reliability engineering, pp 207–218
Yang R, Qu D, Gao Y, Qian Y, Tang Y (2019) nLSALog: An anomaly detection framework for log sequence in security management. IEEE Access 7:181152–181164
Ma H, Ekanayake C, Saha T K (2012) Power transformer fault diagnosis under measurement originated uncertainties. IEEE Trans Dielectr Electr Insul 19(6):1982–1990. https://doi.org/10.1109/TDEI.2012.6396956
Xianfeng Y, Pengfei L (2015) Tailoring fuzzy C-means clustering algorithm for big data using random sampling and particle swarm optimization. Int J Database Theory Appl 8(3):191–202. https://doi.org/10.14257/ijdta.2015.8.3.16
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no conflict of interest with regards to this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Farzad, A., Gulliver, T.A. Log message anomaly detection with fuzzy C-means and MLP. Appl Intell 52, 17708–17717 (2022). https://doi.org/10.1007/s10489-022-03300-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03300-1