ABSTRACT
Analyzing the log file for software or device provides a focal point for making incremental improvements; it is the performed step to start the incident analysis. Although, log messages format or contents may not always be fully documented, and described in many different formats. It makes the log analysis task more difficult, affects the correction deadline of incidents and therefore involves a high financial risk.
In this paper, we survey the log file analysis and the existing systems elaborated to resolve current issue. Then, we propose a methodology to support the log analysis in the complex environment. The KN-K-Nearest-Neighbor (KNN) classification method was chosed to be used online by weka to predict the error. Therefore, a program was developed in python to extract, clean and format the log file before comparing the different algorithms of the classifiation method KNN, J48 and Bayes - NaiveBayes in the context of dataset.API was used in order to process Weka.
Finally, we illustrate our proposal in the Tivoli Storage Manager (TSM) file log and provide a description of the results obtained.
- J. El abdelkhalki, M. Ben ahmed, et B. H. Anouar, « Classification and exploration of TSM log file based on datamining Algorithms », in Proceedings of the 2nd International Conference on Computing and Wireless Communication Systems - ICCWCS'17, Larache, Morocco, 2017, p. 1--7.Google Scholar
- E. Heikkinen et T. D. Hämäläinen, « Behavior Mining Language for mining expected behavior from log files », in IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society, 2016, p. 4613--4618.Google Scholar
- « Advances and challenges in log analysis. »Google Scholar
- « Extraction des connaissances à partir des fichiers Logs d'une Plateforme E-learning (PDF Download Available) », ResearchGate. [En ligne]. Disponible sur: https://www.researchgate.net/publication/309419431_Extraction_des_connaissances_a_partir_des_fichiers_Logs_d'une_Plateforme_E-learning. [Consulté le: 31-juill-2017].Google Scholar
- J. T. Hereford, P. J. Cogan, P. A. Ferguson, T. Rutowski, et C. Boothman, « System, method, and computer program for identifying significant records », US10140345B1, 27-nov-2018.Google Scholar
- D.-Q. Zou, H. Qin, et H. Jin, « UiLog: Improving Log-Based Fault Diagnosis by Log Analysis », J. Comput. Sci. Technol., vol. 31, n° 5, p. 1038--1052, sept. 2016.Google Scholar
- R. J. Gutierrez, K. W. Bauer, B. C. Boehmke, C. M. Saie, et T. J. Bihl, « Cyber anomaly detection: Using tabulated vectors and embedded analytics for efficient data mining », Journal of Algorithms & Computational Technology, vol. 12, n° 4, p. 293--310, déc. 2018.Google ScholarCross Ref
- « What is Log Analysis? - Definition from Techopedia », Techopedia.com. [En ligne]. Disponible sur: https://www.techopedia.com/definition/31756/log-analysis. [Consulté le: 20-août-2017].Google Scholar
- C. Suh-Lee, Ju-Yeon Jo, et Yoohwan Kim, « Text mining for security threat detection discovering hidden information in unstructured log messages », in 2016 IEEE Conference on Communications and Network Security (CNS), 2016, p. 252--260.Google ScholarCross Ref
- K. Saurabh, C. F. Beedgen, et B. Kurtic, « Log data analysis », US9262519B1, 16-févr-2016.Google Scholar
- S. Narkhede et T. Baraskar, « HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce », International Journal of UbiComp, vol. 4, n° 3, p. 41--51, juill. 2013.Google Scholar
- K. Nagaraj, C. Killian, et J. Neville, « Structured comparative analysis of systems logs to diagnose performance problems », in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2012, p. 26--26.Google ScholarDigital Library
- M. Lemoudden, M. Amar, et B. E. Ouahidi, « A Binary-based MapReduce Analysis for Cloud Logs », Procedia Computer Science, vol. 83, p. 1213--1218, 2016.Google ScholarCross Ref
- R. Gutierrez J., B. Boehmke C., K. Bauer W., C. Saie M., et T. Bihl J., « anomalyDetection: Implementation of Augmented Network Log Anomaly Detection Procedures », The R Journal, vol. 9, n° 2, p. 354, 2017.Google ScholarCross Ref
- A. Juvonen, T. Sipola, et T. Hämäläinen, « Online anomaly detection using dimensionality reduction techniques for HTTP log analysis », Computer Networks, vol. 91, p. 46--56, nov. 2015.Google ScholarDigital Library
- C. Yin, Y. Zhu, J. Fei, et X. He, « A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks », IEEE Access, vol. 5, p. 21954--21961, 2017.Google ScholarCross Ref
- A. A. Aburomman et M. B. Ibne Reaz, « A novel SVMkNN-PSO ensemble method for intrusion detection system », Applied Soft Computing, vol. 38, p. 360--372, janv. 2016.Google ScholarDigital Library
- H. Maniya, M. Hasan, et K. P. Patel, « Comparative study of naïve Bayes classifier and KNN for tuberculosis », in International Conference on Web Services Computing (ICWSC), 2011, p. 22--26.Google Scholar
- K. Khamar, « Short text classification using kNN based on distance function », International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, n° 4, p. 1916--1919, 2013.Google Scholar
- P. Amornsinlaphachai, « Efficiency of data mining models to predict academic performance and a cooperative learning model », in 2016 8th International Conference on Knowledge and Smart Technology (KST), 2016, p. 66--71.Google ScholarCross Ref
- S. R. Mehedy et A. Jhalak, « Classification rules comparison into data mining concept », p. 9.Google Scholar
- A. H, Q. A., M. N., et E. M., « A Comparison Study between Data Mining Tools over some Classification Methods », International Journal of Advanced Computer Science and Applications, vol. 1, n° 3, 2011.Google Scholar
- S. Joshi et M. K. Nair, « Prediction of Heart Disease Using Classification Based Data Mining Techniques », in Computational Intelligence in Data Mining - Volume 2, 2015, p. 503--511.Google Scholar
- I. H. Witten, E. Frank, M. A. Hall, et C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2016.Google Scholar
- S. Dua, X. Du, et X. Du, Data Mining and Machine Learning in Cybersecurity. Auerbach Publications, 2016.Google ScholarCross Ref
- « Classification », Wikipedia. 05-févr-2019.Google Scholar
- « IBM Knowledgecenter - Tivoli Storage Manager server and client messages format ». [En ligne]. Disponible sur: https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.4/msgs.tsm/msgs_msgsformat.html. [Consulté le: 28-juill-2017].Google Scholar
- Y. Benzaki, « Introduction à l'algorithme k Nearest Neighbors (KNN) », Mr. Mint: Apprendre le Machine Learning de A à Z, 02-oct-2018..Google Scholar
- D. Team, « Data Mining Algorithms - 13 Algorithms Used in Data Mining », DataFlair, 14-févr-2018. [En ligne]. Disponible sur: https://data-flair.training/blogs/datamining-algorithms/. [Consulté le: 15-juin-2019].Google Scholar
- V. Rajeswari et K. Arunesh, « Analysing Soil Data using Data Mining Classification Techniques », Indian Journal of Science and Technology, vol. 9, n° 19, mai 2016.Google Scholar
- « Weka 3 - Data Mining with Open Source Machine Learning Software in Java ». [En ligne]. Disponible sur: http://www.cs.waikato.ac.nz/ml/weka/. [Consulté le: 20-août-2017].Google Scholar
Index Terms
- Incident prediction through logging management and machine learning
Recommendations
Classification and exploration of TSM log file based on datamining Algorithms
ICCWCS'17: Proceedings of the 2nd International Conference on Computing and Wireless Communication Systemsanalyzing the log file for software or device provides a focal point for making incremental improvements; it is the performed step to start the incident analysis. Although, log messages format or contents may not always be fully documented, and ...
Classification Technique and its Combination with Clustering and Association Rule Mining in Educational Data Mining — A survey
AbstractEducational data mining (EDM) is the application of data mining in the educational field. EDM is used to classify, analyze, and predict the students’ academic performance, and students’ dropout rate, as well as instructors’performance in order to ...
A Comparative Study of Selected Classifiers with Classification Accuracy in User Profiling
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03In recent years the use of personalized service provisioning applications has been very popular. However, effective personalization cannot be achieved without accurate user profiles. In literature a number of classification algorithms have been used to ...
Comments