skip to main content
10.1145/3368756.3369069acmotherconferencesArticle/Chapter ViewAbstractPublication PagessmartcityappConference Proceedingsconference-collections
research-article

Incident prediction through logging management and machine learning

Published:02 October 2019Publication History

ABSTRACT

Analyzing the log file for software or device provides a focal point for making incremental improvements; it is the performed step to start the incident analysis. Although, log messages format or contents may not always be fully documented, and described in many different formats. It makes the log analysis task more difficult, affects the correction deadline of incidents and therefore involves a high financial risk.

In this paper, we survey the log file analysis and the existing systems elaborated to resolve current issue. Then, we propose a methodology to support the log analysis in the complex environment. The KN-K-Nearest-Neighbor (KNN) classification method was chosed to be used online by weka to predict the error. Therefore, a program was developed in python to extract, clean and format the log file before comparing the different algorithms of the classifiation method KNN, J48 and Bayes - NaiveBayes in the context of dataset.API was used in order to process Weka.

Finally, we illustrate our proposal in the Tivoli Storage Manager (TSM) file log and provide a description of the results obtained.

References

  1. J. El abdelkhalki, M. Ben ahmed, et B. H. Anouar, « Classification and exploration of TSM log file based on datamining Algorithms », in Proceedings of the 2nd International Conference on Computing and Wireless Communication Systems - ICCWCS'17, Larache, Morocco, 2017, p. 1--7.Google ScholarGoogle Scholar
  2. E. Heikkinen et T. D. Hämäläinen, « Behavior Mining Language for mining expected behavior from log files », in IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society, 2016, p. 4613--4618.Google ScholarGoogle Scholar
  3. « Advances and challenges in log analysis. »Google ScholarGoogle Scholar
  4. « Extraction des connaissances à partir des fichiers Logs d'une Plateforme E-learning (PDF Download Available) », ResearchGate. [En ligne]. Disponible sur: https://www.researchgate.net/publication/309419431_Extraction_des_connaissances_a_partir_des_fichiers_Logs_d'une_Plateforme_E-learning. [Consulté le: 31-juill-2017].Google ScholarGoogle Scholar
  5. J. T. Hereford, P. J. Cogan, P. A. Ferguson, T. Rutowski, et C. Boothman, « System, method, and computer program for identifying significant records », US10140345B1, 27-nov-2018.Google ScholarGoogle Scholar
  6. D.-Q. Zou, H. Qin, et H. Jin, « UiLog: Improving Log-Based Fault Diagnosis by Log Analysis », J. Comput. Sci. Technol., vol. 31, n° 5, p. 1038--1052, sept. 2016.Google ScholarGoogle Scholar
  7. R. J. Gutierrez, K. W. Bauer, B. C. Boehmke, C. M. Saie, et T. J. Bihl, « Cyber anomaly detection: Using tabulated vectors and embedded analytics for efficient data mining », Journal of Algorithms & Computational Technology, vol. 12, n° 4, p. 293--310, déc. 2018.Google ScholarGoogle ScholarCross RefCross Ref
  8. « What is Log Analysis? - Definition from Techopedia », Techopedia.com. [En ligne]. Disponible sur: https://www.techopedia.com/definition/31756/log-analysis. [Consulté le: 20-août-2017].Google ScholarGoogle Scholar
  9. C. Suh-Lee, Ju-Yeon Jo, et Yoohwan Kim, « Text mining for security threat detection discovering hidden information in unstructured log messages », in 2016 IEEE Conference on Communications and Network Security (CNS), 2016, p. 252--260.Google ScholarGoogle ScholarCross RefCross Ref
  10. K. Saurabh, C. F. Beedgen, et B. Kurtic, « Log data analysis », US9262519B1, 16-févr-2016.Google ScholarGoogle Scholar
  11. S. Narkhede et T. Baraskar, « HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce », International Journal of UbiComp, vol. 4, n° 3, p. 41--51, juill. 2013.Google ScholarGoogle Scholar
  12. K. Nagaraj, C. Killian, et J. Neville, « Structured comparative analysis of systems logs to diagnose performance problems », in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2012, p. 26--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Lemoudden, M. Amar, et B. E. Ouahidi, « A Binary-based MapReduce Analysis for Cloud Logs », Procedia Computer Science, vol. 83, p. 1213--1218, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. Gutierrez J., B. Boehmke C., K. Bauer W., C. Saie M., et T. Bihl J., « anomalyDetection: Implementation of Augmented Network Log Anomaly Detection Procedures », The R Journal, vol. 9, n° 2, p. 354, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Juvonen, T. Sipola, et T. Hämäläinen, « Online anomaly detection using dimensionality reduction techniques for HTTP log analysis », Computer Networks, vol. 91, p. 46--56, nov. 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Yin, Y. Zhu, J. Fei, et X. He, « A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks », IEEE Access, vol. 5, p. 21954--21961, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. A. Aburomman et M. B. Ibne Reaz, « A novel SVMkNN-PSO ensemble method for intrusion detection system », Applied Soft Computing, vol. 38, p. 360--372, janv. 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Maniya, M. Hasan, et K. P. Patel, « Comparative study of naïve Bayes classifier and KNN for tuberculosis », in International Conference on Web Services Computing (ICWSC), 2011, p. 22--26.Google ScholarGoogle Scholar
  19. K. Khamar, « Short text classification using kNN based on distance function », International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, n° 4, p. 1916--1919, 2013.Google ScholarGoogle Scholar
  20. P. Amornsinlaphachai, « Efficiency of data mining models to predict academic performance and a cooperative learning model », in 2016 8th International Conference on Knowledge and Smart Technology (KST), 2016, p. 66--71.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. R. Mehedy et A. Jhalak, « Classification rules comparison into data mining concept », p. 9.Google ScholarGoogle Scholar
  22. A. H, Q. A., M. N., et E. M., « A Comparison Study between Data Mining Tools over some Classification Methods », International Journal of Advanced Computer Science and Applications, vol. 1, n° 3, 2011.Google ScholarGoogle Scholar
  23. S. Joshi et M. K. Nair, « Prediction of Heart Disease Using Classification Based Data Mining Techniques », in Computational Intelligence in Data Mining - Volume 2, 2015, p. 503--511.Google ScholarGoogle Scholar
  24. I. H. Witten, E. Frank, M. A. Hall, et C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2016.Google ScholarGoogle Scholar
  25. S. Dua, X. Du, et X. Du, Data Mining and Machine Learning in Cybersecurity. Auerbach Publications, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  26. « Classification », Wikipedia. 05-févr-2019.Google ScholarGoogle Scholar
  27. « IBM Knowledgecenter - Tivoli Storage Manager server and client messages format ». [En ligne]. Disponible sur: https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.4/msgs.tsm/msgs_msgsformat.html. [Consulté le: 28-juill-2017].Google ScholarGoogle Scholar
  28. Y. Benzaki, « Introduction à l'algorithme k Nearest Neighbors (KNN) », Mr. Mint: Apprendre le Machine Learning de A à Z, 02-oct-2018..Google ScholarGoogle Scholar
  29. D. Team, « Data Mining Algorithms - 13 Algorithms Used in Data Mining », DataFlair, 14-févr-2018. [En ligne]. Disponible sur: https://data-flair.training/blogs/datamining-algorithms/. [Consulté le: 15-juin-2019].Google ScholarGoogle Scholar
  30. V. Rajeswari et K. Arunesh, « Analysing Soil Data using Data Mining Classification Techniques », Indian Journal of Science and Technology, vol. 9, n° 19, mai 2016.Google ScholarGoogle Scholar
  31. « Weka 3 - Data Mining with Open Source Machine Learning Software in Java ». [En ligne]. Disponible sur: http://www.cs.waikato.ac.nz/ml/weka/. [Consulté le: 20-août-2017].Google ScholarGoogle Scholar

Index Terms

  1. Incident prediction through logging management and machine learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SCA '19: Proceedings of the 4th International Conference on Smart City Applications
      October 2019
      788 pages
      ISBN:9781450362894
      DOI:10.1145/3368756

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 October 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)18
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader