skip to main content
10.1145/1081870.1081972acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

An integrated framework on mining logs files for computing system management

Published: 21 August 2005 Publication History

Abstract

Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. This has been well known and experienced as a cumbersome, labor intensive, and error prone process. In addition, this process is difficult to keep up with the rapidly changing environments. In this paper, we will describe our research efforts on establishing an integrated framework for mining system log files for automatic management. In particular, we apply text mining techniques to categorize messages in log files into common situations, improve categorization accuracy by considering the temporal characteristics of log messages, develop temporal mining techniques to discover the relationships between different events, and utilize visualization tools to evaluate and validate the interesting temporal patterns for system management.

References

[1]
Mark Berman. Testing for spatial association between a point process and another stochastic process. Applied Statistics, 35(1):54--62, 1986.]]
[2]
M. Chessell. Specification: Common base event, 2003. http://www-106.ibm.com /developerworks/webservices/library/ws-cbe/.]]
[3]
Noel A.C. Cressie. Statistics for spatial data. John Wiley & Sons, 1991.]]
[4]
Joseph L. Hellerstein, Sheng Ma, and Chang shing Perng. Discover actionable patterns in event data. IBM System Journal, 41(3):475--493, 2002.]]
[5]
K. Houck, S. Calo, and A. Finkel. Towards a practical alarm correlation system. Integrated Network Management IV, pages 226--237, 1995.]]
[6]
Jeffrey O. Kephart and David M. Chess. The vision of autonomic computing. Computer, pages 41--50, 2003.]]
[7]
Nicholas Kushmerick, Edward Johnston, and Stephen McGuinness. Information extraction by text classification. Proceedings of the IJCAI-01 Workshop on Adaptive Text Extraction and Mining, 2001.]]
[8]
T. R. Leek. Information extraction using hidden markov models. Master's thesis, UC San Diego, 1997.]]
[9]
C. Li and G. Biswas. Temporal pattern generation using hidden Markov model based unsupervised classification. In In Proc. of IDA-99, pages 245--256, 1999.]]
[10]
Tao Li and Sheng Ma. Mining temporal patterns without predefined time windows. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), pages 451--454, 2004.]]
[11]
Feng Liang, Sheng Ma, and Joseph L. Hellerstein. Discovering fully dependent patterns. In SIAM DM, 2002.]]
[12]
Sheng Ma and Joseph L. Hellerstein. Eventbrowser: A flexible tool for scalable analysis of event data. In Proceedings of the 10th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, pages 285--296. Springer-Verlag, 1999.]]
[13]
Sheng Ma and Joseph L. Hellerstein. Mining partially periodic event patterns with unknown periods. In ICDE, pages 205--214, 2001.]]
[14]
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (SIGKDD'95), pages 210--215. AAAI Press, 1995.]]
[15]
A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.]]
[16]
Tom M. Mitchell. Machine Learning. The McGraw-Hill Companies,Inc., 1997.]]
[17]
Wei Peng, Tao Li, and Sheng Ma. Mining logs files for computing system management. In Proceedings of The 2nd IEEE International Conference on Autonomic Computing (ICAC-05). To appear, 2005.]]
[18]
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of IEEE, 77(2):257--286, 1989.]]
[19]
IBM Market Research. Autonomic computing core technology study, 2003.]]
[20]
Irina Rish. An empirical study of the naive Bayes classifier. In Proceedings of IJCAI-01 workshop on Empirical Methods in AI, pages 41--46, 2001.]]
[21]
Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Survey, 34(1):1--47, 2002.]]
[22]
Jon Stearley. Towards informatic analysis of syslogs. In Proceedings of IEEE International Conference on Cluster Computing, Sept. 2004.]]
[23]
D. Stoyan, W.S. Kendall, and J. Mecke. Stochastic Geometry and its Applications. John wiley and Sons, 1995.]]
[24]
Brad Topol, David Ogle, Donna Pierson, Jim Thoensen, John Sweitzer, Marie Chow, Mary Ann Hoffmann, Pamela Durham, Ric Telford, Sulabha Sheth, and Thomas Studwell. Automating problem determination: A first step toward self-healing computing systems. IBM White Paper, October 2003. http://www-106.ibm.com/developerworks/autonomic/library/ac-summary/ac-prob.html.]]

Cited By

View all
  • (2025)Application of artificial intelligence in big data managementArtificial Intelligence in e-Health Framework, Volume 110.1016/B978-0-443-13816-4.00007-3(145-155)Online publication date: 2025
  • (2024)Web service fault diagnosis based on graph convolutional networksThird International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024)10.1117/12.3032903(100)Online publication date: 5-Jul-2024
  • (2024)An Intelligent Secure Fault Classification and Identification Scheme for Mining Valuable Information in IIoTIEEE Systems Journal10.1109/JSYST.2024.343718518:3(1705-1716)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. An integrated framework on mining logs files for computing system management

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
    August 2005
    844 pages
    ISBN:159593135X
    DOI:10.1145/1081870
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. event relationship
    2. log categorization
    3. system management
    4. temporal pattern

    Qualifiers

    • Article

    Conference

    KDD05

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Application of artificial intelligence in big data managementArtificial Intelligence in e-Health Framework, Volume 110.1016/B978-0-443-13816-4.00007-3(145-155)Online publication date: 2025
    • (2024)Web service fault diagnosis based on graph convolutional networksThird International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024)10.1117/12.3032903(100)Online publication date: 5-Jul-2024
    • (2024)An Intelligent Secure Fault Classification and Identification Scheme for Mining Valuable Information in IIoTIEEE Systems Journal10.1109/JSYST.2024.343718518:3(1705-1716)Online publication date: Sep-2024
    • (2023)Heterogeneous Syslog Analysis: There Is HopeProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624128(581-587)Online publication date: 12-Nov-2023
    • (2023)System Log Parsing: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3222417(1-20)Online publication date: 2023
    • (2023)LogInsights - Understanding and Extracting Information from Logs for Fast Fault Classification by Weak Supervision2023 IEEE International Conference on Software Services Engineering (SSE)10.1109/SSE60056.2023.00014(20-26)Online publication date: Jul-2023
    • (2022)PoSBert: Log Classification via Modified Bert Based on Part-of-Speech Weight2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)10.1109/PRAI55851.2022.9904207(979-983)Online publication date: 19-Aug-2022
    • (2021)Survey on Log Clustering ApproachesSmart Log Data Analytics10.1007/978-3-030-74450-2_2(13-41)Online publication date: 29-Aug-2021
    • (2019)Proactive Failure Detection Learning Generation Patterns of Large-Scale Network LogsIEICE Transactions on Communications10.1587/transcom.2018EBP3103E102.B:2(306-316)Online publication date: 1-Feb-2019
    • (2019)Learning Latent Events From Network Message LogsIEEE/ACM Transactions on Networking10.1109/TNET.2019.293004027:4(1728-1741)Online publication date: 1-Aug-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media