skip to main content
10.1145/1923947.1923979dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

Symptom-based problem determination using log data abstraction

Published: 01 November 2010 Publication History

Abstract

System failures in industry are expensive, and the increasingly stringent requirements on performance and reliability of enterprise systems have made the detection and diagnosis of system failures crucial and challenging. Log files generated at the system runtime are considered to contain the representations of failure symptoms, and thus become one of the most important sources used for system monitoring and failure diagnosis.
A number of studies suggest that data mining and machine learning can help in dealing with the vast amount of log data for a complex enterprise system. Log data abstraction techniques have been proposed, but have not been well studied for failure detection and problem determination. In this research, we investigate the effects of using an unsupervised log data abstraction method to aid the supervised learning processes of problem determination. Additionally, we compare the efficiency of associative classification methods for failure diagnosis against Bayesian Learning technique and C4.5 that have been proved good both in documentation classification and failure diagnosis. Our experimental results show that two associative classification methods outperform Naive Bayes and C4.5 when applied on non-abstracted logs, and unsupervised log abstraction helps to improve the performance of log-based problem determination significantly in terms of the precision, F-measure, and efficiency.

References

[1]
Glassfish wiki. http://wiki.glassfish.java.net.
[2]
Web automation and test tool. http://sourceforge.net/projects/sahi/.
[3]
M-L. Antonie and O. R. Zaïane. Text document categorization by term association. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 19, Washington, DC, USA, 2002. IEEE Computer Society.
[4]
L. D. Baker and A. W. McCallum. Distributional clustering of words for text classification. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'98), pages 96--103, New York, NY, USA, 1998. ACM.
[5]
D. M Blei, A. Y Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[6]
F. Coene. Liverpool University Computer Science---Knowledge Discovery in Datas. http://www.csc.liv.ac.uk/frans/KDD/Software/.
[7]
Q. Fu, J. G. Lou, Y. Wang, and J. Li. Execution anomaly detection in distributed systems through unstructured log analysis. Proceedings of the 2009 IEEE International Conference on Data Mining (ICDM'09), 0:149--158, 2009.
[8]
E. W. Fulp, G. A. Fink, and J. N. Haack. Predicting computer system failures using support vector machines. In Proceedings of the 1st USENIX Workshop on the Analyssi of System Logs (WASL'08), 2008.
[9]
C. Gulcu. The complete log4j manual. http://www.qos.ch/shop/products/eclm/.
[10]
S. Gupta. Pro Apache log4j(Second Edition). Apress, June 22, 2005.
[11]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.
[12]
J. L. Hellerstein, S. Ma, and C.-S. Perng. Discovering actionable patterns in event data. IBM System Journal, 41(3):475--493, 2002.
[13]
H. Huang, R. Jennings III, Y. Ruan, R. Sahoo, S. Sahu, and A. Shaikh. Pda: a tool for automated problem determination. In Proceedings of the 21st conference on Large Installation System Administration Conference (LISA'07), pages 1--14, Berkeley, CA, USA, 2007. USENIX Association.
[14]
Z. M. Jiang, A. E. Hassan, P. Flora, and G. Hamann. Abstracting execution logs to execution events for enterprise applications (short paper). Quality Software, International Conference on, 0:181--186, 2008.
[15]
B. Li, N. Sugandh, E. V. Garcia, and A. Ram. Adapting associative classification to text categorization. In Proceedings of the 2007 ACM symposium on Document engineering (DocEng'07), pages 205--208, New York, NY, USA, 2007. ACM.
[16]
T. Li, F. Liang, S. Ma, and W. Peng. An integrated framework on mining logs files for computing system management. In Proceedings of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining (KDD'05), pages 776--781, New York, NY, USA, 2005. ACM.
[17]
W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01), San Jose, California, Novermeber29-December2 2001.
[18]
Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R. Sahoo. Bluegene/l failure analysis and prediction models. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'06), pages 425--434, Washington, DC, USA, 2006. IEEE Computer Society.
[19]
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD'98), pages 80--86, New York City, NY, August 1998.
[20]
A. A. O Makanju, A. N. Zincir-Heywood, and E. E Milios. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'09), pages 1255--1264, New York, NY, USA, 2009. ACM.
[21]
L. Mariani and F. Pastore. Automated identification of failure causes in system logs. In Proceedings of the 2008 19th International Symposium on Software Reliability Engineering (ISSRE'08), pages 117--126, Washington, DC, USA, 2008. IEEE Computer Society.
[22]
M. Nagappan and M. A. Vouk. Abstracting log lines to log event types for mining software system logs. In Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR'10), Cape Town, South Africa, 2010.
[23]
D. Ogle, H. Kreger, A. Salahshour, J. Cornpropst, E. Labadie, M. Chessel, B. Horn, J. Gerken, J. Schoech, and M. Wamboldt. Canonical Situation Data Format: The Common Base Event V1.0.1. IBM Corporation, 2004.
[24]
A. Oliner and J. Stearley. What supercomputers say: A study of five system logs. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), pages 575--584, Washington, DC, USA, 2007. IEEE Computer Society.
[25]
A. J. Oliner, A. Aiken, and J. Stearley. Alert detection in system logs. In Proceedings of the 2008 IEEE International Conference on Data Mining (ICDM'08), pages 959--964, 2008.
[26]
A. Razavi and K. Kontogiannis. Pattern and policy driven log analysis for software monitoring. In Proceedings of the 32nd Annual IEEE International Computer Software and Applications Conference (COMP-SAC'08), pages 108--111, Washington, DC, USA, 2008. IEEE Computer Society.
[27]
T. Reidemeister, MA. Munawar, M. Jiang, and P. Ward. Diagnosis of recurrent faults using log files. In Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research (CAS-CON'09), pages 12--23, New York, NY, USA, 2009. ACM.
[28]
F. Salfner and S. Tschirpke. Error log processing for accurate failure prediction. In Proceedings of the 1st USENIX Workshop on the Analyssi of System Logs (WASL'08), 2008.
[29]
L. M. Silva. Comparing error detection techniques for web applications: An experimental study. IEEE International Symposium on Network Computing and Applications, 0:144--151, 2008.
[30]
J. Stearley and A. J. Oliner. Bad words: Finding faults in spirit's syslogs. IEEE International Symposium on Cluster Computing and the Grid, 0:765--770, 2008.
[31]
F. A. Thabtah. A review of associative classification mining. Knowledge Eng. Review, 22(1):37--65, 2007.
[32]
R. Vaarandi. A data clustering algorithm for mining patterns from event logs. In Proceeding of the the 2003 IEEE Workshop on IP Operations and Management (IPOM'03), pages 119--126, 2003.
[33]
R. Vaarandi. A breadth-first algorithm for mining frequent patterns from event logs. In Intelligence in Communication Systems, pages 293--308, 2004.
[34]
R. Vaarandi. Mining event logs with slct and loghound. In IEEE/IFIP Network Operations and Management Symposium (NOMS'08), pages 1071--1074, 2008.
[35]
J. Wang and G. Karypis. Harmony: Efficiently mining the best rules for classification. In Proceedings of the 2005 SIAM International Conference on Data Mining (SDM'05), 2005.
[36]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detecting large-scale system problems by mining console logs. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP'09), pages 117--132, New York, NY, USA, 2009. ACM.
[37]
X. Yin and J. Han. Cpar: Classification based on predictive association rules. In Proceedings of the 2003 SIAM International Conference on Data Mining (SDM'03), 2003.

Cited By

View all
  • (2020)Automatic Event Log Abstraction to Support Forensic InvestigationProceedings of the Australasian Computer Science Week Multiconference10.1145/3373017.3373018(1-9)Online publication date: 4-Feb-2020
  • (2017)GenLog: Accurate Log Template Discovery for Stripped X86 Binaries2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2017.137(337-346)Online publication date: Jul-2017
  • (2016)An Evaluation Study on Log Parsing and Its Use in Log Mining2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2016.66(654-661)Online publication date: Jun-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
November 2010
482 pages

Publisher

IBM Corp.

United States

Publication History

Published: 01 November 2010

Qualifiers

  • Research-article

Conference

CASCON '10
CASCON '10: Center for Advanced Studies on Collaborative Research
November 1 - 4, 2010
Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Automatic Event Log Abstraction to Support Forensic InvestigationProceedings of the Australasian Computer Science Week Multiconference10.1145/3373017.3373018(1-9)Online publication date: 4-Feb-2020
  • (2017)GenLog: Accurate Log Template Discovery for Stripped X86 Binaries2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2017.137(337-346)Online publication date: Jul-2017
  • (2016)An Evaluation Study on Log Parsing and Its Use in Log Mining2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2016.66(654-661)Online publication date: Jun-2016
  • (2013)Accurate Proactive Adaptation of Service-Oriented SystemsAssurances for Self-Adaptive Systems10.1007/978-3-642-36249-1_9(240-265)Online publication date: 2013
  • (2013)Monitoring Event Logs within a Cluster SystemComplex Systems and Dependability10.1007/978-3-642-30662-4_17(257-271)Online publication date: 2013
  • (2012)Spatio-temporal decomposition, clustering and identification for alert detection in system logsProceedings of the 27th Annual ACM Symposium on Applied Computing10.1145/2245276.2245395(621-628)Online publication date: 26-Mar-2012
  • (2012)Interactive learning of alert signatures in High Performance Cluster system logs2012 IEEE Network Operations and Management Symposium10.1109/NOMS.2012.6211882(52-60)Online publication date: Apr-2012
  • (2011)Assisting failure diagnosis through filesystem instrumentationProceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research10.5555/2093889.2093909(160-174)Online publication date: 7-Nov-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media