Abstract
The failure prediction of cluster systems is an effective approach to improve the reliability of the cluster systems, which is becoming a new research hotspot of high performance computing, especially with the growth of cluster systems and applications both in scale and complexity. A classification sequential rule model is proposed to predict cluster system failures. The system logs of BlueGene/L, Red Storm, and Spirit are used as experimental datasets to predict cluster system failures. The results show that sequential rule approach outperforms SVM and HSMM in terms of precision and F-measure in 5hr prediction window, and in 1hr or 12hr prediction window, sequential rules, SVM and HSMM have their own strengths and weaknesses respectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sahoo, R.K., Oliner, A.J., Rish, I., et al.: Critical Event Prediction for Proactive Management in Large scale Computer Clusters. In: 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2003), pp. 426–435. ACM Press, New York (2003)
Guan, Q., Zhang, Z., Fu, S.: Ensemble of Bayesian Predictors and Decision Trees for Proactive Failure Management in Cloud Computing Systems. Journal of Communications 7(1), 52–61 (2012)
Pecchia, A., Cinque, M.: Log-Based Failure Analysis of Complex Systems: Methodology and Relevant Applications. In: Innovative Technologies for Dependable OTS-Based Critical Systems, pp. 203–215. Springer, Milan (2013)
Fronza, I., Sillitti, A., Succi, G., et al.: Failure Prediction based on Log Files using Random Indexing and Support Vector Machines. Journal of Systems and Software 86(1), 2–11 (2012)
Fu, X., Ren, R., Zhan, J., et al.: LogMaster: Mining Event Correlations in Logs of Large-scale Cluster Systems. In: 2012 IEEE 31st Symposium on Reliable Distributed Systems (SRDS 2012), pp. 71–80. IEEE Press, New York (2012)
Salfner, F., Lenk, M., Malek, M.: A Survey of Online Failure Prediction Methods. Journal of ACM Computing Surveys 43(3), 22–29 (2010)
Wenjian, W., Changqian, M., Weizhen, L.: Online Prediction Model based on Support Vector Machine. Neurocomputing 71(4-6), 550–558 (2008)
Zhenghua, X., Xiaoshe, D., et al.: A Survey on Failure Prediction of Large-scale Server Clusters. In: 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), pp. 733–738. IEEE Press, New York (2007)
Yinglung, L., Yanyong, Z., Jette, M.: BlueGene/L Failure Analysis and Prediction Models. In: Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2006), pp. 425–434. IEEE Press, New York (2007)
Fulp, E.W., Fink, G.A., Haack, J.N.: Predicting Computer System Failures Using Support Vector Machines. In: Workshop on the Analysis of System Logs (WASL 2008), p. 5. ACM Press, New York (2008)
Gujrati, P., Yawei, L., Zhiling, L., et al.: A Meta-Learning Failure Predictor for Blue Gene/L Systems. In: International Conference on Parallel Processing (ICPP 2007), pp. 40–47. IEEE Press, New York (2007)
Jiexing, G., Ziming, Z., Zhiling, L.: Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study. In: 2008 International Conference on Parallel Processing, pp. 157–164. IEEE Press, New York (2008)
Joshi, M., Agarwal, R., Kumar, V.: Mining Needle in a Haystack: Classifying Rare Classes via Two-phase Rule Induction. In: 2001 ACM SIGMOD International Conference on Management of Data, pp. 91–102. ACM Press, New York (2001)
Chih-Chuang, C., Chih-Jen, L.: LIBSVM: a Library for Support Vector Machines (2001), Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Yinglung, L., Yanyong, Z., Hui, X.: Failure Prediction in IBM BlueGene/L Event Logs. In: IEEE Conference on Data Mining (ICDM 2007), pp. 583–588. IEEE Press, New York (2007)
Salfener, F., Malek, M.: Using Hidden Semi-markov Models for Effective Online Failure Prediction. In: 26th IEEE International Symposium on Reliable Distributed Systems, pp. 161–174. IEEE Press, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, J., Li, H. (2013). The Failure Prediction of Cluster Systems Based on System Logs. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-39787-5_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39786-8
Online ISBN: 978-3-642-39787-5
eBook Packages: Computer ScienceComputer Science (R0)