Skip to main content

The Failure Prediction of Cluster Systems Based on System Logs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8041))

Abstract

The failure prediction of cluster systems is an effective approach to improve the reliability of the cluster systems, which is becoming a new research hotspot of high performance computing, especially with the growth of cluster systems and applications both in scale and complexity. A classification sequential rule model is proposed to predict cluster system failures. The system logs of BlueGene/L, Red Storm, and Spirit are used as experimental datasets to predict cluster system failures. The results show that sequential rule approach outperforms SVM and HSMM in terms of precision and F-measure in 5hr prediction window, and in 1hr or 12hr prediction window, sequential rules, SVM and HSMM have their own strengths and weaknesses respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sahoo, R.K., Oliner, A.J., Rish, I., et al.: Critical Event Prediction for Proactive Management in Large scale Computer Clusters. In: 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2003), pp. 426–435. ACM Press, New York (2003)

    Google Scholar 

  2. Guan, Q., Zhang, Z., Fu, S.: Ensemble of Bayesian Predictors and Decision Trees for Proactive Failure Management in Cloud Computing Systems. Journal of Communications 7(1), 52–61 (2012)

    Article  Google Scholar 

  3. Pecchia, A., Cinque, M.: Log-Based Failure Analysis of Complex Systems: Methodology and Relevant Applications. In: Innovative Technologies for Dependable OTS-Based Critical Systems, pp. 203–215. Springer, Milan (2013)

    Chapter  Google Scholar 

  4. Fronza, I., Sillitti, A., Succi, G., et al.: Failure Prediction based on Log Files using Random Indexing and Support Vector Machines. Journal of Systems and Software 86(1), 2–11 (2012)

    Article  Google Scholar 

  5. Fu, X., Ren, R., Zhan, J., et al.: LogMaster: Mining Event Correlations in Logs of Large-scale Cluster Systems. In: 2012 IEEE 31st Symposium on Reliable Distributed Systems (SRDS 2012), pp. 71–80. IEEE Press, New York (2012)

    Chapter  Google Scholar 

  6. Salfner, F., Lenk, M., Malek, M.: A Survey of Online Failure Prediction Methods. Journal of ACM Computing Surveys 43(3), 22–29 (2010)

    Google Scholar 

  7. Wenjian, W., Changqian, M., Weizhen, L.: Online Prediction Model based on Support Vector Machine. Neurocomputing 71(4-6), 550–558 (2008)

    Article  Google Scholar 

  8. Zhenghua, X., Xiaoshe, D., et al.: A Survey on Failure Prediction of Large-scale Server Clusters. In: 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), pp. 733–738. IEEE Press, New York (2007)

    Google Scholar 

  9. Yinglung, L., Yanyong, Z., Jette, M.: BlueGene/L Failure Analysis and Prediction Models. In: Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2006), pp. 425–434. IEEE Press, New York (2007)

    Google Scholar 

  10. Fulp, E.W., Fink, G.A., Haack, J.N.: Predicting Computer System Failures Using Support Vector Machines. In: Workshop on the Analysis of System Logs (WASL 2008), p. 5. ACM Press, New York (2008)

    Google Scholar 

  11. Gujrati, P., Yawei, L., Zhiling, L., et al.: A Meta-Learning Failure Predictor for Blue Gene/L Systems. In: International Conference on Parallel Processing (ICPP 2007), pp. 40–47. IEEE Press, New York (2007)

    Google Scholar 

  12. Jiexing, G., Ziming, Z., Zhiling, L.: Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study. In: 2008 International Conference on Parallel Processing, pp. 157–164. IEEE Press, New York (2008)

    Google Scholar 

  13. Joshi, M., Agarwal, R., Kumar, V.: Mining Needle in a Haystack: Classifying Rare Classes via Two-phase Rule Induction. In: 2001 ACM SIGMOD International Conference on Management of Data, pp. 91–102. ACM Press, New York (2001)

    Chapter  Google Scholar 

  14. Chih-Chuang, C., Chih-Jen, L.: LIBSVM: a Library for Support Vector Machines (2001), Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

  15. Yinglung, L., Yanyong, Z., Hui, X.: Failure Prediction in IBM BlueGene/L Event Logs. In: IEEE Conference on Data Mining (ICDM 2007), pp. 583–588. IEEE Press, New York (2007)

    Google Scholar 

  16. Salfener, F., Malek, M.: Using Hidden Semi-markov Models for Effective Online Failure Prediction. In: 26th IEEE International Symposium on Reliable Distributed Systems, pp. 161–174. IEEE Press, New York (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, J., Li, H. (2013). The Failure Prediction of Cluster Systems Based on System Logs. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39787-5_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39786-8

  • Online ISBN: 978-3-642-39787-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics