Skip to main content

A Method for the Joint Analysis of Numerical and Textual IT-System Data to Predict Critical System States

  • Conference paper
  • First Online:
Software Technologies (ICSOFT 2020)

Abstract

We present a method for the joint analysis of textual and numerical IT-system data usable to predict possibly critical system states. Towards a comparative discussion culminating in a justified model and method choice, we apply logistic regression, random forest and neural networks to the prediction of critical system states. Our models consume a set of different monitoring performance metrics and log file events. To ease the analysis of IT-systems, our models judge the future system state using one binary outcome variable for the system state’s criticality as “alarm” or “no alarm”. Moreover, we use feature importance measures to give IT-operators guidance on which system parameters, i.e., features, to consider primarily when responding to an alarm. We evaluate our models using different configurations, including (among others) the demanded lead time window for incident response, and a set of common performance measures. This paper is an extension to previous work that adds details on how to jointly process textual and numerical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kubiak, P., Rass, S., Pinzger, M.: IT-Application Behaviour Analysis: Predicting Critical System States on OpenStack using Monitoring Performance Data and Log Files, pp. 589–596. SCITEPRESS - Science and Technology Publications, Lieusaint - Paris (2020)

    Google Scholar 

  2. Kubiak, P., Rass, S.: An overview of data-driven techniques for IT-service-management. IEEE Access 6, 63664–63688 (2018)

    Article  Google Scholar 

  3. Hochstein, A., Tamm, G., Brenner, W.: Service-oriented IT management: benefit, cost and success factors. In: Proceedings of the 13th European Conference on Information Systems, Information Systems in a Rapidly Changing Economy, Regensburg, Germany (2005)

    Google Scholar 

  4. Potgieter, B.C., Botha, J.H., Lew, C.: Evidence that use of the ITIL framework is effective. In: Proceedings of the 8th Annual Conference of the National Advisory Committee on Computing Qualifications, Tauranga, New Zealand, pp. 160–167 (2005)

    Google Scholar 

  5. Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. (CSUR), 42, 1–42 (2010)

    Google Scholar 

  6. Andrzejak, A., Silva, L.: Deterministic models of software aging and optimal rejuvenation schedules. In: 2007 10th IFIP/IEEE International Symposium on Integrated Network Management, pp 159–168. IEEE, Munich (2007)

    Google Scholar 

  7. Cheng, F.-T., Wu, S.-L., Tsai, P.-Y., et al.: Application cluster service scheme for near-zero-downtime services. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4062–4067. IEEE, Barcelona (2005)

    Google Scholar 

  8. Murray, J., Hughes, G., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the ICANN/ICONIP (2003)

    Google Scholar 

  9. Kiciman, E., Fox, A.: Detecting application - level failures in component-based inernet services. IEEE Trans. Neural Netw. 16, 1027–1041 (2005)

    Article  Google Scholar 

  10. Shen, J., Wan, J., Lim, S.-J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14, 155014771880648 (2018). https://doi.org/10.1177/1550147718806480

    Article  Google Scholar 

  11. Zeng, C., Tang, L., Li, T., et al.: Mining temporal lag from fluctuating events for correlation and root cause analysis. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM), Rio de Janeiro, Brazil (2014)

    Google Scholar 

  12. Kiran, R.U., Shang, H., Toyoda, M., Kitsuregawa, M.: Discovering recurring patterns in time series. In: Proceedings of the International Conference on Extending Database Technology, Brussels, Belgium (2015)

    Google Scholar 

  13. Kiyota, N., Shimamura, S., Hirata, K.: Extracting mutually dependent multisets. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 267–280. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67786-6_19

    Chapter  Google Scholar 

  14. Zöller, M.-A., Baum, M., Huber, M.F.: Framework for mining event correlations and time lags in large event sequences. In: Proceedings of the IEEE 15th International Conference on Industrial Informatics (INDIN), Emden, Germany (2017)

    Google Scholar 

  15. Kiernan, J., Terzi, E.: Constructing comprehensive summaries of large event sequences. ACM Trans. Knowl. Discov. Data (TKDD) 3, 1–31 (2009)

    Article  Google Scholar 

  16. Jiang, Y., Perng, C.S., Li, T.: Natural event summarization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (2011)

    Google Scholar 

  17. Luo, C., Fu, Q., Lou, J.-G., et al.: Correlating events with time series for incident diagnosis. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA (2014)

    Google Scholar 

  18. Li, T., et al.: Data-driven techniques in computing system management. ACM Comput. Surv. 50(3), 1–43 (2017). https://doi.org/10.1145/3092697

    Article  Google Scholar 

  19. Imai, K.: Quantitative Social Science: An Introduction. Princeton University Press, Woodstock (2017)

    Google Scholar 

  20. Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)

    Google Scholar 

  21. Raschka, S., Mirjalili, V.: Machine Learning mit Python und Scikit-Learn und TensorFlow: das umfassende Praxis-Handbuch für Data Science, Deep Learning und Predictive Analytics, 2nd Edn. mitp, Frechen (2018)

    Google Scholar 

  22. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17, 375–381 (2003). https://doi.org/10.1080/713827180

    Article  Google Scholar 

  23. Kuhn, M., Johnson, K.: Feature Engineering and Selection: A Practical Approach for Predictive Models. CRC Press, Taylor & Francis Group, Boca Raton (2020)

    Google Scholar 

  24. Schober, P., Boer, C., Schwarte, L.A.: Correlation coefficients: appropriate use and interpretation. Anesth. Analg. 126, 1763–1768 (2018)

    Article  Google Scholar 

  25. Salgado, C.M., Azevedo, C., Proença, H., Vieira, S.: Noise versus outliers. In: Secondary Analysis of Electronic Health Records, pp. 163–183. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43742-2_14

    Chapter  Google Scholar 

  26. Sauer, S.: Moderne Datenanalyse mit R: Daten einlesen, aufbereiten, visualisieren und modellieren. Springer, Wiesbaden (2018). https://doi.org/10.1007/978-3-658-21587-3

    Book  Google Scholar 

  27. Senaviratna, N.A.M.R, Cooray, T.M.J.A.: Diagnosing multicollinearity of logistic regression model. In: AJPAS, pp. 1–9 (2019). https://doi.org/10.9734/ajpas/2019/v5i230132

  28. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley, New York (2000)

    Book  Google Scholar 

  29. James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning: with applications in R. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7

    Book  MATH  Google Scholar 

  30. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  31. Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (2013)

    Google Scholar 

  32. Genuer, R., Poggi, J.-M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31, 2225–2236 (2010)

    Article  Google Scholar 

  33. Calle, M.L., Urrea, V.: Letter to the editor: stability of random forest importance measures. Brief. Bioinform. 12, 86–89 (2011). https://doi.org/10.1093/bib/bbq011

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Kubiak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kubiak, P., Rass, S., Pinzger, M., Schneider, S. (2021). A Method for the Joint Analysis of Numerical and Textual IT-System Data to Predict Critical System States. In: van Sinderen, M., Maciaszek, L.A., Fill, HG. (eds) Software Technologies. ICSOFT 2020. Communications in Computer and Information Science, vol 1447. Springer, Cham. https://doi.org/10.1007/978-3-030-83007-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-83007-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-83006-9

  • Online ISBN: 978-3-030-83007-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics