A Method for the Joint Analysis of Numerical and Textual IT-System Data to Predict Critical System States

Kubiak, Patrick; Rass, Stefan; Pinzger, Martin; Schneider, Stephan

doi:10.1007/978-3-030-83007-6_12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1447))

Included in the following conference series:

International Conference on Software Technologies

264 Accesses

Abstract

We present a method for the joint analysis of textual and numerical IT-system data usable to predict possibly critical system states. Towards a comparative discussion culminating in a justified model and method choice, we apply logistic regression, random forest and neural networks to the prediction of critical system states. Our models consume a set of different monitoring performance metrics and log file events. To ease the analysis of IT-systems, our models judge the future system state using one binary outcome variable for the system state’s criticality as “alarm” or “no alarm”. Moreover, we use feature importance measures to give IT-operators guidance on which system parameters, i.e., features, to consider primarily when responding to an alarm. We evaluate our models using different configurations, including (among others) the demanded lead time window for incident response, and a set of common performance measures. This paper is an extension to previous work that adds details on how to jointly process textual and numerical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kubiak, P., Rass, S., Pinzger, M.: IT-Application Behaviour Analysis: Predicting Critical System States on OpenStack using Monitoring Performance Data and Log Files, pp. 589–596. SCITEPRESS - Science and Technology Publications, Lieusaint - Paris (2020)
Google Scholar
Kubiak, P., Rass, S.: An overview of data-driven techniques for IT-service-management. IEEE Access 6, 63664–63688 (2018)
Article Google Scholar
Hochstein, A., Tamm, G., Brenner, W.: Service-oriented IT management: benefit, cost and success factors. In: Proceedings of the 13th European Conference on Information Systems, Information Systems in a Rapidly Changing Economy, Regensburg, Germany (2005)
Google Scholar
Potgieter, B.C., Botha, J.H., Lew, C.: Evidence that use of the ITIL framework is effective. In: Proceedings of the 8th Annual Conference of the National Advisory Committee on Computing Qualifications, Tauranga, New Zealand, pp. 160–167 (2005)
Google Scholar
Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. (CSUR), 42, 1–42 (2010)
Google Scholar
Andrzejak, A., Silva, L.: Deterministic models of software aging and optimal rejuvenation schedules. In: 2007 10th IFIP/IEEE International Symposium on Integrated Network Management, pp 159–168. IEEE, Munich (2007)
Google Scholar
Cheng, F.-T., Wu, S.-L., Tsai, P.-Y., et al.: Application cluster service scheme for near-zero-downtime services. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4062–4067. IEEE, Barcelona (2005)
Google Scholar
Murray, J., Hughes, G., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the ICANN/ICONIP (2003)
Google Scholar
Kiciman, E., Fox, A.: Detecting application - level failures in component-based inernet services. IEEE Trans. Neural Netw. 16, 1027–1041 (2005)
Article Google Scholar
Shen, J., Wan, J., Lim, S.-J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14, 155014771880648 (2018). https://doi.org/10.1177/1550147718806480
Article Google Scholar
Zeng, C., Tang, L., Li, T., et al.: Mining temporal lag from fluctuating events for correlation and root cause analysis. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM), Rio de Janeiro, Brazil (2014)
Google Scholar
Kiran, R.U., Shang, H., Toyoda, M., Kitsuregawa, M.: Discovering recurring patterns in time series. In: Proceedings of the International Conference on Extending Database Technology, Brussels, Belgium (2015)
Google Scholar
Kiyota, N., Shimamura, S., Hirata, K.: Extracting mutually dependent multisets. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 267–280. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67786-6_19
Chapter Google Scholar
Zöller, M.-A., Baum, M., Huber, M.F.: Framework for mining event correlations and time lags in large event sequences. In: Proceedings of the IEEE 15th International Conference on Industrial Informatics (INDIN), Emden, Germany (2017)
Google Scholar
Kiernan, J., Terzi, E.: Constructing comprehensive summaries of large event sequences. ACM Trans. Knowl. Discov. Data (TKDD) 3, 1–31 (2009)
Article Google Scholar
Jiang, Y., Perng, C.S., Li, T.: Natural event summarization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (2011)
Google Scholar
Luo, C., Fu, Q., Lou, J.-G., et al.: Correlating events with time series for incident diagnosis. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA (2014)
Google Scholar
Li, T., et al.: Data-driven techniques in computing system management. ACM Comput. Surv. 50(3), 1–43 (2017). https://doi.org/10.1145/3092697
Article Google Scholar
Imai, K.: Quantitative Social Science: An Introduction. Princeton University Press, Woodstock (2017)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)
Google Scholar
Raschka, S., Mirjalili, V.: Machine Learning mit Python und Scikit-Learn und TensorFlow: das umfassende Praxis-Handbuch für Data Science, Deep Learning und Predictive Analytics, 2nd Edn. mitp, Frechen (2018)
Google Scholar
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17, 375–381 (2003). https://doi.org/10.1080/713827180
Article Google Scholar
Kuhn, M., Johnson, K.: Feature Engineering and Selection: A Practical Approach for Predictive Models. CRC Press, Taylor & Francis Group, Boca Raton (2020)
Google Scholar
Schober, P., Boer, C., Schwarte, L.A.: Correlation coefficients: appropriate use and interpretation. Anesth. Analg. 126, 1763–1768 (2018)
Article Google Scholar
Salgado, C.M., Azevedo, C., Proença, H., Vieira, S.: Noise versus outliers. In: Secondary Analysis of Electronic Health Records, pp. 163–183. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43742-2_14
Chapter Google Scholar
Sauer, S.: Moderne Datenanalyse mit R: Daten einlesen, aufbereiten, visualisieren und modellieren. Springer, Wiesbaden (2018). https://doi.org/10.1007/978-3-658-21587-3
Book Google Scholar
Senaviratna, N.A.M.R, Cooray, T.M.J.A.: Diagnosing multicollinearity of logistic regression model. In: AJPAS, pp. 1–9 (2019). https://doi.org/10.9734/ajpas/2019/v5i230132
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley, New York (2000)
Book Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning: with applications in R. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Book MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (2013)
Google Scholar
Genuer, R., Poggi, J.-M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31, 2225–2236 (2010)
Article Google Scholar
Calle, M.L., Urrea, V.: Letter to the editor: stability of random forest importance measures. Brief. Bioinform. 12, 86–89 (2011). https://doi.org/10.1093/bib/bbq011
Article Google Scholar

Download references

Author information

Authors and Affiliations

Volkswagen Financial Services AG, Brunswick, Germany
Patrick Kubiak
Alpen-Adria-University, Klagenfurt, Austria
Stefan Rass & Martin Pinzger
University of Applied Sciences Kiel, Kiel, Germany
Stephan Schneider

Authors

Patrick Kubiak
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Rass
View author publications
You can also search for this author in PubMed Google Scholar
Martin Pinzger
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Kubiak .

Editor information

Editors and Affiliations

Information Systems Group, Enschede, The Netherlands
Marten van Sinderen
Wrocław University of Economics Institute of Business Informatics, Wrocław, Poland
Leszek A. Maciaszek
Universität Fribourg, Fribourg, Switzerland
Hans-Georg Fill

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kubiak, P., Rass, S., Pinzger, M., Schneider, S. (2021). A Method for the Joint Analysis of Numerical and Textual IT-System Data to Predict Critical System States. In: van Sinderen, M., Maciaszek, L.A., Fill, HG. (eds) Software Technologies. ICSOFT 2020. Communications in Computer and Information Science, vol 1447. Springer, Cham. https://doi.org/10.1007/978-3-030-83007-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-83007-6_12
Published: 21 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83006-9
Online ISBN: 978-3-030-83007-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics