Abstract
We present a method for the joint analysis of textual and numerical IT-system data usable to predict possibly critical system states. Towards a comparative discussion culminating in a justified model and method choice, we apply logistic regression, random forest and neural networks to the prediction of critical system states. Our models consume a set of different monitoring performance metrics and log file events. To ease the analysis of IT-systems, our models judge the future system state using one binary outcome variable for the system state’s criticality as “alarm” or “no alarm”. Moreover, we use feature importance measures to give IT-operators guidance on which system parameters, i.e., features, to consider primarily when responding to an alarm. We evaluate our models using different configurations, including (among others) the demanded lead time window for incident response, and a set of common performance measures. This paper is an extension to previous work that adds details on how to jointly process textual and numerical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kubiak, P., Rass, S., Pinzger, M.: IT-Application Behaviour Analysis: Predicting Critical System States on OpenStack using Monitoring Performance Data and Log Files, pp. 589–596. SCITEPRESS - Science and Technology Publications, Lieusaint - Paris (2020)
Kubiak, P., Rass, S.: An overview of data-driven techniques for IT-service-management. IEEE Access 6, 63664–63688 (2018)
Hochstein, A., Tamm, G., Brenner, W.: Service-oriented IT management: benefit, cost and success factors. In: Proceedings of the 13th European Conference on Information Systems, Information Systems in a Rapidly Changing Economy, Regensburg, Germany (2005)
Potgieter, B.C., Botha, J.H., Lew, C.: Evidence that use of the ITIL framework is effective. In: Proceedings of the 8th Annual Conference of the National Advisory Committee on Computing Qualifications, Tauranga, New Zealand, pp. 160–167 (2005)
Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. (CSUR), 42, 1–42 (2010)
Andrzejak, A., Silva, L.: Deterministic models of software aging and optimal rejuvenation schedules. In: 2007 10th IFIP/IEEE International Symposium on Integrated Network Management, pp 159–168. IEEE, Munich (2007)
Cheng, F.-T., Wu, S.-L., Tsai, P.-Y., et al.: Application cluster service scheme for near-zero-downtime services. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4062–4067. IEEE, Barcelona (2005)
Murray, J., Hughes, G., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the ICANN/ICONIP (2003)
Kiciman, E., Fox, A.: Detecting application - level failures in component-based inernet services. IEEE Trans. Neural Netw. 16, 1027–1041 (2005)
Shen, J., Wan, J., Lim, S.-J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14, 155014771880648 (2018). https://doi.org/10.1177/1550147718806480
Zeng, C., Tang, L., Li, T., et al.: Mining temporal lag from fluctuating events for correlation and root cause analysis. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM), Rio de Janeiro, Brazil (2014)
Kiran, R.U., Shang, H., Toyoda, M., Kitsuregawa, M.: Discovering recurring patterns in time series. In: Proceedings of the International Conference on Extending Database Technology, Brussels, Belgium (2015)
Kiyota, N., Shimamura, S., Hirata, K.: Extracting mutually dependent multisets. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 267–280. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67786-6_19
Zöller, M.-A., Baum, M., Huber, M.F.: Framework for mining event correlations and time lags in large event sequences. In: Proceedings of the IEEE 15th International Conference on Industrial Informatics (INDIN), Emden, Germany (2017)
Kiernan, J., Terzi, E.: Constructing comprehensive summaries of large event sequences. ACM Trans. Knowl. Discov. Data (TKDD) 3, 1–31 (2009)
Jiang, Y., Perng, C.S., Li, T.: Natural event summarization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (2011)
Luo, C., Fu, Q., Lou, J.-G., et al.: Correlating events with time series for incident diagnosis. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA (2014)
Li, T., et al.: Data-driven techniques in computing system management. ACM Comput. Surv. 50(3), 1–43 (2017). https://doi.org/10.1145/3092697
Imai, K.: Quantitative Social Science: An Introduction. Princeton University Press, Woodstock (2017)
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)
Raschka, S., Mirjalili, V.: Machine Learning mit Python und Scikit-Learn und TensorFlow: das umfassende Praxis-Handbuch für Data Science, Deep Learning und Predictive Analytics, 2nd Edn. mitp, Frechen (2018)
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17, 375–381 (2003). https://doi.org/10.1080/713827180
Kuhn, M., Johnson, K.: Feature Engineering and Selection: A Practical Approach for Predictive Models. CRC Press, Taylor & Francis Group, Boca Raton (2020)
Schober, P., Boer, C., Schwarte, L.A.: Correlation coefficients: appropriate use and interpretation. Anesth. Analg. 126, 1763–1768 (2018)
Salgado, C.M., Azevedo, C., Proença, H., Vieira, S.: Noise versus outliers. In: Secondary Analysis of Electronic Health Records, pp. 163–183. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43742-2_14
Sauer, S.: Moderne Datenanalyse mit R: Daten einlesen, aufbereiten, visualisieren und modellieren. Springer, Wiesbaden (2018). https://doi.org/10.1007/978-3-658-21587-3
Senaviratna, N.A.M.R, Cooray, T.M.J.A.: Diagnosing multicollinearity of logistic regression model. In: AJPAS, pp. 1–9 (2019). https://doi.org/10.9734/ajpas/2019/v5i230132
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley, New York (2000)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning: with applications in R. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (2013)
Genuer, R., Poggi, J.-M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31, 2225–2236 (2010)
Calle, M.L., Urrea, V.: Letter to the editor: stability of random forest importance measures. Brief. Bioinform. 12, 86–89 (2011). https://doi.org/10.1093/bib/bbq011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kubiak, P., Rass, S., Pinzger, M., Schneider, S. (2021). A Method for the Joint Analysis of Numerical and Textual IT-System Data to Predict Critical System States. In: van Sinderen, M., Maciaszek, L.A., Fill, HG. (eds) Software Technologies. ICSOFT 2020. Communications in Computer and Information Science, vol 1447. Springer, Cham. https://doi.org/10.1007/978-3-030-83007-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-83007-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83006-9
Online ISBN: 978-3-030-83007-6
eBook Packages: Computer ScienceComputer Science (R0)