Authors:
Patrick Kubiak
1
;
Stefan Rass
2
and
Martin Pinzger
2
Affiliations:
1
Volkswagen Financial Services AG, Brunswick, Germany
;
2
Alpen-Adria-University, Klagenfurt, Austria
Keyword(s):
Data Science, IT-Operations, Log File Analysis, Failure Prediction.
Abstract:
Recent studies have proposed several ways to optimize the stability of IT-services with an extensive portfolio of processual, reactive or proactive approaches. The goal of this paper is to combine monitored performance data, such as CPU utilization, with discrete data from log files in a joint model to predict critical system states. We propose a systematic method to derive mathematical prediction models, which we experimentally test using a downsized clone of a real life contract management system as a testbed. First, this testbed is used for data acquisition under variable and fully controllable system loads. Next, based on the monitored performance metrics and log file data, we train models (logistic regression and decision trees) that unify both, numeric and textual, data types in a single incident forecasting model. We focus on 1) investigating different cases to identify an appropriate prediction time window, allowing to prepare countermeasures by considering prediction accurac
y and 2) identifying variables that appear more likely than others in the predictive models.
(More)