Elsevier

Information Systems

Volume 114, March 2023, 102177
Information Systems

Explainable concept drift in process mining

https://doi.org/10.1016/j.is.2023.102177Get rights and content

Highlights

  • We propose a data-driven approach to uncover explanations for process concept drifts.

  • The proposed technique detects and explains drifts for object-centric event data.

  • Our framework supports linear and non-linear relationships.

Abstract

The execution of processes leaves trails of event data in information systems. These event data are analyzed to generate insights and improvements for the underlying process. However, companies do not execute these processes in a vacuum. The fast pace of technological development, constantly changing market environments, and fast consumer responses expose companies to high levels of uncertainty. This uncertainty often manifests itself in significant changes in the executed processes. Such significant changes are called concept drifts. Transparency about concept drifts is crucial to respond quickly and adequately, limiting the potentially negative impact of such drifts. Three types of knowledge are of interest to a process owner: When did a drift occur, what happened, and why did it happen. This paper introduces a framework to extract concept drifts and their potential root causes from event data. We extract time series describing process measures, detect concept drifts, and test these drifts for correlation. This framework generalizes existing work such that object-centric event data with multiple case notions, non-linear relationships, and an arbitrary number of process measures are supported. We provide an extendable implementation and evaluate our framework concerning the sensitivity of the time series construction and scalability of cause–effect testing. Furthermore, we provide a case study uncovering an explainable concept drift.

Introduction

Throughout the past decades, information systems have become an elementary component supporting various businesses. Executions of business processes [1] leave traces of event data in such information systems, describing the conducted actions. These data can be analyzed to generate insights and improvements for the supported processes. The techniques dealing with these types of problems are summarized under the term process mining [2]. Applying such techniques helps businesses increase efficiency by understanding their processes and is, therefore, a fundamental part of every competitive company.

Different process mining techniques have different objectives of knowledge discovery. Process discovery, for example, delivers a visual representation of the possible executions paths within a process [3]. Conformance checking techniques allow a user to uncover deviations of process executions and measure the correspondence of a process to the recorded event data [4]. Other techniques aim to identify and solve problems within a process, e.g., by simulation [5] or by predicting problematic process executions [6].

Concept drift is one process-related problem that has drawn attention over the past years [7], [8], [9]. A concept drift is a significant change in a process over time. The occurrence of a concept drift may lead to several problems or inefficiencies. However, concept drifts are challenging to detect and analyze because the process is already dynamic, exhibiting stochastic behavior. In the literature, three main questions about problems related to concept drifts are of importance: (1) When did the concept drift occur? If a concept drift happens in the control-flow of the process, i.e., the ordering of executed actions, process discovery, conformance checking, and predictive monitoring techniques have to be adjusted accordingly [10]. (2) What happened and how did it happen? Significant changes need to be understood by the process owner to react accordingly [11]. (3) Why did it happen? Significant changes might be caused by other changes in the process [12]. Uncovering potential cause–effects helps to uncover problems and improve the process.

These three questions are the main research lines related to concept drift in process mining. Different works aim to solve one or multiple of these questions. In this work, we focus on the third research question. However, we can also generate insights for the first two research questions as a byproduct of our work.

  • RQ1

    Detection The existence of a concept drift needs to be detected, and its change point needs to be located as precisely as possible.

  • RQ2

    Characterization The nature of the concept drift needs to be described as accurately and extensively as possible.

  • RQ3

    Explanation Potential cause–effects of a concept drift contained within the event data need to be uncovered.

Several techniques have addressed RQ1 over the last years, with a focus on solving this problem for the control-flow of a process [13]. Generally, these techniques first calculate some numeric representation of the control-flow and, subsequently, use one of many change point detection algorithms [14] such as hypothesis testing, cost-based segmentation techniques, or visual inspection. Different techniques have been proposed to answer RQ2, both in an interactive and fully automated manner. Yeshchenko et al. [9] provide an extensive visual analytics framework for humans to explore concept drifts and understand the dynamics and changes behind them. Ostovar et al. [15] uncover change patterns of a concept drift by testing the data against predefined business process change patterns. These are given as textual output to the user. In recent work, we propose a general framework to answer RQ3 [12]. In this paper, we generalize and extend our previous work to address several challenges encountered in real-life information systems.

The first of these challenges is the so-called “object-centricity” of information systems: Traditional process mining techniques assume the existence of a single case notion and that each recorded event is associated with exactly one object of the case notion. In reality, an information system, e.g., an ERP system, consists of many case notions, e.g., different document types, and events may be related to multiple objects of different case notions [16]. To apply traditional process mining techniques, these object-centric event data have to be flattened first, forcing them into traditional event log format [17]. Flattening is related to certain problems (cf. Section 4.1) and provides misleading insights. Therefore, process mining techniques have to be adapted into the object-centric setting to provide accurate insights. Recently, academia and industry have picked up this challenge. On the one hand, many techniques have been proposed to translate traditional process mining problems to the object-centric setting [18], [19], [20], [21], [22], [23], [24]. On the other hand, industry leaders provide initial support of object-centric event data, e.g., Celonis by supporting object-centric process models through ProcessSphere, or Mehrwerk Process Mining by supporting multiple case identifiers.

The second challenge is non-linear cause–effect relationships contained in information systems. The investigation of cause–effect relationships behind concept drifts cannot be limited to linear relationships. Different dynamics in processes, e.g., workload–productivity relationships of resources [25], may show non-linear behavior.

The third challenge is the absence of domain knowledge. One cannot always assume that a basic knowledge or suspicion of a candidate perspective for a concept drift and its potential cause is present. In our original framework, the user has to choose one perspective to be investigated for concept drifts and another perspective to be investigated for potential causes. However, an approach stripped from its necessity for domain knowledge would be more generally applicable.

Therefore, the work presented in this paper is a generalization and extension of our original framework [12] in the following ways: (a) Our revised framework supports event data with multiple case notions. (b) We support the detection of non-linear relationships. (c) No choice about a primary and secondary perspective has to be made. The user can choose arbitrarily many features.

Our new framework is depicted in Fig. 1. The event log is first segmented into subsequent windows. For each of these windows, we calculate multiple numerical features subject to the user’s choice. These values are concatenated into a time series for each feature. Subsequently, we detect concept drifts in these time series. For each pair of features and each pair of concept drifts of the two series, we test for Granger causality [26] given the time difference between drifts. Non-linear relationships are covered by applying a kernel function. Granger-causal concept drift pairs are given to the user as explainable concept drifts.

We answer RQ1 by detecting concept drifts using existing concept drift detection techniques. The time series provide a visualization to explore the nature of the concept drift, helping in answering RQ2. The correlated concept drifts give explanations and potential root causes for drifts, answering RQ3.

The remainder of this paper is structured as follows: Section 2 introduces related work on concept drift in process mining. We formalize object-centric event data in Section 3. Extracting time series from object-centric event logs is introduced in Section 4. We give a general definition for concept drift detection in Section 5. Section 6 introduces Granger causality for time series and testing for non-linear relationships. Our general framework and a short overview of the implementation is given in Section 7. In Section 8, the framework is, first, evaluated for sensitivity and scalability and, second, applied to a real-life event log uncovering an explainable concept drift in a case study. We conclude this paper in Section 9.

Section snippets

Related work

Over the past years, many techniques dealing with problems related to concept drifts in processes have been introduced. This section discusses the scope of these approaches compared to our framework. Furthermore, we discuss the differentiation of this work from our previous work [12], for which this work constitutes an extension and generalization. Table 1 depicts an overview of the scope of papers on concept drift in process mining.

Most papers deal with the detection of the drift,

Event data

First, we introduce some notations used throughout this paper. A sequence σ:{1,,n}X assigns positions to elements of a set X. We denote a sequence σX with σ=x1,,xn for elements x1,,xnX. A sequence σ=x1,,xn is of length len(σ)=n. We denote subsequences with σ(l,k)=xj,,xk for l<k. The notation xσ is overloaded to express xrange(σ).

Event data describe the executions of a process as a collection of events. E is the universe of events. Each event corresponds to the execution of a

Time series extraction

This section introduces a general approach for transforming an object-centric event log into a time series. This approach is split into three steps: First, process executions are extracted from the event log. Second, the event log is segmented into windows of timeframes. Third, a numerical feature is calculated for each window.

Concept drift detection

In process mining, a plethora of techniques has been applied to detect concept drifts and locate their change points in time series constructed from event data of a process. Many of the techniques use hypothesis testing to compare the distribution of values for subsequent time windows [27], [29] or global cost function-based segmentation of the time series [11], [12]. We provide a general definition for a concept drift detection technique.

Definition 10 Concept Drift Detection

Let sR be a time series. A concept drift detection

Drift correlations

This section introduces the general formulation of Granger causality to determine linear and non-linear correlations between concept drifts. We first define Granger causality and subsequently provide a general definition to test for non-linear relationships. We combine these techniques to our setting to find concept drift correlations.

Granger causality [26] describes a statistical test to determine a weak form of causality between two time series based on linear regression. A time lag between

Framework for explainable concept drift

The previously introduced notations allow us to define our general framework to correlate significant changes in object-centric event data. Several parameters can be chosen, Table 5 depicts an overview of the parameters for different framework steps. We extract the correlated concept drifts called explainable concept drifts based on these parameters. The combinations of each pair of change points in both times series are considered for each pair of features. We test for Granger causality for

Evaluation

This section evaluates our proposed framework. We provide a quantitative evaluation in terms of sensitivity and scalability and, subsequently, showcase our framework in a case study. First, we investigate the sensitivity of the time series extraction to the chosen inclusion function and the window size. Second, we evaluate the scalability of the whole framework depending on the number of features. We will not evaluate the performance of different concept drift detection techniques. The

Conclusion

In this paper, we introduced a framework to uncover explainable concept drifts in object-centric event data. Our framework is split into three steps. First, time series for different features are extracted from an object-centric event log and, second, investigated for concept drifts. Third, we test for correlations between concept drifts using Granger causality. Through the use of kernel functions, we can test for non-linear relationships. In our evaluation, we investigated the choice of

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research (grant no. 1191945).

References (63)

  • CamargoM. et al.

    Automated discovery of business process simulation models from event logs

    Decis. Support Syst.

    (2020)
  • TaxN. et al.

    Predictive business process monitoring with LSTM neural networks

  • BoseR.P.J.C. et al.

    Dealing with concept drifts in process mining

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • BrockhoffT. et al.

    Time-aware concept drift detection using the earth mover’s distance

  • YeshchenkoA. et al.

    Visual drift detection for sequence data analysis of business processes

    IEEE Trans. Vis. Comput. Graphics

    (2021)
  • ChamorroA.E.M. et al.

    Updating prediction models for predictive process monitoring

  • YeshchenkoA. et al.

    Comprehensive process drift detection with visual analytics

  • AdamsJ.N. et al.

    A framework for explainable concept drift detection in process mining

  • SatoD.M.V. et al.

    A survey on concept drift in process mining

    ACM Comput. Surv.

    (2022)
  • AminikhanghahiS. et al.

    A survey of methods for time series change point detection

    Knowl. Inf. Syst.

    (2017)
  • OstovarA. et al.

    Characterizing drift from event streams of business processes

  • van der AalstW.M.P.

    Process mining manifesto

  • van der AalstW.M.P.

    Object-centric process mining: Dealing with divergence and convergence in event data

  • van der AalstW.M.P. et al.

    Discovering object-centric Petri nets

    Fundam. Inform.

    (2020)
  • EsserS. et al.

    Multi-dimensional event data in graph databases

    J. Data Semant.

    (2021)
  • WaibelP. et al.

    Causal process mining from relational databases with domain knowledge

    (2022)
  • AdamsJ.N. et al.

    Precision and fitness in object-centric process mining

  • FahlandD.

    Process mining over multiple behavioral dimensions with event knowledge graphs

  • AdamsJ.N. et al.

    Defining cases and variants for object-centric event data

  • ParkG. et al.

    OPerA: Object-centric performance analysis

  • NakatumbaJ. et al.

    Analyzing resource behavior using process mining

  • Cited by (15)

    • Towards Business Process Observability

      2024, ACM International Conference Proceeding Series
    View all citing articles on Scopus
    View full text