Explainable concept drift in process mining

doi:10.1016/j.is.2023.102177

Information Systems

Volume 114, March 2023, 102177

https://doi.org/10.1016/j.is.2023.102177 Get rights and content

Highlights

•
We propose a data-driven approach to uncover explanations for process concept drifts.
•
The proposed technique detects and explains drifts for object-centric event data.
•
Our framework supports linear and non-linear relationships.

Abstract

The execution of processes leaves trails of event data in information systems. These event data are analyzed to generate insights and improvements for the underlying process. However, companies do not execute these processes in a vacuum. The fast pace of technological development, constantly changing market environments, and fast consumer responses expose companies to high levels of uncertainty. This uncertainty often manifests itself in significant changes in the executed processes. Such significant changes are called concept drifts. Transparency about concept drifts is crucial to respond quickly and adequately, limiting the potentially negative impact of such drifts. Three types of knowledge are of interest to a process owner: When did a drift occur, what happened, and why did it happen. This paper introduces a framework to extract concept drifts and their potential root causes from event data. We extract time series describing process measures, detect concept drifts, and test these drifts for correlation. This framework generalizes existing work such that object-centric event data with multiple case notions, non-linear relationships, and an arbitrary number of process measures are supported. We provide an extendable implementation and evaluate our framework concerning the sensitivity of the time series construction and scalability of cause–effect testing. Furthermore, we provide a case study uncovering an explainable concept drift.

Graphical abstract

Introduction

Throughout the past decades, information systems have become an elementary component supporting various businesses. Executions of business processes [1] leave traces of event data in such information systems, describing the conducted actions. These data can be analyzed to generate insights and improvements for the supported processes. The techniques dealing with these types of problems are summarized under the term process mining [2]. Applying such techniques helps businesses increase efficiency by understanding their processes and is, therefore, a fundamental part of every competitive company.

Different process mining techniques have different objectives of knowledge discovery. Process discovery, for example, delivers a visual representation of the possible executions paths within a process [3]. Conformance checking techniques allow a user to uncover deviations of process executions and measure the correspondence of a process to the recorded event data [4]. Other techniques aim to identify and solve problems within a process, e.g., by simulation [5] or by predicting problematic process executions [6].

Concept drift is one process-related problem that has drawn attention over the past years [7], [8], [9]. A concept drift is a significant change in a process over time. The occurrence of a concept drift may lead to several problems or inefficiencies. However, concept drifts are challenging to detect and analyze because the process is already dynamic, exhibiting stochastic behavior. In the literature, three main questions about problems related to concept drifts are of importance: (1) When did the concept drift occur? If a concept drift happens in the control-flow of the process, i.e., the ordering of executed actions, process discovery, conformance checking, and predictive monitoring techniques have to be adjusted accordingly [10]. (2) What happened and how did it happen? Significant changes need to be understood by the process owner to react accordingly [11]. (3) Why did it happen? Significant changes might be caused by other changes in the process [12]. Uncovering potential cause–effects helps to uncover problems and improve the process.

These three questions are the main research lines related to concept drift in process mining. Different works aim to solve one or multiple of these questions. In this work, we focus on the third research question. However, we can also generate insights for the first two research questions as a byproduct of our work.

RQ1
Detection The existence of a concept drift needs to be detected, and its change point needs to be located as precisely as possible.
RQ2
Characterization The nature of the concept drift needs to be described as accurately and extensively as possible.
RQ3
Explanation Potential cause–effects of a concept drift contained within the event data need to be uncovered.

Several techniques have addressed RQ1 over the last years, with a focus on solving this problem for the control-flow of a process [13]. Generally, these techniques first calculate some numeric representation of the control-flow and, subsequently, use one of many change point detection algorithms [14] such as hypothesis testing, cost-based segmentation techniques, or visual inspection. Different techniques have been proposed to answer RQ2, both in an interactive and fully automated manner. Yeshchenko et al. [9] provide an extensive visual analytics framework for humans to explore concept drifts and understand the dynamics and changes behind them. Ostovar et al. [15] uncover change patterns of a concept drift by testing the data against predefined business process change patterns. These are given as textual output to the user. In recent work, we propose a general framework to answer RQ3 [12]. In this paper, we generalize and extend our previous work to address several challenges encountered in real-life information systems.

The first of these challenges is the so-called “object-centricity” of information systems: Traditional process mining techniques assume the existence of a single case notion and that each recorded event is associated with exactly one object of the case notion. In reality, an information system, e.g., an ERP system, consists of many case notions, e.g., different document types, and events may be related to multiple objects of different case notions [16]. To apply traditional process mining techniques, these object-centric event data have to be flattened first, forcing them into traditional event log format [17]. Flattening is related to certain problems (cf. Section 4.1) and provides misleading insights. Therefore, process mining techniques have to be adapted into the object-centric setting to provide accurate insights. Recently, academia and industry have picked up this challenge. On the one hand, many techniques have been proposed to translate traditional process mining problems to the object-centric setting [18], [19], [20], [21], [22], [23], [24]. On the other hand, industry leaders provide initial support of object-centric event data, e.g., Celonis by supporting object-centric process models through ProcessSphere, or Mehrwerk Process Mining by supporting multiple case identifiers.

The second challenge is non-linear cause–effect relationships contained in information systems. The investigation of cause–effect relationships behind concept drifts cannot be limited to linear relationships. Different dynamics in processes, e.g., workload–productivity relationships of resources [25], may show non-linear behavior.

The third challenge is the absence of domain knowledge. One cannot always assume that a basic knowledge or suspicion of a candidate perspective for a concept drift and its potential cause is present. In our original framework, the user has to choose one perspective to be investigated for concept drifts and another perspective to be investigated for potential causes. However, an approach stripped from its necessity for domain knowledge would be more generally applicable.

Therefore, the work presented in this paper is a generalization and extension of our original framework [12] in the following ways: (a) Our revised framework supports event data with multiple case notions. (b) We support the detection of non-linear relationships. (c) No choice about a primary and secondary perspective has to be made. The user can choose arbitrarily many features.

Our new framework is depicted in Fig. 1. The event log is first segmented into subsequent windows. For each of these windows, we calculate multiple numerical features subject to the user’s choice. These values are concatenated into a time series for each feature. Subsequently, we detect concept drifts in these time series. For each pair of features and each pair of concept drifts of the two series, we test for Granger causality [26] given the time difference between drifts. Non-linear relationships are covered by applying a kernel function. Granger-causal concept drift pairs are given to the user as explainable concept drifts.

We answer RQ1 by detecting concept drifts using existing concept drift detection techniques. The time series provide a visualization to explore the nature of the concept drift, helping in answering RQ2. The correlated concept drifts give explanations and potential root causes for drifts, answering RQ3.

The remainder of this paper is structured as follows: Section 2 introduces related work on concept drift in process mining. We formalize object-centric event data in Section 3. Extracting time series from object-centric event logs is introduced in Section 4. We give a general definition for concept drift detection in Section 5. Section 6 introduces Granger causality for time series and testing for non-linear relationships. Our general framework and a short overview of the implementation is given in Section 7. In Section 8, the framework is, first, evaluated for sensitivity and scalability and, second, applied to a real-life event log uncovering an explainable concept drift in a case study. We conclude this paper in Section 9.

Section snippets

Related work

Over the past years, many techniques dealing with problems related to concept drifts in processes have been introduced. This section discusses the scope of these approaches compared to our framework. Furthermore, we discuss the differentiation of this work from our previous work [12], for which this work constitutes an extension and generalization. Table 1 depicts an overview of the scope of papers on concept drift in process mining.

Most papers deal with the detection of the drift,

Event data

First, we introduce some notations used throughout this paper. A sequence $σ : {1, \dots, n} \to X$ assigns positions to elements of a set $X$ . We denote a sequence $σ \in X^{*}$ with $σ = 〈 x_{1}, \dots, x_{n} 〉$ for elements $x_{1}, \dots, x_{n} \in X$ . A sequence $σ = 〈 x_{1}, \dots, x_{n} 〉$ is of length $l e n (σ) = n$ . We denote subsequences with $σ (l, k) = 〈 x_{j}, \dots, x_{k} 〉$ for $l < k$ . The notation $x \in σ$ is overloaded to express $x \in r a n g e (σ)$ .

Event data describe the executions of a process as a collection of events. $E$ is the universe of events. Each event corresponds to the execution of a

Time series extraction

This section introduces a general approach for transforming an object-centric event log into a time series. This approach is split into three steps: First, process executions are extracted from the event log. Second, the event log is segmented into windows of timeframes. Third, a numerical feature is calculated for each window.

Concept drift detection

In process mining, a plethora of techniques has been applied to detect concept drifts and locate their change points in time series constructed from event data of a process. Many of the techniques use hypothesis testing to compare the distribution of values for subsequent time windows [27], [29] or global cost function-based segmentation of the time series [11], [12]. We provide a general definition for a concept drift detection technique.

Definition 10 Concept Drift Detection

Let $s \in R^{*}$ be a time series. A concept drift detection

Drift correlations

This section introduces the general formulation of Granger causality to determine linear and non-linear correlations between concept drifts. We first define Granger causality and subsequently provide a general definition to test for non-linear relationships. We combine these techniques to our setting to find concept drift correlations.

Granger causality [26] describes a statistical test to determine a weak form of causality between two time series based on linear regression. A time lag between

Framework for explainable concept drift

The previously introduced notations allow us to define our general framework to correlate significant changes in object-centric event data. Several parameters can be chosen, Table 5 depicts an overview of the parameters for different framework steps. We extract the correlated concept drifts called explainable concept drifts based on these parameters. The combinations of each pair of change points in both times series are considered for each pair of features. We test for Granger causality for

Evaluation

This section evaluates our proposed framework. We provide a quantitative evaluation in terms of sensitivity and scalability and, subsequently, showcase our framework in a case study. First, we investigate the sensitivity of the time series extraction to the chosen inclusion function and the window size. Second, we evaluate the scalability of the whole framework depending on the number of features. We will not evaluate the performance of different concept drift detection techniques. The

Conclusion

In this paper, we introduced a framework to uncover explainable concept drifts in object-centric event data. Our framework is split into three steps. First, time series for different features are extracted from an object-centric event log and, second, investigated for concept drifts. Third, we test for correlations between concept drifts using Granger causality. Through the use of kernel functions, we can test for non-linear relationships. In our evaluation, we investigated the choice of

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research (grant no. 1191945).

References (63)

ChenY. et al.
Analyzing multiple nonlinear time series with extended granger causality
Phys. Lett. A
(2004)
MarinazzoD. et al.
Nonlinear connectivity by granger causality
Neuroimage
(2011)
de LeoniM. et al.
A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs
Inf. Syst.
(2016)
GalantiR. et al.
Object-centric process predictive analytics
Expert Syst. Appl.
(2023)
AdamsJ.N. et al.
Ocpa: a python library for object-centric process analysis
Software Impacts
(2022)
AdamsJ.N. et al.
OC $π$ : Object-centric process insights
DumasM. et al.
Fundamentals of Business Process Management
(2018)
van der AalstW.M.P.
Process Mining: Data Science in Action
(2016)
LeemansS.J.J. et al.
Discovering block-structured process models from event logs - A constructive approach
AdriansyahA. et al.
Conformance checking using cost-based fitness analysis

CamargoM. et al.

Automated discovery of business process simulation models from event logs

Decis. Support Syst.

(2020)

TaxN. et al.

Predictive business process monitoring with LSTM neural networks

BoseR.P.J.C. et al.

Dealing with concept drifts in process mining

IEEE Trans. Neural Netw. Learn. Syst.

(2014)

BrockhoffT. et al.

Time-aware concept drift detection using the earth mover’s distance

YeshchenkoA. et al.

Visual drift detection for sequence data analysis of business processes

IEEE Trans. Vis. Comput. Graphics

(2021)

ChamorroA.E.M. et al.

Updating prediction models for predictive process monitoring

YeshchenkoA. et al.

Comprehensive process drift detection with visual analytics

AdamsJ.N. et al.

A framework for explainable concept drift detection in process mining

SatoD.M.V. et al.

A survey on concept drift in process mining

ACM Comput. Surv.

(2022)

AminikhanghahiS. et al.

A survey of methods for time series change point detection

Knowl. Inf. Syst.

(2017)

OstovarA. et al.

Characterizing drift from event streams of business processes

van der AalstW.M.P.

Process mining manifesto

van der AalstW.M.P.

Object-centric process mining: Dealing with divergence and convergence in event data

van der AalstW.M.P. et al.

Discovering object-centric Petri nets

Fundam. Inform.

(2020)

EsserS. et al.

Multi-dimensional event data in graph databases

J. Data Semant.

(2021)

WaibelP. et al.

Causal process mining from relational databases with domain knowledge

(2022)

AdamsJ.N. et al.

Precision and fitness in object-centric process mining

FahlandD.

Process mining over multiple behavioral dimensions with event knowledge graphs

AdamsJ.N. et al.

Defining cases and variants for object-centric event data

ParkG. et al.

OPerA: Object-centric performance analysis

NakatumbaJ. et al.

Analyzing resource behavior using process mining

Cited by (15)

An adaptive imbalance modified online broad learning system-based fault diagnosis for imbalanced chemical process data stream
2023, Expert Systems with Applications
Modern chemical process industry is becoming larger and more complicated to achieve a higher level of technical functionality. There is less tolerance for functional degeneration, productivity retrogression, and safety hazards, which significantly leads to an ever-increasing demand on detecting any potential faults as early as possible. In reality, chemical process data are continuous generated with imbalanced fault patterns, which leads to the fault diagnosis models failing to assign the same attention to minority fault patterns as the majority and further leads to the lack of generalization ability. In the present study, a novel adaptive imbalance modified online broad learning system (AIM-OBLS) is developed to promote fault diagnosis in the contexts of imbalanced chemical process data streams. The proposed AIM-OBLS is developed on a flat linear network, which can excavate potential information efficiently in an incremental manner. An adaptive imbalance modified method combined with the Niche technique, oversampling technique, and manifold regularization is presented for imbalanced data streams modification. The advantages of the proposed AIM-OBLS are demonstrated on two widely used industrial datasets. Experimental results indicate that AIM-OBLS can effectively deal with the imbalance chemical process data streams. The performance of AIM-OBLS is competitive in terms of both diagnosis accuracy and time consumption.
Pattern-based action engine: Generating process management actions using temporal patterns of process-centric problems
2023, Computers in Industry
As business environments become more competitive, organizations strive to improve their business processes to reduce costs and increase quality and productivity. As process improvement traditionally embraces manual creative tasks that are time-consuming and labor-intensive, the need for automating it arises. Action-Oriented Process Mining (AOPM) aims to support automated process improvement by leveraging various process mining techniques. To that end, AOPM first monitors the presence of operational constraints, i.e., operational problems, in business processes, e.g., a high waiting time for patients to register. Next, it produces interim management actions designed to address these transient problems by analyzing the monitoring results. For instance, if an excessive waiting time persists for more than a week, the system might recommend dispatching additional resources for the upcoming week. Contrary to the mature process mining support for monitoring operational constraints, the action part is typically missing in today’s process mining tools. In this work, we propose an action engine to support the automatic generation of actions. It analyzes temporal patterns of monitoring results and produces action plans that describe the execution of management actions. We have demonstrated a use case using the data of a Dutch financial institute to evaluate the feasibility of the proposed action engine and conducted experiments to evaluate its effectiveness.
A new concept drift detection method based on the ranking of features in a data stream
2023, Procedia Computer Science
The article presents a new concept drift detection method based on analyzing the importance of features of instances in the data stream. The data stream contains information about distribution patterns that reflect different concepts that may be hidden in the data stream. The presented drift detector concept uses information about the fluctuation of the most informative feature inside chunks of the data stream and compares it with the change of the same feature in neighbor chunks. In the case of data streams, the meaning of features can change over time. These changes affect the quality of the classification but can also be a significant indicator of ongoing concept drift. After detecting the drift, the classifier should be trained with the new dataset. But this issue is not addressed in this article.
In this work, we propose a new concept drift detector in the data stream for the first time. This goal is achieved by observing the changing importance of features in different parts of the data stream. The proposed approach uses the feature significance measure as a drift detector. The obtained results indicate that the method can be introduced in practice. Because these are only preliminary results, in this paper, we focused on presenting the advantages of our strategy without comparison with other methods.
Thwarting Cybersecurity Attacks with Explainable Concept Drift
2024, arXiv
Towards Business Process Observability
2024, ACM International Conference Proceeding Series
TV-ALP: A log dataset of television assembly line production under multi-person collaboration for process mining research
2024, Applied Intelligence

View all citing articles on Scopus

View full text

Explainable concept drift in process mining

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related work

Event data

Time series extraction

Concept drift detection

Drift correlations

Framework for explainable concept drift

Evaluation

Conclusion

Declaration of Competing Interest

Phys. Lett. A

Neuroimage

Inf. Syst.

Expert Syst. Appl.

Software Impacts

Fundamentals of Business Process Management

Process Mining: Data Science in Action

Discovering block-structured process models from event logs - A constructive approach

Conformance checking using cost-based fitness analysis

Automated discovery of business process simulation models from event logs

Decis. Support Syst.

Predictive business process monitoring with LSTM neural networks

Dealing with concept drifts in process mining

IEEE Trans. Neural Netw. Learn. Syst.

Time-aware concept drift detection using the earth mover’s distance

Visual drift detection for sequence data analysis of business processes

IEEE Trans. Vis. Comput. Graphics

Updating prediction models for predictive process monitoring

Comprehensive process drift detection with visual analytics

A framework for explainable concept drift detection in process mining

A survey on concept drift in process mining

ACM Comput. Surv.

A survey of methods for time series change point detection

Knowl. Inf. Syst.

Characterizing drift from event streams of business processes

Process mining manifesto

Object-centric process mining: Dealing with divergence and convergence in event data

Discovering object-centric Petri nets

Fundam. Inform.

Multi-dimensional event data in graph databases

J. Data Semant.

Causal process mining from relational databases with domain knowledge

Precision and fitness in object-centric process mining

Process mining over multiple behavioral dimensions with event knowledge graphs

Defining cases and variants for object-centric event data

OPerA: Object-centric performance analysis

Analyzing resource behavior using process mining