Retrieval and clustering for supporting business process adjustment and analysis
Introduction
Business Process (BP) Management is a set of activities aimed at defining, executing and optimizing BP, with the objective of making the business of an enterprise as effective and efficient as possible, and of increasing its economic success. Such activities are highly automated, typically by means of workflow management systems, also called Business Process Management System (BPMS) [1], [2].
In normal conditions, a BPMS allows to automatically execute a BP according to its process schema, i.e., to a formalized model in which the actions to be performed and the control flow relations to be respected among them are specified. However, BP optimization may ask the enterprise to be able to flexibly change and adapt such a predefined process schema, in response to expected situations (e.g., new laws, reengineering efforts) as well as to unanticipated exceptions and problems in the operating environment (e.g., emergencies) [3].
The agile workflow technology [4], [5] is the technical solution which has been invoked to deal with such adaptation and overriding needs. It can support both ad hoc adjustments of individual process instances [6], [7], operated by end users, and redesign at the general process schema level, operated by process engineers—applicable even if the default process schema is already in use by some running instances [6], [8].
In order to provide an effective and quick adaptation support, many agile workflow systems share the idea of recalling and reusing concrete examples of changes adopted in the past. To this end, Case-based Reasoning (CBR) [9] has been proposed as a natural methodological solution. CBR is a reasoning paradigm that exploits the specific knowledge of previously experienced situations, called cases. It operates by retrieving and reusing similar cases in order to solve the problem at hand (after a possible revision of the retrieved solutions, if needed). Indeed CBR is particularly well suited for managing exceptional situations, even when they cannot be foreseen or preplanned. As a matter of fact, in the literature cases have often been resorted to in order to describe exceptions, in various domains (see e.g. [10]), and many examples of CBR-based process change support have been proposed (see e.g. [7], [11], [12], [13], [14], [15]).
The implementation of a proper CBR-based support for process adjustment and analysis obviously starts from a careful evaluation of past cases representation. In many applications, past examples of change are recorded as traces of execution [16] (stored in a database, also known as event log [17]), i.e., as the sequence of the process actions that were actually performed, often coupled with their starting and ending time. In the simplest form, a trace does not log any other feature about the executed actions (e.g., the actor, or the available resources). Moreover, usually it does not provide any contextual information, which could justify the reasons for possible deviations from the prescriptions of the default process schema.
In this paper, we propose a support to BP adjustment and analysis which adopts the retrieval step of the CBR methodology, specifically designed to work on cases in the form of traces of execution.
In our framework, retrieval is meant to help end users in the process execution phase, when dealing with an atypical situation. Indeed, suggestions on how to adjust the default process schema in the current situation may be obtained by analyzing the most similar retrieved examples of change, recorded as traces that share the starting sequence of actions with the current query (i.e., with the input problem).
Moreover, we support an automatic organization of the case base content (i.e., of all the available traces) through the application of hierarchical clustering techniques. Clustering can serve as a starting point for a set of a posteriori trace analyses. In particular, it can help process engineers in conformance evaluation (e.g., it can be an input to formal verification of the conformance of traces to proper semantic constraints [18]). Additionally, since changes can also be due to a weak or incomplete initial process schema definition, engineers can exploit (retrieval and) clustering results to draw some suggestions on how to redesign process schemata, in order to incorporate the most frequent and significant changes once and for all.
In our work retrieval and clustering rely on a distance definition able to take into account temporal information.
Interestingly, only a few metrics specifically designed to work on traces have been described in the literature. Moreover, most of them do not manage temporal information; in particular, a properly way of comparing qualitative temporal constraints is usually not provided (see Section 5).
On the other hand, neglecting time can be a significant flaw. In fact, time is really crucial in some applications. In medicine, for instance, the role of time is clearly central: it is mandatory to penalize the fact that the very same action had different durations in two traces, or was delayed, especially if referring to emergency procedures. And, generally speaking, temporal information is relevant in all domains, as it can be used e.g., to discover bottlenecks and to measure service levels [17].
The metric we introduce in this work allows us to explicit manage temporal information in traces, paying attention both to quantitative and to qualitative constraints.
In addition to this methodological contribution, in the paper we also describe our experimental work in the field of stroke care, in which we compared the new metric to a classical existing one (namely, the edit distance [19], [20]), and to simpler versions of our distance definition, able to manage only part of the overall information available on traces.
The paper is organized as follows. Section 2 presents technical details of the framework. Section 3 describes experimental results. Section 4 adds information about recent methodological improvements we are providing, in order to enhance the framework performance. Section 5 addresses some comparisons with related works. Finally, Section 6 is devoted to conclusions, discussion of limitations and future research directions.
Section snippets
A framework for supporting BP adjustment and analysis
This section describes methodological and technical details of our framework.
In particular, central tasks that all CBR tools have to deal with are [9] to define the notion of case, and to find a past case similar to the input one (i.e., to implement retrieval). Case definition and retrieval methods can vary considerably [9], and should be tailored to the needs of the application domain. Specifically, a proper distance definition has to be introduced, in order to optimize the reliability of
Experimental results
In this section we will provide some results on clustering experiments. Some experiment on retrieval will be presented in Section 4, where we will discuss our most recent improvements.
All of our experiments were conducted working on real patient traces taken from the stroke management domain. Actually, Health-Care Organizations (HCO) place strong emphasis on efficiency and effectiveness, to control their health-care performance and expenditures: they thus need to evaluate existing
Improving performance through a pivoting-based technique
As observed in Section 2, distance calculation is tractable, and indeed retrieval was fast in the experiments we have conducted so far (see [36], where, however, we only relied on Trace Edit Distance, and did not manage temporal information); nonetheless, it can become computationally expensive when working on very large databases. This problem has already been highlighted in process retrieval [37].
To this end, we are currently designing and implementing a methodology able to enhance the
Related works
Examples of CBR tools in BP management, and specifically in process adjustment support, are described in the literature (e.g. [11], [12], [7], [14], [15]); a few works exploiting clustering techniques are reported as well (e.g. [42], [43])—even though they mainly deal with process mining [44] (see below).
Since the main methodological contribution of our work consists in the definition of a proper distance function, in our comparison with the existing literature we will first focus on this
Concluding remarks and future work
In this work, we have described a case retrieval and clustering approach to process change and analysis. In particular, we have defined a proper case structure and a new distance measure, that are exploited to retrieve traces similar to the current one. Our system also allows to automatically cluster the trace database content by resorting to hierarchical clustering techniques.
We believe that such functionalities can help end users who need to adapt a process instance to some unforeseen
References (65)
- et al.
Correctness criteria for dynamic changes in workflow systems—a survey
Data and Knowledge Engineering
(2004) - et al.
Workflow evolutions
Data and Knowledge Engineering
(1998) - et al.
A case-based reasoning framework for workflow model management
Data and Knowledge Engineering
(2004) - et al.
Integration and verification of semantic constraints in adaptive process management systems
Data & Knowledge Engineering
(2008) Towards a general theory of action and time
Artificial Intelligence
(1984)- et al.
A fast pivot-based indexing algorithm for metric spaces
Pattern Recognition Letters
(2011) - et al.
Temporal similarity measures for querying clinical workflows
Artificial Intelligence in Medicine
(2009) - et al.
Conformance checking of processes based on monitoring real behavior
Information Systems
(2008) - et al.
Business process analysis in healthcare environmentsa methodology based on process mining
Information Systems
(2012) - Workflow Management Coalition...
Towards the agile management of business processes
Adeptflex-supporting dynamic changes of workflows without losing control
Journal of Intelligent Information Systems
Case-based maintenance for CCBR-based process evolution
Case-based reasoningfoundational issues, methodological variations and systems approaches
AI Communications
Integration rules and cases for the classification task
Exception handling in workflow systems
Applied Intelligence
Providing integrated life cycle support in process-aware information systems
International Journal of Cooperative Information Systems
Agile workflow technology and case-based change reuse for long-term processes
International Journal of Intelligent Information Technologies
Process Mining. Discovery, Conformance and Enhancement of Business Processes
Binary codes capable of correcting deletions, insertions and reversals
Soviet Physics Doklady
A normalized levenshtein distance metric
IEEE Transactions on Pattern Analysis and Machine Intelligence
Verb semantics for English–Chinese translation
Machine Translation
Similarity measures for object-oriented case representations
Molecular EvolutionA Phylogenetic Approach
Computation of normalized edit distance and applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (43)
Doing good by going digital: A taxonomy of digital social innovation in the context of incumbents
2023, Journal of Strategic Information SystemsExplainable process trace classification: An application to stroke
2022, Journal of Biomedical InformaticsA distributed business process fragmentation method based on community discovery
2020, Future Generation Computer SystemsInteractive mining and retrieval from process traces
2018, Expert Systems with ApplicationsLeveraging semantic labels for multi-level abstraction in medical process mining and trace comparison
2018, Journal of Biomedical InformaticsCitation Excerpt :In the metric in [17], these three contributions (i.e., Trace Edit Distance, Interval Distance between durations, Neighbors-graph Distance or Interval Distance between pairs of actions) are finally put in a linear combination with non-negative weights. When working on macro-actions, however, the metric in [17] needs to be extended. Indeed, two otherwise identical abstracted traces, for simplicity composed by just one macro-action each, may differ only because the macro-action in the first trace includes some delay or interleaved action, while the macro-action in the second trace does not (it is a pure direct sequence of ground actions sharing the same goal).
Monitoring elderly people at home with temporal Case-Based Reasoning
2017, Knowledge-Based Systems