Retrieval and clustering for supporting business process adjustment and analysis

doi:10.1016/j.is.2012.11.006

Information Systems

Volume 40, March 2014, Pages 128-141

https://doi.org/10.1016/j.is.2012.11.006 Get rights and content

Abstract

In this paper, we describe a framework able to support run-time adjustment and a posteriori analysis of business processes, which exploits the retrieval step of the Case-based Reasoning (CBR) methodology. In particular, our framework allows to retrieve traces of process execution similar to the current one. Moreover, it supports an automatic organization of the trace database content through the application of hierarchical clustering techniques. Results can provide help both to end users, in the process execution phase, and to process engineers, in (formal) process conformance evaluation and long term process schema redesign.

Retrieval and clustering rely on a distance definition able to take into account temporal information in traces. This metric has outperformed simpler distance definitions in our experiments, which were conducted in a real-world application domain.

Introduction

Business Process (BP) Management is a set of activities aimed at defining, executing and optimizing BP, with the objective of making the business of an enterprise as effective and efficient as possible, and of increasing its economic success. Such activities are highly automated, typically by means of workflow management systems, also called Business Process Management System (BPMS) [1], [2].

In normal conditions, a BPMS allows to automatically execute a BP according to its process schema, i.e., to a formalized model in which the actions to be performed and the control flow relations to be respected among them are specified. However, BP optimization may ask the enterprise to be able to flexibly change and adapt such a predefined process schema, in response to expected situations (e.g., new laws, reengineering efforts) as well as to unanticipated exceptions and problems in the operating environment (e.g., emergencies) [3].

The agile workflow technology [4], [5] is the technical solution which has been invoked to deal with such adaptation and overriding needs. It can support both ad hoc adjustments of individual process instances [6], [7], operated by end users, and redesign at the general process schema level, operated by process engineers—applicable even if the default process schema is already in use by some running instances [6], [8].

In order to provide an effective and quick adaptation support, many agile workflow systems share the idea of recalling and reusing concrete examples of changes adopted in the past. To this end, Case-based Reasoning (CBR) [9] has been proposed as a natural methodological solution. CBR is a reasoning paradigm that exploits the specific knowledge of previously experienced situations, called cases. It operates by retrieving and reusing similar cases in order to solve the problem at hand (after a possible revision of the retrieved solutions, if needed). Indeed CBR is particularly well suited for managing exceptional situations, even when they cannot be foreseen or preplanned. As a matter of fact, in the literature cases have often been resorted to in order to describe exceptions, in various domains (see e.g. [10]), and many examples of CBR-based process change support have been proposed (see e.g. [7], [11], [12], [13], [14], [15]).

The implementation of a proper CBR-based support for process adjustment and analysis obviously starts from a careful evaluation of past cases representation. In many applications, past examples of change are recorded as traces of execution [16] (stored in a database, also known as event log [17]), i.e., as the sequence of the process actions that were actually performed, often coupled with their starting and ending time. In the simplest form, a trace does not log any other feature about the executed actions (e.g., the actor, or the available resources). Moreover, usually it does not provide any contextual information, which could justify the reasons for possible deviations from the prescriptions of the default process schema.

In this paper, we propose a support to BP adjustment and analysis which adopts the retrieval step of the CBR methodology, specifically designed to work on cases in the form of traces of execution.

In our framework, retrieval is meant to help end users in the process execution phase, when dealing with an atypical situation. Indeed, suggestions on how to adjust the default process schema in the current situation may be obtained by analyzing the most similar retrieved examples of change, recorded as traces that share the starting sequence of actions with the current query (i.e., with the input problem).

Moreover, we support an automatic organization of the case base content (i.e., of all the available traces) through the application of hierarchical clustering techniques. Clustering can serve as a starting point for a set of a posteriori trace analyses. In particular, it can help process engineers in conformance evaluation (e.g., it can be an input to formal verification of the conformance of traces to proper semantic constraints [18]). Additionally, since changes can also be due to a weak or incomplete initial process schema definition, engineers can exploit (retrieval and) clustering results to draw some suggestions on how to redesign process schemata, in order to incorporate the most frequent and significant changes once and for all.

In our work retrieval and clustering rely on a distance definition able to take into account temporal information.

Interestingly, only a few metrics specifically designed to work on traces have been described in the literature. Moreover, most of them do not manage temporal information; in particular, a properly way of comparing qualitative temporal constraints is usually not provided (see Section 5).

On the other hand, neglecting time can be a significant flaw. In fact, time is really crucial in some applications. In medicine, for instance, the role of time is clearly central: it is mandatory to penalize the fact that the very same action had different durations in two traces, or was delayed, especially if referring to emergency procedures. And, generally speaking, temporal information is relevant in all domains, as it can be used e.g., to discover bottlenecks and to measure service levels [17].

The metric we introduce in this work allows us to explicit manage temporal information in traces, paying attention both to quantitative and to qualitative constraints.

In addition to this methodological contribution, in the paper we also describe our experimental work in the field of stroke care, in which we compared the new metric to a classical existing one (namely, the edit distance [19], [20]), and to simpler versions of our distance definition, able to manage only part of the overall information available on traces.

The paper is organized as follows. Section 2 presents technical details of the framework. Section 3 describes experimental results. Section 4 adds information about recent methodological improvements we are providing, in order to enhance the framework performance. Section 5 addresses some comparisons with related works. Finally, Section 6 is devoted to conclusions, discussion of limitations and future research directions.

Section snippets

A framework for supporting BP adjustment and analysis

This section describes methodological and technical details of our framework.

In particular, central tasks that all CBR tools have to deal with are [9] to define the notion of case, and to find a past case similar to the input one (i.e., to implement retrieval). Case definition and retrieval methods can vary considerably [9], and should be tailored to the needs of the application domain. Specifically, a proper distance definition has to be introduced, in order to optimize the reliability of

Experimental results

In this section we will provide some results on clustering experiments. Some experiment on retrieval will be presented in Section 4, where we will discuss our most recent improvements.

All of our experiments were conducted working on real patient traces taken from the stroke management domain. Actually, Health-Care Organizations (HCO) place strong emphasis on efficiency and effectiveness, to control their health-care performance and expenditures: they thus need to evaluate existing

Improving performance through a pivoting-based technique

As observed in Section 2, distance calculation is tractable, and indeed retrieval was fast in the experiments we have conducted so far (see [36], where, however, we only relied on Trace Edit Distance, and did not manage temporal information); nonetheless, it can become computationally expensive when working on very large databases. This problem has already been highlighted in process retrieval [37].

To this end, we are currently designing and implementing a methodology able to enhance the

Related works

Examples of CBR tools in BP management, and specifically in process adjustment support, are described in the literature (e.g. [11], [12], [7], [14], [15]); a few works exploiting clustering techniques are reported as well (e.g. [42], [43])—even though they mainly deal with process mining [44] (see below).

Since the main methodological contribution of our work consists in the definition of a proper distance function, in our comparison with the existing literature we will first focus on this

Concluding remarks and future work

In this work, we have described a case retrieval and clustering approach to process change and analysis. In particular, we have defined a proper case structure and a new distance measure, that are exploited to retrieve traces similar to the current one. Our system also allows to automatically cluster the trace database content by resorting to hierarchical clustering techniques.

We believe that such functionalities can help end users who need to adapt a process instance to some unforeseen

References (65)

S. Rinderle et al.
Correctness criteria for dynamic changes in workflow systems—a survey
Data and Knowledge Engineering
(2004)
F. Casati et al.
Workflow evolutions
Data and Knowledge Engineering
(1998)
T. Madhusudan et al.
A case-based reasoning framework for workflow model management
Data and Knowledge Engineering
(2004)
L.T. Ly et al.
Integration and verification of semantic constraints in adaptive process management systems
Data & Knowledge Engineering
(2008)
J. Allen
Towards a general theory of action and time
Artificial Intelligence
(1984)
R. Socorro et al.
A fast pivot-based indexing algorithm for metric spaces
Pattern Recognition Letters
(2011)
C. Combi et al.
Temporal similarity measures for querying clinical workflows
Artificial Intelligence in Medicine
(2009)
A. Rozinat et al.
Conformance checking of processes based on monitoring real behavior
Information Systems
(2008)
A. Rebuge et al.
Business process analysis in healthcare environmentsa methodology based on process mining
Information Systems
(2012)
Workflow Management Coalition...

W.V. der Aalst, A. ter Hofstede, M. Weske, Business process management: a survey, in: Proceedings of the International...

P. Heimann, G. Joeris, C. Krapp, B. Westfechtel, Dynamite: dynamic task nets for software process management, in:...

B. Weber et al.

Towards the agile management of business processes

M. Reichert et al.

Adeptflex-supporting dynamic changes of workflows without losing control

Journal of Intelligent Information Systems

(1998)

B. Weber et al.

Case-based maintenance for CCBR-based process evolution

A. Aamodt et al.

Case-based reasoningfoundational issues, methodological variations and systems approaches

AI Communications

(1994)

J. Surma et al.

Integration rules and cases for the classification task

Z. Luo et al.

Exception handling in workflow systems

Applied Intelligence

(2000)

B. Weber et al.

Providing integrated life cycle support in process-aware information systems

International Journal of Cooperative Information Systems

(2009)

M. Minor et al.

Agile workflow technology and case-based change reuse for long-term processes

International Journal of Intelligent Information Technologies

(2008)

S. Montani, Prototype-based management of business process exception cases, Applied Intelligence,...

W.V. der Aalst

Process Mining. Discovery, Conformance and Enhancement of Business Processes

(2011)

IEEE Taskforce on Process Mining: Process Mining Manifesto...

A. Levenshtein

Binary codes capable of correcting deletions, insertions and reversals

Soviet Physics Doklady

(1966)

L. Yujian et al.

A normalized levenshtein distance metric

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2007)

A. Lanz, B. Weber, M. Reichert, Workflow time patterns for process-aware information systems, in: Proceedings of the...

S. Kurtz, Approximate string searching under weighted edit distance, in: Proceedings of the 3rd South American Workshop...

M. Palmer et al.

Verb semantics for English–Chinese translation

Machine Translation

(1995)

R. Bergmann et al.

Similarity measures for object-oriented case representations

P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, in: Proceedings of the IJCAI, 1995,...

R. Page et al.

Molecular EvolutionA Phylogenetic Approach

(1998)

A. Marzal et al.

Computation of normalized edit distance and applications

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1993)

Cited by (43)

Doing good by going digital: A taxonomy of digital social innovation in the context of incumbents
2023, Journal of Strategic Information Systems
Digital social innovation (DSI) offers incumbents a strategic field of action to leverage the opportunities of digital technologies to address pressing societal challenges. By proposing a taxonomy and 12 clusters of incumbents’ DSI initiatives based on a sample of 296 real-world objects, we develop a unified understanding of DSI and its characteristics. This lays the foundation for further theorising on DSI from an incumbent perspective and for researchers to shape the DSI field. The taxonomy provides incumbents with an orientation to realise DSI’s rich strategic potentials throughout the DSI ideation process and in assessing DSI types.
Explainable process trace classification: An application to stroke
2022, Journal of Biomedical Informatics
Medical process trace classification exploits the activity sequences logged by an healthcare organization to classify traces themselves on the basis of some performance properties; this information can be used for quality assessment. State-of-the-art process trace classification resorts to deep learning, a very powerful technique which however suffers from the lack of explainability.
In this paper we aim at addressing this issue, motivated by a relevant application, i.e., the classification of process traces for quality assessment in stroke management. To this end we introduce the novel concept of trace saliency maps, an instrument able to highlight what trace activities are particularly significant for the classification task. Through trace saliency maps we justify the output of the deep learning architecture, and make it more easily interpretable to medical users. The good results in our use case have shown the feasibility of the approach, and let us make the hypothesis that it might be translated to other application settings and to other black box learners as well.
A distributed business process fragmentation method based on community discovery
2020, Future Generation Computer Systems
For deploying a traditional centralized business process to distributed systems, how to properly partition the process is a key issue. If closely related sub-process fragments are assigned into many different servers, the frequent interactions of the sub-processes will result in large server communication costs. However, most of the current works in business process fragmentation focus on optimizing load balancing to achieve high execution performance, without considering the intrinsic functional aggregation of partitioned sub-processes. This paper proposes a novel distributed business process fragmentation method based on community discovery, which aggregates functional related process partitions together to optimize server communication costs during business process execution. An advanced fuzzy models (AFM) and control-data flow model which are more suitable for community discovery are presented. And the community-based process fragmentation algorithm (CPFA) is developed to get an optimized business process distributed deployment schema, reducing communications between servers and data access communication costs. We evaluate our method with BPI dataset, and demonstrate that it significantly reduces communication costs between servers for process execution efficiency while also maintain good server resource load balancing.
Interactive mining and retrieval from process traces
2018, Expert Systems with Applications
The traces of past process executions are maintained in many contexts, since they constitute a strategic source of information. Different tasks on such data can be supported. In particular, we focus on process model discovery, by proposing an approach that helps the analyst in identifying a good balance between overfitting and underfitting. To achieve such a goal, we have designed SIM (Semantic Interactive Miner), an innovative interactive and incremental tool, which starts from a non-generalized model, and provides the user with a path retrieval facility to analyse the current model, and with semantic abstractions to build increasingly more generalized models (through the selective merging of retrieved paths). Additionally, the tool exploits the path retrieval facility and an indexing strategy to support efficient trace retrieval. As a consequence, our framework represents the first literature contribution able to integrate in a synergic approach process model discovery, path retrieval, and trace retrieval.
We experimentally compare our tool to two well-known process mining algorithms, namely inductive miner (Leemans, Fahland, and van der Aalst, 2013) and heuristic miner (Weijters, van der Aalst, and de Medeiros, 2006). The comparison enlights the main innovative aspect of our approach, i.e., its ability to facilitate the analyst in directly using her/his domain knowledge to lead process model discovery, a feature that can be extremely advantageous in knowledge-rich applications, such as the medical ones.
Leveraging semantic labels for multi-level abstraction in medical process mining and trace comparison
2018, Journal of Biomedical Informatics
Citation Excerpt :
In the metric in [17], these three contributions (i.e., Trace Edit Distance, Interval Distance between durations, Neighbors-graph Distance or Interval Distance between pairs of actions) are finally put in a linear combination with non-negative weights. When working on macro-actions, however, the metric in [17] needs to be extended. Indeed, two otherwise identical abstracted traces, for simplicity composed by just one macro-action each, may differ only because the macro-action in the first trace includes some delay or interleaved action, while the macro-action in the second trace does not (it is a pure direct sequence of ground actions sharing the same goal).
Many medical information systems record data about the executed process instances in the form of an event log. In this paper, we present a framework, able to convert actions in the event log into higher level concepts, at different levels of abstraction, on the basis of domain knowledge. Abstracted traces are then provided as an input to trace comparison and semantic process discovery. Our abstraction mechanism is able to manage non trivial situations, such as interleaved actions or delays between two actions that abstract to the same concept. Trace comparison resorts to a similarity metric able to take into account abstraction phase penalties, and to deal with quantitative and qualitative temporal constraints in abstracted traces. As for process discovery, we rely on classical algorithms embedded in the framework ProM, made semantic by the capability of abstracting the actions on the basis of their conceptual meaning. The approach has been tested in stroke care, where we adopted abstraction and trace comparison to cluster event logs of different stroke units, to highlight (in)correct behavior, abstracting from details. We also provide process discovery results, showing how the abstraction mechanism allows to obtain stroke process models more easily interpretable by neurologists.
Monitoring elderly people at home with temporal Case-Based Reasoning
2017, Knowledge-Based Systems
This paper presents a study of why and how Case-Based Reasoning (CBR) can be used in the long term to help elderly people living alone in a Smart Home. The work focuses on the need to manage the temporal dimension and how the system must be maintained.
The proposal involves the integration of a CBR system in a commercial Smart Home architecture that includes sensors, data communication and data integration. The CBR system analyses the daily activity at home as temporal event sequences, using temporal edit distance to identify the most similar cases. Most common Case-Based Maintenance (CBM) algorithms adapted to the temporal problem (t-CNN, t-RENN, t-ICF, t-DROP1 and t-RCFP) are used to reduce the number of cases in the case base in order to contribute to its long term maintenance.
The experiments carried out analyse the effect of different temporal CBM algorithms in common risk scenarios (waking up during the night, falls and falls with loss of consciousness). Data experiments are generated synthetically based on real behaviour patterns of 12 hours’ and 24 hours’ duration. Algorithms are compared using a paired t-test analysis. The results show that the algorithms t-CNN and t-DROP1 are able to create case-bases that statistically present the same average results as the original case-base but with a 10–20% in size. Algorithms t-ICF, t-RCFP and t-RENN can build similar case-bases to the original with a 10–50% size reduction, although they are not totally equivalent since they have significantly different average results than the original case-base. Finally, algorithm t-RENN does not significantly reduce the size of the case-base because it commonly deletes cases describing abnormal scenarios.
We demonstrate that the proposed temporal CBR system is able to detect the different proposed risk scenarios when there is a large number of cases. That is, the CBR systems are useful in the long term. Experiments indicate that the temporal CBM algorithms analysed are able to reduce case-bases successfully to detect abnormal scenarios. However, success in creating a maintained case-base equivalent to the original depends on the number of cases.

View all citing articles on Scopus

View full text

Retrieval and clustering for supporting business process adjustment and analysis

Abstract

Introduction

Section snippets

A framework for supporting BP adjustment and analysis

Experimental results

Improving performance through a pivoting-based technique

Related works

Concluding remarks and future work

Data and Knowledge Engineering

Data and Knowledge Engineering

Data and Knowledge Engineering

Data & Knowledge Engineering

Artificial Intelligence

Pattern Recognition Letters

Artificial Intelligence in Medicine

Information Systems

Information Systems

Towards the agile management of business processes

Adeptflex-supporting dynamic changes of workflows without losing control

Journal of Intelligent Information Systems

Case-based maintenance for CCBR-based process evolution

Case-based reasoningfoundational issues, methodological variations and systems approaches

AI Communications

Integration rules and cases for the classification task

Exception handling in workflow systems

Applied Intelligence

Providing integrated life cycle support in process-aware information systems

International Journal of Cooperative Information Systems

Agile workflow technology and case-based change reuse for long-term processes

International Journal of Intelligent Information Technologies

Process Mining. Discovery, Conformance and Enhancement of Business Processes

Binary codes capable of correcting deletions, insertions and reversals

Soviet Physics Doklady

A normalized levenshtein distance metric

IEEE Transactions on Pattern Analysis and Machine Intelligence

Verb semantics for English–Chinese translation

Machine Translation

Similarity measures for object-oriented case representations

Molecular EvolutionA Phylogenetic Approach

Computation of normalized edit distance and applications

IEEE Transactions on Pattern Analysis and Machine Intelligence