Elsevier

Information Systems

Volume 40, March 2014, Pages 128-141
Information Systems

Retrieval and clustering for supporting business process adjustment and analysis

https://doi.org/10.1016/j.is.2012.11.006Get rights and content

Abstract

In this paper, we describe a framework able to support run-time adjustment and a posteriori analysis of business processes, which exploits the retrieval step of the Case-based Reasoning (CBR) methodology. In particular, our framework allows to retrieve traces of process execution similar to the current one. Moreover, it supports an automatic organization of the trace database content through the application of hierarchical clustering techniques. Results can provide help both to end users, in the process execution phase, and to process engineers, in (formal) process conformance evaluation and long term process schema redesign.

Retrieval and clustering rely on a distance definition able to take into account temporal information in traces. This metric has outperformed simpler distance definitions in our experiments, which were conducted in a real-world application domain.

Introduction

Business Process (BP) Management is a set of activities aimed at defining, executing and optimizing BP, with the objective of making the business of an enterprise as effective and efficient as possible, and of increasing its economic success. Such activities are highly automated, typically by means of workflow management systems, also called Business Process Management System (BPMS) [1], [2].

In normal conditions, a BPMS allows to automatically execute a BP according to its process schema, i.e., to a formalized model in which the actions to be performed and the control flow relations to be respected among them are specified. However, BP optimization may ask the enterprise to be able to flexibly change and adapt such a predefined process schema, in response to expected situations (e.g., new laws, reengineering efforts) as well as to unanticipated exceptions and problems in the operating environment (e.g., emergencies) [3].

The agile workflow technology [4], [5] is the technical solution which has been invoked to deal with such adaptation and overriding needs. It can support both ad hoc adjustments of individual process instances [6], [7], operated by end users, and redesign at the general process schema level, operated by process engineers—applicable even if the default process schema is already in use by some running instances [6], [8].

In order to provide an effective and quick adaptation support, many agile workflow systems share the idea of recalling and reusing concrete examples of changes adopted in the past. To this end, Case-based Reasoning (CBR) [9] has been proposed as a natural methodological solution. CBR is a reasoning paradigm that exploits the specific knowledge of previously experienced situations, called cases. It operates by retrieving and reusing similar cases in order to solve the problem at hand (after a possible revision of the retrieved solutions, if needed). Indeed CBR is particularly well suited for managing exceptional situations, even when they cannot be foreseen or preplanned. As a matter of fact, in the literature cases have often been resorted to in order to describe exceptions, in various domains (see e.g. [10]), and many examples of CBR-based process change support have been proposed (see e.g. [7], [11], [12], [13], [14], [15]).

The implementation of a proper CBR-based support for process adjustment and analysis obviously starts from a careful evaluation of past cases representation. In many applications, past examples of change are recorded as traces of execution [16] (stored in a database, also known as event log [17]), i.e., as the sequence of the process actions that were actually performed, often coupled with their starting and ending time. In the simplest form, a trace does not log any other feature about the executed actions (e.g., the actor, or the available resources). Moreover, usually it does not provide any contextual information, which could justify the reasons for possible deviations from the prescriptions of the default process schema.

In this paper, we propose a support to BP adjustment and analysis which adopts the retrieval step of the CBR methodology, specifically designed to work on cases in the form of traces of execution.

In our framework, retrieval is meant to help end users in the process execution phase, when dealing with an atypical situation. Indeed, suggestions on how to adjust the default process schema in the current situation may be obtained by analyzing the most similar retrieved examples of change, recorded as traces that share the starting sequence of actions with the current query (i.e., with the input problem).

Moreover, we support an automatic organization of the case base content (i.e., of all the available traces) through the application of hierarchical clustering techniques. Clustering can serve as a starting point for a set of a posteriori trace analyses. In particular, it can help process engineers in conformance evaluation (e.g., it can be an input to formal verification of the conformance of traces to proper semantic constraints [18]). Additionally, since changes can also be due to a weak or incomplete initial process schema definition, engineers can exploit (retrieval and) clustering results to draw some suggestions on how to redesign process schemata, in order to incorporate the most frequent and significant changes once and for all.

In our work retrieval and clustering rely on a distance definition able to take into account temporal information.

Interestingly, only a few metrics specifically designed to work on traces have been described in the literature. Moreover, most of them do not manage temporal information; in particular, a properly way of comparing qualitative temporal constraints is usually not provided (see Section 5).

On the other hand, neglecting time can be a significant flaw. In fact, time is really crucial in some applications. In medicine, for instance, the role of time is clearly central: it is mandatory to penalize the fact that the very same action had different durations in two traces, or was delayed, especially if referring to emergency procedures. And, generally speaking, temporal information is relevant in all domains, as it can be used e.g., to discover bottlenecks and to measure service levels [17].

The metric we introduce in this work allows us to explicit manage temporal information in traces, paying attention both to quantitative and to qualitative constraints.

In addition to this methodological contribution, in the paper we also describe our experimental work in the field of stroke care, in which we compared the new metric to a classical existing one (namely, the edit distance [19], [20]), and to simpler versions of our distance definition, able to manage only part of the overall information available on traces.

The paper is organized as follows. Section 2 presents technical details of the framework. Section 3 describes experimental results. Section 4 adds information about recent methodological improvements we are providing, in order to enhance the framework performance. Section 5 addresses some comparisons with related works. Finally, Section 6 is devoted to conclusions, discussion of limitations and future research directions.

Section snippets

A framework for supporting BP adjustment and analysis

This section describes methodological and technical details of our framework.

In particular, central tasks that all CBR tools have to deal with are [9] to define the notion of case, and to find a past case similar to the input one (i.e., to implement retrieval). Case definition and retrieval methods can vary considerably [9], and should be tailored to the needs of the application domain. Specifically, a proper distance definition has to be introduced, in order to optimize the reliability of

Experimental results

In this section we will provide some results on clustering experiments. Some experiment on retrieval will be presented in Section 4, where we will discuss our most recent improvements.

All of our experiments were conducted working on real patient traces taken from the stroke management domain. Actually, Health-Care Organizations (HCO) place strong emphasis on efficiency and effectiveness, to control their health-care performance and expenditures: they thus need to evaluate existing

Improving performance through a pivoting-based technique

As observed in Section 2, distance calculation is tractable, and indeed retrieval was fast in the experiments we have conducted so far (see [36], where, however, we only relied on Trace Edit Distance, and did not manage temporal information); nonetheless, it can become computationally expensive when working on very large databases. This problem has already been highlighted in process retrieval [37].

To this end, we are currently designing and implementing a methodology able to enhance the

Related works

Examples of CBR tools in BP management, and specifically in process adjustment support, are described in the literature (e.g. [11], [12], [7], [14], [15]); a few works exploiting clustering techniques are reported as well (e.g. [42], [43])—even though they mainly deal with process mining [44] (see below).

Since the main methodological contribution of our work consists in the definition of a proper distance function, in our comparison with the existing literature we will first focus on this

Concluding remarks and future work

In this work, we have described a case retrieval and clustering approach to process change and analysis. In particular, we have defined a proper case structure and a new distance measure, that are exploited to retrieve traces similar to the current one. Our system also allows to automatically cluster the trace database content by resorting to hierarchical clustering techniques.

We believe that such functionalities can help end users who need to adapt a process instance to some unforeseen

References (65)

  • W.V. der Aalst, A. ter Hofstede, M. Weske, Business process management: a survey, in: Proceedings of the International...
  • P. Heimann, G. Joeris, C. Krapp, B. Westfechtel, Dynamite: dynamic task nets for software process management, in:...
  • B. Weber et al.

    Towards the agile management of business processes

  • M. Reichert et al.

    Adeptflex-supporting dynamic changes of workflows without losing control

    Journal of Intelligent Information Systems

    (1998)
  • B. Weber et al.

    Case-based maintenance for CCBR-based process evolution

  • A. Aamodt et al.

    Case-based reasoningfoundational issues, methodological variations and systems approaches

    AI Communications

    (1994)
  • J. Surma et al.

    Integration rules and cases for the classification task

  • Z. Luo et al.

    Exception handling in workflow systems

    Applied Intelligence

    (2000)
  • B. Weber et al.

    Providing integrated life cycle support in process-aware information systems

    International Journal of Cooperative Information Systems

    (2009)
  • M. Minor et al.

    Agile workflow technology and case-based change reuse for long-term processes

    International Journal of Intelligent Information Technologies

    (2008)
  • S. Montani, Prototype-based management of business process exception cases, Applied Intelligence,...
  • W.V. der Aalst

    Process Mining. Discovery, Conformance and Enhancement of Business Processes

    (2011)
  • IEEE Taskforce on Process Mining: Process Mining Manifesto...
  • A. Levenshtein

    Binary codes capable of correcting deletions, insertions and reversals

    Soviet Physics Doklady

    (1966)
  • L. Yujian et al.

    A normalized levenshtein distance metric

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2007)
  • A. Lanz, B. Weber, M. Reichert, Workflow time patterns for process-aware information systems, in: Proceedings of the...
  • S. Kurtz, Approximate string searching under weighted edit distance, in: Proceedings of the 3rd South American Workshop...
  • M. Palmer et al.

    Verb semantics for English–Chinese translation

    Machine Translation

    (1995)
  • R. Bergmann et al.

    Similarity measures for object-oriented case representations

  • P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, in: Proceedings of the IJCAI, 1995,...
  • R. Page et al.

    Molecular EvolutionA Phylogenetic Approach

    (1998)
  • A. Marzal et al.

    Computation of normalized edit distance and applications

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1993)
  • Cited by (43)

    • Interactive mining and retrieval from process traces

      2018, Expert Systems with Applications
    • Leveraging semantic labels for multi-level abstraction in medical process mining and trace comparison

      2018, Journal of Biomedical Informatics
      Citation Excerpt :

      In the metric in [17], these three contributions (i.e., Trace Edit Distance, Interval Distance between durations, Neighbors-graph Distance or Interval Distance between pairs of actions) are finally put in a linear combination with non-negative weights. When working on macro-actions, however, the metric in [17] needs to be extended. Indeed, two otherwise identical abstracted traces, for simplicity composed by just one macro-action each, may differ only because the macro-action in the first trace includes some delay or interleaved action, while the macro-action in the second trace does not (it is a pure direct sequence of ground actions sharing the same goal).

    View all citing articles on Scopus
    View full text