Mining task post-conditions: Automating the acquisition of process semantics

https://doi.org/10.1016/j.datak.2017.03.007Get rights and content

Abstract

Semantic annotation of business process model in the business process designs has been addressed in a large and growing body of work, but these annotations can be difficult and expensive to acquire. This paper presents a data-driven approach to mining and validating these annotations (and specifically context-independent semantic annotations). We leverage event objects in process execution histories which describe both activity execution events (typically represented as process events) and state update events (represented as object state transition events). We present an empirical evaluation, which suggests that the approach provides generally reliable results.

Introduction

A large and growing body of work explores the use of semantic annotation of business process designs [6], [21], [38], [41], [5], [11] (we use the term semantic annotation to describe the annotation of process designs with semantic information, and specifically, post-conditions). A large body of work also addresses the problem of semantic annotation of web services in a similar fashion [29], [30], [31], [37]. Common to all of these approaches is the idea that semantic annotation of process tasks or services provides value in ways that the process or service model alone cannot. Our focus in this paper is on post-conditions of tasks in the context of process models (pre-conditions are also of interest and we believe that an extension of the machinery presented here can address these, but are outside the scope of the present work). Ideally process designs annotated with post-conditions help answer the following question for any part of a process design: what changes will have occurred in the process context if the process were to execute up to this point? Arguably, a sufficiently detailed process model (for instance one that decomposes tasks down to the level of individual read or write operations) will require no additional information to answer this question. However, process models are most valuable when described at higher levels of abstraction, in terms of concepts and activities that stakeholders are familiar with. Processes annotated with post-conditions thus serve a crucial modeling function, providing an effective summary of a substantial body of knowledge regarding the “lower-level” workings of a process. Annotation with post-conditions can also help solve a range of problems such as process compliance management [11], goal satisfaction analysis [35], change management [25], enterprise process architectures [28] and the management of the business process life cycle [26].

The modeling and acquisition of these post-conditions poses a particularly difficult challenge. It is generally recognized that process modeling involves significant investment in time and effort, which would be multiplied manyfold if there were an additional obligation to specify semantic annotations. Analysts also tend to find semantic annotation difficult, particularly if the intent is to make these formal (as is required by all of the use cases referred to above). This paper seeks to address this challenge by offering a set of techniques that mine readily available data associated with process execution to generate largely accurate “first-cut” post-conditions for process tasks or activities (we use the terms “task” and ”activity“ interchangeably in this paper).

Our approach leverages the generally understood notion of event logging. The events that occur in a process execution context can be viewed in general terms as being of two types: (1) events that describe the start or end of the execution of process activities and (2) events that describe state changes in the objects impacted by a process. In many settings, the existing event logging machinery is capable of logging both kinds of events. One such approach on event logging is the event processing framework for business process management by Herzberg et al. [16], [17], [18], [19], [20].

We leverage these two types of events in juxtaposition, and the time-stamped sequences of activity execution events and state-change events thus obtained, to generate the sequence database taken as input by a sequential rule miner (CMRules [7] in our instance, but others could be used instead). The key idea is to identify commonly occurring patterns of activity execution events, followed by sequences of state change events. As we show, the approach is generally quite effective. We also define techniques which leverage a state update operator (that defines how a specification of a state of affairs is updated as a consequence of the execution of an action) and the actual history of process execution provided by the juxtaposed activity executions and state changes to determine whether the mined post-conditions, if accumulated using the state update operator, would indeed generate the available execution histories. This forms a validation step for the mined results.

Our intent is to mine the context-independent post-conditions (or immediate outcome) of each activity. These are contextualized via iterated applications of the state update operator to obtain the context-dependent post-conditions of each activity (in the context of a process model)—a complete collection of these for each activity or event provides a semantically annotated process model. For instance, the outcome of turning a switch on is to complete a circuit. In the context of a light bulb circuit, the context-dependent post-conditions of this activity would be to turn the bulb on. In the context of a switching circuit for a chemical reactor, the context-dependent post-conditions of that same activity would be to bring the chemical reactor to an operational state. We envisage the machinery we present below being used in the following manner: given as input a set of events that describe the execution of activities, a set of state-change events, a process model (or a set of process models in the event that the logs describe the execution of instances of multiple process designs) and a state update operator, the machinery would generate the post-conditions of each activity referred to in the recorded events. These post-conditions could be used directly in annotating process models, or might be viewed as “first-cut” specifications, to be edited and refined by expert analysts.

The problem we solve can be summarized as follows. Given: (1) a log of process events, (2) a log of object state transition events, (3) a process model or models whose execution generated these logs and (4) a state update operator, compute: the context-independent post-conditions of every task/activity referred to in the process event log. Inputs (1) and (2) are used in the mining phase, while inputs (3) and (4) are used in the validation phase.

This paper extends the results presented in [36] in a number of important ways. First, this paper presents a more sophisticated approach to validation. Second, it offers a novel abductive framework for repairing mined post-conditions, based on soundness and completeness analysis contained in the validation approach. Third, the paper presents more extensive empirical analysis.

The rest of the paper organize as follows. We provide a running example in Section 2. In Section 3, we describe the event ontology that our approach uses. In Section 4, we describe the approach to semantic annotation of process models that sits at the core of our proposal. In Section 5, we describe the post-condition mining algorithm. In Section 6, we describe a sophisticated approach to validating the knowledge mined, while in Section 7, we provide an abductive approach to repairing the post-conditions that we mine. Section 8 presents an empirical evaluation of the proposal. Section 9 describes related work, while Section 10 presents conclusions.

Section snippets

Example

Process designs are intended to be abstract, enabling users to get a handle on a complex underlying reality. Thus the effects or impact of a process is often not directly reflected in the high-level abstractions contained in a process design. Our proposal offers a means of mining these effects and correlating these with elements of a process design. Compelling examples of such processes can be found in domains such as medicine, logistics, financial services and so on. We will use a clinical

An event ontology

We derive our approach from the event processing framework for business process management by Herzberg et al. [16], [17], [18], [19], [20]. In this framework, a process model is correlated with a set of data objects and each data object has a defined life cycle. The notion of a data object permits us to abstract information (of various kinds including information that reflects states in the life-cycle of real-woprld objects) being processed or manipulated during process execution [19].

During

Semantic annotation

We assume that each task or event in a process is associated with post-conditions written as conjunctive normal form sentences in the underlying formal state description language, which might be propositional or first-order (we do not consider temporal logics in this work, but extensions are possible). We assume that each task or event has context-independent post-conditions that can be contextualized via iterated applications of a state update operator as in [11] and [21]. We permit the

Mining post-conditions

Our approach to post-conditions mining is predicated on the observation that the state transitions of objects impacted by executing an activity occur soon after the execution of the activity. State transitions that manifest a long period after the execution of an activity are typically not the effect of that activity alone, but of that activity plus some others (e.g., one may think of the arrival of a traditional “snailmail” letter 3 days after posting as an outcome of the action of

Validation

We can use the state update operator and the available data to validate the mined post-conditions. The intuition is to leverage available data to determine if the mined post-conditions predict the object state transitions seen in the data. We offer tests for soundness and completeness, and an abductive framework to guide the repair of mined post-conditions. We consider two settings, the first mainly for tesing purposes and the second because it reflects real-life operations.

Unique activity

Abductive repair

We now consider the problem of what needs to be done when mined post-conditions are found to be unsound or incomplete according to the tests described above. An easy solution is to seek more data and mine again. More interestingly, we can offer guidance to analysts in manually modifying the first-cut post-conditions mined from available data by using a simple formulation as an abductive problem. Our discussion focuses on settings with concurrent tasks, but the approach easily extends to the

Evaluation

Evaluation with synthetic process models: Our aim is to establish that our approach generates reasonably reliable results. We ran the first set of experiments with a synthetic semantically annotated process model (i.e., a hand-crafted one with T1,T2, etc, for activity names and p,q, for states/post-conditions). The model had 8 activities, with an AND-split nested inside an XOR-split and with each activity semantically annotated with 1 or 2 literals (in the 2 literal case, the states were

Related work

Artifact-centric business process modeling. An approach in the space of artifact-centric business process modeling is the GSM (Guard-Stage-Milestone) model by Hull et al. [4], [24]. In the GSM model, the state of an artifact at any given point during the execution of the model is described using three elements: (a) milestone, which represents a business objective with achieving and/or invalidating conditions; (b) stage, which consists of a cluster of activities to achieve a milestone (in the

Conclusions and future work

This paper offers an approach to mining business process task post-conditions from process and state changes events in process execution histories. Specifying post-conditions is notoriously difficult for process analysts, yet these post-conditions are critical to a variety of process analysis tasks such as process compliance management [11], goal satisfaction analysis [35], change management [25], enterprise process architectures [28] and the management of the business process life cycle [26].

Metta Santiputri received her bachelor degree in Informatics from Bandung Institute of Technology, Indonesia and the Master degree in Computer Science from University of Twente, the Netherlands. In 2001, she joined the Department of Informatics Engineering, State Polytechnic of Batam, as a Lecturer.

Currently, she is a Ph.D. candidate at Computer Science at the School of Computing and Information Technology, University of Wollongong, Australia. Her research interest, include business process

References (43)

  • M.N. Garofalakis, R. Rastogi, K. Shim, SPIRIT: Sequential pattern mining with regular expression constraints, in: VLDB....
  • C. Ghidini, M. Rospocher, L. Serafini,: A formalisation of BPMN in description logics. FBK-irst, Tech. Rep. TR, 2008,...
  • A. Ghose et al.

    Auditing business process compliance

    (2007)
  • C. Gnther, W. van der Aalst,: Fuzzy mining adaptive process simplification based on multi-perspective metrics, in: G....
  • H.J. Happel, L. Stojanovic, Ontoprocess–a prototype for semantic business process verification using SWRL rules, in:...
  • S.K. Harms et al.

    Sequential association rule mining with time lags

    J. Intell. Inf. Syst.

    (2004)
  • N. Herzberg, M. Kunze, A. Rogge-Solti, Towards process evaluation in non-automated process execution environments, in:...
  • N. Herzberg, A. Meyer, M. Weske, An event processing platform for business process management, in: Proceedings of the...
  • N. Herzberg, A. Meyer, M. Weske, Improving business process intelligence by observing object state transitions,...
  • N. Herzberg, M. Weske, Enriching raw events to enable process intelligence: research challenges. Number 73....
  • K. Hinge, A. Ghose, G. Koliadis, Process SEER: A tool for semantic effect annotation of business process models, in:...
  • Cited by (7)

    View all citing articles on Scopus

    Metta Santiputri received her bachelor degree in Informatics from Bandung Institute of Technology, Indonesia and the Master degree in Computer Science from University of Twente, the Netherlands. In 2001, she joined the Department of Informatics Engineering, State Polytechnic of Batam, as a Lecturer.

    Currently, she is a Ph.D. candidate at Computer Science at the School of Computing and Information Technology, University of Wollongong, Australia. Her research interest, include business process modeling, data mining, semantic annotations, and goal-oriented requirements modeling.

    Aditya Ghose is Professor of Computer Science at the School of Computing and IT at the University of Wollongong Australia, where he heads the Decision Systems Lab. He holds a Ph.D. and M.Sc. in Computing Science from the University of Alberta, Canada and a Bachelor of Computer Science and Engineering from Jadavpur University, India. His research interests are in knowledge representation and reasoning, business process management, service science, enterprise analytics and requirements engineering.

    Hoa Khanh Dam is a Senior Lecturer in the School of Computing and Information Technology, University of Wollongong (UOW) in Australia. He is Associate Director for the Decision System Lab at UOW, heading its Software Engineering Analytics research program. His research interests lie primarily in the intersection of software engineering, business process management and service-oriented computing, focusing on such areas as software engineering analytics, process analytics and service analytics. He holds Ph.D. and Master degrees in Computer Science from RMIT University, and Bachelor of Computer Science degree from the University of Melbourne in Australia. His research has won multiple Best Paper Awards (at WICSA, APCCM, and ASWEC) and ACM SIGSOFT Distinguished Paper Award (at MSR).

    View full text