Mining task post-conditions: Automating the acquisition of process semantics
Introduction
A large and growing body of work explores the use of semantic annotation of business process designs [6], [21], [38], [41], [5], [11] (we use the term semantic annotation to describe the annotation of process designs with semantic information, and specifically, post-conditions). A large body of work also addresses the problem of semantic annotation of web services in a similar fashion [29], [30], [31], [37]. Common to all of these approaches is the idea that semantic annotation of process tasks or services provides value in ways that the process or service model alone cannot. Our focus in this paper is on post-conditions of tasks in the context of process models (pre-conditions are also of interest and we believe that an extension of the machinery presented here can address these, but are outside the scope of the present work). Ideally process designs annotated with post-conditions help answer the following question for any part of a process design: what changes will have occurred in the process context if the process were to execute up to this point? Arguably, a sufficiently detailed process model (for instance one that decomposes tasks down to the level of individual read or write operations) will require no additional information to answer this question. However, process models are most valuable when described at higher levels of abstraction, in terms of concepts and activities that stakeholders are familiar with. Processes annotated with post-conditions thus serve a crucial modeling function, providing an effective summary of a substantial body of knowledge regarding the “lower-level” workings of a process. Annotation with post-conditions can also help solve a range of problems such as process compliance management [11], goal satisfaction analysis [35], change management [25], enterprise process architectures [28] and the management of the business process life cycle [26].
The modeling and acquisition of these post-conditions poses a particularly difficult challenge. It is generally recognized that process modeling involves significant investment in time and effort, which would be multiplied manyfold if there were an additional obligation to specify semantic annotations. Analysts also tend to find semantic annotation difficult, particularly if the intent is to make these formal (as is required by all of the use cases referred to above). This paper seeks to address this challenge by offering a set of techniques that mine readily available data associated with process execution to generate largely accurate “first-cut” post-conditions for process tasks or activities (we use the terms “task” and ”activity“ interchangeably in this paper).
Our approach leverages the generally understood notion of event logging. The events that occur in a process execution context can be viewed in general terms as being of two types: (1) events that describe the start or end of the execution of process activities and (2) events that describe state changes in the objects impacted by a process. In many settings, the existing event logging machinery is capable of logging both kinds of events. One such approach on event logging is the event processing framework for business process management by Herzberg et al. [16], [17], [18], [19], [20].
We leverage these two types of events in juxtaposition, and the time-stamped sequences of activity execution events and state-change events thus obtained, to generate the sequence database taken as input by a sequential rule miner (CMRules [7] in our instance, but others could be used instead). The key idea is to identify commonly occurring patterns of activity execution events, followed by sequences of state change events. As we show, the approach is generally quite effective. We also define techniques which leverage a state update operator (that defines how a specification of a state of affairs is updated as a consequence of the execution of an action) and the actual history of process execution provided by the juxtaposed activity executions and state changes to determine whether the mined post-conditions, if accumulated using the state update operator, would indeed generate the available execution histories. This forms a validation step for the mined results.
Our intent is to mine the context-independent post-conditions (or immediate outcome) of each activity. These are contextualized via iterated applications of the state update operator to obtain the context-dependent post-conditions of each activity (in the context of a process model)—a complete collection of these for each activity or event provides a semantically annotated process model. For instance, the outcome of turning a switch on is to complete a circuit. In the context of a light bulb circuit, the context-dependent post-conditions of this activity would be to turn the bulb on. In the context of a switching circuit for a chemical reactor, the context-dependent post-conditions of that same activity would be to bring the chemical reactor to an operational state. We envisage the machinery we present below being used in the following manner: given as input a set of events that describe the execution of activities, a set of state-change events, a process model (or a set of process models in the event that the logs describe the execution of instances of multiple process designs) and a state update operator, the machinery would generate the post-conditions of each activity referred to in the recorded events. These post-conditions could be used directly in annotating process models, or might be viewed as “first-cut” specifications, to be edited and refined by expert analysts.
The problem we solve can be summarized as follows. Given: (1) a log of process events, (2) a log of object state transition events, (3) a process model or models whose execution generated these logs and (4) a state update operator, compute: the context-independent post-conditions of every task/activity referred to in the process event log. Inputs (1) and (2) are used in the mining phase, while inputs (3) and (4) are used in the validation phase.
This paper extends the results presented in [36] in a number of important ways. First, this paper presents a more sophisticated approach to validation. Second, it offers a novel abductive framework for repairing mined post-conditions, based on soundness and completeness analysis contained in the validation approach. Third, the paper presents more extensive empirical analysis.
The rest of the paper organize as follows. We provide a running example in Section 2. In Section 3, we describe the event ontology that our approach uses. In Section 4, we describe the approach to semantic annotation of process models that sits at the core of our proposal. In Section 5, we describe the post-condition mining algorithm. In Section 6, we describe a sophisticated approach to validating the knowledge mined, while in Section 7, we provide an abductive approach to repairing the post-conditions that we mine. Section 8 presents an empirical evaluation of the proposal. Section 9 describes related work, while Section 10 presents conclusions.
Section snippets
Example
Process designs are intended to be abstract, enabling users to get a handle on a complex underlying reality. Thus the effects or impact of a process is often not directly reflected in the high-level abstractions contained in a process design. Our proposal offers a means of mining these effects and correlating these with elements of a process design. Compelling examples of such processes can be found in domains such as medicine, logistics, financial services and so on. We will use a clinical
An event ontology
We derive our approach from the event processing framework for business process management by Herzberg et al. [16], [17], [18], [19], [20]. In this framework, a process model is correlated with a set of data objects and each data object has a defined life cycle. The notion of a data object permits us to abstract information (of various kinds including information that reflects states in the life-cycle of real-woprld objects) being processed or manipulated during process execution [19].
During
Semantic annotation
We assume that each task or event in a process is associated with post-conditions written as conjunctive normal form sentences in the underlying formal state description language, which might be propositional or first-order (we do not consider temporal logics in this work, but extensions are possible). We assume that each task or event has context-independent post-conditions that can be contextualized via iterated applications of a state update operator as in [11] and [21]. We permit the
Mining post-conditions
Our approach to post-conditions mining is predicated on the observation that the state transitions of objects impacted by executing an activity occur soon after the execution of the activity. State transitions that manifest a long period after the execution of an activity are typically not the effect of that activity alone, but of that activity plus some others (e.g., one may think of the arrival of a traditional “snailmail” letter 3 days after posting as an outcome of the action of
Validation
We can use the state update operator and the available data to validate the mined post-conditions. The intuition is to leverage available data to determine if the mined post-conditions predict the object state transitions seen in the data. We offer tests for soundness and completeness, and an abductive framework to guide the repair of mined post-conditions. We consider two settings, the first mainly for tesing purposes and the second because it reflects real-life operations.
Unique activity
Abductive repair
We now consider the problem of what needs to be done when mined post-conditions are found to be unsound or incomplete according to the tests described above. An easy solution is to seek more data and mine again. More interestingly, we can offer guidance to analysts in manually modifying the first-cut post-conditions mined from available data by using a simple formulation as an abductive problem. Our discussion focuses on settings with concurrent tasks, but the approach easily extends to the
Evaluation
Evaluation with synthetic process models: Our aim is to establish that our approach generates reasonably reliable results. We ran the first set of experiments with a synthetic semantically annotated process model (i.e., a hand-crafted one with etc, for activity names and for states/post-conditions). The model had 8 activities, with an AND-split nested inside an XOR-split and with each activity semantically annotated with 1 or 2 literals (in the 2 literal case, the states were
Related work
Artifact-centric business process modeling. An approach in the space of artifact-centric business process modeling is the GSM (Guard-Stage-Milestone) model by Hull et al. [4], [24]. In the GSM model, the state of an artifact at any given point during the execution of the model is described using three elements: (a) milestone, which represents a business objective with achieving and/or invalidating conditions; (b) stage, which consists of a cluster of activities to achieve a milestone (in the
Conclusions and future work
This paper offers an approach to mining business process task post-conditions from process and state changes events in process execution histories. Specifying post-conditions is notoriously difficult for process analysts, yet these post-conditions are critical to a variety of process analysis tasks such as process compliance management [11], goal satisfaction analysis [35], change management [25], enterprise process architectures [28] and the management of the business process life cycle [26].
Metta Santiputri received her bachelor degree in Informatics from Bandung Institute of Technology, Indonesia and the Master degree in Computer Science from University of Twente, the Netherlands. In 2001, she joined the Department of Informatics Engineering, State Polytechnic of Batam, as a Lecturer.
Currently, she is a Ph.D. candidate at Computer Science at the School of Computing and Information Technology, University of Wollongong, Australia. Her research interest, include business process
References (43)
- et al.
On the equivalence of incremental and fixpoint semantics for business artifacts with guard-stage-milestone lifecycles
Inf. Syst.
(2013) - et al.
CMRules: mining sequential rules common to several sequences
Knowl. Based Syst.
(2012) - et al.
Reasoning about action I: a possible World approach
Artificial Intelligence
(1988) - et al.
Improving process monitoring and progress prediction with data state transition events
Data Knowl. Eng.
(2015) - R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proc. of...
- M. Born, F. Dörr, I. Weber, User-friendly semantic annotation in business process modeling, in: M. Weske, M.S. Hacid,...
- K.C. Chan, W.H. Au,: An effective algorithm for mining interesting quantitative association rules, in: Proceedings of...
- C. Di Francescomarino, C. Ghidini, M. Rospocher, L. Serafini, P. Tonella, Reasoning on semantically annotated...
- D. Fensel, F. Facca, E. Simperl, Web service modeling ontology, in: Semantic Web Services, Springer, Berlin,...
- P. Fournier-Viger, R. Nkambou, V.S.M. Tseng, RuleGrowth: mining sequential rules common to several sequences by...
Auditing business process compliance
Sequential association rule mining with time lags
J. Intell. Inf. Syst.
Cited by (7)
Predicting business processes of the social insurance using recurrent neural network and Markov chain
2022, Journal of Modelling in ManagementBusiness Process Model Annotation Techniques: Identification, Classification and Analysis
2022, CEUR Workshop ProceedingsAI-Enabled Processes: The Age of Artificial Intelligence and Big Data
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)DeepProcess: Supporting Business Process Execution Using a MANN-Based Recommender System
2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Improving Airline Operations Efficiency via Flexible Business Process Management
2019, Lecture Notes in Business Information Processing
Metta Santiputri received her bachelor degree in Informatics from Bandung Institute of Technology, Indonesia and the Master degree in Computer Science from University of Twente, the Netherlands. In 2001, she joined the Department of Informatics Engineering, State Polytechnic of Batam, as a Lecturer.
Currently, she is a Ph.D. candidate at Computer Science at the School of Computing and Information Technology, University of Wollongong, Australia. Her research interest, include business process modeling, data mining, semantic annotations, and goal-oriented requirements modeling.
Aditya Ghose is Professor of Computer Science at the School of Computing and IT at the University of Wollongong Australia, where he heads the Decision Systems Lab. He holds a Ph.D. and M.Sc. in Computing Science from the University of Alberta, Canada and a Bachelor of Computer Science and Engineering from Jadavpur University, India. His research interests are in knowledge representation and reasoning, business process management, service science, enterprise analytics and requirements engineering.
Hoa Khanh Dam is a Senior Lecturer in the School of Computing and Information Technology, University of Wollongong (UOW) in Australia. He is Associate Director for the Decision System Lab at UOW, heading its Software Engineering Analytics research program. His research interests lie primarily in the intersection of software engineering, business process management and service-oriented computing, focusing on such areas as software engineering analytics, process analytics and service analytics. He holds Ph.D. and Master degrees in Computer Science from RMIT University, and Bachelor of Computer Science degree from the University of Melbourne in Australia. His research has won multiple Best Paper Awards (at WICSA, APCCM, and ASWEC) and ACM SIGSOFT Distinguished Paper Award (at MSR).