Ensuring the canonicity of process models
Introduction
Process models are an important means to specify requirements in business-related software development projects [1]. Nevertheless, practitioners often struggle with the definition of fully correct and meaningful models [2]. The reasons for this are manifold. For instance, many modelers in practice have limited modeling experience [3], modeling projects often involve an overwhelming number of models [3], and the work of modelers involved in one project is difficult to coordinate [4]. The implications of incorrect and inconsistent models are severe. In the worst case, they entail wrong design decisions and a considerable increase of the overall development costs [5], [6].
To ensure process model correctness and consistency, researchers proposed several automated analysis techniques. Such techniques can, for instance, check whether a process model contains deadlocks [7], is compliant with expected behavior [8], and meets predefined naming conventions [9]. The shortcoming of these techniques is, however, that they already make assumptions about the way modelers have used natural language to label the process model activities. As a result, these techniques are hardly of any help if the logic of modeling and routing elements is textually described in activity labels. As an example, consider the activity “Consult expert and prepare report” from one of the models we encountered in practice. Apparently, this activity label consists of two separate activities, i.e., “consult expert” and “prepare report”, which are linked using the conjunction “and”. The problem is that the execution semantics between these linked activity parts is not clearly defined. The word “and” might either refer to a parallel or a sequential execution. The specification of the activity as in this example mixes natural language and control structure in a way that is inherently ambiguous. This makes it impossible to draw valid conclusions from formal analysis results and, thus, difficult to develop process-related systems that are in line with the specification.
In this paper, we address this problem by introducing the notion of canonicity to prevent the mixing of natural language and modeling language. Based on this notion, we automatically check for problems caused by the violation of canonicity and point to reworks for resolving them. More specifically, we provide the following contributions. First, we introduce the notion of canonicity for process models and provide an operationalization of the concept. Second, we formalize a number of non-canonical patterns we discovered in models from practice. Third, we develop algorithms to recognize whether a given label suffers from these patterns and to refactor the detected cases into canonical model fragments. In order to demonstrate the applicability of the proposed techniques, we conduct extensive experiments with four real-world process model collections.
The rest of the paper is structured as follows. Section 2 illustrates the problems of non-canonical process model activity labels and reviews how prior research approaches have addressed this issue. Section 3 explains how we operationalize the notion of canonicity and our strategies to recognize and refactor instances that do not comply with it. Section 4 evaluates our techniques with process model collections from practice. In Section 5, we discuss implications and limitations of our work, before Section 6 concludes the paper.
Section snippets
Background
This section introduces the background of our research. First, Section 2.1 illustrates the problem of mixing modeling language and natural language and reflects upon the implications of non-canonical activities for system analysis and design. Section 2.2 discusses in how far prior research from the field of process model analysis has addressed the issue of non-canonical activities.
A technique for ensuring canonicity in process models
In this section, we present our technique for recognizing and refactoring non-canonical process model activities. In Section 3.1, we operationalize the concept of canonicity and provide a formal definition. In Section 3.2 we introduce the formalism for recognizing non-canonical process model activities. In Section 3.3, we then introduce our technique to refactor them.
Evaluation
In this section, we present the results of an evaluation with four large process model collections. The goal of the evaluation was to demonstrate the applicability of the presented techniques in terms of accuracy. Section 4.1 first discusses the evaluation setup. Section 4.2 then introduces the test data of our evaluation. Sections 4.3 and 4.4 finally present the experimental results of the detection and the refactoring.
Implications
This section discusses the implications of our research. Section 5.1 and 5.2 identify implications of our work for research and for practice. Section 5.3 reflects upon threats to validity.
Conclusion
In this paper, we introduced the notion of canonicity in order to prevent the mix natural of natural language and modeling language within one process model activity. To this end, we formalized the notion of canonicity and reoccurring patterns that violate canonicity. Based on these formalizations, we designed techniques for the automatic recognition and refactoring of these patterns. As shown in the evaluation experiments, the proposed techniques are capable of recognizing and correcting the
Dr. Henrik Leopold is an assistant professor with the Department of Computer Science at the VU University Amsterdam. His research interests include business process modelling, natural language processing techniques, process model matching, and process architectures. He obtained a doctoral degree as well as an MSc in Information Systems from the Humboldt-Universität zu Berlin and a Bachelor degree in Information Systems from the Berlin School of Economics. After being a post-doc at the
References (68)
- et al.
Managing large collections of business process models-current techniques and challenges
Comput. Industry
(2012) - et al.
Analysis on Demand: instantaneous Soundness Checking of Industrial Business Process Models
Data Knowl. Eng.
(2011) - et al.
Detection of naming convention violations in process models for different languages
Decis. Support Syst.
(2013) - et al.
Soundness Verification for Conceptual Workflow Nets with Data: early Detection of Errors with the Most Precision Possible
Inform. Syst.
(2011) - et al.
Verification of conceptual models based on linguistic knowledge
Data Knowl. Eng.
(1997) - et al.
Semantics and analysis of business process models in bpmn
Inform. Softw. Technol.
(2008) - et al.
Analyzing interacting WS-BPEL processes using flexible model generation
Data Knowl. Eng.
(2008) Formalization and verification of event-driven process chains
Inform. Softw. Technol.
(1999)- et al.
Reduction rules for YAWL workflows with cancellation regions and or-joins
Inform. Softw. Technol.
(2009) - et al.
Activity labeling in process modeling: empirical insights and recommendations
Inform. Syst.
(2010)
Seven process modeling guidelines (7PMG)
Inform. Softw. Technol.
Semantics and analysis of business process models in bpmn
Inform. Softw. Technol.
On the refactoring of activity labels in business process models
Inform. Syst.
The critical success factors of business process management
Int. J. Inform. Manag.
Automatic execution of business process models: Exploiting the benefits of model-driven engineering approaches
J. Syst. Softw.
Potential Pitfalls of Process Modeling: part A
Bus. Process Manag. J.
Understanding and controlling software costs
J. Parametr.
Guide to the Software Engineering Body of Knowledge - SWEBOK
Efficient consistency measurement based on behavioral profiles of process models
IEEE Trans. Software Eng.
Pattern-based translation of bpmn process models to bpel web services
Int. J. Web Services Res. ((IJWSR))
Verification of workflow nets
Appl. Theory Petri Nets
Business Process Management: concepts, Languages, Architectures
Soundness of workflow nets: classification, decidability, and analysis
Formal Asp. Comput
A formal approach to workflow analysis
Inform. Syst. Res.
Supporting distributed conceptual modelling through naming conventions-a tool-based linguistic approach
Enterp. Model. Inform. Syst. Architect.
Cited by (9)
Natural language processing-enhanced extraction of SBVR business vocabularies and business rules from UML use case diagrams
2020, Data and Knowledge EngineeringCitation Excerpt :At the same time, one must admit that obtaining suitable corpora might be problematic, if it concerns languages, which are overall less widely used and researched. Further, Leopold et al. [54] introduced the notion of canonicity (which could be interpreted similarly to atomicity) to describe business processes consisting of one action, one business object, and no more than one addition. This notion is used to describe refactoring for activities with labels that conform to activity naming antipatterns in [26]; hence, the principles in this paper could be one of the extension points in the post-processing step, as use case modeling also suffers from similar problems (Section 4.2).
Using Natural Language Processing for Biometric Identification Optimizatoin
2023, 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2023Utilizing Mixture Methods for Classifier in NLP: An Essential Consideration
2023, 2023 International Conference on Artificial Intelligence and Smart Communication, AISC 2023A NLP-Oriented Methodology to Enhance Event Log Quality
2021, Lecture Notes in Business Information ProcessingExtending drag-and-drop actions-based model-to-model transformations with natural language processing
2020, Applied Sciences (Switzerland)
Dr. Henrik Leopold is an assistant professor with the Department of Computer Science at the VU University Amsterdam. His research interests include business process modelling, natural language processing techniques, process model matching, and process architectures. He obtained a doctoral degree as well as an MSc in Information Systems from the Humboldt-Universität zu Berlin and a Bachelor degree in Information Systems from the Berlin School of Economics. After being a post-doc at the Humboldt-Universität zu Berlin, he joined the WU Vienna as an assistant professor from April 2014 to January 2015. His research has been published, among others, in Decision Support Systems, IEEE Transactions on Software Engineering, and Information Systems. His doctoral thesis received the German Targion Award 2014 for the best dissertation in the field of strategic information management.
Dr. Fabian Pittke is a research assistant with WU Vienna. He received an MSc degree in business informatics from the Institute of Information Systems (IWI), Universität des Saarlandes, Germany, in 2010. He has been a research fellow with Humboldt-Universität zu Berlin until 2012 and is, since then, an external research fellow at the Institute for Information Business at Wirtschaftsuniversität Wien. His research focuses on linguistic aspects of process models. His research interests include business process modeling and natural language processing techniques.
Prof. Dr. Jan Mendling is a full professor and head of the Institute for Information Business at WU Vienna. His research areas include Business Process Management, Conceptual Modelling and Enterprise Systems. He studied Business Computer Science at University of Trier (Germany) and UFSIA Antwerpen (Belgium), received a PhD degree from WU Vienna (Austria). After being a postdoc with QUT Brisbane (Australia) and a junior professor at HU Berlin (Germany), he moved back to WU in 2011. He has published more than 200 research papers and articles, among others in ACM Transactions on Software Engineering and Methodology, IEEE Transaction on Software Engineering, Information Systems, Data & Knowledge Engineering, and Decision Support Systems. He is member of the editorial board of three international journals, one of the founders of the Berlin BPM Community of Practice (www.bpmb.de), and board member of the Austrian Gesellschaft für Prozessmanagement. His Ph.D. thesis has won the Heinz-Zemanek-Award of the Austrian Computer Society and the German Targion Award for dissertations in the area of strategic information management.