Elsevier

Data & Knowledge Engineering

Volume 111, September 2017, Pages 22-38
Data & Knowledge Engineering

Ensuring the canonicity of process models

https://doi.org/10.1016/j.datak.2017.03.010Get rights and content

Abstract

Process models play an important role for specifying requirements of business-related software. However, the usefulness of process models is highly dependent on their quality. Recognizing this, researches have proposed various techniques for the automated quality assurance of process models. A considerable shortcoming of these techniques is the assumption that each activity label consistently refers to a single stream of action. If, however, activities textually describe control flow related aspects such as decisions or conditions, the analysis results of these tools are distorted. Due to the ambiguity that is associated with this misuse of natural language, also humans struggle with drawing valid conclusions from such inconsistently specified activities. In this paper, we therefore introduce the notion of canonicity to prevent the mixing of natural language and modeling language. We identify and formalize non-canonical patterns, which we then use to define automated techniques for detecting and refactoring activities that do not comply with it. We evaluated these techniques by the help of four process model collections from industry, which confirmed the applicability and accuracy of these techniques.

Introduction

Process models are an important means to specify requirements in business-related software development projects [1]. Nevertheless, practitioners often struggle with the definition of fully correct and meaningful models [2]. The reasons for this are manifold. For instance, many modelers in practice have limited modeling experience [3], modeling projects often involve an overwhelming number of models [3], and the work of modelers involved in one project is difficult to coordinate [4]. The implications of incorrect and inconsistent models are severe. In the worst case, they entail wrong design decisions and a considerable increase of the overall development costs [5], [6].

To ensure process model correctness and consistency, researchers proposed several automated analysis techniques. Such techniques can, for instance, check whether a process model contains deadlocks [7], is compliant with expected behavior [8], and meets predefined naming conventions [9]. The shortcoming of these techniques is, however, that they already make assumptions about the way modelers have used natural language to label the process model activities. As a result, these techniques are hardly of any help if the logic of modeling and routing elements is textually described in activity labels. As an example, consider the activity “Consult expert and prepare report” from one of the models we encountered in practice. Apparently, this activity label consists of two separate activities, i.e., “consult expert” and “prepare report”, which are linked using the conjunction “and”. The problem is that the execution semantics between these linked activity parts is not clearly defined. The word “and” might either refer to a parallel or a sequential execution. The specification of the activity as in this example mixes natural language and control structure in a way that is inherently ambiguous. This makes it impossible to draw valid conclusions from formal analysis results and, thus, difficult to develop process-related systems that are in line with the specification.

In this paper, we address this problem by introducing the notion of canonicity to prevent the mixing of natural language and modeling language. Based on this notion, we automatically check for problems caused by the violation of canonicity and point to reworks for resolving them. More specifically, we provide the following contributions. First, we introduce the notion of canonicity for process models and provide an operationalization of the concept. Second, we formalize a number of non-canonical patterns we discovered in models from practice. Third, we develop algorithms to recognize whether a given label suffers from these patterns and to refactor the detected cases into canonical model fragments. In order to demonstrate the applicability of the proposed techniques, we conduct extensive experiments with four real-world process model collections.

The rest of the paper is structured as follows. Section 2 illustrates the problems of non-canonical process model activity labels and reviews how prior research approaches have addressed this issue. Section 3 explains how we operationalize the notion of canonicity and our strategies to recognize and refactor instances that do not comply with it. Section 4 evaluates our techniques with process model collections from practice. In Section 5, we discuss implications and limitations of our work, before Section 6 concludes the paper.

Section snippets

Background

This section introduces the background of our research. First, Section 2.1 illustrates the problem of mixing modeling language and natural language and reflects upon the implications of non-canonical activities for system analysis and design. Section 2.2 discusses in how far prior research from the field of process model analysis has addressed the issue of non-canonical activities.

A technique for ensuring canonicity in process models

In this section, we present our technique for recognizing and refactoring non-canonical process model activities. In Section 3.1, we operationalize the concept of canonicity and provide a formal definition. In Section 3.2 we introduce the formalism for recognizing non-canonical process model activities. In Section 3.3, we then introduce our technique to refactor them.

Evaluation

In this section, we present the results of an evaluation with four large process model collections. The goal of the evaluation was to demonstrate the applicability of the presented techniques in terms of accuracy. Section 4.1 first discusses the evaluation setup. Section 4.2 then introduces the test data of our evaluation. Sections 4.3 and 4.4 finally present the experimental results of the detection and the refactoring.

Implications

This section discusses the implications of our research. Section 5.1 and 5.2 identify implications of our work for research and for practice. Section 5.3 reflects upon threats to validity.

Conclusion

In this paper, we introduced the notion of canonicity in order to prevent the mix natural of natural language and modeling language within one process model activity. To this end, we formalized the notion of canonicity and reoccurring patterns that violate canonicity. Based on these formalizations, we designed techniques for the automatic recognition and refactoring of these patterns. As shown in the evaluation experiments, the proposed techniques are capable of recognizing and correcting the

Dr. Henrik Leopold is an assistant professor with the Department of Computer Science at the VU University Amsterdam. His research interests include business process modelling, natural language processing techniques, process model matching, and process architectures. He obtained a doctoral degree as well as an MSc in Information Systems from the Humboldt-Universität zu Berlin and a Bachelor degree in Information Systems from the Berlin School of Economics. After being a post-doc at the

References (68)

  • J. Mendling et al.

    Seven process modeling guidelines (7PMG)

    Inform. Softw. Technol.

    (2010)
  • R.M. Dijkman et al.

    Semantics and analysis of business process models in bpmn

    Inform. Softw. Technol.

    (2008)
  • H. Leopold et al.

    On the refactoring of activity labels in business process models

    Inform. Syst.

    (2012)
  • P. Trkman

    The critical success factors of business process management

    Int. J. Inform. Manag.

    (2010)
  • J. Fabra et al.

    Automatic execution of business process models: Exploiting the benefits of model-driven engineering approaches

    J. Syst. Softw.

    (2012)
  • E. Cardoso, J. Almeida, G. Guizzardi, Requirements engineering based on business process models: A case study, in:...
  • H. Leopold, J. Mendling, O. Gunther, What we can learn from quality issues of bpmn models from industry, IEEE...
  • M. Rosemann

    Potential Pitfalls of Process Modeling: part A

    Bus. Process Manag. J.

    (2006)
  • B.W. Boehm

    Understanding and controlling software costs

    J. Parametr.

    (1988)
  • A. Abran et al.

    Guide to the Software Engineering Body of Knowledge - SWEBOK

    (2004)
  • M. Weidlich et al.

    Efficient consistency measurement based on behavioral profiles of process models

    IEEE Trans. Software Eng.

    (2011)
  • C. Ouyang, M. Dumas, S. Breutel, A. ter Hofstede, Advanced Information Systems Engineering, in: Translating standard...
  • C. Ouyang et al.

    Pattern-based translation of bpmn process models to bpel web services

    Int. J. Web Services Res. ((IJWSR))

    (2008)
  • W.M.P. van der Aalst

    Verification of workflow nets

    Appl. Theory Petri Nets

    (1997)
  • M. Weske

    Business Process Management: concepts, Languages, Architectures

    (2012)
  • J. Mendling, M. Nüttgens, EPC syntax validation with XML schema languages, in: M. Nüttgens, F. J. Rump, , (Eds.), EPK...
  • J. Dehnert, P. Rittgen, Relaxed soundness of business processes, in: Advanced Information Systems Engineering,...
  • F. Puhlmann, Soundness verification of business processes specified in the pi-calculus, in: R. Meersman, Z. Tari...
  • W.M.P. van der Aalst et al.

    Soundness of workflow nets: classification, decidability, and analysis

    Formal Asp. Comput

    (2011)
  • A. Basu et al.

    A formal approach to workflow analysis

    Inform. Syst. Res.

    (2000)
  • J. Becker, P. Delfmann, S. Herwig, L. Lis, A. Stein, Formalizing Linguistic Conventions for Conceptual Models, in:...
  • J. Becker, P. Delfmann, S. Herwig, L. Lis, A. Stein, Towards Increased Comparability of Conceptual Models - Enforcing...
  • P. Delfmann et al.

    Supporting distributed conceptual modelling through naming conventions-a tool-based linguistic approach

    Enterp. Model. Inform. Syst. Architect.

    (2009)
  • F. Pittke, H. Leopold, J. Mendling, Spotting terminology deficiencies in process model repositories, in: S. Nurcan,...
  • Cited by (9)

    • Natural language processing-enhanced extraction of SBVR business vocabularies and business rules from UML use case diagrams

      2020, Data and Knowledge Engineering
      Citation Excerpt :

      At the same time, one must admit that obtaining suitable corpora might be problematic, if it concerns languages, which are overall less widely used and researched. Further, Leopold et al. [54] introduced the notion of canonicity (which could be interpreted similarly to atomicity) to describe business processes consisting of one action, one business object, and no more than one addition. This notion is used to describe refactoring for activities with labels that conform to activity naming antipatterns in [26]; hence, the principles in this paper could be one of the extension points in the post-processing step, as use case modeling also suffers from similar problems (Section 4.2).

    • Using Natural Language Processing for Biometric Identification Optimizatoin

      2023, 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2023
    • Utilizing Mixture Methods for Classifier in NLP: An Essential Consideration

      2023, 2023 International Conference on Artificial Intelligence and Smart Communication, AISC 2023
    • A NLP-Oriented Methodology to Enhance Event Log Quality

      2021, Lecture Notes in Business Information Processing
    View all citing articles on Scopus

    Dr. Henrik Leopold is an assistant professor with the Department of Computer Science at the VU University Amsterdam. His research interests include business process modelling, natural language processing techniques, process model matching, and process architectures. He obtained a doctoral degree as well as an MSc in Information Systems from the Humboldt-Universität zu Berlin and a Bachelor degree in Information Systems from the Berlin School of Economics. After being a post-doc at the Humboldt-Universität zu Berlin, he joined the WU Vienna as an assistant professor from April 2014 to January 2015. His research has been published, among others, in Decision Support Systems, IEEE Transactions on Software Engineering, and Information Systems. His doctoral thesis received the German Targion Award 2014 for the best dissertation in the field of strategic information management.

    Dr. Fabian Pittke is a research assistant with WU Vienna. He received an MSc degree in business informatics from the Institute of Information Systems (IWI), Universität des Saarlandes, Germany, in 2010. He has been a research fellow with Humboldt-Universität zu Berlin until 2012 and is, since then, an external research fellow at the Institute for Information Business at Wirtschaftsuniversität Wien. His research focuses on linguistic aspects of process models. His research interests include business process modeling and natural language processing techniques.

    Prof. Dr. Jan Mendling is a full professor and head of the Institute for Information Business at WU Vienna. His research areas include Business Process Management, Conceptual Modelling and Enterprise Systems. He studied Business Computer Science at University of Trier (Germany) and UFSIA Antwerpen (Belgium), received a PhD degree from WU Vienna (Austria). After being a postdoc with QUT Brisbane (Australia) and a junior professor at HU Berlin (Germany), he moved back to WU in 2011. He has published more than 200 research papers and articles, among others in ACM Transactions on Software Engineering and Methodology, IEEE Transaction on Software Engineering, Information Systems, Data & Knowledge Engineering, and Decision Support Systems. He is member of the editorial board of three international journals, one of the founders of the Berlin BPM Community of Practice (www.bpmb.de), and board member of the Austrian Gesellschaft für Prozessmanagement. His Ph.D. thesis has won the Heinz-Zemanek-Award of the Austrian Computer Society and the German Targion Award for dissertations in the area of strategic information management.

    View full text