Abstract
With the growing investments in getting to know and controlling their business processes, organizations produce many business process models. These models have become crucial instruments in the process lifecycle and therefore it is important that they are correct and clear representations of reality. They should contain as few errors and confusions as possible. Because we assume a causal relation between confusion and errors, we investigated it empirically. For our observation group, the data shows a correlation and temporal ordering between both. More in detail, avoiding implicit and redundant events and gateways is related with making less errors.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Nowadays, organizations put a lot of effort in documenting and analyzing their processes with the aid of graphical representations, i.e., with business process models [1, 2]. The de facto standard language they use for these models is BPMN [3]. The models are typically constructed by someone who has gained expertise in modeling through training and practice (“a modeler”), and with the input of people who know the process very well (“the process owners”) [4]. Unfortunately, case studies show that the quality of the produced models is poor, because they are often ambiguous, incomplete, or wrong [5,6,7]. Causes for these issues are categorized in two types: knowledge problems (the modeler has imperfect knowledge of the goal, the end users, the process, or the modeling language) and cognitive problems (the modeler does not succeed in externalizing their knowledge about the process perfectly in the model) [8]. This paper aims to provide a contribution towards solving the latter kind of problems by analyzing the impact of confusion during modeling on making errors.
Because of the complexity of the study topic, the scope is limited in several ways in order to provide initial knowledge that may be of importance when working towards a solution.
-
The language was reduced to its minimum by excluding all but the 6 fundamental constructs for modeling the control flow of processes [9] in order to focus first on essential aspects only.
-
The modeling was reduced to an assignment of translating a pre-structured textual description of a process into a graphical business process model in order to focus first on cognitive problems and the task of the modeler only.
-
The analysis concerns syntax issues only in order to focus first on the type of issues that can be determined most objectively.
-
The study used observational data of master students as a proxy for novice modelers in order to reduce the variability of modeling experience, domain knowledge, problem solving skills, etc. in the dataset.
In our study of the impact of confusion on making errors during modeling, an existing dataset was explored. It contains tool operations of 146 modelers and it was enriched with the timing of adding/removing confusing constructs and of adding/removing syntactical errors in the model. Next, the relation between confusion and errors was studied in detail. The conclusion is not surprising: confusion correlates with making more and solving less errors. More in particular, the study confirms the importance of using explicit events and gateways in a well-structured way.
Yet, since completing the data and constructing the proof for this conclusion proved to be challenging (even under the complexity-reducing limitations mentioned above), we believe that it is important to publish this study. In fact, this is the first paper to provide such detailed explanatory and statistical material on the relation between confusion and errors during modeling. It answers to the criticism in literature about process modeling guidelines, some of which lack an empirical foundation. It provides input for teachers who may now be able to teach their students how to avoid certain errors instead of detecting and correcting them. It provides useful input for tool developers who can improve their syntax support during modeling. On the other hand, the external validity of the research is not investigated, and readers should be cautious when generalizing the results of this paper.
This paper is structured as follows. Section 2 presents related work. Section 3 describes the data collection and Sect. 4 discusses the data analysis. Section 5 provides a discussion. Section 6 provides a summarizing conclusion.
2 Background and Related Work
The research described in this article builds further on the work of De Bock and Claes [10], who studied the origin and evolution of syntax issues in simple BPMN models. They introduced a classification scheme, which is partially adopted here. The scheme makes a distinction between errors, irresolutions, and confusions (cf. Table 1). A syntax error is a clear fault against the specified syntax of the modeling language. Further, since some constructs or their meaning are not clearly or consistently defined in the BPMN syntax specification, the paper distinguishes these from errors and calls them irresolutions. Next, some constructs are clearly syntactically correct, but they are still considered as constructions to avoid. These are called confusions. For this paper, because of their ambiguity, we decided to consider irresolutions also as confusing syntactical constructs.
Second, the scheme recognizes the fact that at certain times during modeling one cannot be sure whether a missing construct is an issue, since its placement can be deliberately postponed or unintentionally forgotten. The proposed way of dealing with this, is to make a distinction between uncertain issues (certain missing constructs in incomplete parts of the model) and definite issues (missing constructs in completed parts of the model, and wrong constructs). For this paper, we consider both the making and solving of issues, further neglecting the difference between uncertain and definite. Only when an uncertain issue evolves into its definite issue equivalent, we disregarded the “solving of the uncertain issue” and the “making of the definite issue” in order to capture the making of the issues at the earliest time.
This paper contributes to the process model quality literature. Since there are ample papers about this topic, we refer to the recent, extensive literature reviews of Figl [11] and De Meyer and Claes [12]. Besides literature about what is a high-quality process model, there is also a growing body of literature about how process models of high quality can be constructed. For example, Guidelines of Modeling provides high-level recommendations such as to optimize correctness, relevance, economic efficiency, clarity, comparability, and systematic design [13]. Nevertheless, it was criticized for its lack of concrete, operational support and the lack of empirical evidence. On the other extreme, the Seven Process Modeling Guidelines [14], Ten Process Modeling Guidelines [15], and Quality Indicators related to Gateway Complexity [16] take a dominantly empirical angle as to finding the optimal thresholds for certain process model metrics such as the size of the model, the number and nesting depth of gateways, etc. Further, Concrete [17] and Abstract Syntax modification Patterns [18] bundled detailed guidelines for process modeling from existing literature, software, and cases. What they all have in common, is that the focus is on how to improve process models, and less on why certain guidelines have certain effects on the quality of the produced models (i.e., the cognitive aspect). The current paper aims to address this gap by studying the relations between different factors of process model quality. More in particular, it relates the amount of syntactical confusion in a model with making and solving errors during modeling.
3 Data Collection
As mentioned, this study builds on an existing dataset used in previous research [8, 10]. The dataset contains all the operations of 146 modelers in our modeling tool while constructing a graphical model from a textual process description.Footnote 1 For each operation, the dataset contains, the type (e.g. ‘create activity’, ‘move xor gateway’, ‘delete edge’), the time, the id of the model element to which it applies, and the code(s) of the issue(s) that is/are made or solved by the operation. For this study, we could use only 122 of the 146 modeling sessions, disregarding the models that contain no confusions nor errors, and Petri-Net-style models (using event symbols for places).
From these data, we derived for each modeling session the measures presented in Table 2. When considering operations in our study, the ‘move’ operations are disregarded. They are not relevant, because they have no impact on syntax issues.
The above measures represent totals over the whole session. At each operation that created or removed a syntax issue, the measures in Table 3 were calculated as well.
4 Data Analysis
4.1 General Relations Between Confusion and Errors
First, we look at the relation between confusion (CI) and errors (E) at the model level. Correlations are calculated between the variables of Table 2. The results are displayed in Table 4.
The next conclusions can be drawn from Table 4.
Made Versus Solved.
It can be noted that the more confusions are made (CI+), the moreFootnote 2 confusions are solved (CI−). Similarly, the more errors made (E+), the more (see Footnote 2) errors solved (E−). This makes sense, since issues that are not made cannot be solved. Yet, these two correlations are important to interpret the following results.
Made/solved Versus Total.
The more confusions made (CI+), the more (see Footnote 2) remain in the final model (CI). The more errors made (E+), the more (see Footnote 2) remain in the final model (E). This appears to be obvious but given the previously discussed correlation between confusions/errors made and confusions/errors solved, it is important to verify that not all issues are solved in the end. This also explains why solving more confusions/errors (CI−/E−) does not result in less confusions/errors in the final model (CI/E).
Making/Solving Versus Operations and Time.
Making more confusions (CI+) does not per se cost more operations (Op) or time (T). Solving more confusions (CI−) costs moreFootnote 3 operations (Op) and more (see Footnote 2) time (T). Interestingly, making and solving errors are both related with more (see Footnote 2) operations (Op), and more (see Footnote 3) time (T). This is consistent with the findings by Bolle and Claes [19].
Confusions Versus Errors.
The most interesting part of Table 4 however, is the relation between confusions and errors. Making more confusions (CI+) is related to making more (see Footnote 3) errors (E+), solving less (see Footnote 2) errors (E−) and more (see Footnote 2) errors remaining (E). Solving confusions (CI−) is not found to be related to making, solving, or remaining errors. Remaining confusions (CI) is related to making more (see Footnote 3) errors (E+), solving less (see Footnote 3) errors (E−), and to more (see Footnote 2) errors remaining (E).
In summary, these are statistically (in)significant indications of a positive relation between confusions (CI+/CI−/CI) and errors (E+/E−/E). However, the above tests neglect the timing of confusions and errors. Since we assume a causal relation between confusion and errors, it is more precise to verify whether more errors are made/solved at the time that more confusion exists in the model. This does not proof causality, but temporal ordering is a necessary condition for causality (on top of correlation). Let us now examine this in more detail.
4.2 More Errors are Made when More Confusion Exists in the Model
For the analysis of the relation between errors made (e+) during modeling and the number of confusions existing in the model (ci), only those entries in the dataset are selected where errors are made. Note that a single operation can make one error and at the same time solve another one. The net effect on e is zero, but this record is still included. The correlation results are summarized in Table 5.
As can be seen, there is no statistically significant relation in our dataset between the number of errors made (e+) and the absolute number of confusions present in the partial models at the time of the error (ci). The reason for this may be because the number of errors that can be made is dependent on the number of operations of the modeler. Therefore, the data was normalized. The number of errors made is considered per operation made (pe+ = e+/op). It represents the chance for an operation to be an error. The number of confusions in the partial models is considered relative to the maximum number in the model ([ci] = ci/max(ci)). This gives a more correct comparison between models and between number of operations. The correlation analysis now shows a statistically significant relation.
The chart in Table 5 shows a more detailed view of the relation in our dataset. It represents the deciles of the confusion level ([ci]) and sets out the average chance to make errors (pe+), as well as a linear trendline of this relation. In general, it can be concluded that the more relative confusions were present in the partial model, the more (see Footnote 2) the chance was to make errors.
4.3 Less Errors are Solved when More Confusion Exists in the Model
For the analysis of the relation between errors solved (e−) during modeling and the number of confusions existing in the model (ci), only those entries in the dataset are selected when errors are solved. The correlation results are summarized in Table 6.
Again, there is no statistically significant relation in the dataset between the number of errors solved (e−) and the absolute number of confusions in the partial models (ci). The number of errors that can be solved depends on the number of errors existing in the model and on the number of operations of the modeler. The data was again normalized for more correct comparison. The number of errors solved is considered per number of errors existing ([e−] = e−/e) and per operation made (p[e−] = [e−]/op). The correlation analysis now shows statistically significant relations.
The chart in Table 6 shows the more detailed view of the relation in our dataset. It represents the deciles of the confusion level ([ci]) and sets out the average chance to solve errors (p[e−]), as well as a linear trendline of this relation. In general, it can be concluded that the more relative confusions were present in the partial model, the less (see Footnote 2) chance there was to solve errors.
4.4 Which Confusions Cause which Errors?
Next, the effects of individual confusions, irresolutions, and errors are studied. We limited the study to those variables who have been observed at least 50 times, which can be derived from Table 1. As such, the effect of confusions multiple start events (cS), multiple end events (cE), multiple optional sequence flows towards non-gateway (cJx), no label for edge departing from XOR splits (cLx), and irresolution one gateway combines a join and split feature (iC) on the errors not all paths are closed (eP), multiple optional sequence flows from non-gateway (eSx), and gateway with only 1 incoming and 1 outgoing sequence flow (e1e) are calculated. The results are summarized in Table 7. They are discussed below, column per column.
Not closing all paths of the model (eP+) happens more (see Footnote 2) when there are multiple end events in the model (cE) and when more edges from xor gateways are not labeled (cLx). We expect that using multiple end events causes the modeler to forget to close all paths in the model (H1), whereas we do not assume a causal relation between not labeling edges and forgetting to close all paths. On the other hand, these confusions do not appear to relate (see Footnote 3) with solving this error.
Using an implicit xor split gateway (eSx+) is not allowed by the BPMN syntax. This error was made more (see Footnote 2) when there were multiple start events (cS), when the modeler used also more implicit xor join gateways (cJx) and gateways that combined split and join functionalities (iC). We propose that people who use implicit gateways are not always aware of when this is allowed (cS, cJx, iC) and when not (eSx) (H2). Surprisingly, these errors are more (see Footnote 2) solved when there are more start events in the model.
Having gateways in the model that are not used for splitting or joining multiple paths (e1e+) happens more (see Footnote 2) when there are multiple start events (cS) or multiple end events (cE) in the model, and when there are more edges from xor gateways without labels (cLx). Perhaps having multiple start and end events increases the structural complexity of the model, causing the modeler to forget adding the postponed paths for which a gateway was already created (H3). We do not assume a causal relation between the labels of edges and forgetting some paths at gateways, which is supported by the unexpected correlation with solving these errors.
This more detailed analysis does not bring conclusive answers. It contributes to the study by adding preliminary insights that can be derived from the statistics. They are formulated in the form of hypotheses (H1-H3), which can be studied in future work.
5 Discussion
5.1 Impact
The impact on research of this study is that it provides (additional) empirical evidence for a number of proposed process modeling guidelines of the Seven Process Modeling Guidelines (7PMG) [14], Ten Process Modeling Guidelines (10PMG) [15], Concrete Syntax Patterns (CSP) [17], Abstract Syntax Patterns (ASP) [18], and Quality Indicators (QI) [16]. The general lack of such evidence on the content, interrelations, and relevance of these guidelines is denounced in multiple critiques [20,21,22,23]. The guidelines to which supporting (additional) evidence is formulated in this paper, are listed below.
-
Use 1 start and 1 end event (7PMG) and Use no more than 2 start and end events (10PMG). This relates to confusions S, E, 0s, 0e; errors 0se, 0es; and hypotheses H1, H2, H3.
-
Model as structured as possible (7PMG, 10PMG, ASP) and Use design patterns to avoid mismatch (10PMG). This proposes to use explicit and paired gateways, which should avoid confusions Sa, Jx; irresolutions C, W, I, N, T, DS; and errors Sx, Ja, 1e; and it relates to hypothesis H2.
-
Use explicit representation (CSP). This may refer to avoiding implicit events and gateways, which is related to confusions 0s, 0e, Sa, Jx; irresolutions C, I, T, DS; errors 0se, 0es, Sx, Ja; and hypothesis H2.
-
Limit the difference in the number of input/output flows between splits and joins (QI) refers to both pervious examples, because it is realized by structured modeling and/or using explicit gateways.
-
Use of textual annotation (CSP) and Naming guidance (CSP). This links to the use and format of text in models, which relates to the confusion Lx, the irresolution La, the error Ls, and the discussion about Lx in Sect. 4.4.
The impact on practice is that the research provides extra insights into the relation between confusion and errors to practitioners. Even in models that are made as input for computer programs, where only pure syntax errors could seem to be important, it now appears to be important to avoid confusing constructs as well, since they may cause modeling errors during modeling. Besides the modelers, this research should also support teachers. It is always easier to train modelers to apply certain guidelines when the reason why they are important can be illustrated. This paper contributes to such illustration of the importance to also focus on avoiding confusion, being a potential cause for errors. Third, tool developers have spent a great deal of effort to support modelers in avoiding syntax errors by highlighting them or by providing an overview of the syntax errors after modeling. The current study provides input to add a level of warnings to their support features (just as programming editors do).
5.2 Limitations and Future Work
As discussed in the introduction, the scope of the research is limited. The reduced language, the artificial case and artificial modelers limit the ecological validity of the research. The focus on syntax only, on the modeler’s contribution only, on one case only, and the lack of focus on consequences for the readers of a model put a limit to the external validity of the research. Therefore, one should be cautious to generalize the relations discussed above. The research should be considered as an explorative empirical study that provides initial insights and hypotheses for further research.
In Sect. 4.4, three hypotheses are formulated, which can be studied further. But more generally, it would be useful to study systematically the effect of all listed confusions and irresolutions, both on making errors during modeling and on the final user understandability of the model. Further, although methodologically more challenging, it is advised to include not only pragmatic (cf. user-understanding), but also semantic quality into the research. Using the same artificial setup where a modeler is instructed to create a diagram representing the knowledge described in a textual description, it is possible to use a similar methodology as the one applied here to derive the timing and type of semantic errors (missing, wrong, redundant, inconsistent, and unnecessary constructs), and to study their interrelations and links with syntax and understanding.
6 Conclusion
The data of 122 (of 146) modeling sessions was used to build a dataset containing the timing of all operations to construct the model. By adding whether each operation initiated or solved certain types of syntax issues, we were able to study the relation between confusing and wrong syntax constructs. In general, the conclusion is that confusion may lead to errors and therefore it should be avoided as much as real errors. The contribution of this paper is not in this conclusion per se, but in its detailed proof (31.588 operations were analyzed, and 2.489 syntax issues were documented), and explanatory knowledge that is added to this conclusion. It provides interesting knowledge about the presence of various types of confusing constructs and syntax errors and their potential (causal) relations.
Notes
- 1.
For details, see the 2015 experiment at https://www.janclaes.info/experiments.
- 2.
Statistically significant result.
- 3.
Results are not statistically significant.
References
Moreno-Montes De Oca, I., Snoeck, M., Reijers, H.A., et al.: A systematic literature review of studies on business process modeling quality. Inf. Softw. Technol. 58, 187–205 (2015)
Aguilar-Savén, R.S.: Business process modelling: review and framework. Int. J. Prod. Econ. 90, 129–149 (2004)
Recker, J.: Opportunities and constraints: the current struggle with BPMN. Bus. Process Manag. J. 16, 181–201 (2010)
Grosskopf, A., Edelman, J., Weske, M.: Tangible business process modeling – methodology and experiment design. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 489–500. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12186-9_46
Hassan, N., Recker, J., Bernhard, E.: A study of the use of business process modelling at Suncorp, Brisbane, Australia (2011)
Gruhn, V., Laue, R.: What business process modelers can learn from programmers. Sci. Comput. Program. 65, 4–13 (2007)
Mendling, J., Verbeek, H.M.W., Van Dongen, B.F., et al.: Detection and prediction of errors in EPCs of the SAP reference model. Data Knowl. Eng. 64, 312–329 (2008)
Claes, J., Vanderfeesten, I., Gailly, F., et al.: The structured process modeling method (SPMM) - what is the best way for me to construct a process model? Decis. Support Syst. 100, 57–76 (2017)
zur Muehlen, M., Recker, J.: How much language is enough? theoretical and practical use of the business process modeling notation. In: Bellahsène, Z., Léonard, M. (eds.) CAiSE 2008. LNCS, vol. 5074, pp. 465–479. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69534-9_35
De Bock, J., Claes, J.: The origin and evolution of syntax errors in simple sequence flow models in BPMN. In: Matulevičius, R., Dijkman, R. (eds.) CAiSE 2018. LNBIP, vol. 316, pp. 155–166. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92898-2_13
Figl, K.: Comprehension of procedural visual business process models - a literature review. Bus. Inf. Syst. Eng. 59, 41–71 (2017)
De Meyer, P., Claes, J.: An overview of process model quality literature - The Comprehensive Process Model Quality Framework (2018)
Becker, J., Rosemann, M., von Uthmann, C.: Guidelines of business process modeling. In: van der Aalst, W., Desel, J., Oberweis, A. (eds.) Business Process Management. LNCS, vol. 1806, pp. 30–49. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45594-9_3
Mendling, J., Reijers, H.A., Van der Aalst, W.M.P.: Seven process modeling guidelines (7PMG). Inf. Softw. Technol. 52, 127–136 (2010)
Mendling, J., Sánchez-González, L., García, F., et al.: Thresholds for error probability measures of business process models. J. Syst. Softw. 85, 1188–1197 (2012)
Sánchez-González, L., García, F., Ruiz, F., et al.: Quality indicators for business process models from a gateway complexity perspective. Inf. Softw. Technol. 54, 1159–1174 (2012)
La Rosa, M., Ter Hofstede, A.H.M., Wohed, P., et al.: Managing process model complexity via concrete syntax modifications. IEEE Trans. Ind. Informatics. 7, 255–265 (2011)
La Rosa, M., Wohed, P., Mendling, J., et al.: Managing process model complexity via abstract syntax modifications. IEEE Trans. Ind. Informatics. 7, 614–629 (2011)
Bolle, J., Claes, J.: Investigating the trade-off between the effectiveness and efficiency of process modeling. In: Daniel, F., Sheng, Quan Z., Motahari, H. (eds.) BPM 2018. LNBIP, vol. 342, pp. 121–132. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11641-5_10
Chen, C.: Top 10 unsolved information visualization problems. IEEE Comput. Graph. Appl. 25, 12–16 (2005)
Nelson, H.J., Poels, G., Genero, M., et al.: A conceptual modeling quality framework. Softw. Qual. J. 20, 201–228 (2012)
Rockwell, S., Bajaj, A.: COGEVAL: applying cognitive theories to evaluate conceptual models. Adv. Top. Database Res. 4, 255–282 (2005)
Rogers, Y., Scaife, M.: How can interactive multimedia facilitate learning? In: Lee, J. (ed.) 1st International Workshop on Intelligence and Multimodality in Multimedia Interfaces. Research and Applications, pp. 1–25. AAAI (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Claes, J., Vandecaveye, G. (2019). The Impact of Confusion on Syntax Errors in Simple Sequence Flow Models in BPMN. In: Proper, H., Stirna, J. (eds) Advanced Information Systems Engineering Workshops. CAiSE 2019. Lecture Notes in Business Information Processing, vol 349. Springer, Cham. https://doi.org/10.1007/978-3-030-20948-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-20948-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20947-6
Online ISBN: 978-3-030-20948-3
eBook Packages: Computer ScienceComputer Science (R0)