Abstract
The focus in the field of process mining, and process discovery in particular, has thus far been on exploring and describing event data by the means of models. Since the obtained models are often directly based on a sample of event data, the question whether they also apply to the real process typically remains unanswered. As the underlying process is unknown in real life, there is a need for unbiased estimators to assess the system-quality of a discovered model, and subsequently make assertions about the process. In this paper, an experiment is described and discussed to analyze whether existing fitness, precision and generalization metrics can be used as unbiased estimators of system fitness and system precision. The results show that important biases exist, which makes it currently nearly impossible to objectively measure the ability of a model to represent the system.
Similar content being viewed by others
Notes
In practice, these costs can be configured for each activity type individually, to reflect that certain deviations should be penalized more than others.
Optimal alignments are the alignments for which the cost is minimized.
The types of noise used have been defined based on existing literature (Maruster 2003). However, for future experiments, a more elaborate reasoning for what qualifies as realistic noise is necessary. For example, the swapping of random activities is not really a realistic event. A detailed discussion of what can be regarded as noise is out of the scope of this paper.
References
Adriansyah A, Munoz-Gama J, Carmona J, van Dongen BF, van der Aalst WM (2015) Measuring precision of modeled behavior. Inf Syst e-Bus Manag 13(1):37–67
Agrawal R, Gunopulos D, Leymann F (1998) Mining process models from workflow logs. In: Schek HJ, Saltor F, Ramos I, Alonso G (eds) Adv Database Technol - EDBT ’98, vol 1377. Springer, Berlin, pp 467–483
Buijs JCAM (2014) Flexible evolutionary algorithms for mining structured process models. Ph.D. thesis, Technische Universiteit Eindhoven, Eindhoven
Buijs JCAM, van Dongen BF, van der Aalst WMP (2012) On the role of fitness, precision, generalization and simplicity in process discovery. In: On the move to meaningful internet systems: OTM 2012, Springer, Berlin, pp 305–322
Cook JE, Wolf AL (1995) Automating process discovery through event-data analysis. In: 17th international conference on software engineering, 1995. ICSE 1995, IEEE, pp 73–73
Datta A (1998) Automating the discovery of as-is business process models: probabilistic and algorithmic approaches. Inf Syst Res 9(3):275–301
Erickson B, Nosanchuk T (1992) Understanding data. McGraw-Hill Education, New York
Gelman A (2004) Exploratory data analysis for complex models. J Comput Gr Stat 13(4):755–779
Goedertier S, Martens D, Vanthienen J, Baesens B (2009) Robust process discovery with artificial negative events. J Mach Learn Res 10:1305–1340
Greco G, Guzzo A, Ponieri L, Sacca D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027
Janssenswillen G, Depaire B, Jouck T (2016) Calculating the number of unique paths in a block-structured process model. In: Proceedings of the international workshop on algorithms and theories for the analysis of event data 2016
Janssenswillen G, Donders N, Jouck T, Depaire B (2017) A comparative study of existing quality measures for process discovery. Inf Syst 71:1–15
Jouck T, Depaire B (Mar 2016) Generating artificial data for empirical analysis of process discovery algorithms: a process tree and log generator. Technical report, Universiteit Hasselt, Hasselt
Kunze M, Luebbe A, Weidlich M, Weske M (2011) Towards understanding process modeling-the case of the BPM academic initiative. In: International workshop on business process modeling notation, Springer, Berlin, pp 44–58
Leemans SJJ, Fahland D, van der Aalst WMP (2013) Discovering block-structured process models from event logs-a constructive approach. Appl Theory Petri Nets Concurr. Springer, Berlin, pp 311–329
Maruster L (2003) A machine learning approach to understand business processes. Technische Universiteit Eindhoven
de Medeiros AKA, Weijters AJ, van der Aalst WMP (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2):245–304
de Medeiros AKA (2006) Genetic process mining. Ph.D. thesis, Technische Universiteit Eindhoven, Eindhoven
Muñoz-Gama J, Carmona J (2010) A fresh look at precision in process conformance. In: Business process management. vol 6336, Springer, Hoboken, pp 211–226
Rogge-Solti A, Senderovich A, Weidlich M, Mendling J, Gal A (2016) In log and model we trust? In: EMISA, pp 91–94
Rozinat A, De Medeiros AA, Günther CW, Weijters A, Van der Aalst WM (2007) Towards an evaluation framework for process mining algorithms, vol 123
Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95
Tukey JW (1977) Exploratory data analysis, vol 2. Addison-Wesley, Reading, MA
Tukey JW, Wilk MB (1966) Data analysis and statistics: an expository overview. In: Proceedings of the November 7-10, 1966, fall joint computer conference, ACM, New York, pp 695–709
van der Aalst WMP, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192
van der Aalst WMP (2013) Mediating between modeled and observed behavior: the quest for the “right” process. In: IEEE international conference on research challenges in information science (RCIS 2013), pp 31–43
van der Aalst WMP (2016) Process mining: data science in action. Springer, Berlin
van der Werf JME, van Dongen BF, Hurkens CA, Serebrenik A (2008) Process discovery using integer linear programming. In: International conference on applications and theory of petri nets. Springer, Berlin, pp 368–387
van Dongen BF, Carmona J, Chatain T (2016) A unified approach for measuring precision and generalization based on anti-alignments. In: International conference on business process management. Springer, Cham
vandenBroucke SKLM, DeWeerdt J, Vanthienen Jan B, Baesens B (2014) Determining process model precision and generalization with weighted artificial negative events. IEEE Trans Knowl Data Eng 26(8):1877–1889
Weidlich M, Polyvyanyy A, Desai N, Mendling J, Weske M (2011) Process compliance analysis based on behavioural profiles. Inf Syst 36(7):1009–1025
Weijters AJMM, van Der Aalst WMP, De Medeiros AKA (2006) Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP vol 166, pp 1–34
Acknowledgements
The computational resources and services used in this work for both process discovery and process conformance tasks were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government.
Author information
Authors and Affiliations
Corresponding author
Additional information
Accepted after three revisions by Jan Mendling.
Rights and permissions
About this article
Cite this article
Janssenswillen, G., Depaire, B. Towards Confirmatory Process Discovery: Making Assertions About the Underlying System. Bus Inf Syst Eng 61, 713–728 (2019). https://doi.org/10.1007/s12599-018-0567-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12599-018-0567-8