Abstract
Traces are everywhere from information systems that store their continuous executions, to any type of health care applications that record each patient’s history. The transformation of a set of traces into a mathematical model that can be used for a formal reasoning is therefore of great value. The discovery of process models out of traces is an interesting problem that has received significant attention in the last years. This is a central problem in Process Mining, a novel area which tries to close the cycle between system design and validation, by resorting on methods for the automated discovery, analysis and extension of process models. In this work, algorithms for the derivation of a Petri net from a set of traces are presented. The methods are grounded on the theory of regions, which maps a model in the state-based domain (e.g., an automata) into a model in the event-based domain (e.g., a Petri net). When dealing with large examples, a direct application of the theory of regions will suffer from two problems: one is the state-explosion problem, i.e., the resources required by algorithms that work at the state-level are sometimes prohibitive. This paper introduces decomposition and projection techniques to alleviate the complexity of the region-based algorithms for Petri net discovery, thus extending its applicability to handle large inputs. A second problem is known as the overfitting problem for region-based approaches, which informally means that, in order to represent with high accuracy the trace set, the models obtained are often spaghetti-like. By focusing on special type of processes called conservative and for which an elegant theory and efficient algorithms can be devised, the techniques presented in this paper alleviate the overfitting problem and moreover incorporate structure into the models generated.
Similar content being viewed by others
References
Arnold A (1994) Finite transition systems. Prentice Hall, Englewood Cliffs
Badouel E, Bernardinello L, Darondeau P (1995) Polynomial algorithms for the synthesis of bounded nets. In: Theory and practice of software (TAPSOFT). Lecture notes in computer science, vol 915. pp 364–383
Bergenthum R, Desel J, Lorenz R, SMauser (2007) Process mining based on regions of languages. In: Proceedings of 5th international conference on business process management (BPM), pp 375–383
Bergenthum R, Desel J, Lorenz R, Mauser S (2008) Synthesis of Petri nets from finite partial languages. Fundam Inform 88(4): 437–468
Carmona J, Cortadella J, Kishinevsky M (2008a) Divide-and-conquer strategies for process mining. Tech. Rep. LSI-08-35-R, Software Department, Universitat Politècnica de Catalunya
Carmona J, Cortadella J, Kishinevsky M (2008b) A region-based algorithm for discovering Petri nets from event logs. In: Dumas M, Reichert M, Shan MC (eds) Proceedings of 6th international conference on business process management (BPM). Lecture notes in computer science, vol 5240. Springer, Berlin, pp 358–373
Carmona J, Cortadella J, Kishinevsky M, Kondratyev A, Lavagno L, Yakovlev A (2008c) A symbolic algorithm for the synthesis of bounded Petri nets. In: van Hee KM, Valk R (eds) 29th international conference on application and theory of Petri nets and other models of concurrency, vol 5062.
Carmona J, Cortadella J, Kishinevsky M (2009) Divide-and-conquer strategies for process mining. In: Dayal U, Eder J, Koehler J, Reijers HA (eds) Proc. 7th international conference on business process management (BPM). Lecture notes in computer science, vol 5701. Springer, Heidelberg, pp 327–343
Carmona J, Cortadella J, Kishinevsky M (2009) New region-based algorithms for deriving bounded Petri nets. IEEE Trans Comp 59(3): 371–384. doi:10.1109/TC.2009.131
Cook JE, Wolf AL (1998) Discovering models of software processes from event-based data. ACM Trans Softw Eng Methodol 7(3): 215–249
Cortadella J, Kishinevsky M, Lavagno L, Yakovlev A (1998) Deriving Petri nets from finite transition systems. IEEE Trans Comput 47(8): 859–882
Cvetković D, Rowlinson P, Simić S (1997) Eigenspaces of graphs. Cambridge University Press, Cambridge
Desel J, Reisig W (1996) The synthesis problem of Petri nets. Acta Informatica 33(4): 297–315
Dill D, Drexler A, Hu A, Yang C (1992) Protocol verification as a hardware design aid. In: Computer design: VLSI in computers and processors, 1992. ICCD ’92. Proceedings. IEEE 1992 international conference, pp 522–525
Dongen B, Busi N, Pinna G, van Der Aalst WMP (2007) An iterative algorithm for applying the theory of regions in process mining. In: Workshop on formal aspects of business processes and web services
Ehrenfeucht A, Rozenberg G (1990) Partial (Set) 2-structures. Part I, II. Acta Informatica 27: 315–368
Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In: Proceedings of the 19th conference on design automation (DAC ’82). IEEE Press, Piscataway, pp 175–181
Ghionna L, Greco G, Guzzo A, Pontieri L (2008) Outlier detection techniques for process mining applications. In: Foundations of intelligent systems, 17th international symposium, ISMIS 2008, Toronto, Canada, 20–23 May 2008. Proceedings. Lecture notes in computer science, vol 4994. Springer, Berlin, pp 150–159
Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8): 1010–1027
Günther C (2009) Process Mining in Flexible Environments. Dissertation, Technical University of Eindhoven, Eindhoven
Hack M (1972) Analysis of production schemata by Petri nets. M.S. thesis, MIT
Harel D (1987) Statecharts: A visual formulation for complex systems. Sci Comput Program 8(3): 231–274
Hoare CAR (1978) Communicating sequential processes. In: Communications of the ACM, pp 666–677
Jolliffe IT (2002) Principal component analysis. Springer, New York
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(1): 291–307
Kindler E, Rubin V, Schäfer W (2006) Process mining and Petri net synthesis. In: Eder J, Dustdar S (eds) Business process management workshops. Lecture notes in computer science, vol 4103. Springer, Heidelberg, pp 105–116
Maruster L, Weijters AJMM, van der Aalst WMP, van den Bosch A (2006) A rule-based approach for process discovery: Dealing with noise and imbalance in process logs. Data Min Knowl Discov 13(1): 67–87
McMillan KL (2001) Parameterized verification of the flash cache coherence protocol by compositional model checking. In: Margaria T, Melham TF (eds) Correct hardware design and verification methods (CHARME). Lecture notes in computer science, vol 2144. Springer, Heidelberg, pp 179–195
Medeiros AKA, van der Aalst WMP, Weijters AJMM (2003) Workflow mining: Current status and future directions. In: Meersman R, Tari Z, Schmidt DC (eds) On the move to meaningful internet systems, CoopIS/DOA/ODBASE. Lecture notes in computer science, vol 2888. Springer, Heidelberg, pp 389–406
Medeiros AA, Guzzo A, Greco G, van der Aalst W, Weijters A, van Dongen B, Sacca D (2008) Process mining based on clustering: A quest for precision. In: ter Hofstede A, Benatallah B, Paik H (eds) BPM 2007 international workshops (BPI, BPD, CBP, ProHealth, RefMod, Semantics4ws). Lecture notes in computer science, vol 4928. Springer, Berlin, pp 17–29
Medeiros AKA, Weijters AJMM, van der Aalst WMP (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2): 245–304
Milner R (1980) A calculus of communicating systems. Lecture notes in computer science. Springer, Berlin
Mukund M (1992) Petri nets and step transition systems. Int J Found Comp Sci 3(4): 443–478
Murata T (1989) Petri nets: properties, analysis and applications. Proc IEEE 77: 541–580
Pretorius AJ (2008) Visualization of state transition graphs. PhD thesis, Technical University of Eindhoven
Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1): 64–95
Schaefer M, Vogler W (2007) Component refinement and csc-solving for stg decomposition. Theor Comp Sci 388(1–3): 243–266
Silva M, Teruel E, Colom JM (1998) Linear algebraic and linear programming techniques for the analysis of place/transition net systems. In: Reisig W, Rozenberg G (eds) Lecture notes in computer science: lectures on Petri nets I: basic models, vol 1491. Springer, Berlin, pp 309–373
Talupur M, Tuttle MR (2008) Going with the flow: Parameterized verification using message flows. In: Cimatti A, Jones RB (eds) Formal methods in computer-aided design (FMCAD), IEEE Press, Los Alamitos, pp 1–8
van der Aalst W, Günther C (2007) Finding structure in unstructured processes: The case for process mining. In: Basten T, Juhás G, Shukla SK Application of concurrency to system design (ACSD). IEEE Computer Society, Bratislava, pp 3–12
van der Aalst W, Medeiros AKA, Weijters T. (2005) Genetic process mining. In: 26th international conference on applications and theory of Petri nets 2005 (ICATPN), Miami, USA, 20–25 June 2005. Proceedings. Lecture notes in computer science, vol 3536. Springer, Berlin, pp 48–69
van der Aalst W, Weijters T, Maruster L (2004) Workflow mining: Discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9): 1128–1142
van der Aalst W, Rubin V, Verbeek HMWE, Dongen B, Kindler E, Günther, C (2009) Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9(1):87–111 (2010)
Verbeek H, Pretorius A, van der Aalst WMP, van Wijk JJ (2007) On Petri-net synthesis and attribute-based visualization. In: Proc. workshop on petri nets and software engineering (PNSE’07), pp 127–141
Vogler W (1992) Modular construction and partial order semantics of Petri nets. In: LNCS, vol 625. Springer, Heidelberg
Wei YC, Cheng CK (1991) Ratio cut partitioning for hierarchical designs. IEEE Trans Comput-Aided Des Integr Circuits Syst 10(7): 911–921. doi:10.1109/43.87601
Wen L, van der Aalst WMP, Wang J, Sun J (2007) Mining process models with non-free-choice constructs. Data Min Knowl Discov 15(2): 145–180
Wen L, Wang J, van der Aalst W, Huang B, Sun J (2009) A novel approach for process mining based on event types. J Intell Inf Syst 32: 163–190
Werf JMEM, van Dongen BF, Hurkens CAJ, Serebrenik A (2008) Process discovery using integer linear programming. In: van Hee KM, Valk R (eds) Petri Nets. 29th international conference on application and theory of Petri nets and other models of concurrency. Lecture notes in computer science, vol 5062. Springer, Berlin, pp 368–387
Weijters A, van der Aalst W, de Medeiros AA (2006) Process mining with the heuristics miner-algorithm. Tech Rep WP 166, BETA Working Paper Series, Eindhoven University of Technology
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Carmona, J. Projection approaches to process mining using region-based techniques. Data Min Knowl Disc 24, 218–246 (2012). https://doi.org/10.1007/s10618-011-0226-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-011-0226-x