Skip to main content
Log in

Projection approaches to process mining using region-based techniques

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Traces are everywhere from information systems that store their continuous executions, to any type of health care applications that record each patient’s history. The transformation of a set of traces into a mathematical model that can be used for a formal reasoning is therefore of great value. The discovery of process models out of traces is an interesting problem that has received significant attention in the last years. This is a central problem in Process Mining, a novel area which tries to close the cycle between system design and validation, by resorting on methods for the automated discovery, analysis and extension of process models. In this work, algorithms for the derivation of a Petri net from a set of traces are presented. The methods are grounded on the theory of regions, which maps a model in the state-based domain (e.g., an automata) into a model in the event-based domain (e.g., a Petri net). When dealing with large examples, a direct application of the theory of regions will suffer from two problems: one is the state-explosion problem, i.e., the resources required by algorithms that work at the state-level are sometimes prohibitive. This paper introduces decomposition and projection techniques to alleviate the complexity of the region-based algorithms for Petri net discovery, thus extending its applicability to handle large inputs. A second problem is known as the overfitting problem for region-based approaches, which informally means that, in order to represent with high accuracy the trace set, the models obtained are often spaghetti-like. By focusing on special type of processes called conservative and for which an elegant theory and efficient algorithms can be devised, the techniques presented in this paper alleviate the overfitting problem and moreover incorporate structure into the models generated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arnold A (1994) Finite transition systems. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Badouel E, Bernardinello L, Darondeau P (1995) Polynomial algorithms for the synthesis of bounded nets. In: Theory and practice of software (TAPSOFT). Lecture notes in computer science, vol 915. pp 364–383

  • Bergenthum R, Desel J, Lorenz R, SMauser (2007) Process mining based on regions of languages. In: Proceedings of 5th international conference on business process management (BPM), pp 375–383

  • Bergenthum R, Desel J, Lorenz R, Mauser S (2008) Synthesis of Petri nets from finite partial languages. Fundam Inform 88(4): 437–468

    MATH  MathSciNet  Google Scholar 

  • Carmona J, Cortadella J, Kishinevsky M (2008a) Divide-and-conquer strategies for process mining. Tech. Rep. LSI-08-35-R, Software Department, Universitat Politècnica de Catalunya

  • Carmona J, Cortadella J, Kishinevsky M (2008b) A region-based algorithm for discovering Petri nets from event logs. In: Dumas M, Reichert M, Shan MC (eds) Proceedings of 6th international conference on business process management (BPM). Lecture notes in computer science, vol 5240. Springer, Berlin, pp 358–373

  • Carmona J, Cortadella J, Kishinevsky M, Kondratyev A, Lavagno L, Yakovlev A (2008c) A symbolic algorithm for the synthesis of bounded Petri nets. In: van Hee KM, Valk R (eds) 29th international conference on application and theory of Petri nets and other models of concurrency, vol 5062.

  • Carmona J, Cortadella J, Kishinevsky M (2009) Divide-and-conquer strategies for process mining. In: Dayal U, Eder J, Koehler J, Reijers HA (eds) Proc. 7th international conference on business process management (BPM). Lecture notes in computer science, vol 5701. Springer, Heidelberg, pp 327–343

  • Carmona J, Cortadella J, Kishinevsky M (2009) New region-based algorithms for deriving bounded Petri nets. IEEE Trans Comp 59(3): 371–384. doi:10.1109/TC.2009.131

    Article  MathSciNet  Google Scholar 

  • Cook JE, Wolf AL (1998) Discovering models of software processes from event-based data. ACM Trans Softw Eng Methodol 7(3): 215–249

    Article  Google Scholar 

  • Cortadella J, Kishinevsky M, Lavagno L, Yakovlev A (1998) Deriving Petri nets from finite transition systems. IEEE Trans Comput 47(8): 859–882

    Article  MathSciNet  Google Scholar 

  • Cvetković D, Rowlinson P, Simić S (1997) Eigenspaces of graphs. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Desel J, Reisig W (1996) The synthesis problem of Petri nets. Acta Informatica 33(4): 297–315

    Article  MathSciNet  Google Scholar 

  • Dill D, Drexler A, Hu A, Yang C (1992) Protocol verification as a hardware design aid. In: Computer design: VLSI in computers and processors, 1992. ICCD ’92. Proceedings. IEEE 1992 international conference, pp 522–525

  • Dongen B, Busi N, Pinna G, van Der Aalst WMP (2007) An iterative algorithm for applying the theory of regions in process mining. In: Workshop on formal aspects of business processes and web services

  • Ehrenfeucht A, Rozenberg G (1990) Partial (Set) 2-structures. Part I, II. Acta Informatica 27: 315–368

    Article  MATH  MathSciNet  Google Scholar 

  • Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In: Proceedings of the 19th conference on design automation (DAC ’82). IEEE Press, Piscataway, pp 175–181

  • Ghionna L, Greco G, Guzzo A, Pontieri L (2008) Outlier detection techniques for process mining applications. In: Foundations of intelligent systems, 17th international symposium, ISMIS 2008, Toronto, Canada, 20–23 May 2008. Proceedings. Lecture notes in computer science, vol 4994. Springer, Berlin, pp 150–159

  • Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8): 1010–1027

    Article  Google Scholar 

  • Günther C (2009) Process Mining in Flexible Environments. Dissertation, Technical University of Eindhoven, Eindhoven

  • Hack M (1972) Analysis of production schemata by Petri nets. M.S. thesis, MIT

  • Harel D (1987) Statecharts: A visual formulation for complex systems. Sci Comput Program 8(3): 231–274

    Article  MATH  MathSciNet  Google Scholar 

  • Hoare CAR (1978) Communicating sequential processes. In: Communications of the ACM, pp 666–677

  • Jolliffe IT (2002) Principal component analysis. Springer, New York

    MATH  Google Scholar 

  • Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(1): 291–307

    MATH  Google Scholar 

  • Kindler E, Rubin V, Schäfer W (2006) Process mining and Petri net synthesis. In: Eder J, Dustdar S (eds) Business process management workshops. Lecture notes in computer science, vol 4103. Springer, Heidelberg, pp 105–116

  • Maruster L, Weijters AJMM, van der Aalst WMP, van den Bosch A (2006) A rule-based approach for process discovery: Dealing with noise and imbalance in process logs. Data Min Knowl Discov 13(1): 67–87

    Article  MathSciNet  Google Scholar 

  • McMillan KL (2001) Parameterized verification of the flash cache coherence protocol by compositional model checking. In: Margaria T, Melham TF (eds) Correct hardware design and verification methods (CHARME). Lecture notes in computer science, vol 2144. Springer, Heidelberg, pp 179–195

  • Medeiros AKA, van der Aalst WMP, Weijters AJMM (2003) Workflow mining: Current status and future directions. In: Meersman R, Tari Z, Schmidt DC (eds) On the move to meaningful internet systems, CoopIS/DOA/ODBASE. Lecture notes in computer science, vol 2888. Springer, Heidelberg, pp 389–406

  • Medeiros AA, Guzzo A, Greco G, van der Aalst W, Weijters A, van Dongen B, Sacca D (2008) Process mining based on clustering: A quest for precision. In: ter Hofstede A, Benatallah B, Paik H (eds) BPM 2007 international workshops (BPI, BPD, CBP, ProHealth, RefMod, Semantics4ws). Lecture notes in computer science, vol 4928. Springer, Berlin, pp 17–29

  • Medeiros AKA, Weijters AJMM, van der Aalst WMP (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2): 245–304

    Article  MathSciNet  Google Scholar 

  • Milner R (1980) A calculus of communicating systems. Lecture notes in computer science. Springer, Berlin

    Google Scholar 

  • Mukund M (1992) Petri nets and step transition systems. Int J Found Comp Sci 3(4): 443–478

    Article  MATH  MathSciNet  Google Scholar 

  • Murata T (1989) Petri nets: properties, analysis and applications. Proc IEEE 77: 541–580

    Article  Google Scholar 

  • Pretorius AJ (2008) Visualization of state transition graphs. PhD thesis, Technical University of Eindhoven

  • Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1): 64–95

    Article  Google Scholar 

  • Schaefer M, Vogler W (2007) Component refinement and csc-solving for stg decomposition. Theor Comp Sci 388(1–3): 243–266

    MATH  MathSciNet  Google Scholar 

  • Silva M, Teruel E, Colom JM (1998) Linear algebraic and linear programming techniques for the analysis of place/transition net systems. In: Reisig W, Rozenberg G (eds) Lecture notes in computer science: lectures on Petri nets I: basic models, vol 1491. Springer, Berlin, pp 309–373

    Google Scholar 

  • Talupur M, Tuttle MR (2008) Going with the flow: Parameterized verification using message flows. In: Cimatti A, Jones RB (eds) Formal methods in computer-aided design (FMCAD), IEEE Press, Los Alamitos, pp 1–8

  • van der Aalst W, Günther C (2007) Finding structure in unstructured processes: The case for process mining. In: Basten T, Juhás G, Shukla SK Application of concurrency to system design (ACSD). IEEE Computer Society, Bratislava, pp 3–12

  • van der Aalst W, Medeiros AKA, Weijters T. (2005) Genetic process mining. In: 26th international conference on applications and theory of Petri nets 2005 (ICATPN), Miami, USA, 20–25 June 2005. Proceedings. Lecture notes in computer science, vol 3536. Springer, Berlin, pp 48–69

  • van der Aalst W, Weijters T, Maruster L (2004) Workflow mining: Discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9): 1128–1142

    Article  Google Scholar 

  • van der Aalst W, Rubin V, Verbeek HMWE, Dongen B, Kindler E, Günther, C (2009) Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9(1):87–111 (2010)

    Google Scholar 

  • Verbeek H, Pretorius A, van der Aalst WMP, van Wijk JJ (2007) On Petri-net synthesis and attribute-based visualization. In: Proc. workshop on petri nets and software engineering (PNSE’07), pp 127–141

  • Vogler W (1992) Modular construction and partial order semantics of Petri nets. In: LNCS, vol 625. Springer, Heidelberg

  • Wei YC, Cheng CK (1991) Ratio cut partitioning for hierarchical designs. IEEE Trans Comput-Aided Des Integr Circuits Syst 10(7): 911–921. doi:10.1109/43.87601

    Article  Google Scholar 

  • Wen L, van der Aalst WMP, Wang J, Sun J (2007) Mining process models with non-free-choice constructs. Data Min Knowl Discov 15(2): 145–180

    Article  MathSciNet  Google Scholar 

  • Wen L, Wang J, van der Aalst W, Huang B, Sun J (2009) A novel approach for process mining based on event types. J Intell Inf Syst 32: 163–190

    Article  Google Scholar 

  • Werf JMEM, van Dongen BF, Hurkens CAJ, Serebrenik A (2008) Process discovery using integer linear programming. In: van Hee KM, Valk R (eds) Petri Nets. 29th international conference on application and theory of Petri nets and other models of concurrency. Lecture notes in computer science, vol 5062. Springer, Berlin, pp 368–387

  • Weijters A, van der Aalst W, de Medeiros AA (2006) Process mining with the heuristics miner-algorithm. Tech Rep WP 166, BETA Working Paper Series, Eindhoven University of Technology

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep Carmona.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carmona, J. Projection approaches to process mining using region-based techniques. Data Min Knowl Disc 24, 218–246 (2012). https://doi.org/10.1007/s10618-011-0226-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0226-x

Keywords

Navigation