Abstract
Heuristic mining techniques are among the most popular methods in the process discovery area. This category of methods is composed of two main steps: (1) discovering a dependency graph and (2) determining the split/join patterns of the dependency graph. The current dependency graph discovery techniques of heuristic-based methods select the initial set of graph arcs according to dependency measures and then modify the set regarding some criteria. This can lead to selecting the non-optimal set of arcs. Also, the modifications can result in modeling rare behaviors and, consequently, low precision and non-simple process models. The motivation of this paper is to improve the heuristic mining methods by addressing the mentioned issues. The contribution of this paper is to propose a new integer linear programming model that determines the optimal set of graph arcs regarding dependency measures. Simultaneously, the proposed method can eliminate some other issues that the existing methods cannot handle completely; i.e., even in the presence of loops, it guarantees that all tasks are on a path from the initial to the final tasks. This approach also allows utilizing domain knowledge by introducing appropriate constraints, which can be a practical advantage in real-world problems. To assess the results, we modified two existing methods of evaluating process models to make them capable of measuring the quality of dependency graphs. According to assessments, the proposed method’s outputs are superior to those of the most prominent dependency graph discovery methods in terms of fitness, precision, and especially simplicity.


Similar content being viewed by others
Notes
References
Garcia CdS et al (2019) Process mining techniques and applications—A systematic mapping study. Expert Syst Appl 133:260–295. https://doi.org/10.1016/j.eswa.2019.05.003
Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D (2016) Process mining in healthcare: a literature review. J Biomed Inf 61:224–236. https://doi.org/10.1016/j.jbi.2016.04.007
Weijters A, Aalst WMP, Medeiros A(2006) Process mining with the Heuristics Miner-algorithm. BETA working papers, vol 166. Technische Universiteit Eindhoven
Burattin A, Sperduti A, Aalst WMP (2012) Heuristics Miners for streaming event data. Comput Res Reposit. https://doi.org/10.1109/CEC.2014.6900341
Burattin A (2015) Heuristics Miner for time interval. In: Burattin A (ed) Process mining techniques in business environments: theoretical aspects, algorithms, techniques and open challenges in process mining. Springer, Cham, pp 85–95
Weijters AJMM, Ribeiro JTS (2011) Flexible heuristics miner (FHM). In: 2011 IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp 310–317. https://doi.org/10.1109/CIDM.2011.5949453
vanden Broucke SKLM, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decis Support Syst 100:109–118. https://doi.org/10.1016/j.dss.2017.04.005
van der Aalst WMP (2011) Process mining: discovery, conformance and enhancement of business processes. Springer, Berlin, Heidelberg
Prodel M (2017) Process discovery, analysis and simulation of clinical pathways using health-care data. PhD dissertation, École Nationale Supérieure des Mines de Saint-Étienne Spécialité : Génie Industriel
Yahya BN, Song M, Bae H, Sul S-O, Wu J-Z (2016) Domain-driven actionable process model discovery. Comput Ind Eng 99:382–400. https://doi.org/10.1016/j.cie.2016.05.010
Das SK, Mandal T, Edalatpanah SA (2017) A mathematical model for solving fully fuzzy linear programming problem with trapezoidal fuzzy numbers. Appl Intell 46(3):509–519. https://doi.org/10.1007/s10489-016-0779-x
Das SK, Dash JK (2020) A new ranking function of triangular neutrosophic number and its application in integer programming. Int J Neutrosophic Sci 4(2):82–92
Kumar A, Kaur J, Singh P (2011) A new method for solving fully fuzzy linear programming problems. Appl Math Model 35(2):817–823. https://doi.org/10.1016/j.apm.2010.07.037
Ezzati R, Khorram E, Enayati R (2015) A new algorithm to solve fully fuzzy linear programming problems using the MOLP problem. Appl Math Model 39(12):3183–3193. https://doi.org/10.1016/j.apm.2013.03.014
van Zelst SJ, van Dongen BF, van der Aalst WMP, Verbeek HMW (2018) Discovering workflow nets using integer linear programming. Computing 100(5):529–556. https://doi.org/10.1007/s00607-017-0582-5
van der Werf JMEM, van Dongen BF, Hurkens CAJ, Serebrenik A (2008) Process discovery using integer linear programming. In: van Hee KM, Valk R (eds) Applications and theory of petri nets, pp 368–387. Springer, Berlin. https://doi.org/10.1007/978-3-540-68746-7_24
van Zelst SJ, van Dongen BF, Aalst WMP (2015) ILP-based process discovery using hybrid regions. In: van der Aalst WMP, Bergenthum R, Carmona J (eds) Algorithms & theories for the analysis of event dData (ATAED’15, Brussels, Belgium, June 22–23, 2015, pp 47–61. CEUR-WS.org, Aachen
Prodel M, Augusto V, Jouaneton B, Lamarsalle L, Xie X (2018) Optimal process mining for large and complex event logs. IEEE Trans Autom Sci Eng 15(3):1309–1325. https://doi.org/10.1109/TASE.2017.2784436
Prodel M, Augusto V, Xie X, Jouaneton B, Lamarsalle L (2015) Discovery of patient pathways from a national hospital database using process mining and integer linear programming. In: Proceedings of IEEE international conference on automation science and engineering (CASE). IEEE, pp 1409–1414. https://doi.org/10.1109/CoASE.2015.7294295
Tavakoli-Zaniani M, Gholamian MR (2022) Improving heuristic process discovery methods through determining the optimal split/join patterns of dependency graphs. IEEE Access 10:1116–1131. https://doi.org/10.1109/ACCESS.2021.3135298
Leemans SJJ, Poppe E, Wynn MT (2019) Directly follows-based process mining: exploration & a case study. In: Proceeding of international conference on process mining (ICPM). IEEE, pp 25–32. https://doi.org/10.1109/ICPM.2019.00015
Leemans SJJ, Fahland D (2020) Information-preserving abstractions of event data in process mining. Knowl Inf Syst 62(3):1143–1197. https://doi.org/10.1007/s10115-019-01376-9
Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A (2019) Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst 59(2):251–284. https://doi.org/10.1007/s10115-018-1214-x
Conforti R, Rosa ML, Hofstede AHMT (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314. https://doi.org/10.1109/TKDE.2016.2614680
Alves de Medeiros A (2006) Genetic process mining. PhD dissertation, Beta Research School for Operations Management and Logistics, TU Eindhoven
Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95. https://doi.org/10.1016/j.is.2007.07.001
van der Aalst WMP, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192
Adriansyah A, van Dongen B, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: 15th IEEE international enterprise distributed object computing conference (EDOC). IEEE, pp 55–64
Adriansyah A (2014) Aligning observed and modeled behavior. PhD dissertation, Department of Mathematics and Computer Science,TU Eindhoven. https://doi.org/10.6100/IR770080
Adriansyah A, Munoz-Gama J, Carmona J, van Dongen B, van der Aalst WMP (2015) Measuring precision of modeled behavior. Inf Syst E-Bus Manag 13(1):37–67
Warshall S (1962) A Theorem on boolean matrices. J ACM 9(1):11–12. https://doi.org/10.1145/321105.321107
Augusto A, Conforti R, Dumas M, La Rosa M, Bruno G (2018) Automated discovery of structured process models from event logs: the discover-and-structure approach. Data Knowl Eng 117:373–392. https://doi.org/10.1016/j.datak.2018.04.007
Nguyen H, Dumas M, ter Hofstede AHM, La Rosa M, Maggi FM (2019) Stage-based discovery of business process models from event logs. Inf Syst 84:214–237. https://doi.org/10.1016/j.is.2019.05.002
Augusto A et al (2018) Automated discovery of process models from event logs: review and benchmark. EEE Trans Knowl Data Eng 31(4):686–705. https://doi.org/10.1109/TKDE.2018.2841877
Levy D (2014) Production analysis with process mining technology. Distributed by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399
Joos B (2014) Receipt phase of an environmental permit application process (WABO), CoSeLoG project. Distributed by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:a07386a5-7be3-4367-9535-70bc9e77dbe6
Shugurov I, Mitsyuk A (2014) Generation of a set of event logs with noise. In: Kamkin A, Petrenko A, Trekhov A (eds) 8th Spring/Summer Young Researchers’ Colloquium on Software Engineering (SYRCoSE2014). ISP Ros, pp 88–95. https://doi.org/10.15514/SYRCOSE-2014-8-13
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Details on achieving FiM measure and \(F^L(x,y)\) value (required for calculating PrM measure
Appendix A: Details on achieving FiM measure and \(F^L(x,y)\) value (required for calculating PrM measure
Assuming an event log L and a dependency graph DG , the following definitions are used in the procedure of replaying L on DG:
-
\(\varepsilon \): The set of all events that are present in L.
-
\(A_L\): The set of tasks that are present in L.
-
\(A_{DG}\): The set of tasks that are present in DG.
-
\(T_L\): The set of traces that are present in L.
-
\(E:t \in T_L \mapsto E_t \subseteq \varepsilon \): The set of events that are present in t.
-
\(\delta :a \in A_L \mapsto A_{DG}\): The member of \(A_{DG}\) that corresponds to \(a \in A_L\).
-
\(Act :e_i \in \varepsilon \mapsto a \in A_L\): The member of \(A_L\) that corresponds to event \(e_i \in \varepsilon \).
-
\(inp : a \in A_{DG} \mapsto i_a \subseteq A_{DG}\): The set of tasks that according to DG are pre-requisite of \(a \in A_{DG}\).
-
\(out : a \in A_{DG} \mapsto o_a \subseteq A_{DG}\): The set of tasks that according to DG are post-requisite of \(a \in A_{DG}\).
-
\(pre: e_i \in E(t) \mapsto a_{pr} \subseteq A_L\): The sequence of tasks which are mapped to the sequence of all events present in t that occurred before \(e_i\).
-
\(suc: e_i \in E(t) \mapsto a_{po} \subseteq A_L\): The sequence of tasks which are mapped to the sequence of all events present in t that occurred after \(e_i\).
The pseudo-code of the proposed procedure for replaying L on DG and achieving FiM measure is explained in Algorithm 1, whereas the pseudo-code for calculating \(F^L(x,y)\) is presented in Algorithm 2.


Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tavakoli-Zaniani, M., Gholamian, M.R., Golpayegani, S.A.H. et al. An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods. Knowl Inf Syst 65, 2087–2121 (2023). https://doi.org/10.1007/s10115-022-01821-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01821-2