An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods

Tavakoli-Zaniani, Maryam; Gholamian, Mohammad Reza; Golpayegani, S. Alireza Hashemi; Ghazanfari, Mehdi

doi:10.1007/s10115-022-01821-2

An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods

Regular Paper
Published: 13 January 2023

Volume 65, pages 2087–2121, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Maryam Tavakoli-Zaniani¹,
Mohammad Reza Gholamian¹,
S. Alireza Hashemi Golpayegani² &
…
Mehdi Ghazanfari¹

233 Accesses
1 Altmetric
Explore all metrics

Abstract

Heuristic mining techniques are among the most popular methods in the process discovery area. This category of methods is composed of two main steps: (1) discovering a dependency graph and (2) determining the split/join patterns of the dependency graph. The current dependency graph discovery techniques of heuristic-based methods select the initial set of graph arcs according to dependency measures and then modify the set regarding some criteria. This can lead to selecting the non-optimal set of arcs. Also, the modifications can result in modeling rare behaviors and, consequently, low precision and non-simple process models. The motivation of this paper is to improve the heuristic mining methods by addressing the mentioned issues. The contribution of this paper is to propose a new integer linear programming model that determines the optimal set of graph arcs regarding dependency measures. Simultaneously, the proposed method can eliminate some other issues that the existing methods cannot handle completely; i.e., even in the presence of loops, it guarantees that all tasks are on a path from the initial to the final tasks. This approach also allows utilizing domain knowledge by introducing appropriate constraints, which can be a practical advantage in real-world problems. To assess the results, we modified two existing methods of evaluating process models to make them capable of measuring the quality of dependency graphs. According to assessments, the proposed method’s outputs are superior to those of the most prominent dependency graph discovery methods in terms of fitness, precision, and especially simplicity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving heuristics miners for healthcare applications by discovering optimal dependency graphs

Article 27 June 2022

Optimization framework for DFG-based automated process discovery approaches

Article Open access 27 February 2021

Mining relaxed functional dependencies from data

Article 23 December 2019

Notes

https://data.4tu.nl/.

References

Garcia CdS et al (2019) Process mining techniques and applications—A systematic mapping study. Expert Syst Appl 133:260–295. https://doi.org/10.1016/j.eswa.2019.05.003
Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D (2016) Process mining in healthcare: a literature review. J Biomed Inf 61:224–236. https://doi.org/10.1016/j.jbi.2016.04.007
Article Google Scholar
Weijters A, Aalst WMP, Medeiros A(2006) Process mining with the Heuristics Miner-algorithm. BETA working papers, vol 166. Technische Universiteit Eindhoven
Burattin A, Sperduti A, Aalst WMP (2012) Heuristics Miners for streaming event data. Comput Res Reposit. https://doi.org/10.1109/CEC.2014.6900341
Article Google Scholar
Burattin A (2015) Heuristics Miner for time interval. In: Burattin A (ed) Process mining techniques in business environments: theoretical aspects, algorithms, techniques and open challenges in process mining. Springer, Cham, pp 85–95
Chapter Google Scholar
Weijters AJMM, Ribeiro JTS (2011) Flexible heuristics miner (FHM). In: 2011 IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp 310–317. https://doi.org/10.1109/CIDM.2011.5949453
vanden Broucke SKLM, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decis Support Syst 100:109–118. https://doi.org/10.1016/j.dss.2017.04.005
van der Aalst WMP (2011) Process mining: discovery, conformance and enhancement of business processes. Springer, Berlin, Heidelberg
Book MATH Google Scholar
Prodel M (2017) Process discovery, analysis and simulation of clinical pathways using health-care data. PhD dissertation, École Nationale Supérieure des Mines de Saint-Étienne Spécialité : Génie Industriel
Yahya BN, Song M, Bae H, Sul S-O, Wu J-Z (2016) Domain-driven actionable process model discovery. Comput Ind Eng 99:382–400. https://doi.org/10.1016/j.cie.2016.05.010
Article Google Scholar
Das SK, Mandal T, Edalatpanah SA (2017) A mathematical model for solving fully fuzzy linear programming problem with trapezoidal fuzzy numbers. Appl Intell 46(3):509–519. https://doi.org/10.1007/s10489-016-0779-x
Article Google Scholar
Das SK, Dash JK (2020) A new ranking function of triangular neutrosophic number and its application in integer programming. Int J Neutrosophic Sci 4(2):82–92
Article Google Scholar
Kumar A, Kaur J, Singh P (2011) A new method for solving fully fuzzy linear programming problems. Appl Math Model 35(2):817–823. https://doi.org/10.1016/j.apm.2010.07.037
Article MathSciNet MATH Google Scholar
Ezzati R, Khorram E, Enayati R (2015) A new algorithm to solve fully fuzzy linear programming problems using the MOLP problem. Appl Math Model 39(12):3183–3193. https://doi.org/10.1016/j.apm.2013.03.014
Article MathSciNet MATH Google Scholar
van Zelst SJ, van Dongen BF, van der Aalst WMP, Verbeek HMW (2018) Discovering workflow nets using integer linear programming. Computing 100(5):529–556. https://doi.org/10.1007/s00607-017-0582-5
Article MathSciNet MATH Google Scholar
van der Werf JMEM, van Dongen BF, Hurkens CAJ, Serebrenik A (2008) Process discovery using integer linear programming. In: van Hee KM, Valk R (eds) Applications and theory of petri nets, pp 368–387. Springer, Berlin. https://doi.org/10.1007/978-3-540-68746-7_24
van Zelst SJ, van Dongen BF, Aalst WMP (2015) ILP-based process discovery using hybrid regions. In: van der Aalst WMP, Bergenthum R, Carmona J (eds) Algorithms & theories for the analysis of event dData (ATAED’15, Brussels, Belgium, June 22–23, 2015, pp 47–61. CEUR-WS.org, Aachen
Google Scholar
Prodel M, Augusto V, Jouaneton B, Lamarsalle L, Xie X (2018) Optimal process mining for large and complex event logs. IEEE Trans Autom Sci Eng 15(3):1309–1325. https://doi.org/10.1109/TASE.2017.2784436
Article Google Scholar
Prodel M, Augusto V, Xie X, Jouaneton B, Lamarsalle L (2015) Discovery of patient pathways from a national hospital database using process mining and integer linear programming. In: Proceedings of IEEE international conference on automation science and engineering (CASE). IEEE, pp 1409–1414. https://doi.org/10.1109/CoASE.2015.7294295
Tavakoli-Zaniani M, Gholamian MR (2022) Improving heuristic process discovery methods through determining the optimal split/join patterns of dependency graphs. IEEE Access 10:1116–1131. https://doi.org/10.1109/ACCESS.2021.3135298
Article Google Scholar
Leemans SJJ, Poppe E, Wynn MT (2019) Directly follows-based process mining: exploration & a case study. In: Proceeding of international conference on process mining (ICPM). IEEE, pp 25–32. https://doi.org/10.1109/ICPM.2019.00015
Leemans SJJ, Fahland D (2020) Information-preserving abstractions of event data in process mining. Knowl Inf Syst 62(3):1143–1197. https://doi.org/10.1007/s10115-019-01376-9
Article Google Scholar
Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A (2019) Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst 59(2):251–284. https://doi.org/10.1007/s10115-018-1214-x
Article Google Scholar
Conforti R, Rosa ML, Hofstede AHMT (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314. https://doi.org/10.1109/TKDE.2016.2614680
Article Google Scholar
Alves de Medeiros A (2006) Genetic process mining. PhD dissertation, Beta Research School for Operations Management and Logistics, TU Eindhoven
Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95. https://doi.org/10.1016/j.is.2007.07.001
Article Google Scholar
van der Aalst WMP, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192
Article Google Scholar
Adriansyah A, van Dongen B, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: 15th IEEE international enterprise distributed object computing conference (EDOC). IEEE, pp 55–64
Adriansyah A (2014) Aligning observed and modeled behavior. PhD dissertation, Department of Mathematics and Computer Science,TU Eindhoven. https://doi.org/10.6100/IR770080
Adriansyah A, Munoz-Gama J, Carmona J, van Dongen B, van der Aalst WMP (2015) Measuring precision of modeled behavior. Inf Syst E-Bus Manag 13(1):37–67
Article Google Scholar
Warshall S (1962) A Theorem on boolean matrices. J ACM 9(1):11–12. https://doi.org/10.1145/321105.321107
Article MathSciNet MATH Google Scholar
Augusto A, Conforti R, Dumas M, La Rosa M, Bruno G (2018) Automated discovery of structured process models from event logs: the discover-and-structure approach. Data Knowl Eng 117:373–392. https://doi.org/10.1016/j.datak.2018.04.007
Article Google Scholar
Nguyen H, Dumas M, ter Hofstede AHM, La Rosa M, Maggi FM (2019) Stage-based discovery of business process models from event logs. Inf Syst 84:214–237. https://doi.org/10.1016/j.is.2019.05.002
Article Google Scholar
Augusto A et al (2018) Automated discovery of process models from event logs: review and benchmark. EEE Trans Knowl Data Eng 31(4):686–705. https://doi.org/10.1109/TKDE.2018.2841877
Article MathSciNet Google Scholar
Levy D (2014) Production analysis with process mining technology. Distributed by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399
Joos B (2014) Receipt phase of an environmental permit application process (WABO), CoSeLoG project. Distributed by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:a07386a5-7be3-4367-9535-70bc9e77dbe6
Shugurov I, Mitsyuk A (2014) Generation of a set of event logs with noise. In: Kamkin A, Petrenko A, Trekhov A (eds) 8th Spring/Summer Young Researchers’ Colloquium on Software Engineering (SYRCoSE2014). ISP Ros, pp 88–95. https://doi.org/10.15514/SYRCOSE-2014-8-13

Download references

Author information

Authors and Affiliations

School of Industrial Engineering, Iran University of Science and Technology, Tehran, 16844, Iran
Maryam Tavakoli-Zaniani, Mohammad Reza Gholamian & Mehdi Ghazanfari
Computer Engineering and IT Department, Amirkabir University of Technology, Tehran, 15875, Iran
S. Alireza Hashemi Golpayegani

Authors

Maryam Tavakoli-Zaniani
View author publications
You can also search for this author inPubMed Google Scholar
Mohammad Reza Gholamian
View author publications
You can also search for this author inPubMed Google Scholar
S. Alireza Hashemi Golpayegani
View author publications
You can also search for this author inPubMed Google Scholar
Mehdi Ghazanfari
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohammad Reza Gholamian.

Ethics declarations

Conflicts of interest

The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Details on achieving FiM measure and $F^L(x,y)$ value (required for calculating PrM measure

Assuming an event log L and a dependency graph DG , the following definitions are used in the procedure of replaying L on DG:

$\varepsilon $: The set of all events that are present in L.
$A_L$: The set of tasks that are present in L.
$A_{DG}$: The set of tasks that are present in DG.
$T_L$: The set of traces that are present in L.
$E:t \in T_L \mapsto E_t \subseteq \varepsilon $: The set of events that are present in t.
$\delta :a \in A_L \mapsto A_{DG}$: The member of $A_{DG}$ that corresponds to $a \in A_L$.
$Act :e_i \in \varepsilon \mapsto a \in A_L$: The member of $A_L$ that corresponds to event $e_i \in \varepsilon $.
$inp : a \in A_{DG} \mapsto i_a \subseteq A_{DG}$: The set of tasks that according to DG are pre-requisite of $a \in A_{DG}$.
$out : a \in A_{DG} \mapsto o_a \subseteq A_{DG}$: The set of tasks that according to DG are post-requisite of $a \in A_{DG}$.
$pre: e_i \in E(t) \mapsto a_{pr} \subseteq A_L$: The sequence of tasks which are mapped to the sequence of all events present in t that occurred before $e_i$.
$suc: e_i \in E(t) \mapsto a_{po} \subseteq A_L$: The sequence of tasks which are mapped to the sequence of all events present in t that occurred after $e_i$.

The pseudo-code of the proposed procedure for replaying L on DG and achieving FiM measure is explained in Algorithm 1, whereas the pseudo-code for calculating $F^L(x,y)$ is presented in Algorithm 2.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tavakoli-Zaniani, M., Gholamian, M.R., Golpayegani, S.A.H. et al. An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods. Knowl Inf Syst 65, 2087–2121 (2023). https://doi.org/10.1007/s10115-022-01821-2

Download citation

Received: 03 July 2022
Revised: 09 October 2022
Accepted: 17 December 2022
Published: 13 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10115-022-01821-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving heuristics miners for healthcare applications by discovering optimal dependency graphs

Optimization framework for DFG-based automated process discovery approaches

Mining relaxed functional dependencies from data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix A: Details on achieving FiM measure and \(F^L(x,y)\) value (required for calculating PrM measure

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving heuristics miners for healthcare applications by discovering optimal dependency graphs

Optimization framework for DFG-based automated process discovery approaches

Mining relaxed functional dependencies from data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix A: Details on achieving FiM measure and \(F^L(x,y)\) value (required for calculating PrM measure

Appendix A: Details on achieving FiM measure and \(F^L(x,y)\) value (required for calculating PrM measure

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now