Skip to main content
Log in

An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Heuristic mining techniques are among the most popular methods in the process discovery area. This category of methods is composed of two main steps: (1) discovering a dependency graph and (2) determining the split/join patterns of the dependency graph. The current dependency graph discovery techniques of heuristic-based methods select the initial set of graph arcs according to dependency measures and then modify the set regarding some criteria. This can lead to selecting the non-optimal set of arcs. Also, the modifications can result in modeling rare behaviors and, consequently, low precision and non-simple process models. The motivation of this paper is to improve the heuristic mining methods by addressing the mentioned issues. The contribution of this paper is to propose a new integer linear programming model that determines the optimal set of graph arcs regarding dependency measures. Simultaneously, the proposed method can eliminate some other issues that the existing methods cannot handle completely; i.e., even in the presence of loops, it guarantees that all tasks are on a path from the initial to the final tasks. This approach also allows utilizing domain knowledge by introducing appropriate constraints, which can be a practical advantage in real-world problems. To assess the results, we modified two existing methods of evaluating process models to make them capable of measuring the quality of dependency graphs. According to assessments, the proposed method’s outputs are superior to those of the most prominent dependency graph discovery methods in terms of fitness, precision, and especially simplicity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://data.4tu.nl/.

References

  1. Garcia CdS et al (2019) Process mining techniques and applications—A systematic mapping study. Expert Syst Appl 133:260–295. https://doi.org/10.1016/j.eswa.2019.05.003

  2. Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D (2016) Process mining in healthcare: a literature review. J Biomed Inf 61:224–236. https://doi.org/10.1016/j.jbi.2016.04.007

    Article  Google Scholar 

  3. Weijters A, Aalst WMP, Medeiros A(2006) Process mining with the Heuristics Miner-algorithm. BETA working papers, vol 166. Technische Universiteit Eindhoven

  4. Burattin A, Sperduti A, Aalst WMP (2012) Heuristics Miners for streaming event data. Comput Res Reposit. https://doi.org/10.1109/CEC.2014.6900341

    Article  Google Scholar 

  5. Burattin A (2015) Heuristics Miner for time interval. In: Burattin A (ed) Process mining techniques in business environments: theoretical aspects, algorithms, techniques and open challenges in process mining. Springer, Cham, pp 85–95

    Chapter  Google Scholar 

  6. Weijters AJMM, Ribeiro JTS (2011) Flexible heuristics miner (FHM). In: 2011 IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp 310–317. https://doi.org/10.1109/CIDM.2011.5949453

  7. vanden Broucke SKLM, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decis Support Syst 100:109–118. https://doi.org/10.1016/j.dss.2017.04.005

  8. van der Aalst WMP (2011) Process mining: discovery, conformance and enhancement of business processes. Springer, Berlin, Heidelberg

    Book  MATH  Google Scholar 

  9. Prodel M (2017) Process discovery, analysis and simulation of clinical pathways using health-care data. PhD dissertation, École Nationale Supérieure des Mines de Saint-Étienne Spécialité : Génie Industriel

  10. Yahya BN, Song M, Bae H, Sul S-O, Wu J-Z (2016) Domain-driven actionable process model discovery. Comput Ind Eng 99:382–400. https://doi.org/10.1016/j.cie.2016.05.010

    Article  Google Scholar 

  11. Das SK, Mandal T, Edalatpanah SA (2017) A mathematical model for solving fully fuzzy linear programming problem with trapezoidal fuzzy numbers. Appl Intell 46(3):509–519. https://doi.org/10.1007/s10489-016-0779-x

    Article  Google Scholar 

  12. Das SK, Dash JK (2020) A new ranking function of triangular neutrosophic number and its application in integer programming. Int J Neutrosophic Sci 4(2):82–92

    Article  Google Scholar 

  13. Kumar A, Kaur J, Singh P (2011) A new method for solving fully fuzzy linear programming problems. Appl Math Model 35(2):817–823. https://doi.org/10.1016/j.apm.2010.07.037

    Article  MathSciNet  MATH  Google Scholar 

  14. Ezzati R, Khorram E, Enayati R (2015) A new algorithm to solve fully fuzzy linear programming problems using the MOLP problem. Appl Math Model 39(12):3183–3193. https://doi.org/10.1016/j.apm.2013.03.014

    Article  MathSciNet  MATH  Google Scholar 

  15. van Zelst SJ, van Dongen BF, van der Aalst WMP, Verbeek HMW (2018) Discovering workflow nets using integer linear programming. Computing 100(5):529–556. https://doi.org/10.1007/s00607-017-0582-5

    Article  MathSciNet  MATH  Google Scholar 

  16. van der Werf JMEM, van Dongen BF, Hurkens CAJ, Serebrenik A (2008) Process discovery using integer linear programming. In: van Hee KM, Valk R (eds) Applications and theory of petri nets, pp 368–387. Springer, Berlin. https://doi.org/10.1007/978-3-540-68746-7_24

  17. van Zelst SJ, van Dongen BF, Aalst WMP (2015) ILP-based process discovery using hybrid regions. In: van der Aalst WMP, Bergenthum R, Carmona J (eds) Algorithms & theories for the analysis of event dData (ATAED’15, Brussels, Belgium, June 22–23, 2015, pp 47–61. CEUR-WS.org, Aachen

    Google Scholar 

  18. Prodel M, Augusto V, Jouaneton B, Lamarsalle L, Xie X (2018) Optimal process mining for large and complex event logs. IEEE Trans Autom Sci Eng 15(3):1309–1325. https://doi.org/10.1109/TASE.2017.2784436

    Article  Google Scholar 

  19. Prodel M, Augusto V, Xie X, Jouaneton B, Lamarsalle L (2015) Discovery of patient pathways from a national hospital database using process mining and integer linear programming. In: Proceedings of IEEE international conference on automation science and engineering (CASE). IEEE, pp 1409–1414. https://doi.org/10.1109/CoASE.2015.7294295

  20. Tavakoli-Zaniani M, Gholamian MR (2022) Improving heuristic process discovery methods through determining the optimal split/join patterns of dependency graphs. IEEE Access 10:1116–1131. https://doi.org/10.1109/ACCESS.2021.3135298

    Article  Google Scholar 

  21. Leemans SJJ, Poppe E, Wynn MT (2019) Directly follows-based process mining: exploration & a case study. In: Proceeding of international conference on process mining (ICPM). IEEE, pp 25–32. https://doi.org/10.1109/ICPM.2019.00015

  22. Leemans SJJ, Fahland D (2020) Information-preserving abstractions of event data in process mining. Knowl Inf Syst 62(3):1143–1197. https://doi.org/10.1007/s10115-019-01376-9

    Article  Google Scholar 

  23. Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A (2019) Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst 59(2):251–284. https://doi.org/10.1007/s10115-018-1214-x

    Article  Google Scholar 

  24. Conforti R, Rosa ML, Hofstede AHMT (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314. https://doi.org/10.1109/TKDE.2016.2614680

    Article  Google Scholar 

  25. Alves de Medeiros A (2006) Genetic process mining. PhD dissertation, Beta Research School for Operations Management and Logistics, TU Eindhoven

  26. Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95. https://doi.org/10.1016/j.is.2007.07.001

    Article  Google Scholar 

  27. van der Aalst WMP, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192

    Article  Google Scholar 

  28. Adriansyah A, van Dongen B, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: 15th IEEE international enterprise distributed object computing conference (EDOC). IEEE, pp 55–64

  29. Adriansyah A (2014) Aligning observed and modeled behavior. PhD dissertation, Department of Mathematics and Computer Science,TU Eindhoven. https://doi.org/10.6100/IR770080

  30. Adriansyah A, Munoz-Gama J, Carmona J, van Dongen B, van der Aalst WMP (2015) Measuring precision of modeled behavior. Inf Syst E-Bus Manag 13(1):37–67

    Article  Google Scholar 

  31. Warshall S (1962) A Theorem on boolean matrices. J ACM 9(1):11–12. https://doi.org/10.1145/321105.321107

    Article  MathSciNet  MATH  Google Scholar 

  32. Augusto A, Conforti R, Dumas M, La Rosa M, Bruno G (2018) Automated discovery of structured process models from event logs: the discover-and-structure approach. Data Knowl Eng 117:373–392. https://doi.org/10.1016/j.datak.2018.04.007

    Article  Google Scholar 

  33. Nguyen H, Dumas M, ter Hofstede AHM, La Rosa M, Maggi FM (2019) Stage-based discovery of business process models from event logs. Inf Syst 84:214–237. https://doi.org/10.1016/j.is.2019.05.002

    Article  Google Scholar 

  34. Augusto A et al (2018) Automated discovery of process models from event logs: review and benchmark. EEE Trans Knowl Data Eng 31(4):686–705. https://doi.org/10.1109/TKDE.2018.2841877

    Article  MathSciNet  Google Scholar 

  35. Levy D (2014) Production analysis with process mining technology. Distributed by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea412399

  36. Joos B (2014) Receipt phase of an environmental permit application process (WABO), CoSeLoG project. Distributed by 4TU.ResearchData. Dataset. https://doi.org/10.4121/uuid:a07386a5-7be3-4367-9535-70bc9e77dbe6

  37. Shugurov I, Mitsyuk A (2014) Generation of a set of event logs with noise. In: Kamkin A, Petrenko A, Trekhov A (eds) 8th Spring/Summer Young Researchers’ Colloquium on Software Engineering (SYRCoSE2014). ISP Ros, pp 88–95. https://doi.org/10.15514/SYRCOSE-2014-8-13

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Reza Gholamian.

Ethics declarations

Conflicts of interest

The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Details on achieving FiM measure and \(F^L(x,y)\) value (required for calculating PrM measure

Appendix A: Details on achieving FiM measure and \(F^L(x,y)\) value (required for calculating PrM measure

Assuming an event log L and a dependency graph DG , the following definitions are used in the procedure of replaying L on DG:

  • \(\varepsilon \): The set of all events that are present in L.

  • \(A_L\): The set of tasks that are present in L.

  • \(A_{DG}\): The set of tasks that are present in DG.

  • \(T_L\): The set of traces that are present in L.

  • \(E:t \in T_L \mapsto E_t \subseteq \varepsilon \): The set of events that are present in t.

  • \(\delta :a \in A_L \mapsto A_{DG}\): The member of \(A_{DG}\) that corresponds to \(a \in A_L\).

  • \(Act :e_i \in \varepsilon \mapsto a \in A_L\): The member of \(A_L\) that corresponds to event \(e_i \in \varepsilon \).

  • \(inp : a \in A_{DG} \mapsto i_a \subseteq A_{DG}\): The set of tasks that according to DG are pre-requisite of \(a \in A_{DG}\).

  • \(out : a \in A_{DG} \mapsto o_a \subseteq A_{DG}\): The set of tasks that according to DG are post-requisite of \(a \in A_{DG}\).

  • \(pre: e_i \in E(t) \mapsto a_{pr} \subseteq A_L\): The sequence of tasks which are mapped to the sequence of all events present in t that occurred before \(e_i\).

  • \(suc: e_i \in E(t) \mapsto a_{po} \subseteq A_L\): The sequence of tasks which are mapped to the sequence of all events present in t that occurred after \(e_i\).

The pseudo-code of the proposed procedure for replaying L on DG and achieving FiM measure is explained in Algorithm 1, whereas the pseudo-code for calculating \(F^L(x,y)\) is presented in Algorithm 2.

figure e
figure f

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tavakoli-Zaniani, M., Gholamian, M.R., Golpayegani, S.A.H. et al. An integer linear programming model to improve the dependency graph discovery step of heuristics miner methods. Knowl Inf Syst 65, 2087–2121 (2023). https://doi.org/10.1007/s10115-022-01821-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01821-2

Keywords

Navigation