Skip to main content

Data is Moody: Discovering Data Modification Rules from Process Event Logs

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14942))

  • 788 Accesses

Abstract

Although event logs are a powerful source to gain insight into the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules describing how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior.

Given an event log, we aim to find accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://eda.rg.cispa.io/prj/moody/.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14, Los Alamitos, CA, USA, 1995. IEEE Computer Society (1995)

    Google Scholar 

  2. Augusto, A., Conforti, R., Dumas, M., La Rosa, M.: Split Miner: Discovering accurate and simple business process models from event logs. In: ICDM, pp. 1–10 (2017)

    Google Scholar 

  3. Bose, R.J.C., Van Der Aalst, W.M., Žliobaitė, I., Pechenizkiy, M.: Dealing with concept drifts in process mining. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 154–171 (2013)

    Article  Google Scholar 

  4. Budhathoki, K., Boley, M., Vreeken, J.: Discovering reliable causal rules. In: SDM, pp. 1–9 (2021)

    Google Scholar 

  5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press (2009)

    Google Scholar 

  6. Cüppers, J., Krieger, P., Vreeken, J.: Discovering sequential patterns with predictable inter-event delays. In: AAAI, vol. 38, pp. 8346–8353 (2024)

    Google Scholar 

  7. Dawid, A.P.: Present position and potential developments: some personal views - statistical theory: the prequential approach. J. R. Statist. Soc. A 147(2), 278–292 (1984)

    Article  Google Scholar 

  8. de Leoni, M., Mannhardt, F.: Road traffic fine management process (2015). https://doi.org/10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5

  9. Fischer, J., Vreeken, J.: Differentiable pattern set mining. In: KDD, pp. 383–392 (2021)

    Google Scholar 

  10. Foster, M., Derrick, J., Walkinshaw, N.: Reverse-engineering EFSMs with data dependencies. In ICTSS, pp. 37–54 (2021)

    Google Scholar 

  11. Galbrun, E.: The minimum description length principle for pattern mining: a survey. Data Min. Knowl. Disc. 36(5), 1679–1727 (2022)

    Article  MathSciNet  Google Scholar 

  12. Grünwald, P.: The Minimum Description Length Principle. MIT Press (2007)

    Google Scholar 

  13. Hlupic, V., Robinson, S.: Business process modelling and analysis using discrete-event simulation. In: WSC, pp. 1363–1369 (1998)

    Google Scholar 

  14. Krismayer, T.: Automatic Mining of Constraints for Event-based Systems Monitoring. Ph.D. thesis, Johannes Kepler University Linz (2020)

    Google Scholar 

  15. Mannhardt, F.: Sepsis cases - event log (2016). https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460

  16. Mannhardt, F., De Leoni, M., Reijers, H.A., Van Der Aalst, W.M.: Balanced multi-perspective checking of process conformance. Computing 98, 407–437 (2016)

    Article  MathSciNet  Google Scholar 

  17. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: KDD, pp. 210–215 (1995)

    Google Scholar 

  18. Marx, A., Vreeken, J.: Telling cause from effect by local and global regression. Knowl. Inf. Syst. 60(3), 1277–1305 (2019)

    Article  Google Scholar 

  19. Marx, A., Vreeken, J.: Formally justifying MDL-based inference of cause and effect. In: ITCI (2022)

    Google Scholar 

  20. Mozafari Mehr, A.S., de Carvalho, R.M., van Dongen, B.: Detecting privacy, data and control-flow deviations in business processes. In: CAiSE, pp. 82–91 (2021)

    Google Scholar 

  21. Nolle, T., Seeliger, A., Mühlhäuser, M.: Binet: multivariate business process anomaly detection using deep learning. In: BPM, pp. 271–287 (2018)

    Google Scholar 

  22. Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE TKDE 16(11), 1424–1440 (2004)

    Google Scholar 

  23. Petitjean, F., Li, T., Tatti, N., Webb, G.: Skopus: mining top-k sequential patterns under leverage. Data Min. Knowl. Disc. 30 (2016)

    Google Scholar 

  24. Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery. Data Min. Knowl. Disc. 36(5), 1885–1970 (2022)

    Article  Google Scholar 

  25. Proença, H.M., van Leeuwen, M.: Interpretable multiclass classification by MDL-based rule lists. JIS 512, 1372–1393 (2020)

    Google Scholar 

  26. Rissanen, J.: Modeling by shortest data description. Automatica 14(1), 465–471 (1978)

    Article  Google Scholar 

  27. Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)

    Article  MathSciNet  Google Scholar 

  28. Rissanen, J.: Universal coding, information, prediction, and estimation. IEEE TIT 30, 629–636 (1984)

    MathSciNet  Google Scholar 

  29. Rodionov, V.: On the number of labeled acyclic digraphs. Discret. Math. 105(1), 319–321 (1992)

    Article  MathSciNet  Google Scholar 

  30. Sato, D.M.V., De Freitas, S.C., Barddal, J.P., Scalabrin, E.E.: A survey on concept drift in process mining. ACM CSUR 54(9), 1–38 (2021)

    Google Scholar 

  31. Schönig, S., Di Ciccio, C., Maggi, F.M., Mendling, J.: Discovery of multi-perspective declarative process models. In: ICSOC, pp. 87–103 (2016)

    Google Scholar 

  32. Sommers, D., Menkovski, V., Fahland, D.: Process discovery using graph neural networks. In: ICPM, pp. 40–47 (2021)

    Google Scholar 

  33. Tatti, N.: Significance of episodes based on minimal windows. In: ICDM, pp. 513–522 (2009)

    Google Scholar 

  34. Tatti, N., Cule, B.: Mining closed episodes with simultaneous events. In: KDD, pp. 1172–1180 (2011)

    Google Scholar 

  35. Tatti, N., Cule, B.: Mining closed strict episodes. Data Min. Knowl. Disc. (2011)

    Google Scholar 

  36. Tatti, N., Vreeken, J.: The long and the short of it: summarizing event sequences with serial episodes. In: KDD, pp. 462–470. ACM (2012)

    Google Scholar 

  37. Taymouri, F., La Rosa, M., Erfani, S.: A deep adversarial model for suffix and remaining time prediction of event sequences. In: SDM, pp. 522–530 (2021)

    Google Scholar 

  38. van der Aalst, W.: Process Mining: Data Science in Action, 2nd edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4

  39. Walkinshaw, N., Hall, M.: Inferring computational state machine models from program executions. In: ICSME (2016)

    Google Scholar 

  40. Wiegand, B., Klakow, D., Vreeken, J.: Discovering interpretable data-to-sequence generators. In: AAAI, pp. 4237–4244 (2022)

    Google Scholar 

  41. Yang, L., van Leeuwen, M.: Truly unordered probabilistic rule sets for multi-class classification. In: ECML PKDD, pp. 87–103 (2022)

    Google Scholar 

  42. Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Bjarne Schuster .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 232 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schuster, M.B., Wiegand, B., Vreeken, J. (2024). Data is Moody: Discovering Data Modification Rules from Process Event Logs. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14942. Springer, Cham. https://doi.org/10.1007/978-3-031-70344-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70344-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70343-0

  • Online ISBN: 978-3-031-70344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics