Data is Moody: Discovering Data Modification Rules from Process Event Logs

Schuster, Marco Bjarne; Wiegand, Boris; Vreeken, Jilles

doi:10.1007/978-3-031-70344-7_17

Marco Bjarne Schuster¹³,
Boris Wiegand¹⁴ &
Jilles Vreeken¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14942))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

788 Accesses

Abstract

Although event logs are a powerful source to gain insight into the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules describing how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior.

Given an event log, we aim to find accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data-Driven Process Discovery - Revealing Conditional Infrequent Behavior from Event Logs

Discovering Data Models from Event Logs

Advanced Process Discovery Techniques

Notes

1.
https://eda.rg.cispa.io/prj/moody/.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14, Los Alamitos, CA, USA, 1995. IEEE Computer Society (1995)
Google Scholar
Augusto, A., Conforti, R., Dumas, M., La Rosa, M.: Split Miner: Discovering accurate and simple business process models from event logs. In: ICDM, pp. 1–10 (2017)
Google Scholar
Bose, R.J.C., Van Der Aalst, W.M., Žliobaitė, I., Pechenizkiy, M.: Dealing with concept drifts in process mining. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 154–171 (2013)
Article Google Scholar
Budhathoki, K., Boley, M., Vreeken, J.: Discovering reliable causal rules. In: SDM, pp. 1–9 (2021)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press (2009)
Google Scholar
Cüppers, J., Krieger, P., Vreeken, J.: Discovering sequential patterns with predictable inter-event delays. In: AAAI, vol. 38, pp. 8346–8353 (2024)
Google Scholar
Dawid, A.P.: Present position and potential developments: some personal views - statistical theory: the prequential approach. J. R. Statist. Soc. A 147(2), 278–292 (1984)
Article Google Scholar
de Leoni, M., Mannhardt, F.: Road traffic fine management process (2015). https://doi.org/10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5
Fischer, J., Vreeken, J.: Differentiable pattern set mining. In: KDD, pp. 383–392 (2021)
Google Scholar
Foster, M., Derrick, J., Walkinshaw, N.: Reverse-engineering EFSMs with data dependencies. In ICTSS, pp. 37–54 (2021)
Google Scholar
Galbrun, E.: The minimum description length principle for pattern mining: a survey. Data Min. Knowl. Disc. 36(5), 1679–1727 (2022)
Article MathSciNet Google Scholar
Grünwald, P.: The Minimum Description Length Principle. MIT Press (2007)
Google Scholar
Hlupic, V., Robinson, S.: Business process modelling and analysis using discrete-event simulation. In: WSC, pp. 1363–1369 (1998)
Google Scholar
Krismayer, T.: Automatic Mining of Constraints for Event-based Systems Monitoring. Ph.D. thesis, Johannes Kepler University Linz (2020)
Google Scholar
Mannhardt, F.: Sepsis cases - event log (2016). https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460
Mannhardt, F., De Leoni, M., Reijers, H.A., Van Der Aalst, W.M.: Balanced multi-perspective checking of process conformance. Computing 98, 407–437 (2016)
Article MathSciNet Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: KDD, pp. 210–215 (1995)
Google Scholar
Marx, A., Vreeken, J.: Telling cause from effect by local and global regression. Knowl. Inf. Syst. 60(3), 1277–1305 (2019)
Article Google Scholar
Marx, A., Vreeken, J.: Formally justifying MDL-based inference of cause and effect. In: ITCI (2022)
Google Scholar
Mozafari Mehr, A.S., de Carvalho, R.M., van Dongen, B.: Detecting privacy, data and control-flow deviations in business processes. In: CAiSE, pp. 82–91 (2021)
Google Scholar
Nolle, T., Seeliger, A., Mühlhäuser, M.: Binet: multivariate business process anomaly detection using deep learning. In: BPM, pp. 271–287 (2018)
Google Scholar
Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE TKDE 16(11), 1424–1440 (2004)
Google Scholar
Petitjean, F., Li, T., Tatti, N., Webb, G.: Skopus: mining top-k sequential patterns under leverage. Data Min. Knowl. Disc. 30 (2016)
Google Scholar
Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery. Data Min. Knowl. Disc. 36(5), 1885–1970 (2022)
Article Google Scholar
Proença, H.M., van Leeuwen, M.: Interpretable multiclass classification by MDL-based rule lists. JIS 512, 1372–1393 (2020)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(1), 465–471 (1978)
Article Google Scholar
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)
Article MathSciNet Google Scholar
Rissanen, J.: Universal coding, information, prediction, and estimation. IEEE TIT 30, 629–636 (1984)
MathSciNet Google Scholar
Rodionov, V.: On the number of labeled acyclic digraphs. Discret. Math. 105(1), 319–321 (1992)
Article MathSciNet Google Scholar
Sato, D.M.V., De Freitas, S.C., Barddal, J.P., Scalabrin, E.E.: A survey on concept drift in process mining. ACM CSUR 54(9), 1–38 (2021)
Google Scholar
Schönig, S., Di Ciccio, C., Maggi, F.M., Mendling, J.: Discovery of multi-perspective declarative process models. In: ICSOC, pp. 87–103 (2016)
Google Scholar
Sommers, D., Menkovski, V., Fahland, D.: Process discovery using graph neural networks. In: ICPM, pp. 40–47 (2021)
Google Scholar
Tatti, N.: Significance of episodes based on minimal windows. In: ICDM, pp. 513–522 (2009)
Google Scholar
Tatti, N., Cule, B.: Mining closed episodes with simultaneous events. In: KDD, pp. 1172–1180 (2011)
Google Scholar
Tatti, N., Cule, B.: Mining closed strict episodes. Data Min. Knowl. Disc. (2011)
Google Scholar
Tatti, N., Vreeken, J.: The long and the short of it: summarizing event sequences with serial episodes. In: KDD, pp. 462–470. ACM (2012)
Google Scholar
Taymouri, F., La Rosa, M., Erfani, S.: A deep adversarial model for suffix and remaining time prediction of event sequences. In: SDM, pp. 522–530 (2021)
Google Scholar
van der Aalst, W.: Process Mining: Data Science in Action, 2nd edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4
Walkinshaw, N., Hall, M.: Inferring computational state machine models from program executions. In: ICSME (2016)
Google Scholar
Wiegand, B., Klakow, D., Vreeken, J.: Discovering interpretable data-to-sequence generators. In: AAAI, pp. 4237–4244 (2022)
Google Scholar
Yang, L., van Leeuwen, M.: Truly unordered probabilistic rule sets for multi-class classification. In: ECML PKDD, pp. 87–103 (2022)
Google Scholar
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Airbus Operations GmbH, Bremen, Germany
Marco Bjarne Schuster
Stahl-Holding-Saar, Dillingen, Germany
Boris Wiegand
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Jilles Vreeken

Authors

Marco Bjarne Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Boris Wiegand
View author publications
You can also search for this author in PubMed Google Scholar
Jilles Vreeken
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Bjarne Schuster .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 232 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schuster, M.B., Wiegand, B., Vreeken, J. (2024). Data is Moody: Discovering Data Modification Rules from Process Event Logs. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14942. Springer, Cham. https://doi.org/10.1007/978-3-031-70344-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-70344-7_17
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70343-0
Online ISBN: 978-3-031-70344-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)