Abstract
Although event logs are a powerful source to gain insight into the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules describing how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior.
Given an event log, we aim to find accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14, Los Alamitos, CA, USA, 1995. IEEE Computer Society (1995)
Augusto, A., Conforti, R., Dumas, M., La Rosa, M.: Split Miner: Discovering accurate and simple business process models from event logs. In: ICDM, pp. 1–10 (2017)
Bose, R.J.C., Van Der Aalst, W.M., Žliobaitė, I., Pechenizkiy, M.: Dealing with concept drifts in process mining. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 154–171 (2013)
Budhathoki, K., Boley, M., Vreeken, J.: Discovering reliable causal rules. In: SDM, pp. 1–9 (2021)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press (2009)
Cüppers, J., Krieger, P., Vreeken, J.: Discovering sequential patterns with predictable inter-event delays. In: AAAI, vol. 38, pp. 8346–8353 (2024)
Dawid, A.P.: Present position and potential developments: some personal views - statistical theory: the prequential approach. J. R. Statist. Soc. A 147(2), 278–292 (1984)
de Leoni, M., Mannhardt, F.: Road traffic fine management process (2015). https://doi.org/10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5
Fischer, J., Vreeken, J.: Differentiable pattern set mining. In: KDD, pp. 383–392 (2021)
Foster, M., Derrick, J., Walkinshaw, N.: Reverse-engineering EFSMs with data dependencies. In ICTSS, pp. 37–54 (2021)
Galbrun, E.: The minimum description length principle for pattern mining: a survey. Data Min. Knowl. Disc. 36(5), 1679–1727 (2022)
Grünwald, P.: The Minimum Description Length Principle. MIT Press (2007)
Hlupic, V., Robinson, S.: Business process modelling and analysis using discrete-event simulation. In: WSC, pp. 1363–1369 (1998)
Krismayer, T.: Automatic Mining of Constraints for Event-based Systems Monitoring. Ph.D. thesis, Johannes Kepler University Linz (2020)
Mannhardt, F.: Sepsis cases - event log (2016). https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460
Mannhardt, F., De Leoni, M., Reijers, H.A., Van Der Aalst, W.M.: Balanced multi-perspective checking of process conformance. Computing 98, 407–437 (2016)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: KDD, pp. 210–215 (1995)
Marx, A., Vreeken, J.: Telling cause from effect by local and global regression. Knowl. Inf. Syst. 60(3), 1277–1305 (2019)
Marx, A., Vreeken, J.: Formally justifying MDL-based inference of cause and effect. In: ITCI (2022)
Mozafari Mehr, A.S., de Carvalho, R.M., van Dongen, B.: Detecting privacy, data and control-flow deviations in business processes. In: CAiSE, pp. 82–91 (2021)
Nolle, T., Seeliger, A., Mühlhäuser, M.: Binet: multivariate business process anomaly detection using deep learning. In: BPM, pp. 271–287 (2018)
Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE TKDE 16(11), 1424–1440 (2004)
Petitjean, F., Li, T., Tatti, N., Webb, G.: Skopus: mining top-k sequential patterns under leverage. Data Min. Knowl. Disc. 30 (2016)
Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery. Data Min. Knowl. Disc. 36(5), 1885–1970 (2022)
Proença, H.M., van Leeuwen, M.: Interpretable multiclass classification by MDL-based rule lists. JIS 512, 1372–1393 (2020)
Rissanen, J.: Modeling by shortest data description. Automatica 14(1), 465–471 (1978)
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)
Rissanen, J.: Universal coding, information, prediction, and estimation. IEEE TIT 30, 629–636 (1984)
Rodionov, V.: On the number of labeled acyclic digraphs. Discret. Math. 105(1), 319–321 (1992)
Sato, D.M.V., De Freitas, S.C., Barddal, J.P., Scalabrin, E.E.: A survey on concept drift in process mining. ACM CSUR 54(9), 1–38 (2021)
Schönig, S., Di Ciccio, C., Maggi, F.M., Mendling, J.: Discovery of multi-perspective declarative process models. In: ICSOC, pp. 87–103 (2016)
Sommers, D., Menkovski, V., Fahland, D.: Process discovery using graph neural networks. In: ICPM, pp. 40–47 (2021)
Tatti, N.: Significance of episodes based on minimal windows. In: ICDM, pp. 513–522 (2009)
Tatti, N., Cule, B.: Mining closed episodes with simultaneous events. In: KDD, pp. 1172–1180 (2011)
Tatti, N., Cule, B.: Mining closed strict episodes. Data Min. Knowl. Disc. (2011)
Tatti, N., Vreeken, J.: The long and the short of it: summarizing event sequences with serial episodes. In: KDD, pp. 462–470. ACM (2012)
Taymouri, F., La Rosa, M., Erfani, S.: A deep adversarial model for suffix and remaining time prediction of event sequences. In: SDM, pp. 522–530 (2021)
van der Aalst, W.: Process Mining: Data Science in Action, 2nd edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4
Walkinshaw, N., Hall, M.: Inferring computational state machine models from program executions. In: ICSME (2016)
Wiegand, B., Klakow, D., Vreeken, J.: Discovering interpretable data-to-sequence generators. In: AAAI, pp. 4237–4244 (2022)
Yang, L., van Leeuwen, M.: Truly unordered probabilistic rule sets for multi-class classification. In: ECML PKDD, pp. 87–103 (2022)
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schuster, M.B., Wiegand, B., Vreeken, J. (2024). Data is Moody: Discovering Data Modification Rules from Process Event Logs. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14942. Springer, Cham. https://doi.org/10.1007/978-3-031-70344-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-70344-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70343-0
Online ISBN: 978-3-031-70344-7
eBook Packages: Computer ScienceComputer Science (R0)