Abstract
Various process mining techniques exist, e.g., techniques that automatically discover a descriptive model of the execution of a process, based on event data. Whereas the premise of process mining is clear, i.e., as witnessed by the tremendous growth of the field, data quality issues often hamper the direct applicability of process mining techniques. Several authors have studied data quality issues in process mining, yet, these works primarily propose data pre-processing techniques. An overarching study of the nature of data quality issues, the types of available techniques, and the general possibilities of (semi)-automated outlier/noise detection methods is missing. Therefore, in this paper, we propose a first attempt to structure and study the field of outlier/noise detection in process mining and understand to what degree knowledge on noise and outliers from other domains could advance the process mining field. We do so by answering three central research questions, covering various aspects related to (semi)-automated outlier/noise detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
To generate the event log we used the Petri Net-based Event Log Generator http://processmining.be/loggenerator/.
- 2.
- 3.
Available tools do not resolve synonyms nor homonyms. Therefore we restricted our analysis only to attribute noise.
- 4.
References
van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second edn. Springer, Heidelberg (2016)
Augusto, A., et al.: Automated discovery of process models from event logs: review and benchmark. IEEE Trans. Knowl. Data Eng. 31(4), 686–705 (2019)
Conforti, R., Rosa, M.L., ter Hofstede, A.H.M.: Filtering out infrequent behavior from business process event logs. IEEE TKDE 29(2), 300–314 (2017)
van Zelst, S.J., Sani, M.F., Ostovar, A., Conforti, R., Rosa, M.L.: Detection and removal of infrequent behavior from event streams of business processes. Inf. Syst. 90, 101451 (2020)
Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P.: Applying sequence mining for outlier detection in process mining. In: Panetto, H., Debruyne, C., Proper, H.A., Ardagna, C.A., Roman, D., Meersman, R. (eds.) OTM 2018. LNCS, vol. 11230, pp. 98–116. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02671-4_6
Freedman, D.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2005)
Ord, K.: Outliers in statistical data, 3rd edition, (john wiley & sons, chichester). Int. J. Forecast. 12(1), 175–176 (1996)
Ghionna, L., Greco, G., Guzzo, A., Pontieri, L.: Outlier detection techniques for process mining applications. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) ISMIS 2008. LNCS (LNAI), vol. 4994, pp. 150–159. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68123-6_17
Sani, M.F., van Zelst, S.J., van der Aalst, W.M.P.: Improving process discovery results by filtering outliers using conditional behavioural probabilities. In: Teniente, E., Weidlich, M. (eds.) BPM 2017. LNBIP, vol. 308, pp. 216–229. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74030-0_16
van Zelst, S.J., Sani, M.F., Ostovar, A., Conforti, R., Rosa, M.L.: Filtering spurious events from event streams of business processes. In: CAiSE 2018, Proceedings, pp. 35–52 (2018)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)
Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE TKDE 26(9), 2250–2267 (2014)
Koschmider, A., Mannhardt, F., Heuser, T.: On the contextualization of event-activity mappings. In: BPM 2018 International Workshops, pp. 445–457 (2018)
Aggarwal, C.C.: Outlier Analysis. 2nd edn. Springer, Heidelberg (2016)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE Trans. Knowl. Data Eng. 24(5), 823–839 (2012)
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study of their impacts. Artif. Intell. Rev. 22(3), 177–210 (2004)
SáEz, J.A., Galar, M., Luengo, J., Herrera, F.: Tackling the problem of classification with noisy data using multiple classifier systems: analysis of the performance and robustness. Inf. Sci. 247, 1–20 (2013)
Khoshgoftaar, T.M., Van Hulse, J.: Empirical case studies in attribute noise detection. IEEE Trans. Syst. Man Cybern. 39(4), 379–388 (2009)
Gupta, S., Gupta, A.: Dealing with noise problem in machine learning data-sets: a systematic review. Procedia Comput. Sci. 161, 466–474 (2019). The Fifth Information Systems International Conference
Dixit, P.M., et al.: Detection and interactive repair of event ordering imperfection in process logs. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 274–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_17
Andrews, R., Suriadi, S., Ouyang, C., Poppe, E.: Towards event log querying for data quality. In: Panetto, H., Debruyne, C., Proper, H.A., Ardagna, C.A., Roman, D., Meersman, R. (eds.) OTM 2018. LNCS, vol. 11229, pp. 116–134. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02610-3_7
van Zelst, S.J., Mannhardt, F., de Leoni, M., Koschmider, A.: Event abstraction in process mining - literature review and taxonomy. Granul. Comput. (2020)
Bose, R.P.J.C., Mans, R.S., van der Aalst, W.M.P.: Wanna improve process mining results? In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 127–134 (2013)
Ziolkowski, T., Brandt, L., Koschmider, A.: Elogqp: an event log quality pointer. In: ZEUS 2021. Volume 2839 of CEUR Workshop Proceedings, pp. 42–45. CEUR-WS.org (2021)
Martin, N., Martinez-Millana, A., Valdivieso, B., Fernandez-Llatas, C.: Interactive data cleaning for process mining: a case study of an outpatient clinic’s appointment system, pp. 532–544, September 2019
Tax, N., Sidorova, N., van der Aalst, W.M.P.: Discovering more precise process models from event logs by filtering out chaotic activities. J. Intell. Inf. Syst. 52(1), 107–139 (2018). https://doi.org/10.1007/s10844-018-0507-6
Sun, X., Hou, W., Yu, D., Wang, J., Pan, J.: Filtering out noise logs for process modelling based on event dependency. In: ICWS 2019, pp. 388–392. IEEE (2019)
Böhmer, K., Rinderle-Ma, S.: Mining association rules for anomaly detection in dynamic process runtime behavior and explaining the root cause to users. Inf. Syst. (2019)
Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P.: Repairing outlier behaviour in event logs. In: Abramowicz, W., Paschke, A. (eds.) BIS 2018. LNBIP, vol. 320, pp. 115–131. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93931-5_9
Chapela-Campa, D., Mucientes, M., Lama, M.: Simplification of complex process models by abstracting infrequent behaviour, pp. 415–430, October 2019
Nolle, T., Seeliger, A., Mühlhäuser, M.: Binet: multivariate business process anomaly detection using deep learning. In: BPM 2018, Proceedings, pp. 271–287 (2018)
Chapela-Campa, D., Mucientes, M., Lama, M.: Discovering infrequent behavioral patterns in process models. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNCS, vol. 10445, pp. 324–340. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65000-5_19
Mannhardt, F., De Leoni, M., Reijers, H.A., van der Aalst, W.M.P.: Data-driven process discovery - revealing conditional infrequent behavior from event logs. In: CAiSE 2017, Proceedings, pp. 545–560 (2017)
Ghionna, L., Greco, G., Guzzo, A., Pontieri, L.: Outlier detection techniques for process mining applications. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) ISMIS 2008. LNCS (LNAI), vol. 4994, pp. 150–159. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68123-6_17
Nolle, T., Seeliger, A., Mühlhäuser, M.: Unsupervised anomaly detection in noisy business process event logs using denoising autoencoders. In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 442–456. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_28
Cheng, H.J., Kumar, A.: Process mining on noisy logs - can log sanitization help to improve performance? Decis. Support Syst. 79, 138–149 (2015)
Conforti, R., La Rosa, M., ter Hofstede, A.: Timestamp repair for business process event logs. Technical report (2018)
Sadeghianasl, S., ter Hofstede, A.H.M., Wynn, M.T., Suriadi, S.: A contextual approach to detecting synonymous and polluted activity labels in process event logs. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 76–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_5
Nguyen, H.T.C., Comuzzi, M.: Event log reconstruction using autoencoders. In: Liu, X., et al. (eds.) ICSOC 2018. LNCS, vol. 11434, pp. 335–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17642-6_28
Sarno, R., Sinaga, F., Sungkono, K.: Anomaly detection in business processes using process mining and fuzzy association rule learning. J. Big Data 7 (2020)
Wang, J., Song, S., Lin, X., Zhu, X., Pei, J.: Cleaning structured event logs: a graph repair approach. In: Proceedings - International Conference on Data Engineering 2015, pp. 30–41, May 2015
Sadeghianasl, S., ter Hofstede, A.H.M., Wynn, M.T., Suriadi, S.: A contextual approach to detecting synonymous and polluted activity labels in process event logs. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 76–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_5
Böhmer, K., Rinderle-Ma, S.: Anomaly detection in business process runtime behavior - challenges and limitations. CoRR abs/1705.06659 (2017)
Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from incomplete event logs. In: Ciardo, G., Kindler, E. (eds.) PETRI NETS 2014. LNCS, vol. 8489, pp. 91–110. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07734-5_6
van der Aalst, W.: A practitioner’s guide to process mining: limitations of the directly-follows graph. Procedia Comput. Sci. 164, 321–328 (2019). CENTERIS 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Koschmider, A., Kaczmarek, K., Krause, M., van Zelst, S.J. (2022). Demystifying Noise and Outliers in Event Logs: Review and Future Directions. In: Marrella, A., Weber, B. (eds) Business Process Management Workshops. BPM 2021. Lecture Notes in Business Information Processing, vol 436. Springer, Cham. https://doi.org/10.1007/978-3-030-94343-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-94343-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94342-4
Online ISBN: 978-3-030-94343-1
eBook Packages: Computer ScienceComputer Science (R0)