Definition
Given an event log being a bag of process instances, with each process instance or trace consisting of a sequence of events, trace clustering refers to grouping these process instances so as to maximize the similarity between them and maximize the dissimilarity between the groups. While trace clustering might be described as discovering process variants, usually, the term process variant is already used for a collection of process instances with exactly the same event sequence. As such, trace clustering usually refers to the grouping of those variants into logically coherent groups, which can be, confusingly, referred to as variants of the process.
Overview
This entry discusses trace clustering in the context of process mining. The technique is explained in general, followed by a dense yet informative typology of techniques. This entry is concluded by an overview of related topics.
Introduction
Despite the...
References
Appice A, Malerba D (2015) A co-training strategy for multiple view clustering in process mining. IEEE Trans Serv Comput PP(99):1–1. https://doi.org/10.1109/TSC.2015.2430327
Bose RPJC, van der Aalst WMP (2010) Trace clustering based on conserved patterns: towards achieving better process models. Lect Notes Bus Inf Process 43 LNBIP:170–181. https://doi.org/10.1007/978-3-642-12186-9_16
Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Disc 7(4):399–424. https://doi.org/10.1023/A:1024992613384
Chatain T, Carmona J, Van Dongen B (2017) Alignment-based trace clustering. In: International conference on conceptual modeling. Springer, pp 295–308
De Koninck P, De Weerdt J (2016a) Determining the number of trace clusters: a stability-based approach. In: van der Aalst WMP, Bergenthum R, Carmona J (eds) Proceedings of the international workshop on algorithms & theories for the analysis of event data 2016 satellite event of the conferences: 37th international conference on application and theory of petri nets and concurrency petri nets 2016 and 16th international conference on application of concurrency to system design ACSD 2016, Torun, 20–21 June 2016. CEUR-WS.org, CEUR Workshop Proceedings, vol 1592, pp 1–15. http://ceur-ws.org/Vol-1592/paper01.pdf
De Koninck P, De Weerdt J (2016b) Multi-objective trace clustering: finding more balanced solutions. In: Dumas M, Fantinato M (eds) Business process management workshops – BPM 2016 international workshops, Rio de Janeiro, 19 Sept 2016, Revised papers. Lecture notes in business information processing, vol 281, pp 49–60. https://doi.org/10.1007/978-3-319-58457-7_4
De Koninck P, De Weerdt J (2017) Similarity-based approaches for determining the number of trace clusters in process discovery. T Petri Nets Other Models Concurr 12:19–42. https://doi.org/10.1007/978-3-662-55862-1_2
De Koninck P, De Weerdt J, vanden Broucke SKLM (2017a) Explaining clusterings of process instances. Data Min Knowl Discov 31(3):774–808. https://doi.org/10.1007/s10618-016-0488-4
De Koninck P, Nelissen K, Baesens B, vanden Broucke S, Snoeck M, De Weerdt J (2017b) An approach for incorporating expert knowledge in trace clustering. In: Dubois E, Pohl K (eds) Proceedings of the 29th international conference on Advanced information systems engineering, CAiSE 2017, Essen, 12–16 June 2017. Lecture notes in computer science, vol 10253. Springer, pp 561–576. https://doi.org/10.1007/978-3-319-59536-8_35
Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl-Based Syst 84:203–213. https://doi.org/10.1016/j.knosys.2015.04.012
De Weerdt J, Vanden Broucke S (2014) SECPI: searching for explanations for clustered process instances. In: Lecture notes in computer science (Including subseries lecture notes artificial intelligence lecture notes in bioinformatics). LNCS, vol 8659, pp 408–415. https://doi.org/10.1007/978-3-319-10172-9_29
De Weerdt J, Vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720. https://doi.org/10.1109/TKDE.2013.64
Evermann J, Thaler T, Fettke P (2016) Clustering traces using sequence alignment. In: Reichert M, Reijers HA (eds) Business process management workshops: BPM 2015, 13th international workshops, Innsbruck, 31 Aug–3 Sept 2015, Revised papers. Springer International Publishing, Cham, pp 179–190. https://doi.org/10.1007/978-3-319-42887-1_15
Ferreira DR, Zacarias M, Malheiros M, Ferreira P (2007) Approaching process mining with sequence clustering: experiments and findings. In: BPM, pp 360–374. https://doi.org/10.1007/978-3-540-75183-0_26
Folino F, Greco G, Guzzo A, Pontieri L (2011) Mining usage scenarios in business processes: outlier-aware discovery and run-time prediction. Data Knowl Eng 70(12):1005–1029. https://doi.org/10.1016/j.datak.2011.07.002
GarcÃa-Bañuelos L, Dumas M, La Rosa M, De Weerdt J, Ekanayake CC (2014) Controlled automated discovery of collections of business process models. Inf Syst 46:85–101
Goedertier S, De Weerdt J, Martens D, Vanthienen J, Baesens B (2011) Process discovery in event logs: an application in the telecom industry. Appl Soft Comput 11(2):1697–1710
Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027. https://doi.org/10.1109/TKDE.2006.123
Günther CW (2009) Process mining in flexible environments. PhD thesis, TU Eindhoven
Günther CW, van der Aalst WMP (2007) Fuzzy mining – adaptive process simplification based on multi-perspective metrics. In: ter Hofstede AHM, Benatallah B, Paik HY (eds) BPM. Lecture notes in computer science, vol 4928. Springer, pp 328–343
Hompes BFA, Buijs JCAM, van der Aalst WMP, Dixit P, Buurman J (2015) Detecting changes in process behavior using comparative case clustering. In: Ceravolo P, Rinderle-Ma S (eds) Data-driven process discovery and analysis – 5th IFIP WG 2.6 international symposium, SIMPDA 2015, Vienna, 9–11 Dec 2015, Revised selected papers. Lecture notes in business information processing, vol 244. Springer, pp 54–75. https://doi.org/10.1007/978-3-319-53435-0_3
Jagadeesh Chandra Bose RP, van der Aalst WMP (2009a) Abstractions in process mining: a taxonomy of patterns. In: Dayal U, Eder J, Koehler J, Reijers HA (eds) BPM. Lecture notes in computer science, vol 5701. Springer, pp 159–175
Jagadeesh Chandra Bose RP, van der Aalst WMP (2009b) Context aware trace clustering: towards improving process mining results. In: SDM, pp 401–412. https://doi.org/10.1137/1.9781611972795.35
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Englewood Cliffs
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710
Song M, Günther CW, van der Aalst WMP (2008) Trace clustering in process mining. In: BPM workshops, pp 109–120. https://doi.org/10.1007/978-3-642-00328-8_11
Song M, Yang H, Siadat SH, Pechenizkiy M (2013) A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Syst Appl 40:3722–3737. https://doi.org/10.1016/j.eswa.2012.12.078
Thaler T, Ternis SF, Fettke P, Loos P (2015) A comparative analysis of process instance cluster techniques. Wirtschaftsinformatik 2015:423–437
Veiga GM, Ferreira DR (2010) Understanding spaghetti models with sequence clustering for prom. In: Rinderle-Ma S et al (ed) BPM workshops. LNBIP, vol 43. Springer, pp 92–103. https://doi.org/10.1007/978-3-642-12186-9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
De Weerdt, J. (2018). Trace Clustering. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_91-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_91-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering