Trace Clustering

De Weerdt, Jochen

doi:10.1007/978-3-319-63962-8_91-1

Jochen De Weerdt³

140 Accesses
1 Citations

Synonyms

Clustering of process instances; Process variant discovery

Definition

Given an event log being a bag of process instances, with each process instance or trace consisting of a sequence of events, trace clustering refers to grouping these process instances so as to maximize the similarity between them and maximize the dissimilarity between the groups. While trace clustering might be described as discovering process variants, usually, the term process variant is already used for a collection of process instances with exactly the same event sequence. As such, trace clustering usually refers to the grouping of those variants into logically coherent groups, which can be, confusingly, referred to as variants of the process.

Overview

This entry discusses trace clustering in the context of process mining. The technique is explained in general, followed by a dense yet informative typology of techniques. This entry is concluded by an overview of related topics.

Introduction

Despite the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Appice A, Malerba D (2015) A co-training strategy for multiple view clustering in process mining. IEEE Trans Serv Comput PP(99):1–1. https://doi.org/10.1109/TSC.2015.2430327
Bose RPJC, van der Aalst WMP (2010) Trace clustering based on conserved patterns: towards achieving better process models. Lect Notes Bus Inf Process 43 LNBIP:170–181. https://doi.org/10.1007/978-3-642-12186-9_16
Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Disc 7(4):399–424. https://doi.org/10.1023/A:1024992613384
Chatain T, Carmona J, Van Dongen B (2017) Alignment-based trace clustering. In: International conference on conceptual modeling. Springer, pp 295–308
Chapter Google Scholar
De Koninck P, De Weerdt J (2016a) Determining the number of trace clusters: a stability-based approach. In: van der Aalst WMP, Bergenthum R, Carmona J (eds) Proceedings of the international workshop on algorithms & theories for the analysis of event data 2016 satellite event of the conferences: 37th international conference on application and theory of petri nets and concurrency petri nets 2016 and 16th international conference on application of concurrency to system design ACSD 2016, Torun, 20–21 June 2016. CEUR-WS.org, CEUR Workshop Proceedings, vol 1592, pp 1–15. http://ceur-ws.org/Vol-1592/paper01.pdf
De Koninck P, De Weerdt J (2016b) Multi-objective trace clustering: finding more balanced solutions. In: Dumas M, Fantinato M (eds) Business process management workshops – BPM 2016 international workshops, Rio de Janeiro, 19 Sept 2016, Revised papers. Lecture notes in business information processing, vol 281, pp 49–60. https://doi.org/10.1007/978-3-319-58457-7_4
De Koninck P, De Weerdt J (2017) Similarity-based approaches for determining the number of trace clusters in process discovery. T Petri Nets Other Models Concurr 12:19–42. https://doi.org/10.1007/978-3-662-55862-1_2
De Koninck P, De Weerdt J, vanden Broucke SKLM (2017a) Explaining clusterings of process instances. Data Min Knowl Discov 31(3):774–808. https://doi.org/10.1007/s10618-016-0488-4
De Koninck P, Nelissen K, Baesens B, vanden Broucke S, Snoeck M, De Weerdt J (2017b) An approach for incorporating expert knowledge in trace clustering. In: Dubois E, Pohl K (eds) Proceedings of the 29th international conference on Advanced information systems engineering, CAiSE 2017, Essen, 12–16 June 2017. Lecture notes in computer science, vol 10253. Springer, pp 561–576. https://doi.org/10.1007/978-3-319-59536-8_35
Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl-Based Syst 84:203–213. https://doi.org/10.1016/j.knosys.2015.04.012
Article Google Scholar
De Weerdt J, Vanden Broucke S (2014) SECPI: searching for explanations for clustered process instances. In: Lecture notes in computer science (Including subseries lecture notes artificial intelligence lecture notes in bioinformatics). LNCS, vol 8659, pp 408–415. https://doi.org/10.1007/978-3-319-10172-9_29
De Weerdt J, Vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720. https://doi.org/10.1109/TKDE.2013.64
Evermann J, Thaler T, Fettke P (2016) Clustering traces using sequence alignment. In: Reichert M, Reijers HA (eds) Business process management workshops: BPM 2015, 13th international workshops, Innsbruck, 31 Aug–3 Sept 2015, Revised papers. Springer International Publishing, Cham, pp 179–190. https://doi.org/10.1007/978-3-319-42887-1_15
Chapter Google Scholar
Ferreira DR, Zacarias M, Malheiros M, Ferreira P (2007) Approaching process mining with sequence clustering: experiments and findings. In: BPM, pp 360–374. https://doi.org/10.1007/978-3-540-75183-0_26
Folino F, Greco G, Guzzo A, Pontieri L (2011) Mining usage scenarios in business processes: outlier-aware discovery and run-time prediction. Data Knowl Eng 70(12):1005–1029. https://doi.org/10.1016/j.datak.2011.07.002
Article Google Scholar
García-Bañuelos L, Dumas M, La Rosa M, De Weerdt J, Ekanayake CC (2014) Controlled automated discovery of collections of business process models. Inf Syst 46:85–101
Article Google Scholar
Goedertier S, De Weerdt J, Martens D, Vanthienen J, Baesens B (2011) Process discovery in event logs: an application in the telecom industry. Appl Soft Comput 11(2):1697–1710
Article Google Scholar
Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027. https://doi.org/10.1109/TKDE.2006.123
Article Google Scholar
Günther CW (2009) Process mining in flexible environments. PhD thesis, TU Eindhoven
Google Scholar
Günther CW, van der Aalst WMP (2007) Fuzzy mining – adaptive process simplification based on multi-perspective metrics. In: ter Hofstede AHM, Benatallah B, Paik HY (eds) BPM. Lecture notes in computer science, vol 4928. Springer, pp 328–343
Google Scholar
Hompes BFA, Buijs JCAM, van der Aalst WMP, Dixit P, Buurman J (2015) Detecting changes in process behavior using comparative case clustering. In: Ceravolo P, Rinderle-Ma S (eds) Data-driven process discovery and analysis – 5th IFIP WG 2.6 international symposium, SIMPDA 2015, Vienna, 9–11 Dec 2015, Revised selected papers. Lecture notes in business information processing, vol 244. Springer, pp 54–75. https://doi.org/10.1007/978-3-319-53435-0_3
Google Scholar
Jagadeesh Chandra Bose RP, van der Aalst WMP (2009a) Abstractions in process mining: a taxonomy of patterns. In: Dayal U, Eder J, Koehler J, Reijers HA (eds) BPM. Lecture notes in computer science, vol 5701. Springer, pp 159–175
Google Scholar
Jagadeesh Chandra Bose RP, van der Aalst WMP (2009b) Context aware trace clustering: towards improving process mining results. In: SDM, pp 401–412. https://doi.org/10.1137/1.9781611972795.35
Chapter Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Englewood Cliffs
Google Scholar
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710
Google Scholar
Song M, Günther CW, van der Aalst WMP (2008) Trace clustering in process mining. In: BPM workshops, pp 109–120. https://doi.org/10.1007/978-3-642-00328-8_11
Chapter Google Scholar
Song M, Yang H, Siadat SH, Pechenizkiy M (2013) A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Syst Appl 40:3722–3737. https://doi.org/10.1016/j.eswa.2012.12.078
Article Google Scholar
Thaler T, Ternis SF, Fettke P, Loos P (2015) A comparative analysis of process instance cluster techniques. Wirtschaftsinformatik 2015:423–437
Google Scholar
Veiga GM, Ferreira DR (2010) Understanding spaghetti models with sequence clustering for prom. In: Rinderle-Ma S et al (ed) BPM workshops. LNBIP, vol 43. Springer, pp 92–103. https://doi.org/10.1007/978-3-642-12186-9
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

KU Leuven, Leuven, Belgium
Jochen De Weerdt

Authors

Jochen De Weerdt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jochen De Weerdt .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

Institute of Computer Science, University of Tartu, Juhan Liivi 2, 50409, Tartu, Estonia
Marlon Dumas
Department of Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Matthias Weidlich

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

De Weerdt, J. (2018). Trace Clustering. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_91-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_91-1
Published: 24 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics