Skip to main content

Trace Clustering

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies

Synonyms

Clustering of process instances; Process variant discovery

Definition

Given an event log being a bag of process instances, with each process instance or trace consisting of a sequence of events, trace clustering refers to grouping these process instances so as to maximize the similarity between them and maximize the dissimilarity between the groups. While trace clustering might be described as discovering process variants, usually, the term process variant is already used for a collection of process instances with exactly the same event sequence. As such, trace clustering usually refers to the grouping of those variants into logically coherent groups, which can be, confusingly, referred to as variants of the process.

Overview

This entry discusses trace clustering in the context of process mining. The technique is explained in general, followed by a dense yet informative typology of techniques. This entry is concluded by an overview of related topics.

Introduction

Despite the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Appice A, Malerba D (2015) A co-training strategy for multiple view clustering in process mining. IEEE Trans Serv Comput PP(99):1–1. https://doi.org/10.1109/TSC.2015.2430327

  • Bose RPJC, van der Aalst WMP (2010) Trace clustering based on conserved patterns: towards achieving better process models. Lect Notes Bus Inf Process 43 LNBIP:170–181. https://doi.org/10.1007/978-3-642-12186-9_16

  • Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Disc 7(4):399–424. https://doi.org/10.1023/A:1024992613384

  • Chatain T, Carmona J, Van Dongen B (2017) Alignment-based trace clustering. In: International conference on conceptual modeling. Springer, pp 295–308

    Chapter  Google Scholar 

  • De Koninck P, De Weerdt J (2016a) Determining the number of trace clusters: a stability-based approach. In: van der Aalst WMP, Bergenthum R, Carmona J (eds) Proceedings of the international workshop on algorithms & theories for the analysis of event data 2016 satellite event of the conferences: 37th international conference on application and theory of petri nets and concurrency petri nets 2016 and 16th international conference on application of concurrency to system design ACSD 2016, Torun, 20–21 June 2016. CEUR-WS.org, CEUR Workshop Proceedings, vol 1592, pp 1–15. http://ceur-ws.org/Vol-1592/paper01.pdf

  • De Koninck P, De Weerdt J (2016b) Multi-objective trace clustering: finding more balanced solutions. In: Dumas M, Fantinato M (eds) Business process management workshops – BPM 2016 international workshops, Rio de Janeiro, 19 Sept 2016, Revised papers. Lecture notes in business information processing, vol 281, pp 49–60. https://doi.org/10.1007/978-3-319-58457-7_4

  • De Koninck P, De Weerdt J (2017) Similarity-based approaches for determining the number of trace clusters in process discovery. T Petri Nets Other Models Concurr 12:19–42. https://doi.org/10.1007/978-3-662-55862-1_2

  • De Koninck P, De Weerdt J, vanden Broucke SKLM (2017a) Explaining clusterings of process instances. Data Min Knowl Discov 31(3):774–808. https://doi.org/10.1007/s10618-016-0488-4

  • De Koninck P, Nelissen K, Baesens B, vanden Broucke S, Snoeck M, De Weerdt J (2017b) An approach for incorporating expert knowledge in trace clustering. In: Dubois E, Pohl K (eds) Proceedings of the 29th international conference on Advanced information systems engineering, CAiSE 2017, Essen, 12–16 June 2017. Lecture notes in computer science, vol 10253. Springer, pp 561–576. https://doi.org/10.1007/978-3-319-59536-8_35

  • Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl-Based Syst 84:203–213. https://doi.org/10.1016/j.knosys.2015.04.012

    Article  Google Scholar 

  • De Weerdt J, Vanden Broucke S (2014) SECPI: searching for explanations for clustered process instances. In: Lecture notes in computer science (Including subseries lecture notes artificial intelligence lecture notes in bioinformatics). LNCS, vol 8659, pp 408–415. https://doi.org/10.1007/978-3-319-10172-9_29

  • De Weerdt J, Vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720. https://doi.org/10.1109/TKDE.2013.64

  • Evermann J, Thaler T, Fettke P (2016) Clustering traces using sequence alignment. In: Reichert M, Reijers HA (eds) Business process management workshops: BPM 2015, 13th international workshops, Innsbruck, 31 Aug–3 Sept 2015, Revised papers. Springer International Publishing, Cham, pp 179–190. https://doi.org/10.1007/978-3-319-42887-1_15

    Chapter  Google Scholar 

  • Ferreira DR, Zacarias M, Malheiros M, Ferreira P (2007) Approaching process mining with sequence clustering: experiments and findings. In: BPM, pp 360–374. https://doi.org/10.1007/978-3-540-75183-0_26

  • Folino F, Greco G, Guzzo A, Pontieri L (2011) Mining usage scenarios in business processes: outlier-aware discovery and run-time prediction. Data Knowl Eng 70(12):1005–1029. https://doi.org/10.1016/j.datak.2011.07.002

    Article  Google Scholar 

  • García-Bañuelos L, Dumas M, La Rosa M, De Weerdt J, Ekanayake CC (2014) Controlled automated discovery of collections of business process models. Inf Syst 46:85–101

    Article  Google Scholar 

  • Goedertier S, De Weerdt J, Martens D, Vanthienen J, Baesens B (2011) Process discovery in event logs: an application in the telecom industry. Appl Soft Comput 11(2):1697–1710

    Article  Google Scholar 

  • Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027. https://doi.org/10.1109/TKDE.2006.123

    Article  Google Scholar 

  • Günther CW (2009) Process mining in flexible environments. PhD thesis, TU Eindhoven

    Google Scholar 

  • Günther CW, van der Aalst WMP (2007) Fuzzy mining – adaptive process simplification based on multi-perspective metrics. In: ter Hofstede AHM, Benatallah B, Paik HY (eds) BPM. Lecture notes in computer science, vol 4928. Springer, pp 328–343

    Google Scholar 

  • Hompes BFA, Buijs JCAM, van der Aalst WMP, Dixit P, Buurman J (2015) Detecting changes in process behavior using comparative case clustering. In: Ceravolo P, Rinderle-Ma S (eds) Data-driven process discovery and analysis – 5th IFIP WG 2.6 international symposium, SIMPDA 2015, Vienna, 9–11 Dec 2015, Revised selected papers. Lecture notes in business information processing, vol 244. Springer, pp 54–75. https://doi.org/10.1007/978-3-319-53435-0_3

    Google Scholar 

  • Jagadeesh Chandra Bose RP, van der Aalst WMP (2009a) Abstractions in process mining: a taxonomy of patterns. In: Dayal U, Eder J, Koehler J, Reijers HA (eds) BPM. Lecture notes in computer science, vol 5701. Springer, pp 159–175

    Google Scholar 

  • Jagadeesh Chandra Bose RP, van der Aalst WMP (2009b) Context aware trace clustering: towards improving process mining results. In: SDM, pp 401–412. https://doi.org/10.1137/1.9781611972795.35

    Chapter  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Englewood Cliffs

    Google Scholar 

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710

    Google Scholar 

  • Song M, Günther CW, van der Aalst WMP (2008) Trace clustering in process mining. In: BPM workshops, pp 109–120. https://doi.org/10.1007/978-3-642-00328-8_11

    Chapter  Google Scholar 

  • Song M, Yang H, Siadat SH, Pechenizkiy M (2013) A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Syst Appl 40:3722–3737. https://doi.org/10.1016/j.eswa.2012.12.078

    Article  Google Scholar 

  • Thaler T, Ternis SF, Fettke P, Loos P (2015) A comparative analysis of process instance cluster techniques. Wirtschaftsinformatik 2015:423–437

    Google Scholar 

  • Veiga GM, Ferreira DR (2010) Understanding spaghetti models with sequence clustering for prom. In: Rinderle-Ma S et al (ed) BPM workshops. LNBIP, vol 43. Springer, pp 92–103. https://doi.org/10.1007/978-3-642-12186-9

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jochen De Weerdt .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

De Weerdt, J. (2018). Trace Clustering. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_91-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_91-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics