ABSTRACT
Increasing attention has been paid to the problem of explaining and analyzing "deviant cases" generated by a business process, i.e. instances of the process that diverged from prescribed/expected behavior (e.g. frauds, faults, SLA violations). In many real settings, such cases are labelled with a numerical deviance measure, and the analyst wants to obtain a fine grain unsupervised classification of them, which will help her recognize and explain different deviance scenarios. Unfortunately, current approaches rely on preliminary labelling all the cases, stored in some an execution log, as either deviant or non-deviant, and then inducing a rule-based classifier for discriminating among the two classes. By contrast, we here propose a method that discovers accurate and readable deviance-aware clusters (of cases) defined in terms of descriptive rules over both properties and behavioral aspects of the cases. Each cluster is also equipped with summary information that allows to derive effective distribution charts and a high-level process map, both emphasizing the distinctive features of the cluster. Tests on a real-life log confirmed the ability of the approach to find easily-interpretable clustering models capturing relevant deviance scenarios.
- M. Atzmueller. Subgroup discovery - advanced review. Wiley Int. Rev. Data Min. and Knowl. Disc., 5(1):35--49, Jan. 2015. Google ScholarDigital Library
- H. Blockeel, L. D. Raedt, and J. Ramon. Top-down induction of clustering trees. In Proc. of the 15th Int. Conf. on Machine Learning (ICML'98), pp. 55--63, 1998. Google ScholarDigital Library
- R. P. J. C. Bose and W. M. P. van der Aalst. Discovering signature patterns from event logs. In IEEE Symp. on Comp. Intellig. and Data Mining (CIDM'13), pp. 111--118, 2013. Google ScholarCross Ref
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.Google Scholar
- W. W. Cohen. Fast effective rule induction. In Proc. of 12th Int. Conf. on Machine Learn. (ICML'95), pp. 115--123, 1995. Google ScholarDigital Library
- F. Folino, M. Guarascio, and L. Pontieri. Mining multi-variant process models from low-level logs. In Proc. of 18th Int. Conf. on Business Inf. Syst. (BIS'15), pp. 165--177, 2015. Google ScholarCross Ref
- A. A. Freitas. Comprehensible classification models: A position paper. SIGKDD Explor. Newsl., 15(1):1--10, 2014. Google ScholarDigital Library
- G. Holmes, M. Hall, and E. Frank. Generating rule sets from model trees. In Proc. of 12th Australian Joint Conf. on Artificial Intelligence (AI'99), pp. 1--12, 1999. Google ScholarDigital Library
- M. Leeuwen and A. Knobbe. Diverse subgroup set discovery. Data Min. Knowl. Discov., 25(2):208--242, 2012. Google ScholarDigital Library
- A. Leontjeva, R. Conforti, C. D. Francescomarino, M. Dumas, and F. M. Maggi. Complex symbolic sequence encodings for predictive monitoring of business processes. In Proc of 13th Int. Conf. on Business Process Management (BPM'15), pp. 297--313, 2015. Google ScholarDigital Library
- H. Nguyen, M. Dumas, M. L. Rosa, F. M. Maggi, and S. Suriadi. Mining business process deviance: A quest for accuracy. In In Proc. of OTM 2014, pp. 436--445, 2014. Google ScholarCross Ref
- W. Steeman. BPI challenge 2013, closed problems. 2013.Google Scholar
- H. M. W. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, and W. M. P. van der Aalst. Xes, Xesame, and ProM 6. In Inf. Systems Evolution - CAiSE Forum 2010, pages 60--75, 2010.Google Scholar
Index Terms
- A descriptive clustering approach to the analysis of quantitative business-process deviances
Recommendations
Evolution-Based Tabu Search Approach to Automatic Clustering
Traditional clustering algorithms (e.g., the K-means algorithm and its variants) are used only for a fixed number of clusters. However, in many clustering applications, the actual number of clusters is unknown beforehand. The general solution to this ...
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Ant clustering algorithm with K-harmonic means clustering
Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Comments