Abstract
Business Process Intelligence (BPI) and Process Mining, two very active research areas of research, share a great interest towards the issue of discovering an effective Deviance Detection Model (DDM), computed via accessing log data. The DDM model allows us to understand whether novel instances of the target business process are deviant or not, thus becoming extremely useful in modern application scenarios such as cybersecurity and fraud detection. In this chapter, we further and significantly extend our previous line of work that has originated, across years, an innovative ensemble-learning framework for mining business process deviances, whose main benefit is that of introducing a sort of multi-view learning scheme. One of the most relevant achievements of this extended work consists in proposing an alternative meta-learning method for probabilistically combining the predictions of different base DDMs, and putting all together in a conceptual system architecture oriented to support common Business Process Management (BPM) scenarios. In addition to this, we here envisage the combination of this approach with a deviance explanation methodology that leverages and extends a previous method still proposed by us in previous research. Basically, the latter method allows to discover accurate and readable deviance-aware trace clusters defined in terms of descriptive rules over both properties and behavioral aspects of the traces. We complement our analytical contributions with a comprehensive experimental assessment and analysis, even in comparison with a state-of-the-art DDM discovery approach. The experimental results we derive confirm flexibility, reliability and effectiveness of the proposed business process deviance mining framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
With regard to the labelling scheme of Fig. 1 (and assuming that the base DDMs appear in CL in the same left-to-right order as in the figure), it is \(CL[q]=c_{i,j}\) iff \(q= (i-1) \times k + j\) and \(PL[q] = P_i\) iff \(i = {\lfloor (q-1) /k \rfloor +1}\).
- 2.
- 3.
With a little abuse of notation, we take the freedom of denoting both the pattern and the corresponding selector by the same symbol.
- 4.
For better readability, we will omit \(L,\delta ,\gamma ,\sigma \) and maxLen when they are clear from the context.
- 5.
Available at https://fluxicon.com/disco/.
References
Angiulli, F., Fassetti, F., Palopoli, L.: Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans. Knowl. Data Eng. 25(6), 1280–1292 (2013)
Atzmueller, M.: Subgroup discovery - advanced review. Wiley Int. Rev. Data Min. Knowl. Disc. 5(1), 35–49 (2015)
Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12186-9_16
Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning (ICML 98), pp. 55–63 (1998)
Bose, R.P.J.C., van der Aalst, W.M.P.: Discovering signature patterns from event logs. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013), pp. 111–118 (2013)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Cuzzocrea, A.: Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP. In: Proceedings of ACM DOLAP 2005, pp. 97–106 (2005)
Cuzzocrea, A.: Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: Proceedings of IEEE SSDBM 2006, pp. 301–310 (2006)
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: A multi-view learning approach to the discovery of deviant process instances. In: Debruyne, C., Panetto, H., Meersman, R., Dillon, T., Weichhart, G., An, Y., Ardagna, C.A. (eds.) OTM 2015. LNCS, vol. 9415, pp. 146–165. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26148-5_9
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: A robust and versatile multi-view learning framework for the detection of deviant business process instances. Int. J. Coop. Inf. Syst. 25(4), 1–56 (2016)
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: Extensions, analysis and experimental assessment of a probabilistic ensemble-learning framework for detecting deviances in business process instances. In: Proceedings of ICEIS 2017, pp. 162–173 (2017)
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri L.: A multi-view multi-dimensional ensemble learning approach to mining business process deviances. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2016), pp. 3809–3816 (2016)
Cuzzocrea, A., Furfaro, F., Saccà, D.: Enabling OLAP in mobile environments via intelligent data cube compression techniques. J. Intell. Inf. Syst. 33(2), 95–143 (2009)
Cuzzocrea, A., Matrangolo, U.: Analytical synopses for approximate query answering in OLAP environments. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 359–370. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30075-5_35
Das, K., Schneider, J., Neill, D.B.: Anomaly pattern detection in categorical datasets. In: Proceedings of 14th International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 169–176 (2008)
Domingos, P., Pazzani, M.J.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning (ICML 1996), pp. 105–112 (1996)
Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005). https://doi.org/10.1007/11494744_25
Folino, F., Guarascio, M., Pontieri, L.: Mining predictive process models out of low-level multidimensional logs. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 533–547. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07881-6_36
Folino, F., Guarascio, M., Pontieri, L.: A descriptive clustering approach to the analysis of quantitative business-process deviances. In: Proceedings of 2017 Symposium on Applied Computing (SAC 2017), pp. 765–770 (2017)
Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B.: Weka-a machine learning workbench for data mining. In: Maimon O., Rokach L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_66
Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Int. Res. 17(1), 501–527 (2002)
Großkreutz, H., Paurat, D., Rüping, S.: An enhanced relevance criterion for more concise supervised pattern discovery. In: Proceedings of 18th International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 1442–1450 (2012)
Günther, C.W., Rozinat, A.: Disco: discover your processes. In: Proceedings of 10th International Conference on Business Process Management (BPM 2012), pp. 40–44 (2012)
Hornix, P.T.: Performance analysis of business processes through process mining. Master’s thesis, Eindhoven University of Technology, The Netherlands (2007)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Keogh, E.J., Pazzani, M.J.: Learning the structure of augmented Bayesian classifiers. Int. J. Artif. Intell. Tools 11(40), 587–601 (2002)
Kubat, M., Holte, R., Matwin, S.: Learning when negative examples abound. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 146–153. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62858-4_79
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of 10th National Conference on Artificial Intelligence (AAAI 1992), pp. 223–228 (1992)
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)
Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25(2), 208–242 (2012)
Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 1–52 (2015)
Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 297–313. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23063-4_21
Lo, D., Cheng, H., Han, J., Khoo, S.C., Sun, C.: Classification of software behaviors for failure detection: a discriminative pattern mining approach. In: Proceedings of 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 557–566 (2009)
McFowland, E., Speakman, S., Neill, D.B.: Fast generalized subset scan for anomalous pattern detection. J. Mach. Learn. Res. 14(1), 1533–1561 (2013)
Nguyen, H., Dumas, M., La Rosa, M., Maggi, F.M., Suriadi, S.: Mining business process deviance: a quest for accuracy. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 436–445. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_25
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM 2000), pp. 86–93 (2000)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Sahami, M.: Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd ACM SIGKDD of International Conference Knowledge Discovery and Data Mining (KDD 1996), pp. 334–338 (1996)
Suriadi, S., Ouyang, C., van der Aalst, W.M.P., ter Hofstede, A.H.M.: Root cause analysis with enriched process logs. In: La Rosa, M., Soffer, P. (eds.) BPM 2012. LNBIP, vol. 132, pp. 174–186. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36285-9_18
Swinnen, J., Depaire, B., Jans, M.J., Vanhoof, K.: A process deviation analysis - a case study. In: Proceedings of 2011 Business Process Management Workshops, pp. 87–98 (2011)
van der Aalst, W., Adriansyah, A., van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. Wiley Int. Rev. Data Min. Knowl. Disc. 2(2), 182–192 (2012)
van Dongen, B.F.: Real-life event logs - hospital log (2011)
Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17722-4_5
Wang, W., Zhou, Z.H.: A new analysis of co-training. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 1135–1142 (2010)
Webb, G.I., Boughton, J., Wang, Z.: Not so Naive Bayes: aggregating one-dependence estimators. Mach. Learn. 58(1), 5–24 (2005)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco (2005)
Ying, Y., et al.: To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans. Knowl. Data Eng. 19(12), 1652–1665 (2007)
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybernet. Part C Appl. Rev. 30(4), 451–462 (2000)
Zhang, H., Jiang, L., Su, J.: Hidden Naive Bayes. In: Proceedings of AAAI, pp. 919–924 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L. (2018). Experimenting and Assessing a Probabilistic Business Process Deviance Mining Framework Based on Ensemble Learning. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2017. Lecture Notes in Business Information Processing, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-319-93375-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-93375-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93374-0
Online ISBN: 978-3-319-93375-7
eBook Packages: Computer ScienceComputer Science (R0)