Experimenting and Assessing a Probabilistic Business Process Deviance Mining Framework Based on Ensemble Learning

Cuzzocrea, Alfredo; Folino, Francesco; Guarascio, Massimo; Pontieri, Luigi

doi:10.1007/978-3-319-93375-7_6

Alfredo Cuzzocrea^10,11,
Francesco Folino¹¹,
Massimo Guarascio¹¹ &
…
Luigi Pontieri¹¹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 321))

Included in the following conference series:

International Conference on Enterprise Information Systems

877 Accesses
2 Citations

Abstract

Business Process Intelligence (BPI) and Process Mining, two very active research areas of research, share a great interest towards the issue of discovering an effective Deviance Detection Model (DDM), computed via accessing log data. The DDM model allows us to understand whether novel instances of the target business process are deviant or not, thus becoming extremely useful in modern application scenarios such as cybersecurity and fraud detection. In this chapter, we further and significantly extend our previous line of work that has originated, across years, an innovative ensemble-learning framework for mining business process deviances, whose main benefit is that of introducing a sort of multi-view learning scheme. One of the most relevant achievements of this extended work consists in proposing an alternative meta-learning method for probabilistically combining the predictions of different base DDMs, and putting all together in a conceptual system architecture oriented to support common Business Process Management (BPM) scenarios. In addition to this, we here envisage the combination of this approach with a deviance explanation methodology that leverages and extends a previous method still proposed by us in previous research. Basically, the latter method allows to discover accurate and readable deviance-aware trace clusters defined in terms of descriptive rules over both properties and behavioral aspects of the traces. We complement our analytical contributions with a comprehensive experimental assessment and analysis, even in comparison with a state-of-the-art DDM discovery approach. The experimental results we derive confirm flexibility, reliability and effectiveness of the proposed business process deviance mining framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
With regard to the labelling scheme of Fig. 1 (and assuming that the base DDMs appear in CL in the same left-to-right order as in the figure), it is \(CL[q]=c_{i,j}\) iff \(q= (i-1) \times k + j\) and \(PL[q] = P_i\) iff \(i = {\lfloor (q-1) /k \rfloor +1}\).
2.
Similar encodings are used by recent process mining approaches [11, 14, 35] to obtain an abstract representation for a given log.
3.
With a little abuse of notation, we take the freedom of denoting both the pattern and the corresponding selector by the same symbol.
4.
For better readability, we will omit \(L,\delta ,\gamma ,\sigma \) and maxLen when they are clear from the context.
5.
Available at https://fluxicon.com/disco/.

References

Angiulli, F., Fassetti, F., Palopoli, L.: Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans. Knowl. Data Eng. 25(6), 1280–1292 (2013)
Article Google Scholar
Atzmueller, M.: Subgroup discovery - advanced review. Wiley Int. Rev. Data Min. Knowl. Disc. 5(1), 35–49 (2015)
Article Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12186-9_16
Chapter Google Scholar
Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning (ICML 98), pp. 55–63 (1998)
Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Discovering signature patterns from event logs. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013), pp. 111–118 (2013)
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Article Google Scholar
Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Cuzzocrea, A.: Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP. In: Proceedings of ACM DOLAP 2005, pp. 97–106 (2005)
Google Scholar
Cuzzocrea, A.: Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: Proceedings of IEEE SSDBM 2006, pp. 301–310 (2006)
Google Scholar
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: A multi-view learning approach to the discovery of deviant process instances. In: Debruyne, C., Panetto, H., Meersman, R., Dillon, T., Weichhart, G., An, Y., Ardagna, C.A. (eds.) OTM 2015. LNCS, vol. 9415, pp. 146–165. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26148-5_9
Chapter Google Scholar
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: A robust and versatile multi-view learning framework for the detection of deviant business process instances. Int. J. Coop. Inf. Syst. 25(4), 1–56 (2016)
Article Google Scholar
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: Extensions, analysis and experimental assessment of a probabilistic ensemble-learning framework for detecting deviances in business process instances. In: Proceedings of ICEIS 2017, pp. 162–173 (2017)
Google Scholar
Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri L.: A multi-view multi-dimensional ensemble learning approach to mining business process deviances. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2016), pp. 3809–3816 (2016)
Google Scholar
Cuzzocrea, A., Furfaro, F., Saccà, D.: Enabling OLAP in mobile environments via intelligent data cube compression techniques. J. Intell. Inf. Syst. 33(2), 95–143 (2009)
Article Google Scholar
Cuzzocrea, A., Matrangolo, U.: Analytical synopses for approximate query answering in OLAP environments. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 359–370. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30075-5_35
Chapter Google Scholar
Das, K., Schneider, J., Neill, D.B.: Anomaly pattern detection in categorical datasets. In: Proceedings of 14th International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 169–176 (2008)
Google Scholar
Domingos, P., Pazzani, M.J.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning (ICML 1996), pp. 105–112 (1996)
Google Scholar
Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Article Google Scholar
van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005). https://doi.org/10.1007/11494744_25
Chapter Google Scholar
Folino, F., Guarascio, M., Pontieri, L.: Mining predictive process models out of low-level multidimensional logs. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 533–547. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07881-6_36
Chapter Google Scholar
Folino, F., Guarascio, M., Pontieri, L.: A descriptive clustering approach to the analysis of quantitative business-process deviances. In: Proceedings of 2017 Symposium on Applied Computing (SAC 2017), pp. 765–770 (2017)
Google Scholar
Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B.: Weka-a machine learning workbench for data mining. In: Maimon O., Rokach L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_66
Chapter Google Scholar
Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Int. Res. 17(1), 501–527 (2002)
MATH Google Scholar
Großkreutz, H., Paurat, D., Rüping, S.: An enhanced relevance criterion for more concise supervised pattern discovery. In: Proceedings of 18th International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 1442–1450 (2012)
Google Scholar
Günther, C.W., Rozinat, A.: Disco: discover your processes. In: Proceedings of 10th International Conference on Business Process Management (BPM 2012), pp. 40–44 (2012)
Google Scholar
Hornix, P.T.: Performance analysis of business processes through process mining. Master’s thesis, Eindhoven University of Technology, The Netherlands (2007)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Keogh, E.J., Pazzani, M.J.: Learning the structure of augmented Bayesian classifiers. Int. J. Artif. Intell. Tools 11(40), 587–601 (2002)
Article Google Scholar
Kubat, M., Holte, R., Matwin, S.: Learning when negative examples abound. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 146–153. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62858-4_79
Chapter Google Scholar
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of 10th National Conference on Artificial Intelligence (AAAI 1992), pp. 223–228 (1992)
Google Scholar
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)
MathSciNet Google Scholar
Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25(2), 208–242 (2012)
Article MathSciNet Google Scholar
Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 1–52 (2015)
Google Scholar
Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 297–313. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23063-4_21
Chapter Google Scholar
Lo, D., Cheng, H., Han, J., Khoo, S.C., Sun, C.: Classification of software behaviors for failure detection: a discriminative pattern mining approach. In: Proceedings of 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 557–566 (2009)
Google Scholar
McFowland, E., Speakman, S., Neill, D.B.: Fast generalized subset scan for anomalous pattern detection. J. Mach. Learn. Res. 14(1), 1533–1561 (2013)
MathSciNet MATH Google Scholar
Nguyen, H., Dumas, M., La Rosa, M., Maggi, F.M., Suriadi, S.: Mining business process deviance: a quest for accuracy. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 436–445. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_25
Chapter Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM 2000), pp. 86–93 (2000)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Sahami, M.: Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd ACM SIGKDD of International Conference Knowledge Discovery and Data Mining (KDD 1996), pp. 334–338 (1996)
Google Scholar
Suriadi, S., Ouyang, C., van der Aalst, W.M.P., ter Hofstede, A.H.M.: Root cause analysis with enriched process logs. In: La Rosa, M., Soffer, P. (eds.) BPM 2012. LNBIP, vol. 132, pp. 174–186. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36285-9_18
Chapter Google Scholar
Swinnen, J., Depaire, B., Jans, M.J., Vanhoof, K.: A process deviation analysis - a case study. In: Proceedings of 2011 Business Process Management Workshops, pp. 87–98 (2011)
Chapter Google Scholar
van der Aalst, W., Adriansyah, A., van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. Wiley Int. Rev. Data Min. Knowl. Disc. 2(2), 182–192 (2012)
Google Scholar
van Dongen, B.F.: Real-life event logs - hospital log (2011)
Google Scholar
Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17722-4_5
Chapter Google Scholar
Wang, W., Zhou, Z.H.: A new analysis of co-training. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 1135–1142 (2010)
Google Scholar
Webb, G.I., Boughton, J., Wang, Z.: Not so Naive Bayes: aggregating one-dependence estimators. Mach. Learn. 58(1), 5–24 (2005)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco (2005)
Google Scholar
Ying, Y., et al.: To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans. Knowl. Data Eng. 19(12), 1652–1665 (2007)
Article MathSciNet Google Scholar
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybernet. Part C Appl. Rev. 30(4), 451–462 (2000)
Article Google Scholar
Zhang, H., Jiang, L., Su, J.: Hidden Naive Bayes. In: Proceedings of AAAI, pp. 919–924 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

DIA Department, University of Trieste, Trieste, Italy
Alfredo Cuzzocrea
ICAR-CNR, Rende, CS, Italy
Alfredo Cuzzocrea, Francesco Folino, Massimo Guarascio & Luigi Pontieri

Authors

Alfredo Cuzzocrea
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Folino
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Guarascio
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Pontieri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfredo Cuzzocrea .

Editor information

Editors and Affiliations

MODESTE/ESEO, Angers, France
Slimane Hammoudi
Warsaw University of Technology, Warsaw, Poland
Michał Śmiałek
MODESTE/ESEO, Angers, France
Olivier Camp
INSTICC, Polytechnic Institute of Setúbal, Setúbal, Poland
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L. (2018). Experimenting and Assessing a Probabilistic Business Process Deviance Mining Framework Based on Ensemble Learning. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2017. Lecture Notes in Business Information Processing, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-319-93375-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-93375-7_6
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93374-0
Online ISBN: 978-3-319-93375-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics