Skip to main content

Experimenting and Assessing a Probabilistic Business Process Deviance Mining Framework Based on Ensemble Learning

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2017)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 321))

Included in the following conference series:

Abstract

Business Process Intelligence (BPI) and Process Mining, two very active research areas of research, share a great interest towards the issue of discovering an effective Deviance Detection Model (DDM), computed via accessing log data. The DDM model allows us to understand whether novel instances of the target business process are deviant or not, thus becoming extremely useful in modern application scenarios such as cybersecurity and fraud detection. In this chapter, we further and significantly extend our previous line of work that has originated, across years, an innovative ensemble-learning framework for mining business process deviances, whose main benefit is that of introducing a sort of multi-view learning scheme. One of the most relevant achievements of this extended work consists in proposing an alternative meta-learning method for probabilistically combining the predictions of different base DDMs, and putting all together in a conceptual system architecture oriented to support common Business Process Management (BPM) scenarios. In addition to this, we here envisage the combination of this approach with a deviance explanation methodology that leverages and extends a previous method still proposed by us in previous research. Basically, the latter method allows to discover accurate and readable deviance-aware trace clusters defined in terms of descriptive rules over both properties and behavioral aspects of the traces. We complement our analytical contributions with a comprehensive experimental assessment and analysis, even in comparison with a state-of-the-art DDM discovery approach. The experimental results we derive confirm flexibility, reliability and effectiveness of the proposed business process deviance mining framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    With regard to the labelling scheme of Fig. 1 (and assuming that the base DDMs appear in CL in the same left-to-right order as in the figure), it is \(CL[q]=c_{i,j}\) iff \(q= (i-1) \times k + j\) and \(PL[q] = P_i\) iff \(i = {\lfloor (q-1) /k \rfloor +1}\).

  2. 2.

    Similar encodings are used by recent process mining approaches [11, 14, 35] to obtain an abstract representation for a given log.

  3. 3.

    With a little abuse of notation, we take the freedom of denoting both the pattern and the corresponding selector by the same symbol.

  4. 4.

    For better readability, we will omit \(L,\delta ,\gamma ,\sigma \) and maxLen when they are clear from the context.

  5. 5.

    Available at https://fluxicon.com/disco/.

References

  1. Angiulli, F., Fassetti, F., Palopoli, L.: Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans. Knowl. Data Eng. 25(6), 1280–1292 (2013)

    Article  Google Scholar 

  2. Atzmueller, M.: Subgroup discovery - advanced review. Wiley Int. Rev. Data Min. Knowl. Disc. 5(1), 35–49 (2015)

    Article  Google Scholar 

  3. Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12186-9_16

    Chapter  Google Scholar 

  4. Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning (ICML 98), pp. 55–63 (1998)

    Google Scholar 

  5. Bose, R.P.J.C., van der Aalst, W.M.P.: Discovering signature patterns from event logs. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013), pp. 111–118 (2013)

    Google Scholar 

  6. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  7. Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)

    Article  Google Scholar 

  8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  9. Cuzzocrea, A.: Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP. In: Proceedings of ACM DOLAP 2005, pp. 97–106 (2005)

    Google Scholar 

  10. Cuzzocrea, A.: Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: Proceedings of IEEE SSDBM 2006, pp. 301–310 (2006)

    Google Scholar 

  11. Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: A multi-view learning approach to the discovery of deviant process instances. In: Debruyne, C., Panetto, H., Meersman, R., Dillon, T., Weichhart, G., An, Y., Ardagna, C.A. (eds.) OTM 2015. LNCS, vol. 9415, pp. 146–165. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26148-5_9

    Chapter  Google Scholar 

  12. Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: A robust and versatile multi-view learning framework for the detection of deviant business process instances. Int. J. Coop. Inf. Syst. 25(4), 1–56 (2016)

    Article  Google Scholar 

  13. Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L.: Extensions, analysis and experimental assessment of a probabilistic ensemble-learning framework for detecting deviances in business process instances. In: Proceedings of ICEIS 2017, pp. 162–173 (2017)

    Google Scholar 

  14. Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri L.: A multi-view multi-dimensional ensemble learning approach to mining business process deviances. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2016), pp. 3809–3816 (2016)

    Google Scholar 

  15. Cuzzocrea, A., Furfaro, F., Saccà, D.: Enabling OLAP in mobile environments via intelligent data cube compression techniques. J. Intell. Inf. Syst. 33(2), 95–143 (2009)

    Article  Google Scholar 

  16. Cuzzocrea, A., Matrangolo, U.: Analytical synopses for approximate query answering in OLAP environments. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 359–370. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30075-5_35

    Chapter  Google Scholar 

  17. Das, K., Schneider, J., Neill, D.B.: Anomaly pattern detection in categorical datasets. In: Proceedings of 14th International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 169–176 (2008)

    Google Scholar 

  18. Domingos, P., Pazzani, M.J.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning (ICML 1996), pp. 105–112 (1996)

    Google Scholar 

  19. Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)

    Article  Google Scholar 

  20. van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005). https://doi.org/10.1007/11494744_25

    Chapter  Google Scholar 

  21. Folino, F., Guarascio, M., Pontieri, L.: Mining predictive process models out of low-level multidimensional logs. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 533–547. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07881-6_36

    Chapter  Google Scholar 

  22. Folino, F., Guarascio, M., Pontieri, L.: A descriptive clustering approach to the analysis of quantitative business-process deviances. In: Proceedings of 2017 Symposium on Applied Computing (SAC 2017), pp. 765–770 (2017)

    Google Scholar 

  23. Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B.: Weka-a machine learning workbench for data mining. In: Maimon O., Rokach L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_66

    Chapter  Google Scholar 

  24. Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Int. Res. 17(1), 501–527 (2002)

    MATH  Google Scholar 

  25. Großkreutz, H., Paurat, D., Rüping, S.: An enhanced relevance criterion for more concise supervised pattern discovery. In: Proceedings of 18th International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 1442–1450 (2012)

    Google Scholar 

  26. Günther, C.W., Rozinat, A.: Disco: discover your processes. In: Proceedings of 10th International Conference on Business Process Management (BPM 2012), pp. 40–44 (2012)

    Google Scholar 

  27. Hornix, P.T.: Performance analysis of business processes through process mining. Master’s thesis, Eindhoven University of Technology, The Netherlands (2007)

    Google Scholar 

  28. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  29. Keogh, E.J., Pazzani, M.J.: Learning the structure of augmented Bayesian classifiers. Int. J. Artif. Intell. Tools 11(40), 587–601 (2002)

    Article  Google Scholar 

  30. Kubat, M., Holte, R., Matwin, S.: Learning when negative examples abound. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 146–153. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62858-4_79

    Chapter  Google Scholar 

  31. Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of 10th National Conference on Artificial Intelligence (AAAI 1992), pp. 223–228 (1992)

    Google Scholar 

  32. Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)

    MathSciNet  Google Scholar 

  33. Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25(2), 208–242 (2012)

    Article  MathSciNet  Google Scholar 

  34. Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 1–52 (2015)

    Google Scholar 

  35. Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 297–313. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23063-4_21

    Chapter  Google Scholar 

  36. Lo, D., Cheng, H., Han, J., Khoo, S.C., Sun, C.: Classification of software behaviors for failure detection: a discriminative pattern mining approach. In: Proceedings of 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 557–566 (2009)

    Google Scholar 

  37. McFowland, E., Speakman, S., Neill, D.B.: Fast generalized subset scan for anomalous pattern detection. J. Mach. Learn. Res. 14(1), 1533–1561 (2013)

    MathSciNet  MATH  Google Scholar 

  38. Nguyen, H., Dumas, M., La Rosa, M., Maggi, F.M., Suriadi, S.: Mining business process deviance: a quest for accuracy. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 436–445. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_25

    Chapter  Google Scholar 

  39. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM 2000), pp. 86–93 (2000)

    Google Scholar 

  40. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  41. Sahami, M.: Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd ACM SIGKDD of International Conference Knowledge Discovery and Data Mining (KDD 1996), pp. 334–338 (1996)

    Google Scholar 

  42. Suriadi, S., Ouyang, C., van der Aalst, W.M.P., ter Hofstede, A.H.M.: Root cause analysis with enriched process logs. In: La Rosa, M., Soffer, P. (eds.) BPM 2012. LNBIP, vol. 132, pp. 174–186. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36285-9_18

    Chapter  Google Scholar 

  43. Swinnen, J., Depaire, B., Jans, M.J., Vanhoof, K.: A process deviation analysis - a case study. In: Proceedings of 2011 Business Process Management Workshops, pp. 87–98 (2011)

    Chapter  Google Scholar 

  44. van der Aalst, W., Adriansyah, A., van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. Wiley Int. Rev. Data Min. Knowl. Disc. 2(2), 182–192 (2012)

    Google Scholar 

  45. van Dongen, B.F.: Real-life event logs - hospital log (2011)

    Google Scholar 

  46. Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17722-4_5

    Chapter  Google Scholar 

  47. Wang, W., Zhou, Z.H.: A new analysis of co-training. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 1135–1142 (2010)

    Google Scholar 

  48. Webb, G.I., Boughton, J., Wang, Z.: Not so Naive Bayes: aggregating one-dependence estimators. Mach. Learn. 58(1), 5–24 (2005)

    Article  Google Scholar 

  49. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

  50. Ying, Y., et al.: To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans. Knowl. Data Eng. 19(12), 1652–1665 (2007)

    Article  MathSciNet  Google Scholar 

  51. Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybernet. Part C Appl. Rev. 30(4), 451–462 (2000)

    Article  Google Scholar 

  52. Zhang, H., Jiang, L., Su, J.: Hidden Naive Bayes. In: Proceedings of AAAI, pp. 919–924 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Cuzzocrea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L. (2018). Experimenting and Assessing a Probabilistic Business Process Deviance Mining Framework Based on Ensemble Learning. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2017. Lecture Notes in Business Information Processing, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-319-93375-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93375-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93374-0

  • Online ISBN: 978-3-319-93375-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics