Skip to main content

Scalable Process Discovery with Guarantees

  • Conference paper
  • First Online:
Enterprise, Business-Process and Information Systems Modeling (BPMDS 2015, EMMSAD 2015)

Abstract

Considerable amounts of data, including process event data, are collected and stored by organisations nowadays. Discovering a process model from recorded process event data is the aim of process discovery algorithms. Many techniques have been proposed, but none combines scalability with quality guarantees, e.g. can handle billions of events or thousands of activities, and produces sound models (without deadlocks and other anomalies), and guarantees to rediscover the underlying process in some cases. In this paper, we introduce a framework for process discovery that computes a directly-follows graph by passing over the log once, and applying a divide-and-conquer strategy. Moreover, we introduce three algorithms using the framework. We experimentally show that it sacrifices little compared to algorithms that use the full event log, while it gains the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2(2), 182–192 (2012)

    Google Scholar 

  2. van der Aalst, W.M.P., van Hee, K.M., van der Werf, J.M.E.M., Verdonk, M.: Auditing 2.0: Using process mining to support tomorrow’s auditor. IEEE Computer 43(3), 90–93 (2010)

    Article  Google Scholar 

  3. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011)

    Google Scholar 

  4. van der Aalst, W.M.P.: Process cubes: slicing, dicing, rolling up and drilling down event data for process mining. In: Song, M., Wynn, M.T., Liu, J. (eds.) AP-BPM 2013. LNBIP, vol. 159, pp. 1–22. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. van der Aalst, W.M.P.. In: Data Scientist: Enigneer of the Future. I-ESA, vol. 7, pp. 13–26 (2014)

    Google Scholar 

  6. van der Aalst, W.M.P., Weijters, A., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)

    Article  Google Scholar 

  7. Adriansyah, A., Munoz-Gama, J., Carmona, J., van Dongen, B.F., van der Aalst, W.M.P.: Alignment based precision checking. In: La Rosa, M., Soffer, P. (eds.) BPM Workshops 2012. LNBIP, vol. 132, pp. 137–149. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Badouel, E.: On the \(\alpha \)-reconstructibility of workflow nets. In: Haddad, S., Pomello, L. (eds.) PETRI NETS 2012. LNCS, vol. 7347, pp. 128–147. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Buijs, J., van Dongen, B., van der Aalst, W.: A genetic algorithm for discovering process trees. In: IEEE Congress on Evolutionary Computation, pp. 1–8. IEEE (2012)

    Google Scholar 

  10. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision, generalization and simplicity in process discovery. In: Meersman, R., Panetto, H., Dillon, T., Rinderle-Ma, S., Dadam, P., Zhou, X., Pearson, S., Ferscha, A., Bergamaschi, S., Cruz, I.F. (eds.) OTM 2012, Part I. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Burattin, A., Sperduti, A., van der Aalst, W.M.P.: Control-flow discovery from event streams. In: IEEE Congress on Evolutionary Computation, pp. 2420–2427 (2014)

    Google Scholar 

  12. Carmona, J., Solé, M.: PMLAB: an scripting environment for process mining. In: BPM Demos. CEUR-WP, vol. 1295, p. 16 (2014)

    Google Scholar 

  13. Datta, S., Bhaduri, K., Giannella, C., Wolff, R., Kargupta, H.: Distributed data mining in peer-to-peer networks. IEEE Internet Computing 10(4), 18–26 (2006)

    Article  Google Scholar 

  14. van Dongen, B.: BPI Challenge 2012 Dataset (2012). http://dx.doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f

  15. Evermann, J.: Scalable process discovery using map-reduce. In: IEEE Transactions on Services Computing (2014, to appear)

    Google Scholar 

  16. Günther, C., Rozinat, A.: Disco: Discover your processes. In: BPM (Demos). CEUR Workshop Proceedings, vol. 940, pp. 40–44. CEUR-WS.org (2012)

    Google Scholar 

  17. Hay, B., Wets, G., Vanhoof, K.: Mining navigation patterns using a sequence alignment method. Knowl. Inf. Syst. 6(2), 150–163 (2004)

    Article  Google Scholar 

  18. Hwong, Y., Keiren, J.J.A., Kusters, V.J.J., Leemans, S.J.J., Willemse, T.A.C.: Formalising and analysing the control software of the compact muon solenoid experiment at the large hadron collider. Sci. Comput. Program. 78(12), 2435–2452 (2013)

    Article  Google Scholar 

  19. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  20. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann, N., Song, M., Wohed, P. (eds.) BPM 2013 Workshops. LNBIP, vol. 171, pp. 66–78. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  21. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from incomplete event logs. In: Ciardo, G., Kindler, E. (eds.) PETRI NETS 2014. LNCS, vol. 8489, pp. 91–110. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  22. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Exploring processes and deviations. In: Fournier, F., Mendling, J. (eds.) BPM 2014 Workshops. LNBIP, vol. 202, pp. 304–316. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  23. Redlich, D., Molka, T., Gilani, W., Blair, G., Rashid, A.: Constructs competition miner: process control-flow discovery of bp-domain constructs. In: Sadiq, S., Soffer, P., Völzer, H. (eds.) BPM 2014. LNCS, vol. 8659, pp. 134–150. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  24. Redlich, D., Molka, T., Gilani, W., Blair, G.S., Rashid, A.: Scalable dynamic business process discovery with the constructs competition miner. In: SIMPDA 2014. CEUR-WP, vol. 1293, pp. 91–107 (2014)

    Google Scholar 

  25. Weijters, A., van der Aalst, W., de Medeiros, A.: Process mining with the heuristics miner-algorithm. BETA Working Paper series 166, Eindhoven University of Technology (2006)

    Google Scholar 

  26. Wen, L., van der Aalst, W., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Mining and Knowledge Discovery 15(2), 145–180 (2007)

    Article  Google Scholar 

  27. Wen, L., Wang, J., Sun, J.: Mining invisible tasks from event logs. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 358–365. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  28. van der Werf, J., van Dongen, B., Hurkens, C., Serebrenik, A.: Process discovery using integer linear programming. Fundam. Inform. 94(3–4), 387–412 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sander J. J. Leemans .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P. (2015). Scalable Process Discovery with Guarantees. In: Gaaloul, K., Schmidt, R., Nurcan, S., Guerreiro, S., Ma, Q. (eds) Enterprise, Business-Process and Information Systems Modeling. BPMDS EMMSAD 2015 2015. Lecture Notes in Business Information Processing, vol 214. Springer, Cham. https://doi.org/10.1007/978-3-319-19237-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19237-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19236-9

  • Online ISBN: 978-3-319-19237-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics