Abstract
Considerable amounts of business process event logs can be collected by modern information systems. Process discovery aims to uncover a process model from an event log. Many process discovery approaches have been proposed, however, most of them have difficulties in handling large-scale event logs. Motivated by PageRank, in this paper we propose LogRank, a graph-based ranking model, for event log sampling. Using LogRank, a large-scale event log can be sampled to a smaller size that can be efficiently handled by existing discovery approaches. Moreover, we introduce an approach to measure the quality of a sample log with respect to the original one from a discovery perspective. The proposed sampling approach has been implemented in the open-source process mining toolkit ProM. The experimental analyses with both synthetic and real-life event logs demonstrate that the proposed sampling approach provides an effective solution to improve process discovery efficiency as well as ensuring high quality of the discovered model.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
van der Aalst, W.: Process Mining: Data Science in Action. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4
Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision, generalization and simplicity in process discovery. In: Meersman, R., et al. (eds.) OTM 2012. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33606-5_19
Cheng, J., Liu, C., Zhou, M., Zeng, Q., Ylä-Jääski, A.: Automatic composition of semantic web services based on fuzzy predicate petri nets. IEEE Trans. Autom. Sci. Eng. 12(2), 680–689 (2015)
Cheng, L., Kotoulas, S., Ward, T.E., Theodoropoulos, G.: Robust and efficient large-large table outer joins on distributed infrastructures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 258–269. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_22
Cheng, L., Li, T.: Efficient data redistribution to speedup big data analytics in large systems. In: 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), pp. 91–100. IEEE (2016)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)
Evermann, J.: Scalable process discovery using map-reduce. IEEE Trans. Serv. Comput. 9(3), 469–481 (2016)
Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_17
Liu, C., Cheng, J., Wang, Y., Gao, S.: Time performance optimization and resource conflicts resolution for multiple project management. IEICE Trans. Inf. Syst. 99(3), 650–660 (2016)
Liu, C., Duan, H., Qingtian, Z., Zhou, M., Lu, F., Cheng, J.: Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Trans. Serv. Comput. 1–15 (2016). https://doi.org/10.1109/TSC.2016.2617331
Liu, C., Zeng, Q., Duan, H., Zhou, M., Lu, F., Cheng, J.: E-net modeling and analysis of emergency response processes constrained by resources and uncertain durations. IEEE Trans. Syst. Man Cybern.: Syst. 45(1), 84–96 (2015)
Liu, C., Zeng, Q., Zou, J., Lu, F., Wu, Q.: Invariant decomposition conditions for petri nets based on the index of transitions. Inf. Technol. J. 11(7), 768–774 (2012)
Liu, C., Zhang, F.: Petri net based modeling and correctness verification of collaborative emergency response processes. Cybern. Inf. Technol. 16(3), 122–136 (2016)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Pei, Y., Yin, W., Huang, L.: Generic multi-document summarization using topic-oriented information. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS (LNAI), vol. 7458, pp. 435–446. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32695-0_39
Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00328-8_11
Zeng, Q., Liu, C., Duan, H.: Resource conflict detection and removal strategy for nondeterministic emergency response processes using petri nets. Enterp. Inf. Syst. 10(7), 729–750 (2016)
Zeng, Q., Lu, F., Liu, C., Duan, H., Zhou, C.: Modeling and verification for cross-department collaborative business processes using extended petri nets. IEEE Trans. Syst. Man Cybern.: Syst. 45(2), 349–362 (2015)
Zeng, Q., Sun, S.X., Duan, H., Liu, C., Wang, H.: Cross-organizational collaborative workflow mining from a multi-source log. Decis. Support Syst. 54(3), 1280–1301 (2013)
Acknowledgement
This work was supported in part by the NSFC under Grant 61472229, Grant 61602279, Grant 71704096, and Grant 31671588, in part by the Science and Technology Development Fund of Shandong Province of China under Grant 2016ZDJS02A11, Grant 2014GGX101035, and Grant ZR2017MF027, in part by the Taishan Scholar Climbing Program of Shandong Province, and in part by the SDUST Research Fund under Grant 2015TDJH102.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, C., Pei, Y., Zeng, Q., Duan, H. (2018). LogRank: An Approach to Sample Business Process Event Log for Efficient Discovery. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11061. Springer, Cham. https://doi.org/10.1007/978-3-319-99365-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-99365-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99364-5
Online ISBN: 978-3-319-99365-2
eBook Packages: Computer ScienceComputer Science (R0)