Structural Feature Selection for Event Logs

Hinkka, Markku; Lehto, Teemu; Heljanko, Keijo; Jung, Alexander

doi:10.1007/978-3-319-74030-0_2

Markku Hinkka^8,9,
Teemu Lehto^8,9,
Keijo Heljanko^8,10 &
…
Alexander Jung⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 308))

Included in the following conference series:

International Conference on Business Process Management

3719 Accesses
2 Altmetric

Abstract

We consider the problem of classifying business process instances based on structural features derived from event logs. The main motivation is to provide machine learning based techniques with quick response times for interactive computer assisted root cause analysis. In particular, we create structural features from process mining such as activity and transition occurrence counts, and ordering of activities to be evaluated as potential features for classification. We show that adding such structural features increases the amount of information thus potentially increasing classification accuracy. However, there is an inherent trade-off as using too many features leads to too long run-times for machine learning classification models. One way to improve the machine learning algorithms’ run-time is to only select a small number of features by a feature selection algorithm. However, the run-time required by the feature selection algorithm must also be taken into account. Also, the classification accuracy should not suffer too much from the feature selection. The main contributions of this paper are as follows: First, we propose and compare six different feature selection algorithms by means of an experimental setup comparing their classification accuracy and achievable response times. Second, we discuss the potential use of feature selection results for computer assisted root cause analysis as well as the properties of different types of structural features in the context of feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine Learning-Based Framework for Log-Lifting in Business Process Mining Applications

Process mining on machine event logs for profiling abnormal behaviour and root cause analysis

Article 16 September 2020

Finding Structure in the Unstructured: Hybrid Feature Set Clustering for Process Discovery

References

Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? SIGKDD Explor. 2(2), 1–13 (2000)
Article Google Scholar
Jagadeesh Chandra Bose, R.P., van der Aalst, W.M.P.: Abstractions in process mining: a taxonomy of patterns. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 159–175. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03848-8_12
Chapter Google Scholar
Jagadeesh Chandra Bose, R.P., van der Aalst, W.M.P.: Discovering signature patterns from event logs. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2013, Singapore, 16–19 April 2013, pp. 111–118. IEEE (2013)
Google Scholar
Conforti, R., de Leoni, M., Rosa, M.L., van der Aalst, W.M.P., ter Hofstede, A.H.M.: A recommendation system for predicting risks across multiple business process instances. Decis. Support Syst. 69, 1–19 (2015)
Article Google Scholar
Covões, T.F., Hruschka, E.R., de Castro, L.N., Santos, Á.M.: A cluster-based feature selection approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 169–176. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02319-4_20
Chapter Google Scholar
Ding, C.H.Q., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(2), 185–206 (2005)
Article Google Scholar
Francescomarino, C.D., Dumas, M., Maggi, F.M., Teinemaa, I.: Clustering-based predictive process monitoring. CoRR, abs/1506.01428 (2015)
Google Scholar
Granitto, P.M., Furlanello, C., Biasioli, F., Gasperi, F.: Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom. Intell. Lab. Syst. 83(2), 83–90 (2006)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
MATH Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Royal Stat. Soc. Ser. C (Applied Statistics) 28(1), 100–108 (1979)
MATH Google Scholar
Hinkka, M.: Support materials for articles (2017). https://github.com/mhinkka/articles. Accessed 13 Mar 2017
Hinkka, M., Lehto, T., Heljanko, K.: Assessing big data SQL frameworks for analyzing event logs. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016, Heraklion, Crete, Greece, 17–19 February 2016, pp. 101–108. IEEE Computer Society (2016)
Google Scholar
Lehto, T., Hinkka, M., Hollmén, J.: Focusing business improvements using process mining based influence analysis. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNBIP, vol. 260, pp. 177–192. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45468-9_11
Chapter Google Scholar
Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 297–313. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23063-4_21
Chapter Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R news 2(3), 18–22 (2002)
Google Scholar
Meyer, P.E.: Information-theoretic variable selection and network inference from microarray data. Ph.D. thesis. Université Libre de Bruxelles (2008)
Google Scholar
Nguyen, H., Dumas, M., La Rosa, M., Maggi, F.M., Suriadi, S.: Mining business process deviance: a quest for accuracy. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 436–445. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_25
Google Scholar
Ogutu, J.O., Piepho, H.-P., Schulz-Streeck, T.: A comparison of random forests, boosting and support vector machines for genomic selection. In: BMC Proceedings, vol. 5, no. 3, p. S11 (2011)
Google Scholar
Teinemaa, I., Dumas, M., Maggi, F.M., Di Francescomarino, C.: Predictive business process monitoring with structured and unstructured data. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNCS, vol. 9850, pp. 401–417. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45348-4_23
Chapter Google Scholar
Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Article MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Royal Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
van der Aalst, W.M.P.: Process Mining - Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin (2011)
MATH Google Scholar
Van Dongen, B.: Real-Life Event Logs - Hospital Log (2011). https://doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
Van Dongen, B.: BPI Challenge 2014. Rabobank Nederland (2014). http://dx.doi.org/10.4121/uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T.A., Vapnik, V.: Feature selection for SVMs. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA, pp. 668–674. MIT Press, Cambridge (2000)
Google Scholar
Zeng, Y., Luo, J., Lin, S.: Classification using Markov blanket for feature selection. In: The 2009 IEEE International Conference on Granular Computing, GrC 2009, Lushan Mountain, Nanchang, China, 17–19 August 2009, pp. 743–747. IEEE Computer Society (2009)
Google Scholar

Download references

Acknowledgements

We want to thank QPR Software Plc for funding our research. Financial support of Academy of Finland projects 139402 and 277522 is acknowledged.

Author information

Authors and Affiliations

Department of Computer Science, School of Science, Aalto University, Espoo, Finland
Markku Hinkka, Teemu Lehto, Keijo Heljanko & Alexander Jung
QPR Software Plc, Helsinki, Finland
Markku Hinkka & Teemu Lehto
HIIT Helsinki Institute for Information Technology, Espoo, Finland
Keijo Heljanko

Authors

Markku Hinkka
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Lehto
View author publications
You can also search for this author in PubMed Google Scholar
Keijo Heljanko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Jung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markku Hinkka .

Editor information

Editors and Affiliations

Department of Service and Information System Engineering, Universitat Politècnica de Catalunya, Barcelona, Spain
Ernest Teniente
Humboldt-Universität zu Berlin, Berlin, Berlin, Germany
Matthias Weidlich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hinkka, M., Lehto, T., Heljanko, K., Jung, A. (2018). Structural Feature Selection for Event Logs. In: Teniente, E., Weidlich, M. (eds) Business Process Management Workshops. BPM 2017. Lecture Notes in Business Information Processing, vol 308. Springer, Cham. https://doi.org/10.1007/978-3-319-74030-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-74030-0_2
Published: 17 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74029-4
Online ISBN: 978-3-319-74030-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics