A Framework for Application of Tree-Structured Data Mining to Process Log Analysis

Bui, Dang Bach; Hadzic, Fedja; Potdar, Vidyasagar

doi:10.1007/978-3-642-32639-4_52

Dang Bach Bui^19,20,
Fedja Hadzic²⁰ &
Vidyasagar Potdar¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7435))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1648 Accesses
3 Citations

Abstract

Many data mining and simulation based algorithms have been applied in the process mining field; nevertheless they mainly focus on the process discovery and conformance checking tasks. Even though the event logs are increasingly represented in semi-structured format using XML-based templates, commonly used XML mining techniques have not been explored. In this paper, we investigate the application of tree mining techniques and propose a general framework, within which a wider range of structure aware data mining techniques can be applied. Decision tree learning and frequent pattern mining are used as a case in point in the experiments on publicly available real dataset. The results indicate the promising properties of the proposed framework in adding to the available set of tools for process log analysis by enabling (i) direct data mining of tree-structured process logs (ii) extraction of informative knowledge patterns and (iii) frequent pattern mining at lower minimum support thresholds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aguilar-Savén, R.S.: Business process modelling: Review and framework. International Journal of Production Economics 90, 129–149 (2004)
Article Google Scholar
van der Aalst, W.M.P.: Process mining: discovery, conformance and enhancement of business processes. Springer, Heidelberg (2011)
Google Scholar
Rozinat, A., van der Aalst, W.M.P.: Decision Mining in ProM. In: Dustdar, S., Fiadeiro, J.L., Sheth, A.P. (eds.) BPM 2006. LNCS, vol. 4102, pp. 420–425. Springer, Heidelberg (2006)
Chapter Google Scholar
Greco, G., Guzzo, A., Manco, G., Sacca, D.: Mining and Reasoning on Workflows. IEEE Trans. on Knowl. and Data Eng. 17, 519–534 (2005)
Article Google Scholar
Günther, C.W., van der Aalst, W.M.P.: A Generic Import Framework for Process Event Logs. In: Eder, J., Dustdar, S. (eds.) BPM Workshops 2006. LNCS, vol. 4103, pp. 81–92. Springer, Heidelberg (2006)
Chapter Google Scholar
Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011)
Chapter Google Scholar
Kim, K.: A XML-Based Workflow Event Logging Mechanism for Workflow Mining. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds.) APWeb Workshops 2006. LNCS, vol. 3842, pp. 132–136. Springer, Heidelberg (2006)
Chapter Google Scholar
Gonçalves, M.A., Luo, M., Shen, R., Ali, M.F., Fox, E.A.: An XML Log Standard and Tool for Digital Library Logging Analysis. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 129–143. Springer, Heidelberg (2002)
Chapter Google Scholar
Hadzic, F., Tan, H., Dillon, T.S.: Mining of Data with Complex Structures. Springer, Heidelberg (2011)
MATH Google Scholar
Hadzic, F.: A Structure Preserving Flat Data Format Representation for Tree-Structured Data. In: Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models Workshop (QIMIE 2011), Lyon (2011)
Google Scholar
Kutty, S., Nayak, R., Li, Y.: XML Documents Clustering Using a Tensor Space Model. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 488–499. Springer, Heidelberg (2011)
Chapter Google Scholar
Hadzic, F., Hecker, M., Tagarelli, A.: XML Document Clustering Using Structure-Preserving Flat Representation of XML Content and Structure. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part II. LNCS, vol. 7121, pp. 403–416. Springer, Heidelberg (2011)
Chapter Google Scholar
Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T.: NDPMine: Efficiently Mining Discriminative Numerical Features for Pattern-Based Classification. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS (LNAI), vol. 6322, pp. 35–50. Springer, Heidelberg (2010)
Chapter Google Scholar
Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. on Knowl. and Data Eng. 17, 1021–1035 (2005)
Article Google Scholar
Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 63–73. Springer, Heidelberg (2004)
Chapter Google Scholar
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2 Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: Goethals, R.J.B., Zaki, B., M.J. (eds.) IEEE ICDM Workshop on Frequent Itemset Mining Implementations. CEUR-WS, Brighton, UK (2004)
Google Scholar
Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Holmes, G., Donkin, A., Witten, I.H.: WEKA: a machine learning workbench. In: Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Systems, Curtin University, Perth, Australia
Dang Bach Bui & Vidyasagar Potdar
Department of Computing, Curtin University, Perth, Australia
Dang Bach Bui & Fedja Hadzic

Authors

Dang Bach Bui
View author publications
You can also search for this author in PubMed Google Scholar
Fedja Hadzic
View author publications
You can also search for this author in PubMed Google Scholar
Vidyasagar Potdar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, The University of Manchester, M13 9PL, Manchester, UK
Hujun Yin
Department of Electrical Engineering, Federal University of Rio Grande do Norte, Lagoa Nova, 59072-970, Natal, RN, Brazil
José A. F. Costa
Department of Teleinformatics Engineering, Federal University of Ceará, Campus of Pici, CP 6005, 60455-760, Fortaleza, CE, Brazil
Guilherme Barreto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, D.B., Hadzic, F., Potdar, V. (2012). A Framework for Application of Tree-Structured Data Mining to Process Log Analysis. In: Yin, H., Costa, J.A.F., Barreto, G. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2012. IDEAL 2012. Lecture Notes in Computer Science, vol 7435. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32639-4_52

Download citation

DOI: https://doi.org/10.1007/978-3-642-32639-4_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32638-7
Online ISBN: 978-3-642-32639-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics