Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining

Bui, Dang Bach; Hadzic, Fedja; Tagarelli, Andrea; Hecker, Michael

doi:10.1007/s10844-014-0312-9

Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining

Published: 09 April 2014

Volume 45, pages 397–421, (2015)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Dang Bach Bui¹,
Fedja Hadzic¹,
Andrea Tagarelli² &
…
Michael Hecker¹

267 Accesses
Explore all metrics

Abstract

Tree-structured data are popular in many domains making structural classification an important task. In this paper, an associative classification method is introduced based on a structure preserving flat representation of trees. A major difference to traditional tree mining techniques is that subtrees are constrained by the position in the original trees, leading to a drastic reduction in the number of rules generated, especially with data having great structural variation among tree instances. This characteristic would be desired in the current status of frequent pattern mining, where excessive patterns hinder the practical use of results. However the question remains whether this reduction comes at a high cost in accuracy and coverage rate reduction. We explore this aspect and compare the approach with a state-of-the-art structural classifier based on same subtree type, but not positional constrained in any way. We investigate the effect of using different types of frequent pattern (frequent or closed), or subtree types (induced, embedded or embedded-plus-disconnected subtrees) to the performance of the two classifiers. Different rule strength measures such as confidence, weighted confidence and likelihood are also examined in our study. The experiments on three real-world data sets reveal important similarities and differences between the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Position-Constrained Association-Rule-Based Classification for Tree-Structured Data

Associative Classification Based on the Table Constraint Satisfaction

DAC: Discriminative Associative Classification

Article Open access 17 May 2023

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In International conference on very large data bases (VLDB’94) (pp. 487–499).
Arimura, H., & Uno, T. (2005). An output-polynomial time algorithm for mining frequent closed attribute trees. In International conference on inductive logic programming (ILP’05) (pp. 1–19).
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SIAM international conference on data mining (SIAM’02) (pp. 158–174).
Bose, R.P.J.C., & van der Aalst, W.M.P. (2011). Analysis of patient treatment procedures. In Business process management workshops (BPM’12) (pp. 165–166).
Bringmann, B., & Zimmermann, A. (2005). Tree²: decision trees for tree structured data. In European conference on principles and practice of knowledge discovery in databases (PKDD’05) (pp. 46–58).
Bui, D.B., Hadzic, F., Potdar, V. (2012). A framework for application of tree-Structured data mining to process log analysis. In Intelligent data engineering and automated learning (IDEAL’12) (pp. 423–434).
Chehreghani, M.H., Lucas, C., Rahgozar, M., Ghadimi, E. (2009). Efficient rule based structural algorithms for classification of tree structured data. Intelligent Data Analysis Journal, 13(1), 165–188.
Google Scholar
Cheng, H., Yan, X., Han, J., Yu, P.S. (2008). Direct discriminative pattern mining for effective classification. In International conference on data engineering (ICDE’08) (pp. 169–178).
Chi, Y., Xia, Y., Yang, Y., Muntz, R. (2005a). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), 190–202.
Article Google Scholar
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N. (2005b). Freequent subtree mining—an overview. Fundamenta Informaticae, 66(1–2), 161–198.
MATH MathSciNet Google Scholar
Costa, G., Ortale, R., Ritacco, E. (2013). X-class: associative classification of Xml documents by structure. ACM Transactions on Information Systems, 31(1), 3:1–3:40.
Article Google Scholar
De Knijf, J. (2006). FAT-CAT: frequent attributes tree based classification. In International conference on the initiative for the evaluation of XML retrieval (INEX’06) (pp. 485–496).
Dong, G., Zhang, X., Wong, L., Li, J. (1999). CAEP: classification by aggregating emerging patterns. In International conference on discovery science (DS’99) (pp. 30–42).
Garboni, C., Masseglia, F., Trousse, B. (2006). Sequential pattern mining for structure-Based XML document classification. In International conference on initiative for the evaluation of XML retrieval (INEX’06) (pp. 458–468).
Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., Takabayashi, K. (2005). Constructing a decision tree for graph-structured data and its applications. Fundamenta Informaticae, 66(1–2), 131–160.
MATH MathSciNet Google Scholar
Geng, L., & Hamilton, H.J. (2006). Interestingness measures for data mining: a survey. ACM Computing Surveys, 38(3).
Hadzic, F. (2012). A structure preserving flat data format representation for tree-structured data. In PAKDD workshop on quality issues, measure of interestingness, and evaluation of data mining models (pp. 221–233).
Hadzic, F., & Bui, D.B. (2013). CRM data set and experimental results from CSLOG data sets. http://cse.hcmut.edu.vn/thuanle/dangbui. Accessed 29 Nov 2013.
Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM SIGMOD International Conference on Management of Data (SIGMOD’00) (pp. 1–12).
Han, J., Cheng, H., Xin, D., Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.
Article MathSciNet Google Scholar
Helmer, S., Augsten, N., Bohlen, M. (2012). Measuring structural similarity of semistructured data based on information-theoretic approaches. VLDB Journal, 21(5), 677–702.
Article Google Scholar
Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T. (2010). NDPMine: efficiently mining discriminative numerical features for pattern-based classification. In European conference on machine learning and knowledge discovery in databases (ECML PKDD’10) (pp. 35–50).
Le Bras, Y., Lenca, P., Lallich, S. (2011). Mining classification rules without support: an anti-monotone property of Jaccard measure. In International conference on discovery science (DS’11) (pp. 179–193).
Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.
Article MATH Google Scholar
Li, W., Han, J., Pei, J. (2001). CMAR: accurate and efficient classification based on multiple class-association rules. In International conference on data mining (ICDM’01) (pp. 369–376).
Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In International conference on knowledge discovery and data mining (KDD’98) (pp. 80–86).
Quinlan, J. R., & Cameron-Jones, R. M. (1993). FOIL: a midterm report. In European conference on machine learning (ECML’93) (pp. 3–20).
Shasha, D., Wang, J.T.L., Shan, H., Zhang, K. (2002). ATreeGrep: approximate searching in unordered trees. In International conference on scientific and statistical database management (SSDBM’02) (pp. 89–98).
Termier, A., Rousset, M.C., Sebag, M. (2002). TreeFinder: a first step towards XML data mining. In IEEE international conference on data mining (ICDM’02) (pp. 450–457).
Thabtah, F. (2007). A review of associative classification mining. Knowledge Engineering Review, 22(1), 37–65.
Article Google Scholar
Thabtah, F., Cowling, P., Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In International conference on data mining (ICDM’04) (pp. 217–224).
Veloso, A., Meira, W., Zaki, M.J. (2006). Lazy associative classification. In International conference on data mining (ICDM’06) (pp. 645–654).
Wang, J., & Karypis, G. (2006). On mining instance-centric classification rules. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1497–1511.
Article Google Scholar
Wang, K., Zhou, S., He, Y. (2000). Growing decision tree on support-less association rules. In ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’00) (pp. 265–269).
Wang, K., He, Y., Cheung, D.W. (2001). Mining confident rules without support requirement. In International conference on information and knowledge management (CIKM’01) (pp. 89–96).
Yin, X., & Han, J. (2003). CPAR: classification based on predictive association rule. In SIAM international conference on data mining (SIAM’03) (pp. 369–376).
Zaki, M.J. (2005). Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1021–1035.
Article Google Scholar
Zaki, M.J., & Aggarwal, C.C. (2006). XRules: an effective algorithm for structural classification of XML data. Machine Learning Journal, 62(1–2), 137–170.
Article Google Scholar

Download references

Acknowledgments

We would like to thank Professor Mohammed J. Zaki for making available the XRules classifier and the CSLOG data set. We would also like to acknowledge the constructive comments and advice from Professor George Karypis, which has improved the

Author information

Authors and Affiliations

Department of Computing, Curtin University, Perth, Australia
Dang Bach Bui, Fedja Hadzic & Michael Hecker
Department of Computer Engineering, Modeling, Electronics, and Systems Science, University of Calabria, Cosenza, Italy
Andrea Tagarelli

Authors

Dang Bach Bui
View author publications
You can also search for this author in PubMed Google Scholar
Fedja Hadzic
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Tagarelli
View author publications
You can also search for this author in PubMed Google Scholar
Michael Hecker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dang Bach Bui.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bui, D.B., Hadzic, F., Tagarelli, A. et al. Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining. J Intell Inf Syst 45, 397–421 (2015). https://doi.org/10.1007/s10844-014-0312-9

Download citation

Received: 02 September 2013
Revised: 02 December 2013
Accepted: 13 February 2014
Published: 09 April 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10844-014-0312-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining

Abstract

Access this article

Similar content being viewed by others

Evaluation of Position-Constrained Association-Rule-Based Classification for Tree-Structured Data

Associative Classification Based on the Table Constraint Satisfaction

DAC: Discriminative Associative Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining

Abstract

Access this article

Similar content being viewed by others

Evaluation of Position-Constrained Association-Rule-Based Classification for Tree-Structured Data

Associative Classification Based on the Table Constraint Satisfaction

DAC: Discriminative Associative Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation