Skip to main content
Log in

Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Tree-structured data are popular in many domains making structural classification an important task. In this paper, an associative classification method is introduced based on a structure preserving flat representation of trees. A major difference to traditional tree mining techniques is that subtrees are constrained by the position in the original trees, leading to a drastic reduction in the number of rules generated, especially with data having great structural variation among tree instances. This characteristic would be desired in the current status of frequent pattern mining, where excessive patterns hinder the practical use of results. However the question remains whether this reduction comes at a high cost in accuracy and coverage rate reduction. We explore this aspect and compare the approach with a state-of-the-art structural classifier based on same subtree type, but not positional constrained in any way. We investigate the effect of using different types of frequent pattern (frequent or closed), or subtree types (induced, embedded or embedded-plus-disconnected subtrees) to the performance of the two classifiers. Different rule strength measures such as confidence, weighted confidence and likelihood are also examined in our study. The experiments on three real-world data sets reveal important similarities and differences between the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In International conference on very large data bases (VLDB’94) (pp. 487–499).

  • Arimura, H., & Uno, T. (2005). An output-polynomial time algorithm for mining frequent closed attribute trees. In International conference on inductive logic programming (ILP’05) (pp. 1–19).

  • Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SIAM international conference on data mining (SIAM’02) (pp. 158–174).

  • Bose, R.P.J.C., & van der Aalst, W.M.P. (2011). Analysis of patient treatment procedures. In Business process management workshops (BPM’12) (pp. 165–166).

  • Bringmann, B., & Zimmermann, A. (2005). Tree2: decision trees for tree structured data. In European conference on principles and practice of knowledge discovery in databases (PKDD’05) (pp. 46–58).

  • Bui, D.B., Hadzic, F., Potdar, V. (2012). A framework for application of tree-Structured data mining to process log analysis. In Intelligent data engineering and automated learning (IDEAL’12) (pp. 423–434).

  • Chehreghani, M.H., Lucas, C., Rahgozar, M., Ghadimi, E. (2009). Efficient rule based structural algorithms for classification of tree structured data. Intelligent Data Analysis Journal, 13(1), 165–188.

    Google Scholar 

  • Cheng, H., Yan, X., Han, J., Yu, P.S. (2008). Direct discriminative pattern mining for effective classification. In International conference on data engineering (ICDE’08) (pp. 169–178).

  • Chi, Y., Xia, Y., Yang, Y., Muntz, R. (2005a). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), 190–202.

    Article  Google Scholar 

  • Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N. (2005b). Freequent subtree mining—an overview. Fundamenta Informaticae, 66(1–2), 161–198.

    MATH  MathSciNet  Google Scholar 

  • Costa, G., Ortale, R., Ritacco, E. (2013). X-class: associative classification of Xml documents by structure. ACM Transactions on Information Systems, 31(1), 3:1–3:40.

    Article  Google Scholar 

  • De Knijf, J. (2006). FAT-CAT: frequent attributes tree based classification. In International conference on the initiative for the evaluation of XML retrieval (INEX’06) (pp. 485–496).

  • Dong, G., Zhang, X., Wong, L., Li, J. (1999). CAEP: classification by aggregating emerging patterns. In International conference on discovery science (DS’99) (pp. 30–42).

  • Garboni, C., Masseglia, F., Trousse, B. (2006). Sequential pattern mining for structure-Based XML document classification. In International conference on initiative for the evaluation of XML retrieval (INEX’06) (pp. 458–468).

  • Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., Takabayashi, K. (2005). Constructing a decision tree for graph-structured data and its applications. Fundamenta Informaticae, 66(1–2), 131–160.

    MATH  MathSciNet  Google Scholar 

  • Geng, L., & Hamilton, H.J. (2006). Interestingness measures for data mining: a survey. ACM Computing Surveys, 38(3).

  • Hadzic, F. (2012). A structure preserving flat data format representation for tree-structured data. In PAKDD workshop on quality issues, measure of interestingness, and evaluation of data mining models (pp. 221–233).

  • Hadzic, F., & Bui, D.B. (2013). CRM data set and experimental results from CSLOG data sets. http://cse.hcmut.edu.vn/thuanle/dangbui. Accessed 29 Nov 2013.

  • Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM SIGMOD International Conference on Management of Data (SIGMOD’00) (pp. 1–12).

  • Han, J., Cheng, H., Xin, D., Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.

    Article  MathSciNet  Google Scholar 

  • Helmer, S., Augsten, N., Bohlen, M. (2012). Measuring structural similarity of semistructured data based on information-theoretic approaches. VLDB Journal, 21(5), 677–702.

    Article  Google Scholar 

  • Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T. (2010). NDPMine: efficiently mining discriminative numerical features for pattern-based classification. In European conference on machine learning and knowledge discovery in databases (ECML PKDD’10) (pp. 35–50).

  • Le Bras, Y., Lenca, P., Lallich, S. (2011). Mining classification rules without support: an anti-monotone property of Jaccard measure. In International conference on discovery science (DS’11) (pp. 179–193).

  • Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.

    Article  MATH  Google Scholar 

  • Li, W., Han, J., Pei, J. (2001). CMAR: accurate and efficient classification based on multiple class-association rules. In International conference on data mining (ICDM’01) (pp. 369–376).

  • Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In International conference on knowledge discovery and data mining (KDD’98) (pp. 80–86).

  • Quinlan, J. R., & Cameron-Jones, R. M. (1993). FOIL: a midterm report. In European conference on machine learning (ECML’93) (pp. 3–20).

  • Shasha, D., Wang, J.T.L., Shan, H., Zhang, K. (2002). ATreeGrep: approximate searching in unordered trees. In International conference on scientific and statistical database management (SSDBM’02) (pp. 89–98).

  • Termier, A., Rousset, M.C., Sebag, M. (2002). TreeFinder: a first step towards XML data mining. In IEEE international conference on data mining (ICDM’02) (pp. 450–457).

  • Thabtah, F. (2007). A review of associative classification mining. Knowledge Engineering Review, 22(1), 37–65.

    Article  Google Scholar 

  • Thabtah, F., Cowling, P., Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In International conference on data mining (ICDM’04) (pp. 217–224).

  • Veloso, A., Meira, W., Zaki, M.J. (2006). Lazy associative classification. In International conference on data mining (ICDM’06) (pp. 645–654).

  • Wang, J., & Karypis, G. (2006). On mining instance-centric classification rules. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1497–1511.

    Article  Google Scholar 

  • Wang, K., Zhou, S., He, Y. (2000). Growing decision tree on support-less association rules. In ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’00) (pp. 265–269).

  • Wang, K., He, Y., Cheung, D.W. (2001). Mining confident rules without support requirement. In International conference on information and knowledge management (CIKM’01) (pp. 89–96).

  • Yin, X., & Han, J. (2003). CPAR: classification based on predictive association rule. In SIAM international conference on data mining (SIAM’03) (pp. 369–376).

  • Zaki, M.J. (2005). Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1021–1035.

    Article  Google Scholar 

  • Zaki, M.J., & Aggarwal, C.C. (2006). XRules: an effective algorithm for structural classification of XML data. Machine Learning Journal, 62(1–2), 137–170.

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Professor Mohammed J. Zaki for making available the XRules classifier and the CSLOG data set. We would also like to acknowledge the constructive comments and advice from Professor George Karypis, which has improved the

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dang Bach Bui.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bui, D.B., Hadzic, F., Tagarelli, A. et al. Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining. J Intell Inf Syst 45, 397–421 (2015). https://doi.org/10.1007/s10844-014-0312-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-014-0312-9

Keywords

Navigation