CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees

Chi, Yun; Yang, Yirong; Xia, Yi; Muntz, Richard R.

doi:10.1007/978-3-540-24775-3_9

Yun Chi¹⁹,
Yirong Yang¹⁹,
Yi Xia¹⁹ &
…
Richard R. Muntz¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3056))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2990 Accesses
37 Citations

Abstract

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. However, because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of the subtrees. In this paper, we present CMTreeMiner, a computationally efficient algorithm that discovers all closed and maximal frequent subtrees in a database of rooted unordered trees. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees, while using an enumeration DAG to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. The enumeration tree and the enumeration DAG are defined based on a canonical form for rooted unordered trees–the depth-first canonical form (DFCF). We compare the performance of our algorithm with that of PathJoin, a recently published algorithm that mines maximal frequent subtrees.

This work was supported by NSF under Grant Nos. 0086116, 0085773, and 9817773.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A.V., Hopcroft, J.E., Ullman, J.E.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)
MATH Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proc. of the 2nd SIAM Int. Conf. on Data Mining (2002)
Google Scholar
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Proc. of the 6th Intl. Conf. on Discovery Science (2003)
Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: Proc. of the 2003 IEEE Int. Conf. on Data Mining (2003)
Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Mining frequent rooted trees and free trees using canonical forms. Technical Report CSD-TR No. 030043, UCLA (2003), ftp://ftp.cs.ucla.edu/techreport/2003-reports/030043.pdf
Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: CMTreeMiner: Mining both closed and maximal frequent subtrees. Technical Report CSD-TR No. 030053, UCLA (2003), ftp://ftp.cs.ucla.edu/tech-report/2003-reports/030053.pdf
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Chapter Google Scholar
Xiao, Y., Yao, J.-F., Li, Z., Dunham, M.: Efficient data mining for maximal frequent subtrees. In: Proc. of the 2003 IEEE Int. Conf. on Data Mining (2003)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Google Scholar
Zaki, M.J., Aggarwal, C.C.: XRules: An effective structural classifier for XML data. In: Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Los Angeles, CA, 90095, USA
Yun Chi, Yirong Yang, Yi Xia & Richard R. Muntz

Authors

Yun Chi
View author publications
You can also search for this author in PubMed Google Scholar
Yirong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Xia
View author publications
You can also search for this author in PubMed Google Scholar
Richard R. Muntz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering and Information Technology, Deakin University, VIC 3125, Australia
Honghua Dai
University of Illinois at Urbana-Champaign, 61801, Urbana, IL, USA
Ramakrishnan Srikant
Faculty of Engineering and Information Technology, Centre for Quantum Computation and Intelligent Systems, and Australian ACS National Committee for Artificial Intelligence, University of Technology, Sydney, Australia
Chengqi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chi, Y., Yang, Y., Xia, Y., Muntz, R.R. (2004). CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-24775-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics