skip to main content
10.1145/1183614.1183680acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

TRIPS and TIDES: new algorithms for tree mining

Published: 06 November 2006 Publication History

Abstract

Recent research in data mining has progressed from mining frequent itemsets to more general and structured patterns like trees and graphs. In this paper, we address the problem of frequent subtree mining that has proven to be viable in a wide range of applications such as bioinformatics, XML processing, computational linguistics, and web usage mining. We propose novel algorithms to mine frequent subtrees from a database of rooted trees. We evaluate the use of two popular sequential encodings of trees to systematically generate and evaluate the candidate patterns. The proposed approach is very generic and can be used to mine embedded or induced subtrees that can be labeled, unlabeled, ordered, unordered, or edge-labeled. Our algorithms are highly cache-conscious in nature because of the compact and simple array-based data structures we use. Typically, L1 and L2 hit rates above 99% are observed. Experimental evaluation showed that our algorithms can achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.

References

[1]
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. Proceedings the 2nd SIAM International Conference on Data Mining (SDM2002), pages 158--174, 2002.]]
[2]
S. Basagni, I. Chlamtac, et al. Location aware, dependable multicast for mobile ad hoc networks. Computer Networks, 36(5):659--670, 2001.]]
[3]
Y. Chi, S. Nijssen, R. Muntz, and J. Kok. Frequent Subtree Mining-An Overview. Fundamenta Informaticae, 2005.]]
[4]
Y. Chi, Y. Yang, Y. Xia, and R. R. Muntz. CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. The Eighth Pacic Asia Conference on Knowledge Discovery and Data Mining (PAKDD04), 2004.]]
[5]
R. Cooley, B. Mobasher, and J. Srivastava. Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings of the 9th IEEE International Conference on Tools with Articial Intelligence (ICTAI97), 1(2.1), 1997.]]
[6]
J. H. R. Cui, J. R. Kim, D. R. Maggiorini, K. R. Boussetta, and M. R. Gerla. Aggregated Multicast - A Comparative Study. Cluster Computing, 8(1):15--26, 2005.]]
[7]
A. Ghoting, G. Buehrer, and S. Parthasarathy et al. Cache conscious frequent pattern mining on a modern processor. In Proceedings of the 31st international conference on very large databases (VLDB), 2005.]]
[8]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000.]]
[9]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. Proceedings IEEE International Conference on Data Mining, ICDM 2001., pages 313--320, 2001.]]
[10]
S. Nijssen and J. N. Kok. Efficient discovery of frequent unordered trees. First International Workshop on Mining Graphs, Trees and Sequences, pages 55--64, 2003.]]
[11]
S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of the 2004 ACMSIGKDD international conference on Knowledge discovery and data mining, pages 647--652, 2004.]]
[12]
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. Proceedings. 17th International Conference on Data Engineering, 2001.]]
[13]
H. Prüfer. Neuer Beweis eines Satzes über Permutationen. Archiv für Mathematik und Physik, 27:742--744, 1918.]]
[14]
P. Rao and B. Moon. PRIX: indexing and querying XML using prufer sequences. Data Engineering, 2004. Proceedings. 20th International Conference on, pages 288--299, 2004.]]
[15]
U. Ruckert and S. Kramer. Frequent free tree discovery in graph data. Proceedings of the 2004 ACM symposium on Applied computing, pages 564--570, 2004.]]
[16]
H. Tan, T. S. Dillon, L. Feng, E. Chang, and F. Hadzic. X3-Miner: Mining Patterns from XML Database. Proceedings Data Mining. Skiathos, Greece, 2005.]]
[17]
H. Tan, T. S. Dillon, F. Hadzic, E. Chang, and L. Feng. IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2006.]]
[18]
S. Tatikonda, S. Parthasarathy, and T. Kurc. Trips and tides: New algorithms for tree mining. Technical Report ftp://ftp.cse.ohio-state.edu/pub/tech-report/2006/TR68.pdf, (OSU-CISRC-7/06-TR68), 2006.]]
[19]
A. Termier, M. C. Rousset, and M. Sebag. DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases. Proceedings of Fourth IEEE International Conference on Data Mining, 2004.]]
[20]
A. Termier, M. C. Rousset, M. Sebag, K. Ohara, T. Washio, and H. Motoda. Efficient mining of high branching factor attribute trees. Proceedings of Fifth IEEE International Conference on Data Mining, 2005, pages 785--788, 2005.]]
[21]
C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD04), 2004.]]
[22]
C. Wang, S. Parthasarathy, and R. Jin. A Decomposition-Based Probabilistic Framework for Estimating the Selectivity of XML Twig Queries. International Conference on Extending Database Technology, 2006.]]
[23]
K. Wang and H. Liu. Discovering typical structures of documents: a road map approach. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 1998.]]
[24]
X. Yan and J. Han. gSpan: graph-based substructure pattern mining. Proceedings of IEEE International Conference on Data Mining (ICDM), pages 721--724, 2002.]]
[25]
M. J. Zaki. Efficiently mining frequent trees in a forest. Proceedings of the eighth ACM SIGKDD conference on Knowledge discovery and data mining, 2002.]]
[26]
M. J. Zaki and C. C. Aggarwal. XRules: an effective structural classier for XML data. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 316--325, 2003.]]
[27]
M. J. Zaki, S. Parthasarathy, M. Ogihara, W. Li, et al. New algorithms for fast discovery of association rules. 3rd Intl.Conf. on Knowledge Discovery and Data Mining, pages 283--296, 1997.]]
[28]
S. Zhang and J. T. L. Wang. Mining Frequent Agreement Subtrees in Phylogenetic Databases. Proceedings of the 6th SIAM International Conference on Data Mining (SDM2006), pages 222--233, 2006.]]

Cited By

View all
  • (2025)Mining transactional tree databases under homeomorphismThe Journal of Supercomputing10.1007/s11227-025-06997-281:4Online publication date: 22-Feb-2025
  • (2023)Mining Frequent Infix Patterns from Concurrency-Aware Process Execution VariantsProceedings of the VLDB Endowment10.14778/3603581.360360316:10(2666-2678)Online publication date: 1-Jun-2023
  • (2020)Reversible Circuit Synthesis Time Reduction Based on Subtree-Circuit MappingApplied Sciences10.3390/app1012414710:12(4147)Online publication date: 16-Jun-2020
  • Show More Cited By

Index Terms

  1. TRIPS and TIDES: new algorithms for tree mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
    November 2006
    916 pages
    ISBN:1595934332
    DOI:10.1145/1183614
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Prufer sequences
    2. depth first order codes
    3. embedding lists
    4. frequent patterns
    5. tree mining

    Qualifiers

    • Article

    Conference

    CIKM06
    CIKM06: Conference on Information and Knowledge Management
    November 6 - 11, 2006
    Virginia, Arlington, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Mining transactional tree databases under homeomorphismThe Journal of Supercomputing10.1007/s11227-025-06997-281:4Online publication date: 22-Feb-2025
    • (2023)Mining Frequent Infix Patterns from Concurrency-Aware Process Execution VariantsProceedings of the VLDB Endowment10.14778/3603581.360360316:10(2666-2678)Online publication date: 1-Jun-2023
    • (2020)Reversible Circuit Synthesis Time Reduction Based on Subtree-Circuit MappingApplied Sciences10.3390/app1012414710:12(4147)Online publication date: 16-Jun-2020
    • (2019)Short and Long-term Pattern Discovery Over Large-Scale Geo-Spatiotemporal DataProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330755(2905-2913)Online publication date: 25-Jul-2019
    • (2017)Frequent subtree mining on the automata processorProceedings of the International Conference on Supercomputing10.1145/3079079.3079084(1-11)Online publication date: 14-Jun-2017
    • (2017)Homomorphic Pattern Mining from a Single Large Data TreeData Science and Engineering10.1007/s41019-016-0028-71:4(203-218)Online publication date: 10-Jan-2017
    • (2017)Efficiently Discovering Most-Specific Mixed Patterns from Large Data TreesDatabase Systems for Advanced Applications10.1007/978-3-319-55753-3_18(279-294)Online publication date: 22-Mar-2017
    • (2016)Mining rooted ordered trees under subtree homeomorphismData Mining and Knowledge Discovery10.1007/s10618-015-0439-530:5(1249-1272)Online publication date: 1-Sep-2016
    • (2016)Transactional Tree MiningEuropean Conference on Machine Learning and Knowledge Discovery in Databases - Volume 985110.1007/978-3-319-46128-1_12(182-198)Online publication date: 19-Sep-2016
    • (2015)Ordered subtree mining via transactional mapping using a structure-preserving tree database schemaInformation Sciences: an International Journal10.1016/j.ins.2015.03.015310:C(97-117)Online publication date: 20-Jul-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media