Reference Hub8
Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees

Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees

Yongqiao Xiao, J. F. Yao
Copyright: © 2005 |Volume: 1 |Issue: 2 |Pages: 23
ISSN: 1548-3924|EISSN: 1548-3932|ISSN: 1548-3924|EISBN13: 9781615202164|EISSN: 1548-3924|DOI: 10.4018/jdwm.2005040104
Cite Article Cite Article

MLA

Xiao, Yongqiao, and J. F. Yao. "Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees." IJDWM vol.1, no.2 2005: pp.70-92. http://doi.org/10.4018/jdwm.2005040104

APA

Xiao, Y. & Yao, J. F. (2005). Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees. International Journal of Data Warehousing and Mining (IJDWM), 1(2), 70-92. http://doi.org/10.4018/jdwm.2005040104

Chicago

Xiao, Yongqiao, and J. F. Yao. "Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees," International Journal of Data Warehousing and Mining (IJDWM) 1, no.2: 70-92. http://doi.org/10.4018/jdwm.2005040104

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like bioinformatic data. One major challenge arises due to the fact that embedded subtrees are no longer ordinary subtrees, but preserve only part of the ancestor-descendant relationships in the original trees. To solve the embedded subtree mining problem, in this article we propose a novel algorithm, called TreeGrow, which is optimized in two important respects. First, it obtains frequency counts of root-to-leaf paths through efficient compression of trees, thereby being able to quickly grow an embedded subtree pattern path by path instead of node by node. Second, candidate subtree generation is highly localized so as to avoid unnecessary computational overhead. Experimental results on benchmark synthetic data sets have shown that our algorithm can outperform unoptimized methods by up to 20 times.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.