Abstract
Non-linear data structures are becoming more and more common in data mining problems. Trees, in particular, are amenable to efficient mining techniques. In this paper, we introduce a scalable and parallelizable algorithm to mine partially-ordered trees. Our algorithm, POTMiner, is able to identify both induced and embedded subtrees in such trees. As special cases, it can also handle both completely ordered and completely unordered trees.
Similar content being viewed by others
References
Abe K et al. (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM international conference on data mining
Agarwal RC et al (2001) A tree projection algorithm for generation of frequent item sets. J Parallel Distrib Comput 61(3): 350–371
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8: 962–969
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th international conference on very large data bases, 12–15 September, pp 487–499
Asai T et al (2003) Discovering frequent substructures in large unordered trees. In: Discovery science. Lecture Notes in Artificial Intelligence, vol 2843. Springer, Berlin, pp 47–61
Berzal F et al (2007) Hierarchical program representation for program element matching. In: IDEAL’07. Lecture Notes in Computer Science, vol 4881, pp 467–476
Bringmann B (2006) To see the wood for the trees: mining frequent tree patterns. In: Constraint-based mining and inductive databases, European workshop on inductive databases and constraint based mining. 11–13 March 2004, Hinterzarten, Germany. Revised Selected Papers. Lecture Notes in Computer Science, vol 3848. Springer, Berlin, pp 38–63
Cheung DW-L et al (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6): 911–922
Chi Y et al (2005a) Frequent subtree mining—an overview. Fundam Inform 66(1–2): 161–198
Chi Y et al (2005b) Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans Knowl Data Eng 17(2): 190–202
Chi Y et al (2004) HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: The 16th international conference on scientific and statistical database management, pp 11–20
Chi Y et al (2005c) Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl Inform Syst 8(2): 203–234
Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1): 1–16
Gall H et al (2007) 4th international workshop on mining software repositories (MSR 2007). In: ICSE COMPANION ’07, pp 107–108
Hadzic F et al (2007) UNI3—efficient algorithm for mining unordered induced subtrees using TMG candidate generation. In: Computational intelligence and data mining, pp 568–575
Han J et al (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1–12
Hido S, Kawano H (2005) AMIOT: induced ordered tree mining in tree-structured databases. In: Proceedings of the 5th IEEE international conference on data mining, pp 170–177
Nayak R et al (2006) Knowledge discovery from XML documents. Lecture Notes in Computer Science, vol 3915. Springer, Berlin
Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: First international workshop on mining graphs, trees and sequences (MGTS2003), in conjunction with ECML/PKDD’03, pp 55–64
Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652
Parthasarathy S et al (2001) Parallel data mining for association rules on shared-memory systems. Knowl Inform Syst 3(1): 1–29
Rückert U, Kramer S (2004) Frequent free tree discovery in graph data. In: Proceedings of the 2004 ACM symposium on applied computing, pp 564–570
Schuster A et al (2005) A high-performance distributed algorithm for mining association rules. Knowl Inform Syst 7(4): 458–475
Shen L et al (1999) New algorithms for efficient mining of association rules. Inform Sci 118(1–4): 251–268
Tan H et al (2005a) X3-Miner: mining patterns from an XML database. In: The 6th international conference on data mining, text mining and their business applications. May 2005, Skiathos, Greece, pp 287–296
Tan H et al (2005b) MB3-Miner: mining eMBedded subTREEs using tree model guided candidate generation. In: Proceedings of the first international workshop on mining complex data, pp 103–110
Tan H et al (2006) IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining, pp 450–461
Tatikonda S et al (2006) TRIPS and TIDES: new algorithms for tree mining. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 455–464
Termier A et al (2002) TreeFinder: a first step towards XML data mining. In: Proceedings of the 2nd IEEE international conference on data mining, pp 450–457
Termier A et al (2004) DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: Proceedings of the 4th IEEE international conference on data mining, pp 543–546
Wang C et al (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Lecture Notes in Computer Science, vol 3056. Springer, Berlin, pp 441–451
Xiao Y et al (2003) Efficient data mining for maximal frequent subtrees. In: Proceedings of the 3rd IEEE international conference on data mining, pp 379–386
Yin X et al (2004) CrossMine: efficient classification across multiple database relations. In: International conference on data engineering, pp 399–410
Yin X et al (2005) Cross-relational clustering with user’s guidance. In: Knowledge discovery and data mining, pp 344–353
Zaki MJ (2005a) Efficiently mining frequent embedded unordered trees. Fundam Inform 66(1–2): 33–52
Zaki MJ (2005b) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035
Zhang S, Wang JTL (2008) Discovering frequent agreement subtrees from phylogenetic data. IEEE Trans Knowl Data Eng 20(1): 68–82
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiménez, A., Berzal, F. & Cubero, JC. POTMiner: mining ordered, unordered, and partially-ordered trees. Knowl Inf Syst 23, 199–224 (2010). https://doi.org/10.1007/s10115-009-0213-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0213-3