Skip to main content
Log in

POTMiner: mining ordered, unordered, and partially-ordered trees

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Non-linear data structures are becoming more and more common in data mining problems. Trees, in particular, are amenable to efficient mining techniques. In this paper, we introduce a scalable and parallelizable algorithm to mine partially-ordered trees. Our algorithm, POTMiner, is able to identify both induced and embedded subtrees in such trees. As special cases, it can also handle both completely ordered and completely unordered trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abe K et al. (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM international conference on data mining

  2. Agarwal RC et al (2001) A tree projection algorithm for generation of frequent item sets. J Parallel Distrib Comput 61(3): 350–371

    Article  MATH  Google Scholar 

  3. Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8: 962–969

    Article  Google Scholar 

  4. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th international conference on very large data bases, 12–15 September, pp 487–499

  5. Asai T et al (2003) Discovering frequent substructures in large unordered trees. In: Discovery science. Lecture Notes in Artificial Intelligence, vol 2843. Springer, Berlin, pp 47–61

  6. Berzal F et al (2007) Hierarchical program representation for program element matching. In: IDEAL’07. Lecture Notes in Computer Science, vol 4881, pp 467–476

  7. Bringmann B (2006) To see the wood for the trees: mining frequent tree patterns. In: Constraint-based mining and inductive databases, European workshop on inductive databases and constraint based mining. 11–13 March 2004, Hinterzarten, Germany. Revised Selected Papers. Lecture Notes in Computer Science, vol 3848. Springer, Berlin, pp 38–63

  8. Cheung DW-L et al (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6): 911–922

    Article  MathSciNet  Google Scholar 

  9. Chi Y et al (2005a) Frequent subtree mining—an overview. Fundam Inform 66(1–2): 161–198

    MATH  MathSciNet  Google Scholar 

  10. Chi Y et al (2005b) Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans Knowl Data Eng 17(2): 190–202

    Article  Google Scholar 

  11. Chi Y et al (2004) HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: The 16th international conference on scientific and statistical database management, pp 11–20

  12. Chi Y et al (2005c) Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl Inform Syst 8(2): 203–234

    Article  Google Scholar 

  13. Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1): 1–16

    Article  Google Scholar 

  14. Gall H et al (2007) 4th international workshop on mining software repositories (MSR 2007). In: ICSE COMPANION ’07, pp 107–108

  15. Hadzic F et al (2007) UNI3—efficient algorithm for mining unordered induced subtrees using TMG candidate generation. In: Computational intelligence and data mining, pp 568–575

  16. Han J et al (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1–12

  17. Hido S, Kawano H (2005) AMIOT: induced ordered tree mining in tree-structured databases. In: Proceedings of the 5th IEEE international conference on data mining, pp 170–177

  18. Nayak R et al (2006) Knowledge discovery from XML documents. Lecture Notes in Computer Science, vol 3915. Springer, Berlin

  19. Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: First international workshop on mining graphs, trees and sequences (MGTS2003), in conjunction with ECML/PKDD’03, pp 55–64

  20. Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652

  21. Parthasarathy S et al (2001) Parallel data mining for association rules on shared-memory systems. Knowl Inform Syst 3(1): 1–29

    Article  MATH  Google Scholar 

  22. Rückert U, Kramer S (2004) Frequent free tree discovery in graph data. In: Proceedings of the 2004 ACM symposium on applied computing, pp 564–570

  23. Schuster A et al (2005) A high-performance distributed algorithm for mining association rules. Knowl Inform Syst 7(4): 458–475

    Article  Google Scholar 

  24. Shen L et al (1999) New algorithms for efficient mining of association rules. Inform Sci 118(1–4): 251–268

    Article  Google Scholar 

  25. Tan H et al (2005a) X3-Miner: mining patterns from an XML database. In: The 6th international conference on data mining, text mining and their business applications. May 2005, Skiathos, Greece, pp 287–296

  26. Tan H et al (2005b) MB3-Miner: mining eMBedded subTREEs using tree model guided candidate generation. In: Proceedings of the first international workshop on mining complex data, pp 103–110

  27. Tan H et al (2006) IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining, pp 450–461

  28. Tatikonda S et al (2006) TRIPS and TIDES: new algorithms for tree mining. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 455–464

  29. Termier A et al (2002) TreeFinder: a first step towards XML data mining. In: Proceedings of the 2nd IEEE international conference on data mining, pp 450–457

  30. Termier A et al (2004) DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: Proceedings of the 4th IEEE international conference on data mining, pp 543–546

  31. Wang C et al (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Lecture Notes in Computer Science, vol 3056. Springer, Berlin, pp 441–451

  32. Xiao Y et al (2003) Efficient data mining for maximal frequent subtrees. In: Proceedings of the 3rd IEEE international conference on data mining, pp 379–386

  33. Yin X et al (2004) CrossMine: efficient classification across multiple database relations. In: International conference on data engineering, pp 399–410

  34. Yin X et al (2005) Cross-relational clustering with user’s guidance. In: Knowledge discovery and data mining, pp 344–353

  35. Zaki MJ (2005a) Efficiently mining frequent embedded unordered trees. Fundam Inform 66(1–2): 33–52

    MATH  MathSciNet  Google Scholar 

  36. Zaki MJ (2005b) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035

    Article  Google Scholar 

  37. Zhang S, Wang JTL (2008) Discovering frequent agreement subtrees from phylogenetic data. IEEE Trans Knowl Data Eng 20(1): 68–82

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aída Jiménez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiménez, A., Berzal, F. & Cubero, JC. POTMiner: mining ordered, unordered, and partially-ordered trees. Knowl Inf Syst 23, 199–224 (2010). https://doi.org/10.1007/s10115-009-0213-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0213-3

Keywords

Navigation