Abstract
Many knowledge representation mechanisms consist of link-based structures; they may be studied formally by means of unordered trees. Here we consider the case where labels on the nodes are nonexistent or unreliable, and propose data mining processes focusing on just the link structure. We propose a representation of ordered trees, describe a combinatorial characterization and some properties, and use them to propose an efficient algorithm for mining frequent closed subtrees from a set of input trees. Then we focus on unordered trees, and show that intrinsic characterizations of our representation provide for a way of avoiding the repeated exploration of unordered trees, and then we give an efficient algorithm for mining frequent closed unordered trees.
Partially supported by the 6th Framework Program of EU through the integrated project DELIS (#001907), by the EU PASCAL Network of Excellence, IST-2002-506778, by the MEC TIN2005-08832-C03-03 (MOISES-BAR), MCYT TIN2004-07925-C03-02 (TRANGRAM), and CICYT TIN2004-04343 (iDEAS) projects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arimura, H., Uno, T.: An output-polyunomial time algorithm for mining frequent closed attribute trees. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 1–19. Springer, Heidelberg (2005)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)
Asai, T., Arimura, H., Uno, T., Nakano, S.-I.: Discovering frequent substructures in large unordered trees. Discovery Science, 47–61 (2003)
Baixeries, J., Balcázar, J.L.: Discrete deterministic data mining as knowledge compilation. In: Workshop on Discrete Math. and Data Mining at SIAM DM Conference (2003)
Balcázar, J.L., Bifet, A., Lozano, A.: Intersection algorithms and a closure operator on unordered trees. In: MLG 2006, 4th International Workshop on Mining and Learning with Graphs (2006)
Balcázar, J.L., Bifet, A., Lozano, A.: Mining frequent closed rooted trees (submitted, 2007)
Balcázar, J.L., Garriga, G.C.: On Horn axiomatizations for sequential data. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 215–229. Springer, Heidelberg (2004)
Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In: Chi, Y., Yang, Y., Muntz, R.R. (eds.) SSDBM 2004. Proceedings of the 16th International Conference on Scientific and Statistical Database Management, Washington, DC, USA, 2004, p. 11. IEEE Computer Society Press, Los Alamitos (2004)
Chi, Y., Muntz, R., Nijssen, S., Kok, J.: Frequent subtree mining – an overview. Fundamenta Informaticae XXI, 1001–1038 (2001)
Chi, Y., Xia, Y., Yang, Y., Muntz, R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. Fundamenta Informaticae XXI, 1001–1038 (2001)
Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: ICDM 2003. Proceedings of the Third IEEE International Conference on Data Mining, Washington, DC, USA, 2003, p. 509. IEEE Computer Society, Los Alamitos (2003)
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999)
Garriga, G.C.: Formal methods for mining structured objects. PhD Thesis (2006)
Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations 2(2), 86–98 (2000)
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequences, pp. 55–64 (2003)
Termier, A., Rousset, M.-C., Sebag, M.: DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 543–546. Springer, Heidelberg (2004)
Valiente, G.: Algorithms on Trees and Graphs. Springer, Heidelberg (2002)
Xiao, Y., Yao, J.-F., Li, Z., Dunham, M.H.: Efficient data mining for maximal frequent subtrees. In: ICDM 2003. Proceedings of the Third IEEE International Conference on Data Mining, Washington, DC, USA, 2003, p. 379. IEEE Computer Society, Los Alamitos (2003)
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: ICDM 2002. Proceedings of the 2002 IEEE International Conference on Data Mining, Washington, DC, USA, 2002, p. 721. IEEE Computer Society, Los Alamitos (2002)
Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: KDD 2003. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 286–295. ACM Press, New York, NY, USA (2003)
Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large databases. In: SDM (2003)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1-2), 33–52 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balcázar, J.L., Bifet, A., Lozano, A. (2007). Mining Frequent Closed Unordered Trees Through Natural Representations . In: Priss, U., Polovina, S., Hill, R. (eds) Conceptual Structures: Knowledge Architectures for Smart Applications. ICCS 2007. Lecture Notes in Computer Science(), vol 4604. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73681-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-73681-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73680-6
Online ISBN: 978-3-540-73681-3
eBook Packages: Computer ScienceComputer Science (R0)