Abstract
In this paper, we study a frequent substructure discovery problem in semi-structured data. We present an efficient algorithm Unotthat computes all frequent labeled unordered trees appearing in a large collection of data trees with frequency above a user-specified threshold. The keys of the algorithm are efficient enumeration of all unordered trees in canonical form and incremental computation of their occurrences. We then show that Unotdiscovers each frequent pattern T in O(kb 2 m) per pattern, where k is the size of T, b is the branching factor of the data trees, and m is the total number of occurrences of T in the data trees.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semi-structured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 1–14. Springer, Heidelberg (2002)
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: Proc. SIAM SDM 2002, pp. 158–174 (2002)
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. IEEE ICDM 2002, pp. 27–34 (2002)
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees, DOI Technical Report DOI-TR 216, Department of Informatics, Kyushu University (June 2003), http://www.i.kyushu-u.ac.jp/doitr/trcs216.pdf
Avis, D., Fukuda, K.: Reverse Search for Enumeration. Discrete Applied Mathematics 65(1–3), 21–46 (1996)
Holder, L.B., Cook, D.J., Djoko, S.: Substructure Discovery in the SUBDUE System. In: Proc. KDD 1994, pp. 169–180 (1994)
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Kuramochi, M., Karypis, G.: Frequent Subgraph Discovery. In: Proc. IEEE ICDM (2001)
Miyahara, T., Suzuki, Y., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Nakano, S.: Efficient generation of plane trees. Information Processing Letters 84, 167–172 (2002)
Nakano, S., Uno, T.: Efficient Generation of Rooted Trees, NII Technical Report NII-2003-005E, Natinal Institute of Informatics (July 2003) ISSN 1346-5597
Nestrov, S., Abiteboul, S., Motwani, R.: Extracting Schema from Semistructured Data. In: Proc. SIGKDD 1998, pp. 295–306. ACM, New York (1998)
Nijssen, S., Kok, J.N.: Effcient Discovery of Frequent Unordered Trees. In: Proc. MGTS 2003 (September 2003)
Termier, A., Rousset, M., Sebag, M.: TreeFinder: a First Step towards XML Data Mining. In: Proc. IEEE ICDM 2002, pp. 450–457 (2002)
Uno, T.: A Fast Algorithm for Enumerating Bipartite Perfect Matchings. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 367–379. Springer, Heidelberg (2001)
Vanetik, N., Gudes, E., Shimony, E.: Computing Frequent Graph Patterns from Semistructured Data. In: Proc. IEEE ICDM 2002, pp. 458–465 (2002)
Wang, K., Liu, H.: Schema Discovery from Semistructured Data. In: Proc. KDD 1997, pp. 271–274 (1997)
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. IEEE ICDM 2002, pp. 721–724 (2002)
Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest. In: Proc. SIGKDD 2002, ACM, New York (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Asai, T., Arimura, H., Uno, T., Nakano, Si. (2003). Discovering Frequent Substructures in Large Unordered Trees. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds) Discovery Science. DS 2003. Lecture Notes in Computer Science(), vol 2843. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39644-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-39644-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20293-6
Online ISBN: 978-3-540-39644-4
eBook Packages: Springer Book Archive