Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams

Asai, Tatsuya; Abe, Kenji; Kawasoe, Shinji; Arimura, Hiroki; Arikawa, Setsuo

doi:10.1007/978-3-540-71009-7_3

Tatsuya Asai¹,
Kenji Abe¹,
Shinji Kawasoe¹,
Hiroki Arimura¹ &
…
Setsuo Arikawa¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3609))

Included in the following conference series:

577 Accesses
1 Citations

Abstract

In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling-semistructured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. We give modifications of the algorithm to other online mining models. Furthermore we implement our algorithms in different online models and candidate management strategies, then show empirical analyses to evaluate the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semi-structured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 1–14. Springer, Heidelberg (2002)
Chapter Google Scholar
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
MATH Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: Proc. SIAM SDM’02, pp. 158–174 (2002)
Google Scholar
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. IEEE ICDM’02, pp. 27–34. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordred Trees. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 47–61. Springer, Heidelberg (2003)
Chapter Google Scholar
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD98, pp. 85–93 (1998)
Google Scholar
de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry, Algorithms and Applications. Springer, Heidelberg (2000)
Book Google Scholar
Dehaspe, L., Toivonen, H., King, R.D.: Finding Frequent Substructures in Chemical Compounds. In: Proc. KDD-98, pp. 30–36 (1998)
Google Scholar
Gibbons, P.B., Matias, Y.: Synopsis Data Structures for Massive Data Sets. In: External Memory Algorithms. DIMACS Series in Discr. Math. and Theor. Compt. Sci, vol. 50, pp. 39–70. AMS, New York (2000)
Chapter Google Scholar
Hidber, C.: Online Association Rule Mining. In: Proc. SIGMOD’99, pp. 145–156 (1999)
Google Scholar
Kilpelainen, P., Mannila, H.: Ordered and Unordered Tree Inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Article MathSciNet Google Scholar
Laird, P., Saul, R.: Discrete Sequence Prediction and Its Applications. Machine Learning 15(1), 43–68 (1994)
MATH Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering Frequent Episode in Sequences. In: Proc. KDD-95, pp. 210–215. AAAI, Menlo Park (1995)
Google Scholar
Matsuda, T., Horiuchi, T., Motoda, H., Washio, T., Kumazawa, K., Arai, N.: Graph-Based Induction for General Graph Structured Data. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 340–342. Springer, Heidelberg (1999)
Chapter Google Scholar
Miyahara, T., Suzuki, Y., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Chapter Google Scholar
Nakano, S.: Efficient Generation of Plane Trees. Information Processing Letters 84, 167–172 (2002)
Article MathSciNet Google Scholar
Parthasarathy, S., Zaki, M.J., Ogihara, M., Dwarkadas, S.: Incremental and Interactive Sequence Mining. In: CIKM’99, pp. 251–258 (1999)
Google Scholar
Rastogi, R.: Single-Path Algorithms for Querying and Mining Data Streams. In: Proc. SDM’02 Workshop HDM’02, pp. 43–48 (2002)
Google Scholar
W3C, Extensive Markup Language (XML) 1.0 (Second Edition), W3C Recommendation, 06 October (2000), http://www.w3.org/TR/REC-xml
Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. TKDE 12(2), 353–371 (2000)
Google Scholar
Yamanishi, K., Takeuchi, J.: A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data. In: Proc. SIGKDD-2002, ACM Press, New York (2002)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. SIGKDD-2002, ACM Press, New York (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Kyushu University, 6–10–1 Hakozaki Higashi-ku, Fukuoka 812–8581, Japan
Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroki Arimura & Setsuo Arikawa

Authors

Tatsuya Asai
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Abe
View author publications
You can also search for this author in PubMed Google Scholar
Shinji Kawasoe
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar
Setsuo Arikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Akito Sakurai Kôiti Hasida Katsumi Nitta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asai, T., Abe, K., Kawasoe, S., Arimura, H., Arikawa, S. (2007). Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams. In: Sakurai, A., Hasida, K., Nitta, K. (eds) New Frontiers in Artificial Intelligence. JSAI JSAI 2003 2004. Lecture Notes in Computer Science(), vol 3609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71009-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-71009-7_3
Published: 20 July 2007
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71008-0
Online ISBN: 978-3-540-71009-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics