Abstract
In the fields of data mining and knowledge discovery, many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and t has edge labels. In order to represent structural features common to such semistructured data, we propose a regular term tree which is a rooted tree pattern consisting of ordered tree structures and internal structured variables. For a regular ordered term tree t, the term tree language of t, denoted by L(t), is the set of all trees which are obtained from t by substituting arbitrary trees for all variables in t.
In this paper, we consider a polynomial time learnability of the class OTTL = {L(t) ∣ t ∈ OTT} from positive data, where OTT denotes the set of all regular ordered term trees. First of all, we present a polynomial time algorithm for solving the minimal language problem for OTT which is, given a set of labeled trees S, to find a term tree t in OTT such that L(t) is minimal among all term tree languages which contain all trees in S. Moreover, by using this algorithm and the polynomial time algorithm for solving the membership problem for OTT in our previous work [15], we show that OTTL is polynomial time inductively inferable from positive data. This result is an extension of our previous results in [14].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, 2000.
T. R. Amoth, P. Cull, and P. Tadepalli. Exact learning of unordered tree patterns from queries. Proc. COLT-99, ACM Press, pages 323–332, 1999.
D. Angluin. Finding patterns common to a set of strings. Journal of Computer and System Science, 21:46–62, 1980.
H. Arimura, H. Sakamoto, and S. Arikawa. Efficient learning of semi-structured data from queries. Proc. ALT-2001, Springer-Verlag, LNAI 2225, pages 315–331, 2001.
H. Arimura, T. Shinohara, and S. Otsuki. Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. Proc. STACS-94, Springer-Verlag, LNCS 775, pages 649–660, 1994.
S. Matsumoto, Y. Hayashi, and T. Shoudai. Polynomial time inductive inference of regular term tree languages from positive data. Proc. ALT-97, Springer-Verlag, LNAI 1316, pages 212–227, 1997.
T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Polynomial time matching algorithms for tree-like structured patterns in knowledge discovery. Proc. PAKDD-2000, Springer-Verlag, LNAI 1805, pages 5–16, 2000.
T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tree structured patterns in semistructured web documents. Proc. PAKDD-2001, Springer-Verlag, LNAI 2035, pages 47–52, 2001.
T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tag tree patterns in semistructured web documents. Proc. PAKDD-2002, Springer-Verlag, LNAI 2336, pages 341–355, 2002.
T. Shinohara. Polynomial time inference of extended regular pattern languages. In Springer-Verlag, LNCS 147, pages 115–127, 1982.
T. Shinohara and S. Arikawa. Pattern inference. GOSLER Final Report, Springer-Verlag, LNAI 961, pages 259–291, 1995.
T. Shoudai, T. Miyahara, T. Uchida, and S. Matsumoto. Inductive inference of regular term tree languages and its application to knowledge discovery. Information Modeling and Knowledge Bases XI, IOS Press, pages 85–102, 2000.
T. Shoudai, T. Uchida, and T. Miyahara. Polynomial time algorithms for finding unordered tree patterns with internal variables. Proc. FCT-2001, Springer-Verlag, LNCS 2138, pages 335–346, 2001.
Y. Suzuki, R. Akanuma, T. Shoudai, T. Miyahara, and T. Uchida. Polynomial time inductive inference of ordered tree patterns with internal structured variables from positive data. Proc. COLT-2002, Springer-Verlag, LNAI 2375, pages 169–184, 2002.
Y. Suzuki, K. Inomae, T. Shoudai, T. Miyahara, and T. Uchida. A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data. Proc. ILP-2002, Springer-Verlag, LNAI (to appear), 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suzuki, Y., Shoudai, T., Uchida, T., Miyahara, T. (2002). Ordered Term Tree Languages which Are Polynomial Time Inductively Inferable from Positive Data. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds) Algorithmic Learning Theory. ALT 2002. Lecture Notes in Computer Science(), vol 2533. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36169-3_17
Download citation
DOI: https://doi.org/10.1007/3-540-36169-3_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00170-6
Online ISBN: 978-3-540-36169-5
eBook Packages: Springer Book Archive