Abstract
We present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (n-NSTTs). These are tree automata that capture the class of monadic second-order definable n-ary queries. We show that n-NSTTs defined polynomially bounded n-ary queries can be learned from polynomial time and data. An application in Web information extraction yields encouraging results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Carme, J., Gilleron, R., Lemay, A., Niehren, J.: Interactive learning of node selecting tree transducer. Machine Learning (2006)
Carme, J., Lemay, A., Niehren, J.: Learning node selecting tree transducer from completely annotated examples. In: Paliouras, G., Sakakibara, Y. (eds.) ICGI 2004. LNCS (LNAI), vol. 3264, pp. 91–102. Springer, Heidelberg (2004)
Carme, J., Niehren, J., Tommasi, M.: Querying unranked trees with stepwise tree automata. In: van Oostrom, V. (ed.) RTA 2004. LNCS, vol. 3091, pp. 105–118. Springer, Heidelberg (2004)
Chidlovskii, B.: Wrapping web information providers by transducer induction. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 61–73. Springer, Heidelberg (2001)
CorbÃ, A., Oncina, J., GarcÃa, P.: Learning regular languages from a complete sample by error correcting techniques. IEEE, 4/1–4/7 (1993)
de la Higuera, C.: Characteristic sets for polynomial grammatical inference. Machine Learning 27, 125–137 (1997)
Gold, E.M.: Complexity of automaton identification from given data. Inf. Cont. 37, 302–320 (1978)
Gottlob, G., Koch, C.: Monadic queries over tree-structured data. In: 17th Annual IEEE Symposium on Logic in Computer Science, pp. 189–202 (2002)
Hosoya, H., Pierce, B.: Regular expression pattern matching for XML. Journal of Functional Programming 6(13), 961–1004 (2003)
Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)
Martens, W., Niehren, J.: On the minimization of XML schemas and tree automata for unranked trees. Journal of Computer and System Science (2006)
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. Journal of the ACM 51(1), 2–45 (2004)
Muslea, I., Minton, S., Knoblock, C.: Active learning with strong and weak views: a case study on wrapper induction. In: IJCAI 2003, pp. 415–420 (2003)
Neven, F., Van Den Bussche, J.: Expressiveness of structured document query languages based on attribute grammars. Journal of the ACM 49(1), 56–100 (2002)
Niehren, J., Planque, L., Talbot, J.M., Tison, S.: N-ary queries by tree automata. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 217–231. Springer, Heidelberg (2005)
Oncina, J., Garcia, P.: Inferring regular languages in polynomial update time. Pattern Recognition and Image Analysis, 49–61 (1992)
Oncina, J., GarcÃa, P.: Inference of recognizable tree sets. Tech. report, Universidad de Alicante, DSIC-II/47/93 (1993)
Raeymaekers, S., Bruynooghe, M., Van den Bussche, J.: Learning (k,l)-contextual tree languages for information extraction. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 305–316. Springer, Heidelberg (2005)
Thatcher, J.W., Wright, J.B.: Generalized finite automata with an application to a decision problem of second-order logic. Math. System Theory 2, 57–82 (1968)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lemay, A., Niehren, J., Gilleron, R. (2006). Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_21
Download citation
DOI: https://doi.org/10.1007/11872436_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)