Abstract
In this work we propose to use a more powerful teacher to effectively apply query learning algorithms to identify regular languages in practical, real-world problems. More specifically, we define a more powerful set of replies to the membership queries posed by the L* algorithm that reduces the number of such queries by several orders of magnitude in a practical application. The basic idea is to avoid the needless repetition of membership queries in cases where the reply will be negative as long as a particular condition is met by the string in the membership query. We present an example of the application of this method to a real problem, that of inferring a grammar for the structure of technical articles.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gold, E.M.: Complexity of automaton identification from given data. Information and Control 37, 302–320 (1978)
Pitt, L., Warmuth, M.: The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of ACM 40, 95–142 (1993)
Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75, 86–106 (1987)
Gold, E.M.: System identification via state characterization. Automatica 8, 621–636 (1972)
Schapire, R.E.: The Design and Analysis of Efficient Learning Algorithms. MIT Press, Cambridge (1992)
Nevill-Manning, C., Witten, I.H., Maulsby, D.L.: Modeling sequences using grammars and automata. In: Proceedings Canadian Machine Learning Workshop, pp. 15–18 (1994)
Hsu, C.N., Dung, M.T.: Generating finite-state transducers for semi-structured data extraction from the web. Information Systems 23, 521–538 (1998)
Witten, I.H.: Adaptive text mining: inferring structure from sequences. Journal of Discrete Algorithms 2, 137–159 (2004)
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Record 31, 84–93 (2002)
Ribeiro-Neto, B.A., Laender, A.H.F., da Silva, A.S.: Extracting semi-structured data through examples. In: Proceedings of the 1999 ACM CIKM International Conference on Information and Knowledge Management, pp. 94–101. ACM, New York (1999)
Adelberg, B.: NoDoSE - a tool for semi-automatically extracting semi-structured data from text documents. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 283–294 (1998)
Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, pp. 328–334 (1999)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)
Angluin, D.: Queries and concept learning. Machine Learning 2, 319–342 (1988)
Martins, A.L., Pinto, H.S., Oliveira, A.L.: Towards automatic learning of a structure ontology for technical articles. In: Semantic Web Workshop at SIGIR 2004 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martins, A.L., Pinto, H.S., Oliveira, A.L. (2005). Using a More Powerful Teacher to Reduce the Number of Queries of the L* Algorithm in Practical Applications. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_33
Download citation
DOI: https://doi.org/10.1007/11595014_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30737-2
Online ISBN: 978-3-540-31646-6
eBook Packages: Computer ScienceComputer Science (R0)