Abstract
Learning context-free grammars is generally considered a very hard task. This is even more the case when learning has to be done from positive examples only. In this context one possibility is to learn stochastic context-free grammars, by making the implicit assumption that the distribution of the examples is given by such an object. Nevertheless this is still a hard task for which no algorithm is known. We use recent results to introduce a proper subclass of linear grammars, called deterministic linear grammars, for which we prove that a small canonical form can be found. This has been a successful condition for a learning algorithm to be possible. We propose an algorithm for this class of grammars and we prove that our algorithm works in polynomial time, and structurally converges to the target in the paradigm of identification in the limit with probability 1. Although this does not ensure that only a polynomial size sample is necessary for learning to be possible, we argue that the criterion means that no added (hidden) bias is present.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baker, J.K.: Trainable grammars for speech recognition. In: Speech Communication Papers for the 97th Meeting of the Acoustical Soc. of America, pp. 547–550 (1979)
Carrasco, R., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 139–150. Springer, Heidelberg (1994)
Carrasco, R.C., Oncina, J.: Learning deterministic regular grammars from stochastic samples in polynomial time. RAIRO (Theoretical Informatics and Applications) 33(1), 1–20 (1999)
de la Higuera, C., Oncina, J.: Learning deterministic linear languages. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 185–200. Springer, Heidelberg (2002)
de la Higuera, C., Thollard, F.: Identication in the limit with probability one of stochastic deterministic finite automata. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 15–24. Springer, Heidelberg (2000)
Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn., vol. 1 and 2. John Wiley & Sons, Inc., Chichester (1968)
Langley, P., Stromsten, S.: Learning context-free grammars with a simplicity bias. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 220–228. Springer, Heidelberg (2000)
Nevill-Manning, C., Witten, I.: Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of A. I.Research 7, 67–82 (1997)
Sakakibara, Y.: Efficient learning of context-free grammars from positive structural examples. Information and Computation 97, 23–60 (1992)
Sakakibara, Y., Brown, M., Hughley, R., Mian, I., Sjolander, K., Underwood, R., Haussler, D.: Stochastic context-free grammars for trna modeling. Nuclear Acids Res. 22, 5112–5120 (1994)
Wang, Y., Acero, A.: Evaluation of spoken language grammar learning in the atis domain. In: Proceedings of ICASSP (2002)
Young-Lai, M., Tompa, F.W.: Stochastic grammatical inference of text database structure. Machine Learning 40(2), 111–137 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de la Higuera, C., Oncina, J. (2003). Identification with Probability One of Stochastic Deterministic Linear Languages. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2003. Lecture Notes in Computer Science(), vol 2842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39624-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-39624-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20291-2
Online ISBN: 978-3-540-39624-6
eBook Packages: Springer Book Archive