Identification with Probability One of Stochastic Deterministic Linear Languages

de la Higuera, Colin; Oncina, Jose

doi:10.1007/978-3-540-39624-6_20

Colin de la Higuera⁴ &
Jose Oncina⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2842))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

386 Accesses
4 Citations

Abstract

Learning context-free grammars is generally considered a very hard task. This is even more the case when learning has to be done from positive examples only. In this context one possibility is to learn stochastic context-free grammars, by making the implicit assumption that the distribution of the examples is given by such an object. Nevertheless this is still a hard task for which no algorithm is known. We use recent results to introduce a proper subclass of linear grammars, called deterministic linear grammars, for which we prove that a small canonical form can be found. This has been a successful condition for a learning algorithm to be possible. We propose an algorithm for this class of grammars and we prove that our algorithm works in polynomial time, and structurally converges to the target in the paradigm of identification in the limit with probability 1. Although this does not ensure that only a polynomial size sample is necessary for learning to be possible, we argue that the criterion means that no added (hidden) bias is present.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Computing $$\textit{pre}^{*}$$ for General Context Free Grammars

Grammatical Inference in the Discovery of Generating Functions

On the Grammatical Complexity of Finite Languages

References

Baker, J.K.: Trainable grammars for speech recognition. In: Speech Communication Papers for the 97th Meeting of the Acoustical Soc. of America, pp. 547–550 (1979)
Google Scholar
Carrasco, R., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 139–150. Springer, Heidelberg (1994)
Google Scholar
Carrasco, R.C., Oncina, J.: Learning deterministic regular grammars from stochastic samples in polynomial time. RAIRO (Theoretical Informatics and Applications) 33(1), 1–20 (1999)
Article MATH MathSciNet Google Scholar
de la Higuera, C., Oncina, J.: Learning deterministic linear languages. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 185–200. Springer, Heidelberg (2002)
Chapter Google Scholar
de la Higuera, C., Thollard, F.: Identication in the limit with probability one of stochastic deterministic finite automata. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 15–24. Springer, Heidelberg (2000)
Chapter Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn., vol. 1 and 2. John Wiley & Sons, Inc., Chichester (1968)
MATH Google Scholar
Langley, P., Stromsten, S.: Learning context-free grammars with a simplicity bias. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 220–228. Springer, Heidelberg (2000)
Chapter Google Scholar
Nevill-Manning, C., Witten, I.: Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of A. I.Research 7, 67–82 (1997)
MATH Google Scholar
Sakakibara, Y.: Efficient learning of context-free grammars from positive structural examples. Information and Computation 97, 23–60 (1992)
Article MATH MathSciNet Google Scholar
Sakakibara, Y., Brown, M., Hughley, R., Mian, I., Sjolander, K., Underwood, R., Haussler, D.: Stochastic context-free grammars for trna modeling. Nuclear Acids Res. 22, 5112–5120 (1994)
Article Google Scholar
Wang, Y., Acero, A.: Evaluation of spoken language grammar learning in the atis domain. In: Proceedings of ICASSP (2002)
Google Scholar
Young-Lai, M., Tompa, F.W.: Stochastic grammatical inference of text database structure. Machine Learning 40(2), 111–137 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

EURISE, Université de Saint-Etienne, 23 rue du Docteur Paul Michelon, 42023, Saint-Etienne, France
Colin de la Higuera
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Ap.99., E-03080, Alicante, Spain
Jose Oncina

Authors

Colin de la Higuera
View author publications
You can also search for this author in PubMed Google Scholar
Jose Oncina
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
Ricard Gavaldá
Meme Media Laboratory, Hokkaido University Sapporo, Kita 13, Nishi 8, Kita-ku, 060-8628, Sapporo, Japan
Klaus P. Jantke
,
Eiji Takimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de la Higuera, C., Oncina, J. (2003). Identification with Probability One of Stochastic Deterministic Linear Languages. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2003. Lecture Notes in Computer Science(), vol 2842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39624-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-39624-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20291-2
Online ISBN: 978-3-540-39624-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics