Abstract
We propose in this article a new practical algorithm for inferring μ-distinguishable stochastic deterministic regular languages. We prove that this algorithm will infer, with high probability, an automaton isomorphic to the target when given a polynomial number of examples. We discuss the links between the error function used to evaluate the inferred model and the learnability of the model class in a PAC like framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abe, N., Warmuth, W.: On the computational complexity of approximating distributions by probabilistic automata. In: Wshop on COLT, pp. 52–66 (1998)
Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.): ICGI 2002. LNCS (LNAI), vol. 2484. Springer, Heidelberg (2002)
Angluin, D.: Identifying languages from stochastic examples. Technical Report YALEU/ DCS/RR-614, Yale University, Dept. of Computer Science (1988)
Brants, T.: Estimating Markov model structures. In: ICSLP 1996 (1996)
Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 139–152. Springer, Heidelberg (1994)
Carrasco, R.C., Oncina, J.: Learning deterministic regular grammars from stochastic samples in polynomial time. Theoretical Informatics and Applications 33(1), 1–20 (1999)
Clark, A., Thollard, F.: PAC-learnability of probabilistic deterministic finite state automata. Jrl of Machine Learning Research 5, 473–497 (2004)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience Publication, Hoboken (1991)
de la Higuera, C., Thollard, F.: Identification in the limit with probability one of stochastic deterministic finite automata. In: de Oliveira [10]
de Oliveira, A.: ICGI 2000. LNCS (LNAI), vol. 1891. Springer, Heidelberg (2000)
Dupont, P.: Smoothing probabilistic automata: an error-correcting approach. In: de Oliveira [10], pp. 51–64
Dupont, P., Chase, L.: Using symbol clustering to improve probabilistic automaton inference. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 232–243. Springer, Heidelberg (1998)
Esposito, Y., Lemay, A., Denis, F., Dupont, P.: Learning probabilistic residual finite state automata. In: Adriaans et al. [2], pp. 77–91
Freitag, D.: Using grammatical inference to improve precision in information extraction. In: Workshop on Grammatical Inference, Automata Induction, and Language Acquisition (1997)
Kearns, M.J., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R.E., Sellie, L.: On the learnability of discrete distributions. In: Proc. of the 25th Annual ACM Symposium on Theory of Computing, pp. 273–282 (1994)
Kermorvant, C., Dupont, P.: Stochastic grammatical inference with multinomial tests. In: Adriaans et al. [2], pp. 149–160
Llorens, D., Vilar, J.M., Casacuberta, F.: Finite state language models smoothed using n-grams. Int. Jrnl of Pattern Recognition and Artificial Intelligence 16(3), 275–289 (2002)
McAllester, D., Shapire, R.: On the convergence rate of the good-turing estimators. In: Thirteenth Annual Conf. on COLT, pp. 1–66 (2000)
Mohri, M., Pereira, F., Riley, M.: Weighted automata in text and speech processing. In: Workshop on Extended Finite-State Models of Language (1996)
Parekh, R., Honavar, H.: Learning DFA from simple examples. In: International Coloquium on Machine Lerning, ICML 1997 (1997)
Pla, F., Molina, A., Prieto, N.: An Integrated Statistical Model for Tagging and Chunking Unrestricted Text. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 15–20. Springer, Heidelberg (2000)
Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. In: COLT 1995, USA, pp. 31–40. ACM, New York (1995)
Stolcke, A.: Bayesian Learning of Probabilistic Language Models. PhD thesis, Dept. of Electrical Engineering and Computer Science, University of California at Berkeley (1994)
Thollard, F.: Improving probabilistic grammatical inference core algorithms with post-processing techniques. In: ICML, pp. 561–568 (2001)
Thollard, F., Clark, A.: Shallow parsing using probabilistic grammatical inference. In: Adriaans et al. [2], pp. 269–282
Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In: Langley, P. (ed.) ICML (2000)
Young-Lai, M., Tompa, F.W.: Stochastic grammatical inference of text database structure. Machine Learning 40(2), 111–137 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thollard, F., Clark, A. (2004). Learning Stochastic Deterministic Regular Languages. In: Paliouras, G., Sakakibara, Y. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2004. Lecture Notes in Computer Science(), vol 3264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30195-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-30195-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23410-4
Online ISBN: 978-3-540-30195-0
eBook Packages: Springer Book Archive