Abstract
In language engineering, language models are employed in order to improve system performance. These language models are usually N-gram models which are estimated from large text databases using the occurrence frequencies of these N-grams. An alternative to conventional frequency-based estimation of N-gram probabilities consists on using neural networks to this end. In this paper, an approach to language modeling with a hybrid language model is presented as a linear combination of a connectionist N-gram model, which is used to represent the global relations between certain linguistic categories, and a stochastic model of word distribution into such categories. The hybrid language model is tested on the corpus of the Wall Street journal processed in the Penn Treebank project.
This work has been supported by the Spanish CICYT under contract TIC2003-07158-C04-03.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bahl, L., Jelinek, F., Mercer, R.: A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Trans. on PAMI 5, 179–190 (1983)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. In: PDP: Computational models of cognition and perception, I, MIT Press, Cambridge (1986)
Bishop, C.: Neural networks for pattern recognition. Oxford University Press, Oxford (1995)
Nakamura, M., Shikano, K.: A study of English word category prediction based on neural networks. In: Proc. of the ICASSP, Glasgow (Scotland), pp. 731–734 (1989)
Castro, M., Casacuberta, F., Prat, F.: Towards connectionist language modeling. In: Proc. of the Symposium on Pattern Recognition and Image Analysis, Bilbao (Spain), pp. 9–10 (1999)
Castro, M., Prat, F., Casacuberta, F.: MLP emulation of N-gram models as a first step to connectionist language modeling. In: Proc. of the ICANN, Edinburgh (UK), pp. 910–915 (1999)
Castro, M.J., Polvoreda, V., Prat, F.: Connectionist N-gram Models by UsingMLPs. In: Proc. of the NLPNN, Tokyo (Japan), pp. 16–22 (2001)
Castro, M.J., Prat, F.: New Directions in Connectionist Language Modeling. In: Mira, J., Álvarez, J.R. (eds.) IWANN 2003. LNCS, vol. 2686, pp. 598–605. Springer, Heidelberg (2003)
Xu, W., Rudnicky, A.: Can Artificial Neural Networks Learn Language Models? In: Proc. of the ICSLP, Beijing, China (2000)
Bengio, Y., Ducharme, R., Vincent, P.: A Neural Probabilistic Language Model. In: Advances in NIPS, vol. 13, pp. 932–938. Morgan Kaufmann, San Francisco (2001)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A Neural Probabilistic Language Model. Journal of Machine Learning Research 3, 1137–1155 (2003)
Schwenk, H., Gauvain, J.L.: Connectionist language modeling for large vocabulary continuous speech recognition. In: Proc. of the ICASSP, Orlando, Florida (USA), pp. 765–768 (2002)
Schwenk, H., Gauvain, J.L.: Using continuous space language models for conversational speech recognition. In: Work. on Spontaneous Speech Process. and Recog., Tokyo (2003)
Schwenk, H.: Efficient Training of Large Neural Networks for Language Modeling. In: Proc. of the IJCNN, Budapest, pp. 3059–3062 (2004)
Elman, J.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Rodriguez, P.: Comparing Simple Recurrent Networks and n-Grams in a Large Corpus. Journal of Applied Intelligence 19, 39–50 (2003)
Benedí, J., Sánchez, J.: Estimation of stochastic context-free grammars and their use as language models. Computer Speech and Language (2005) (in press)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313–330 (1994)
Roark, B.: Probabilistic top-down parsing and language modeling. Computational Linguistics 27, 249–276 (2001)
Rosenfeld, R.: Adaptative statistical language modeling: Amaximum entropy approach. PhD thesis, Carnegie Mellon University (1994)
Clarkson, P., Rosenfeld, R.: Statistical Language Modeling using the CMU-Cambridge toolkit. In: Proc. of the Eurospeech, Rhodes (Greece), pp. 2707–2711 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blat, F., Castro, M.J., Tortajada, S., Sánchez, J.A. (2005). A Hybrid Approach to Statistical Language Modeling with Multilayer Perceptrons and Unigrams. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_25
Download citation
DOI: https://doi.org/10.1007/11551874_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)