Abstract
In formal language theory finite-state transducers are well-know models for “input-output” rational mappings between two languages. Even if more powerful, recursive models can be used to account for more complex mappings, it has been argued that the input-output relations underlying most usual natural language pairs are essentially rational. Moreover, the relative simplicity of these mappings has recently lead to the development of techniques for learning finite-state transducers from a training set of input-output sentence pairs of the languages considered. Following these arguments, in the last few years a number of machine translation systems have been developed based on stochastic finite-state transducers. Here we review the statistical statement of Machine Translation and how the corresponding modelling, learning and search problems can be solved by using stochastic finite-state transducers. We also review the results achieved by the systems developed under this paradigm. After presenting the traditional approach, where transducer learning is mainly solved under the grammatical inference framework, we propose a new approach where learning is explicitly considered as a statistical estimation problem and the whole stochastic finite-state transducer learning problem is solved by expectation maximisation.
This work was partially supported by the European Union project TT2 (IST-2001-32091) and by the Spanish project TEFATE (TIC 2003-08681-C02-02).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Vidal, E., García, P., Segarra, E.: Inductive learning of finite-state transducers for the interpretation of unidimensional objects. In: Mohr, R., Pavlidis, T., Sanfeliu, A. (eds.) Structural Pattern Analysis, pp. 17–35. World Scientific pub., Singapore (1989)
Knight, K., Al-Onaizan, Y.: Translation with finite-state devices. In: Proceedings of the 4th. ANSTA Conference (1998)
Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 448–458 (1993)
Castellanos, A., Vidal, E., Varó, A., Oncina, J.: Language Understanding and Subsequential Transducer Learning. Computer Speech and Language 12, 193–228 (1998)
Mäkinen, E.: Inferring finite transducers. Technical Report A-1999-3, University of Tampere (1999)
Vilar, J.M.: Improve the learning of subsequential transducers by using alignments and dictionaries. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 298–312. Springer, Heidelberg (2000)
Casacuberta, F.: Inference of finite-state transducers by using regular grammars and morphisms. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 1–14. Springer, Heidelberg (2000)
Amengual, J., Benedí, J., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., Vilar, J.: The EuTrans-I speech translation system. Machine Translation 15, 75–103 (2000)
Alshawi, H., Bangalore, S., Douglas, S.: Learning dependency translation models as collections of finite state head transducers. Computational Linguistics 26 (2000)
Picó, D., Casacuberta, F.: Some statistical-estimation methods for stochastic finitestate transducers. Machine Learning 44, 121–141 (2001)
Bangalore, S., Riccardi, G.: A finite-state approach to machine translation. In: Proceedings of the North American ACL 2001, Pittsburgh, USA (2001)
Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30, 205–225 (2004)
Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)
Ney, H., Nießen, S., Och, F.J., Sawaf, H., Tillmann, C., Vogel, S.: Algorithms for statistical translation of spoken language. IEEE Transactions on Speech and Audio Processing 8, 24–36 (2000)
Casacuberta, F., Ney, H., Och, F.J., Vidal, E., Vilar, J.M., Barrachina, S., García- Varea, I., Llorens, D., Martínez, C., Molau, S., Nevado, F., Pastor, M., Picó, D., Sanchis, A., Tillmann, C.: Some approaches to statistical and finite-state speechto- speech translation. Computer Speech and Language 18, 25–47 (2004)
Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press, Cambridge (1998)
Langlais, P., Foster, G., Lapalme, G.: TransType: a computer-aided translation typing system. In: Proceedings of theWorkshop on Embedded Machine Translation Systems (NAACL/ANLP 2000), Seattle, Washington, pp. 46–52 (2000)
Civera, J., Vilar, J., Cubel, E., Lagarda, A., Casacuberta, F., Vidal, E., Picó, D., González, J.: A syntactic pattern recognition approach to computer assisted translation. In: Fred, A., Caelli, T., Campilho, A., Duin, R.P., de Ridder, D. (eds.) Advances in Statistical, Structural and Syntactical Pattern Recognition. LNCS, Springer, Lisbon (2004)
Mohri, M.: Finite-state transducers in language and speech processing. Computational Linguistics 23, 269–311 (1997)
Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30, 205–225 (2004)
Casacuberta, F., Vidal, E., Picó, D.: Inference of finite-state transducers from regular languages. Pattern Recognition (2004) (in press)
Casacuberta, F., de la Higuera, C.: Computational complexity of problems on probabilistic grammars and transducers. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 15–24. Springer, Heidelberg (2000)
Amengual, J., Vidal, E.: Efficient Error-Corecting Viterbi Parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 1109–1116 (1998)
Amengual, J., Sanchis, A., Vidal, E., Bened´ı, J.: Language simplification through error-correcting and grammatical inference techniques. Machine Learning 44, 143–159 (2001)
Llorens, D., Vilar, J.M., Casacuberta, F.: Finite state language models smoothed using n-grams. International Journal of Pattern Recognition and Artificial Intelligence 16, 275–289 (2002)
Oncina, J., Varó, M.: Using domain information during the learning of a subsequential transducer. In: Miclet, L., de la Higuera, C. (eds.) ICGI 1996. LNCS, vol. 1147, pp. 313–325. Springer, Heidelberg (1996)
Vidal, E.: Finite-State Speech-to-Speech Translation. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP 1997), Munich, vol. 1, pp. 111–114 (1997)
EuTrans: Example-based language translation systems. Final report. Technical report, Instituto Tecnológico de Informática, Fondazione Ugo Bordoni, Rheinisch Westfälische Technische Hochschule Aachen Lehrstuhl für Informatik VI, Zeres GmbH Bochum: Long Term Research Domain, Project Number 30268 (2000)
Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29, 19–51 (2003)
Ney, H., Martin, S., Wessel, F.: Statistical language modeling using leaving-oneout. In: Young, S., Bloothooft, G. (eds.) Corpus-Based Statiscal Methods in Speech and Language Processing, pp. 174–207. Kluwer Academic Publishers, Dordrecht (1997)
Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Processing Mahazine, 47–59 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vidal, E., Casacuberta, F. (2004). Learning Finite-State Models for Machine Translation. In: Paliouras, G., Sakakibara, Y. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2004. Lecture Notes in Computer Science(), vol 3264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30195-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-30195-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23410-4
Online ISBN: 978-3-540-30195-0
eBook Packages: Springer Book Archive