Skip to main content

Learning Finite-State Models for Machine Translation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3264))

Abstract

In formal language theory finite-state transducers are well-know models for “input-output” rational mappings between two languages. Even if more powerful, recursive models can be used to account for more complex mappings, it has been argued that the input-output relations underlying most usual natural language pairs are essentially rational. Moreover, the relative simplicity of these mappings has recently lead to the development of techniques for learning finite-state transducers from a training set of input-output sentence pairs of the languages considered. Following these arguments, in the last few years a number of machine translation systems have been developed based on stochastic finite-state transducers. Here we review the statistical statement of Machine Translation and how the corresponding modelling, learning and search problems can be solved by using stochastic finite-state transducers. We also review the results achieved by the systems developed under this paradigm. After presenting the traditional approach, where transducer learning is mainly solved under the grammatical inference framework, we propose a new approach where learning is explicitly considered as a statistical estimation problem and the whole stochastic finite-state transducer learning problem is solved by expectation maximisation.

This work was partially supported by the European Union project TT2 (IST-2001-32091) and by the Spanish project TEFATE (TIC 2003-08681-C02-02).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vidal, E., García, P., Segarra, E.: Inductive learning of finite-state transducers for the interpretation of unidimensional objects. In: Mohr, R., Pavlidis, T., Sanfeliu, A. (eds.) Structural Pattern Analysis, pp. 17–35. World Scientific pub., Singapore (1989)

    Google Scholar 

  2. Knight, K., Al-Onaizan, Y.: Translation with finite-state devices. In: Proceedings of the 4th. ANSTA Conference (1998)

    Google Scholar 

  3. Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 448–458 (1993)

    Article  Google Scholar 

  4. Castellanos, A., Vidal, E., Varó, A., Oncina, J.: Language Understanding and Subsequential Transducer Learning. Computer Speech and Language 12, 193–228 (1998)

    Article  Google Scholar 

  5. Mäkinen, E.: Inferring finite transducers. Technical Report A-1999-3, University of Tampere (1999)

    Google Scholar 

  6. Vilar, J.M.: Improve the learning of subsequential transducers by using alignments and dictionaries. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 298–312. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Casacuberta, F.: Inference of finite-state transducers by using regular grammars and morphisms. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 1–14. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  8. Amengual, J., Benedí, J., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., Vilar, J.: The EuTrans-I speech translation system. Machine Translation 15, 75–103 (2000)

    Article  MATH  Google Scholar 

  9. Alshawi, H., Bangalore, S., Douglas, S.: Learning dependency translation models as collections of finite state head transducers. Computational Linguistics 26 (2000)

    Google Scholar 

  10. Picó, D., Casacuberta, F.: Some statistical-estimation methods for stochastic finitestate transducers. Machine Learning 44, 121–141 (2001)

    Article  MATH  Google Scholar 

  11. Bangalore, S., Riccardi, G.: A finite-state approach to machine translation. In: Proceedings of the North American ACL 2001, Pittsburgh, USA (2001)

    Google Scholar 

  12. Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30, 205–225 (2004)

    Article  MathSciNet  Google Scholar 

  13. Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)

    Google Scholar 

  14. Ney, H., Nießen, S., Och, F.J., Sawaf, H., Tillmann, C., Vogel, S.: Algorithms for statistical translation of spoken language. IEEE Transactions on Speech and Audio Processing 8, 24–36 (2000)

    Article  Google Scholar 

  15. Casacuberta, F., Ney, H., Och, F.J., Vidal, E., Vilar, J.M., Barrachina, S., García- Varea, I., Llorens, D., Martínez, C., Molau, S., Nevado, F., Pastor, M., Picó, D., Sanchis, A., Tillmann, C.: Some approaches to statistical and finite-state speechto- speech translation. Computer Speech and Language 18, 25–47 (2004)

    Article  Google Scholar 

  16. Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press, Cambridge (1998)

    Google Scholar 

  17. Langlais, P., Foster, G., Lapalme, G.: TransType: a computer-aided translation typing system. In: Proceedings of theWorkshop on Embedded Machine Translation Systems (NAACL/ANLP 2000), Seattle, Washington, pp. 46–52 (2000)

    Google Scholar 

  18. Civera, J., Vilar, J., Cubel, E., Lagarda, A., Casacuberta, F., Vidal, E., Picó, D., González, J.: A syntactic pattern recognition approach to computer assisted translation. In: Fred, A., Caelli, T., Campilho, A., Duin, R.P., de Ridder, D. (eds.) Advances in Statistical, Structural and Syntactical Pattern Recognition. LNCS, Springer, Lisbon (2004)

    Google Scholar 

  19. Mohri, M.: Finite-state transducers in language and speech processing. Computational Linguistics 23, 269–311 (1997)

    MathSciNet  Google Scholar 

  20. Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30, 205–225 (2004)

    Article  MathSciNet  Google Scholar 

  21. Casacuberta, F., Vidal, E., Picó, D.: Inference of finite-state transducers from regular languages. Pattern Recognition (2004) (in press)

    Google Scholar 

  22. Casacuberta, F., de la Higuera, C.: Computational complexity of problems on probabilistic grammars and transducers. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 15–24. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  23. Amengual, J., Vidal, E.: Efficient Error-Corecting Viterbi Parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 1109–1116 (1998)

    Article  Google Scholar 

  24. Amengual, J., Sanchis, A., Vidal, E., Bened´ı, J.: Language simplification through error-correcting and grammatical inference techniques. Machine Learning 44, 143–159 (2001)

    Article  MATH  Google Scholar 

  25. Llorens, D., Vilar, J.M., Casacuberta, F.: Finite state language models smoothed using n-grams. International Journal of Pattern Recognition and Artificial Intelligence 16, 275–289 (2002)

    Article  Google Scholar 

  26. Oncina, J., Varó, M.: Using domain information during the learning of a subsequential transducer. In: Miclet, L., de la Higuera, C. (eds.) ICGI 1996. LNCS, vol. 1147, pp. 313–325. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  27. Vidal, E.: Finite-State Speech-to-Speech Translation. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP 1997), Munich, vol. 1, pp. 111–114 (1997)

    Google Scholar 

  28. EuTrans: Example-based language translation systems. Final report. Technical report, Instituto Tecnológico de Informática, Fondazione Ugo Bordoni, Rheinisch Westfälische Technische Hochschule Aachen Lehrstuhl für Informatik VI, Zeres GmbH Bochum: Long Term Research Domain, Project Number 30268 (2000)

    Google Scholar 

  29. Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29, 19–51 (2003)

    Article  Google Scholar 

  30. Ney, H., Martin, S., Wessel, F.: Statistical language modeling using leaving-oneout. In: Young, S., Bloothooft, G. (eds.) Corpus-Based Statiscal Methods in Speech and Language Processing, pp. 174–207. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

  31. Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Processing Mahazine, 47–59 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vidal, E., Casacuberta, F. (2004). Learning Finite-State Models for Machine Translation. In: Paliouras, G., Sakakibara, Y. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2004. Lecture Notes in Computer Science(), vol 3264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30195-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30195-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23410-4

  • Online ISBN: 978-3-540-30195-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics