Skip to main content
Log in

Using target-language information to train part-of-speech taggers for machine translation

  • Published:
Machine Translation

Abstract

Although corpus-based approaches to machine translation (MT) are growing in interest, they are not applicable when the translation involves less-resourced language pairs for which there are no parallel corpora available; in those cases, the rule-based approach is the only applicable solution. Most rule-based MT systems make use of part-of-speech (PoS) taggers to solve the PoS ambiguities in the source-language texts to translate; those MT systems require accurate PoS taggers to produce reliable translations in the target language (TL). The standard statistical approach to PoS ambiguity resolution (or tagging) uses hidden Markov models (HMM) trained in a supervised way from hand-tagged corpora, an expensive resource not always available, or in an unsupervised way through the Baum-Welch expectation-maximization algorithm; both methods use information only from the language being tagged. However, when tagging is considered as an intermediate task for the translation procedure, that is, when the PoS tagger is to be embedded as a module within an MT system, information from the TL can be (unsupervisedly) used in the training phase to increase the translation quality of the whole MT system. This paper presents a method to train HMM-based PoS taggers to be used in MT; the new method uses not only information from the source language (SL), as general-purpose methods do, but also information from the TL and from the remaining modules of the MT system in which the PoS tagger is to be embedded. We find that the translation quality of the MT system embedding a PoS tagger trained in an unsupervised manner through this new method is clearly better than that of the same MT system embedding a PoS tagger trained through the Baum-Welch algorithm, and comparable to that obtained by embedding a PoS tagger trained in a supervised way from hand-tagged corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Armentano-Oller C, Carrasco RC, Corbí-Bellot AM, Forcada ML, Ginestí-Rosell M, Ortiz-Rojas S, Pérez-Ortiz JA, Ramírez-Sánchez G, Sánchez-Martínez F, Scalco MA (2006) Open-source Portuguese-Spanish machine translation. In: Computational processing of the Portuguese language, proceedings of the 7th international workshop on computational processing of written and spoken Portuguese, vol 3960 of lecture notes in computer science. Itatiaia, RJ, Brazil: Springer-Verlag, pp 50–59

  • Armentano-Oller C, Forcada ML (2006) Open-source machine translation between small languages: Catalan and Aranese Occitan. In: Proceedings of strategies for developing machine translation for minority languages (5th workshop on speech and language technology for minority languages), Genoa, Italy, pp 51–54

  • Arnold D (2003) Why translation is difficult for computers. In: Somers H (eds) Computers and translation: a translator’s guide. John Benjamins, Amsterdam/Philadelphia, pp 119–142

    Google Scholar 

  • Baum LE (1972) An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3: 1–8

    Google Scholar 

  • Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6): 1554–1563

    Article  Google Scholar 

  • Brants T, Samuelsson C (1995) Tagging the Teleman corpus. In: Proceedings of the 10th Nordic conference of computational linguistics, Helsinki, Finland, pp 7–20

  • Brill E (1992) A simple rule-based part-of-speech tagger. In: Proceedings of the 3rd applied natural language processing conference, Trento, Italy, pp 152–155

  • Brill E (1995a) Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Comput Linguist 21(4): 543–565

    Google Scholar 

  • Brill E (1995b) Unsupervised learning of disambiguation rules for part of speech tagging. In: Proceedings of the third workshop on very large corpora, Somerset, NJ, pp 1–13

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311

    Google Scholar 

  • Carbonell J, Klein S, Miller D, Steinbaum M, Grassiany T, Frei J (2006) Context-based machine translation. In: Proceedings of the 7th conference of the association for machine translation in the Americas. Visions for the future of machine translation, Cambridge, MA, pp 19–28

  • Carl, M, Way, A (eds) (2003) Recent advances in example-based machine translation, vol 21. Kluwer Academic Publishers, Dordrecht/Boston/London

    Google Scholar 

  • Cutting D, Kupiec J, Pedersen J, Sibun P (1992) A practical part-of-speech tagger. In: Proceedings of the 3rd applied natural language processing conference, Trento, Italy, pp 133–140

  • Dermatas E, Kokkinakis G (1995) Automatic stochastic tagging of natural language texts. Comput Linguist 21(2): 137–163

    Google Scholar 

  • Dien D, Kiem H (2003) POS-tagger for English-Vietnamese bilingual corpus. In: Proceedings of the workshop on building and using parallel texts: data driven machine translation and beyond, at the human language technology and the north American chapter of the association for computational linguistics joint conference, Edmonton, Canada, pp 88–95

  • Efron B, Tibshirani RJ (1993) An introduction to the bootstrap Vol. 57 of monographs on statistics and applied probability. Chapman & Hall/CRC, London, UK

    Google Scholar 

  • Foster G, Isabelle P, Plamondon P (1997) Target text mediated interactive machine translation. Mach Transl 2(1–2): 175–194

    Article  Google Scholar 

  • Gale WA, Church KW (1990) Poor estimates of context are worse than none. In: Proceedings of the third DARPA workshop on speech and natural language. San Mateo, CA: Morgan Kaufmann Publishers Inc., pp 283–287

  • Gale WA, Sampson G (1995) Good-turing frequency estimation without tears. J Quant Linguist 2(3): 217–237

    Article  Google Scholar 

  • Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge, MA

    Google Scholar 

  • Kim JD, Lee SZ, Rim HC (1999) HMM specialization with selective kexicalization. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, College Park, MD, pp 121–127

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing. Barcelona, Spain, pp 388–395

  • Koehn P (2008) Statistical machine translation. Cambridge University Press, Cambridge, UK

    Google Scholar 

  • Kupiec J (1992) Robust part-of-speech tagging using a hidden Markov model. Comput Speech Lang 6(3): 225–242

    Article  Google Scholar 

  • Levenshtein VI (1965) Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4):845–848. English translation in Soviet Physics Doklady 10(8):707–710 (1966)

  • Manning CD, Schütze (1999) Foundations of statistical natural language processing. MIT Press, Cambridge, MA

    Google Scholar 

  • Merialdo B (1994) Tagging English text with a probabilistic model. Comput Linguist 20(2): 155–171

    Google Scholar 

  • Nagao M (1984) Framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence. Amsterdam, The Netherlands, North Holland, pp 173–180

    Google Scholar 

  • Och FJ (2005) Statistical machine translation: foundations and recent advances. Tutorial at MT Summit X, Phuket, Thailand

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics. Association for Computational Linguistics, Philadelphia, PA, pp 311–318

  • Pla F, Molina A (2004) Improving part-of-speech tagging using lexicalized HMMs. Nat Lang Eng 10(2): 167–189

    Article  Google Scholar 

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc Inst Electr Electron Eng (IEEE) 77(2): 257–286

    Google Scholar 

  • Sánchez-Villamil E, Forcada ML, Carrasco RC (2004) Unsupervised training of a finite-state sliding-window part-of-speech tagger. In: Advances in natural language processing, proceedings of the 4th international conference EsTAL (España for Natural Language Processing), Vol 3230 of lecture notes in computer science. Alicante, Spain: Springer-Verlag, pp 454–463

  • Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2004a) Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation, Baltimore, MD, pp 135–144

  • Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2004b) Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems. In: Advances in natural language processing, proceedings of the 4th international conference EsTAL (España for Natural Language Processing), vol 3230 of lecture notes in computer science. Alicante, Spain: Springer-Verlag, pp 137–148

  • Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2006) Speeding up target-language driven part-of-speech tagger training for machine translation. In: Advances in artificial intelligence, proceedings of the 5th Mexican international conference on artificial intelligence, vol 4293 of lecture notes in computer science. Apizaco, Tlaxcala, Mexico: Springer-Verlag, pp 844–854

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas. Visions for the future of machine translation, Cambridge, MA, pp 223–231

  • Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing, Denver, CO, pp 901–904

  • Yarowsky D, Ngai G (2001) Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In: Proceedings of the second meeting of the North American chapter of the association for computational linguistics, Pittsburgh, PA, pp 200–207

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Sánchez-Martínez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sánchez-Martínez, F., Pérez-Ortiz, J.A. & Forcada, M.L. Using target-language information to train part-of-speech taggers for machine translation. Machine Translation 22, 29–66 (2008). https://doi.org/10.1007/s10590-008-9044-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-008-9044-3

Keywords

Navigation