Skip to main content

Classification and Selection of Translation Candidates for Parallel Corpora Alignment

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9273))

Included in the following conference series:

Abstract

By incorporating human feedback in parallel corpora alignment and term translation extraction tasks, and by using all human validated term translation pairs that have been marked as correct, the alignment precision, term translation extraction quality and a bunch of closely correlated tasks improve. Moreover, such a labelled lexicon with entries tagged for correctness enables bilingual learning. From this perspective, we present experiments on automatic classification of translation candidates extracted from aligned parallel corpora. For this purpose, we train SVM based classifiers for three language pairs, English-Portuguese (EN-PT), English-French (EN-FR) and French-Portuguese (FR-PT). The approach enabled micro f-measure classification rates of 95.96%, 75.04% and 65.87% respectively, for the EN-PT, EN-FR and FR-PT language pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Aker, A., Paramita, M.L., Gaizauskas, R.J.: Extracting bilingual terminologies from comparable corpora. In: Proceedings of the 51st Annual Meeting for Computational linguistics, vol. 2, pp. 402–411 (2013)

    Google Scholar 

  3. Bergsma, S., Kondrak, G.: Alignment-based discriminative string similarity. In: Annual meeting-ACL, vol. 45, p. 656 (2007)

    Google Scholar 

  4. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)

    Google Scholar 

  5. Chen, B., Cattoni, R., Bertoldi, N., Cettolo, M., Federico, M.: The ITC-irst SMT system for IWSLT-2005, pp. 98–104 (2005)

    Google Scholar 

  6. Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Computational Linguistics 33(3), 293–303 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Gomes, L.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, October 2009

    Google Scholar 

  8. Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge Univ Pr., pp. 52–61 (1997)

    Google Scholar 

  10. Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by discarding most of the phrasetable. In: Proceedings of EMNLP (2007)

    Google Scholar 

  11. Kavitha, K.M., Gomes, L., Lopes, G.P.: Using SVMs for filtering translation tables for parallel corpora alignment. In: 15th Portuguese Conference in Arificial Intelligence, EPIA 2011, pp. 690–702, October 2011

    Google Scholar 

  12. Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Identification of bilingual suffix classes for classification and translation generation. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 154–166. Springer, Heidelberg (2014)

    Google Scholar 

  13. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL (2007)

    Google Scholar 

  14. Kutsumi, T., Yoshimi, T., Kotani, K., Sata, I., Isahara, H.: Selection of entries for a bilingual dictionary from aligned translation equivalents using support vector machines. In: Proceedings of PACLING (2005)

    Google Scholar 

  15. Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of RANLP, pp. 214–218 (2009)

    Google Scholar 

  16. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  17. Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 184–198. Boston, MA (1995)

    Google Scholar 

  18. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  19. Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)

    Article  MATH  Google Scholar 

  20. Sato, K., Saito, H.: Extracting word sequence correspondences based on support vector machines. Journal of Natural Language Processing 10(4), 109–124 (2003)

    Article  Google Scholar 

  21. Tian, L., Wong, D.F., Chao, L.S., Oliveira, F.: A relationship: Word alignment, phrase table, and translation quality. The Scientific World Journal (2014)

    Google Scholar 

  22. Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: Proceedings of the 11th NoDaLiDa, pp. 120–128 (1998)

    Google Scholar 

  23. Tomeh, N., Cancedda, N., Dymetman, M.: Complexity-based phrase-table filtering for statistical machine translation (2009)

    Google Scholar 

  24. Tomeh, N., Turchi, M., Allauzen, A., Yvon, F.: How good are your phrases? Assessing phrase quality with single class classification. In: IWSLT, pp. 261–268 (2011)

    Google Scholar 

  25. Vapnik, V.: The Nature of Statistical Learning Theory. Data Mining and Knowledge Discovery 1–47 (2000)

    Google Scholar 

  26. Vilar, D., Popovic, M., Ney, H.: AER: Do we need to “improve” our alignments? In: IWSLT, pp. 205–212 (2006)

    Google Scholar 

  27. Way, A., Hearne, M.: On the role of translations in state-of-the-art statistical machine translation. Language and Linguistics Compass 5(5), 227–248 (2011)

    Article  Google Scholar 

  28. Zens, R., Stanton, D., Xu, P.: A systematic comparison of phrase table pruning techniques. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL, EMNLP-CoNLL 2012, pp. 972–983. ACL (2012)

    Google Scholar 

  29. Zhao, B., Vogel, S., Waibel, A.: Phrase pair rescoring with term weightings for statistical machine translation (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. M. Kavitha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kavitha, K.M., Gomes, L., Aires, J., Lopes, J.G.P. (2015). Classification and Selection of Translation Candidates for Parallel Corpora Alignment. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds) Progress in Artificial Intelligence. EPIA 2015. Lecture Notes in Computer Science(), vol 9273. Springer, Cham. https://doi.org/10.1007/978-3-319-23485-4_73

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23485-4_73

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23484-7

  • Online ISBN: 978-3-319-23485-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics