Skip to main content

Forest to String Based Statistical Machine Translation with Hybrid Word Alignments

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

  • 1135 Accesses

Abstract

Forest to String Based Statistical Machine Translation (FSBSMT) is a forest-based tree sequence to string translation model for syntax based statistical machine translation. The model automatically learns tree sequence to string translation rules from a given word alignment estimated on a source-side-parsed bilingual parallel corpus. This paper presents a hybrid method which combines different word alignment methods and integrates them into an FSBSMT system. The hybrid word alignment provides the most informative alignment links to the FSBSMT system. We show that hybrid word alignment integrated into various experimental settings of FSBSMT provides considerable improvement over state-of-the-art Hierarchical Phrase based SMT (HPBSMT). The research also demonstrates that additional integration of Named Entities (NEs), their translations and Example Based Machine Translation (EBMT) phrases (all extracted from the bilingual parallel training data) into the system brings about further considerable performance improvements over the hybrid FSBSMT system. We apply our hybrid model to a distant language pair, English–Bengali. The proposed system achieves 78.5% relative (9.84 BLEU points absolute) improvement over baseline HPBSMT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.statmt.org/wmt15/.

  2. 2.

    http://www.nist.gov/itl/iad/mig/openmt15.cfm.

  3. 3.

    http://ntcir.nii.ac.jp/PatentMTList/.

  4. 4.

    http://phontron.com/travatar/.

  5. 5.

    http://nlp.stanford.edu/software/CRF-NER.shtml.

  6. 6.

    This corpus is produced in the EILMT project funded by DEITY, MCIT, Govt. of India.

  7. 7.

    http://code.google.com/p/egret-parser/.

References

  1. Ayan, N.F., Dorr, B.J., Monz, C.: NeurAlign: combining word alignments using neural networks. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 65–72. Association for Computational Linguistics, Vancouver, October 2005

    Google Scholar 

  2. Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day Company, Oakland (1977)

    MATH  Google Scholar 

  3. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993)

    Google Scholar 

  4. Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 427–436 (2012)

    Google Scholar 

  5. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005)

    Google Scholar 

  6. Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007)

    Article  MATH  Google Scholar 

  7. Cicekli, I., Güvenir, H.A.: Learning translation templates from bilingual translation examples. Appl. Intell. 15(1), 57–76 (2001)

    Article  MATH  Google Scholar 

  8. DeNero, J., Macherey, K.: Model-based aligner combination using dual decomposition. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 420–429. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  9. Ding, Y., Palmer, M.: Machine translation using probabilistic synchronous dependency insertion grammars. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 541–548 (2005)

    Google Scholar 

  10. Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: a language independent approach. Int. J. Electr. Comput. Syst. Eng. 4(2), 155–170 (2010)

    MATH  Google Scholar 

  11. Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 273–280. Association for Computational Linguistics, Boston (2004)

    Google Scholar 

  12. Graehl, J., Knight, K.: Training tree transducers. In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 105–112. Association for Computational Linguistics, Boston (2004)

    Google Scholar 

  13. Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)

    Google Scholar 

  14. Huang, L.: Statistical syntax-directed translation with extended domain of locality. In: Proceedings of the AMTA 2006, pp. 66–73 (2006)

    Google Scholar 

  15. Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 244–251. Association for Computational Linguistics (2010)

    Google Scholar 

  16. Junczys-Dowmunt, M., Szał, A.: SyMGiza++: symmetrized word alignment models for statistical machine translation. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 379–390. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25261-7_30

    Chapter  Google Scholar 

  17. Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)

    MATH  Google Scholar 

  18. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)

    Google Scholar 

  19. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54 (2003)

    Google Scholar 

  20. Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231 (2007)

    Google Scholar 

  21. Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 104–111 (2006)

    Google Scholar 

  22. Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pp. 609–616 (2006)

    Google Scholar 

  23. Liu, Y., Xia, T., Xiao, X., Liu, Q.: Weighted alignment matrices for statistical machine translation. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1017–1026. Association for Computational Linguistics, Singapore, August 2009

    Google Scholar 

  24. Marcu, D., Wang, W., Echihabi, A., Knight, K.: SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 44–52, July 2006

    Google Scholar 

  25. Mi, H., Huang, L.: Forest-based translation rule extraction. In: Proceedings of EMNLP, pp. 206–214. ACL (2008)

    Google Scholar 

  26. Mi, H., Huang, L., Liu, Q.: Forest-based translation. In: Proceedings of ACL 2008: HLT, pp. 192–199. Association for Computational Linguistics, Columbus, June 2008

    Google Scholar 

  27. Neubig, G.: Travatar: a forest-to-string machine translation engine based on tree transducers. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 91–96. Association for Computational Linguistics, Sofia (2013)

    Google Scholar 

  28. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167 (2003)

    Google Scholar 

  29. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  30. Pal, S., Naskar, S.K., Pecina, P., Bandyopadhyay, S., Way, A.: Handling named entities and compound verbs in phrase-based statistical machine translation. In: Proceedings of the of Multiword Expression Workshop (MWE 2010) and the 23rd International Conference of Computational Linguistics (Coling 2010) (2010)

    Google Scholar 

  31. Pal, S., Naskar, S.K., Bandyopadhyay, S.: A hybrid word alignment model for phrase-based statistical machine translation. In: ACL 2013, pp. 94–101 (2013)

    Google Scholar 

  32. Pal, S., Naskar, S.K., Bandyopadhyay, S.: Word alignment-based reordering of source chunks in PB-SMT. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), May 2014

    Google Scholar 

  33. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 311–318 (2002)

    Google Scholar 

  34. Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp. 271–279 (2005)

    Google Scholar 

  35. Shen, L., Xu, J., Weischedel, R.: A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of Association for Computational Linguistics, pp. 577–585 (2008)

    Google Scholar 

  36. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)

    Google Scholar 

  37. Tan, L., Pal, S.: Manawi: using multi-word expressions and named entities to improve machine translation. In: Proceedings of Ninth Workshop on Statistical Machine Translation (2014)

    Google Scholar 

  38. Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S.: Combining multiple alignments to improve machine translation. In: The 24th International Conference of Computational Linguistics (Coling 2012), pp. 1249–1260 (2012)

    Google Scholar 

  39. Tu, Z., Liu, Y., Liu, Q., Lin, S.: Extracting hierarchical rules from a weighted alignment matrix. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1294–1303 (2011)

    Google Scholar 

  40. Vilar, D., Popovi, M., Ney, H.: AER: do we need to improve our alignments. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 205–212 (2006)

    Google Scholar 

  41. Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841. Association for Computational Linguistics (1996)

    Google Scholar 

  42. Wu, X., Matsuzaki, T., Tsujii, J.: Effective use of function words for rule generalization in forest-based translation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies, Portland, Oregon, USA, pp. 22–31, June 2011

    Google Scholar 

  43. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 523–530 (2001)

    Google Scholar 

  44. Zhang, H., Zhang, M., Li, H., Aw, A., Tan, C.L.: Forest-based tree sequence to string translation model. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 172–180 (2009)

    Google Scholar 

  45. Zollmann, A., Venugopal, A.: Syntax augmented machine translation via chart parsing. In: Proceedings on the Workshop on Statistical Machine Translation, New York City, pp. 138–141, June 2006

    Google Scholar 

  46. Zollmann, A., Venugopal, A., Paulik, M., Vogel, S.: The syntax augmented MT (SAMT) system for the shared task in the 2007 ACL workshop on statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 216–219. Association for Computational Linguistics (2007)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the People Programme (Marie Curie Actions) of the European Union’s Framework Programme (FP7/2007-2013) under REA grant agreement no. 317471.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santanu Pal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pal, S., Naskar, S.K., van Genabith, J. (2018). Forest to String Based Statistical Machine Translation with Hybrid Word Alignments. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics