Abstract
Forest to String Based Statistical Machine Translation (FSBSMT) is a forest-based tree sequence to string translation model for syntax based statistical machine translation. The model automatically learns tree sequence to string translation rules from a given word alignment estimated on a source-side-parsed bilingual parallel corpus. This paper presents a hybrid method which combines different word alignment methods and integrates them into an FSBSMT system. The hybrid word alignment provides the most informative alignment links to the FSBSMT system. We show that hybrid word alignment integrated into various experimental settings of FSBSMT provides considerable improvement over state-of-the-art Hierarchical Phrase based SMT (HPBSMT). The research also demonstrates that additional integration of Named Entities (NEs), their translations and Example Based Machine Translation (EBMT) phrases (all extracted from the bilingual parallel training data) into the system brings about further considerable performance improvements over the hybrid FSBSMT system. We apply our hybrid model to a distant language pair, English–Bengali. The proposed system achieves 78.5% relative (9.84 BLEU points absolute) improvement over baseline HPBSMT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
This corpus is produced in the EILMT project funded by DEITY, MCIT, Govt. of India.
- 7.
References
Ayan, N.F., Dorr, B.J., Monz, C.: NeurAlign: combining word alignments using neural networks. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 65–72. Association for Computational Linguistics, Vancouver, October 2005
Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day Company, Oakland (1977)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993)
Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 427–436 (2012)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005)
Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007)
Cicekli, I., Güvenir, H.A.: Learning translation templates from bilingual translation examples. Appl. Intell. 15(1), 57–76 (2001)
DeNero, J., Macherey, K.: Model-based aligner combination using dual decomposition. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 420–429. Association for Computational Linguistics, Stroudsburg (2011)
Ding, Y., Palmer, M.: Machine translation using probabilistic synchronous dependency insertion grammars. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 541–548 (2005)
Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: a language independent approach. Int. J. Electr. Comput. Syst. Eng. 4(2), 155–170 (2010)
Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 273–280. Association for Computational Linguistics, Boston (2004)
Graehl, J., Knight, K.: Training tree transducers. In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 105–112. Association for Computational Linguistics, Boston (2004)
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)
Huang, L.: Statistical syntax-directed translation with extended domain of locality. In: Proceedings of the AMTA 2006, pp. 66–73 (2006)
Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 244–251. Association for Computational Linguistics (2010)
Junczys-Dowmunt, M., Szał, A.: SyMGiza++: symmetrized word alignment models for statistical machine translation. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 379–390. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25261-7_30
Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54 (2003)
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231 (2007)
Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 104–111 (2006)
Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pp. 609–616 (2006)
Liu, Y., Xia, T., Xiao, X., Liu, Q.: Weighted alignment matrices for statistical machine translation. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1017–1026. Association for Computational Linguistics, Singapore, August 2009
Marcu, D., Wang, W., Echihabi, A., Knight, K.: SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 44–52, July 2006
Mi, H., Huang, L.: Forest-based translation rule extraction. In: Proceedings of EMNLP, pp. 206–214. ACL (2008)
Mi, H., Huang, L., Liu, Q.: Forest-based translation. In: Proceedings of ACL 2008: HLT, pp. 192–199. Association for Computational Linguistics, Columbus, June 2008
Neubig, G.: Travatar: a forest-to-string machine translation engine based on tree transducers. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 91–96. Association for Computational Linguistics, Sofia (2013)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167 (2003)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Pal, S., Naskar, S.K., Pecina, P., Bandyopadhyay, S., Way, A.: Handling named entities and compound verbs in phrase-based statistical machine translation. In: Proceedings of the of Multiword Expression Workshop (MWE 2010) and the 23rd International Conference of Computational Linguistics (Coling 2010) (2010)
Pal, S., Naskar, S.K., Bandyopadhyay, S.: A hybrid word alignment model for phrase-based statistical machine translation. In: ACL 2013, pp. 94–101 (2013)
Pal, S., Naskar, S.K., Bandyopadhyay, S.: Word alignment-based reordering of source chunks in PB-SMT. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), May 2014
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 311–318 (2002)
Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp. 271–279 (2005)
Shen, L., Xu, J., Weischedel, R.: A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of Association for Computational Linguistics, pp. 577–585 (2008)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Tan, L., Pal, S.: Manawi: using multi-word expressions and named entities to improve machine translation. In: Proceedings of Ninth Workshop on Statistical Machine Translation (2014)
Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S.: Combining multiple alignments to improve machine translation. In: The 24th International Conference of Computational Linguistics (Coling 2012), pp. 1249–1260 (2012)
Tu, Z., Liu, Y., Liu, Q., Lin, S.: Extracting hierarchical rules from a weighted alignment matrix. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1294–1303 (2011)
Vilar, D., Popovi, M., Ney, H.: AER: do we need to improve our alignments. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 205–212 (2006)
Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841. Association for Computational Linguistics (1996)
Wu, X., Matsuzaki, T., Tsujii, J.: Effective use of function words for rule generalization in forest-based translation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies, Portland, Oregon, USA, pp. 22–31, June 2011
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 523–530 (2001)
Zhang, H., Zhang, M., Li, H., Aw, A., Tan, C.L.: Forest-based tree sequence to string translation model. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 172–180 (2009)
Zollmann, A., Venugopal, A.: Syntax augmented machine translation via chart parsing. In: Proceedings on the Workshop on Statistical Machine Translation, New York City, pp. 138–141, June 2006
Zollmann, A., Venugopal, A., Paulik, M., Vogel, S.: The syntax augmented MT (SAMT) system for the shared task in the 2007 ACL workshop on statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 216–219. Association for Computational Linguistics (2007)
Acknowledgments
This work is supported by the People Programme (Marie Curie Actions) of the European Union’s Framework Programme (FP7/2007-2013) under REA grant agreement no. 317471.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Pal, S., Naskar, S.K., van Genabith, J. (2018). Forest to String Based Statistical Machine Translation with Hybrid Word Alignments. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-75487-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)