Forest to String Based Statistical Machine Translation with Hybrid Word Alignments

Pal, Santanu; Naskar, Sudip Kumar; van Genabith, Josef

doi:10.1007/978-3-319-75487-1_4

Santanu Pal¹⁴,
Sudip Kumar Naskar¹⁶ &
Josef van Genabith^14,15

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1135 Accesses

Abstract

Forest to String Based Statistical Machine Translation (FSBSMT) is a forest-based tree sequence to string translation model for syntax based statistical machine translation. The model automatically learns tree sequence to string translation rules from a given word alignment estimated on a source-side-parsed bilingual parallel corpus. This paper presents a hybrid method which combines different word alignment methods and integrates them into an FSBSMT system. The hybrid word alignment provides the most informative alignment links to the FSBSMT system. We show that hybrid word alignment integrated into various experimental settings of FSBSMT provides considerable improvement over state-of-the-art Hierarchical Phrase based SMT (HPBSMT). The research also demonstrates that additional integration of Named Entities (NEs), their translations and Example Based Machine Translation (EBMT) phrases (all extracted from the bilingual parallel training data) into the system brings about further considerable performance improvements over the hybrid FSBSMT system. We apply our hybrid model to a distant language pair, English–Bengali. The proposed system achieves 78.5% relative (9.84 BLEU points absolute) improvement over baseline HPBSMT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.statmt.org/wmt15/.
2.
http://www.nist.gov/itl/iad/mig/openmt15.cfm.
3.
http://ntcir.nii.ac.jp/PatentMTList/.
4.
http://phontron.com/travatar/.
5.
http://nlp.stanford.edu/software/CRF-NER.shtml.
6.
This corpus is produced in the EILMT project funded by DEITY, MCIT, Govt. of India.
7.
http://code.google.com/p/egret-parser/.

References

Ayan, N.F., Dorr, B.J., Monz, C.: NeurAlign: combining word alignments using neural networks. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 65–72. Association for Computational Linguistics, Vancouver, October 2005
Google Scholar
Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day Company, Oakland (1977)
MATH Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993)
Google Scholar
Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 427–436 (2012)
Google Scholar
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005)
Google Scholar
Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007)
Article MATH Google Scholar
Cicekli, I., Güvenir, H.A.: Learning translation templates from bilingual translation examples. Appl. Intell. 15(1), 57–76 (2001)
Article MATH Google Scholar
DeNero, J., Macherey, K.: Model-based aligner combination using dual decomposition. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 420–429. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Ding, Y., Palmer, M.: Machine translation using probabilistic synchronous dependency insertion grammars. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 541–548 (2005)
Google Scholar
Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: a language independent approach. Int. J. Electr. Comput. Syst. Eng. 4(2), 155–170 (2010)
MATH Google Scholar
Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 273–280. Association for Computational Linguistics, Boston (2004)
Google Scholar
Graehl, J., Knight, K.: Training tree transducers. In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 105–112. Association for Computational Linguistics, Boston (2004)
Google Scholar
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)
Google Scholar
Huang, L.: Statistical syntax-directed translation with extended domain of locality. In: Proceedings of the AMTA 2006, pp. 66–73 (2006)
Google Scholar
Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 244–251. Association for Computational Linguistics (2010)
Google Scholar
Junczys-Dowmunt, M., Szał, A.: SyMGiza++: symmetrized word alignment models for statistical machine translation. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 379–390. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25261-7_30
Chapter Google Scholar
Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)
MATH Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54 (2003)
Google Scholar
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231 (2007)
Google Scholar
Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 104–111 (2006)
Google Scholar
Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pp. 609–616 (2006)
Google Scholar
Liu, Y., Xia, T., Xiao, X., Liu, Q.: Weighted alignment matrices for statistical machine translation. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1017–1026. Association for Computational Linguistics, Singapore, August 2009
Google Scholar
Marcu, D., Wang, W., Echihabi, A., Knight, K.: SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 44–52, July 2006
Google Scholar
Mi, H., Huang, L.: Forest-based translation rule extraction. In: Proceedings of EMNLP, pp. 206–214. ACL (2008)
Google Scholar
Mi, H., Huang, L., Liu, Q.: Forest-based translation. In: Proceedings of ACL 2008: HLT, pp. 192–199. Association for Computational Linguistics, Columbus, June 2008
Google Scholar
Neubig, G.: Travatar: a forest-to-string machine translation engine based on tree transducers. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 91–96. Association for Computational Linguistics, Sofia (2013)
Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167 (2003)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article MATH Google Scholar
Pal, S., Naskar, S.K., Pecina, P., Bandyopadhyay, S., Way, A.: Handling named entities and compound verbs in phrase-based statistical machine translation. In: Proceedings of the of Multiword Expression Workshop (MWE 2010) and the 23rd International Conference of Computational Linguistics (Coling 2010) (2010)
Google Scholar
Pal, S., Naskar, S.K., Bandyopadhyay, S.: A hybrid word alignment model for phrase-based statistical machine translation. In: ACL 2013, pp. 94–101 (2013)
Google Scholar
Pal, S., Naskar, S.K., Bandyopadhyay, S.: Word alignment-based reordering of source chunks in PB-SMT. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), May 2014
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 311–318 (2002)
Google Scholar
Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp. 271–279 (2005)
Google Scholar
Shen, L., Xu, J., Weischedel, R.: A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of Association for Computational Linguistics, pp. 577–585 (2008)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Tan, L., Pal, S.: Manawi: using multi-word expressions and named entities to improve machine translation. In: Proceedings of Ninth Workshop on Statistical Machine Translation (2014)
Google Scholar
Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S.: Combining multiple alignments to improve machine translation. In: The 24th International Conference of Computational Linguistics (Coling 2012), pp. 1249–1260 (2012)
Google Scholar
Tu, Z., Liu, Y., Liu, Q., Lin, S.: Extracting hierarchical rules from a weighted alignment matrix. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1294–1303 (2011)
Google Scholar
Vilar, D., Popovi, M., Ney, H.: AER: do we need to improve our alignments. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 205–212 (2006)
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841. Association for Computational Linguistics (1996)
Google Scholar
Wu, X., Matsuzaki, T., Tsujii, J.: Effective use of function words for rule generalization in forest-based translation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies, Portland, Oregon, USA, pp. 22–31, June 2011
Google Scholar
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 523–530 (2001)
Google Scholar
Zhang, H., Zhang, M., Li, H., Aw, A., Tan, C.L.: Forest-based tree sequence to string translation model. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 172–180 (2009)
Google Scholar
Zollmann, A., Venugopal, A.: Syntax augmented machine translation via chart parsing. In: Proceedings on the Workshop on Statistical Machine Translation, New York City, pp. 138–141, June 2006
Google Scholar
Zollmann, A., Venugopal, A., Paulik, M., Vogel, S.: The syntax augmented MT (SAMT) system for the shared task in the 2007 ACL workshop on statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 216–219. Association for Computational Linguistics (2007)
Google Scholar

Download references

Acknowledgments

This work is supported by the People Programme (Marie Curie Actions) of the European Union’s Framework Programme (FP7/2007-2013) under REA grant agreement no. 317471.

Author information

Authors and Affiliations

Universität des Saarlandes, Saarbrücken, Germany
Santanu Pal & Josef van Genabith
German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany
Josef van Genabith
Jadavpur University, Kolkata, India
Sudip Kumar Naskar

Authors

Santanu Pal
View author publications
You can also search for this author in PubMed Google Scholar
Sudip Kumar Naskar
View author publications
You can also search for this author in PubMed Google Scholar
Josef van Genabith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santanu Pal .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pal, S., Naskar, S.K., van Genabith, J. (2018). Forest to String Based Statistical Machine Translation with Hybrid Word Alignments. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-75487-1_4
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics