A POS-Based Preordering Approach for English-to-Arabic Statistical Machine Translation

Hadj Ameur, Mohamed Seghir; Guessoum, Ahmed; Meziane, Farid

doi:10.1007/978-3-319-73500-9_3

Mohamed Seghir Hadj Ameur¹⁴,
Ahmed Guessoum¹⁴ &
Farid Meziane¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 782))

Included in the following conference series:

International Conference on Arabic Language Processing

926 Accesses

Abstract

In this work, we present a POS-based preordering approach that tackles both long- and short-distance reordering phenomena. Syntactic unlexicalized reordering rules are automatically extracted from a parallel corpus using only word alignment and a source-side language tagging. The reordering rules are used in a deterministic manner; this prevents the decoding speed from being bottlenecked in the reordering procedure. A new approach for both rule filtering and rule application is used to ensure a fast and efficient reordering. The tests performed on the IWSLT2016 English-to-Arabic evaluation benchmark show a noticeable increase in the overall Blue Score for our system over the baseline PSMT system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
FindCS is a simple method that finds the number of crossing alignments (CS) for a given aligned sentence.
2.
http://workshop2016.iwslt.org/59.php.
3.
The conversion table can be found in the following link http://universaldependencies.org/tagset-conversion/en-penn-uposf.html.
4.
http://www.nltk.org/.
5.
We mean by a monotonic corpus, a corpus in which the alignment does not contain any crossing links.

References

Brown, P.F., Cocke, J., Della-Pietra, S.A., Della-Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Rossin, P.: A statistical approach to machine translation. Computat. Linguist. 16(2), 76–85 (1990)
Google Scholar
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Lakemeyer, G., Koehler, J. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45751-8_2
Chapter Google Scholar
Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 295–302 (2002)
Google Scholar
Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, p. 508 (2004)
Google Scholar
Habash, N.: Syntactic preprocessing for statistical machine translation. In: Proceedings of the 11th MT Summit, p. 10 (2007)
Google Scholar
Genzel, D.: Automatically learning source-side reordering rules for large scale machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 376–384. Association for Computational Linguistics (2010)
Google Scholar
Yang, N., Li, M., Zhang, D., Yu, N.: A ranking-based approach to word reordering for statistical machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 912–920. Association for Computational Linguistics (2012)
Google Scholar
Sudoh, K., Nagata, M.: Chinese-to-Japanese patent machine translation based on syntactic pre-ordering for WAT 2016. In: Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pp. 211–215 (2016)
Google Scholar
Jehl, L., Gispert, A., Hopkins, M., Byrne, W.: Source-side preordering for translation using logistic regression and depth-first branch-and-bound search (2014)
Google Scholar
Fuji, M., Utiyama, M., Sumita, E., Matsumoto, Y.: Global pre-ordering for improving sublanguage translation. In: WAT 2016, p. 84 (2016)
Google Scholar
Zhang, Y., Zens, R., Ney, H.: Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation, pp. 1–8. Association for Computational Linguistics (2007)
Google Scholar
Elming, J.: Syntactic reordering integrated with phrase-based SMT. In: Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation, pp. 46–54. Association for Computational Linguistics (2008)
Google Scholar
Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)
MATH Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167. Association for Computational Linguistics (2003)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint. arXiv:1104.2086 (2011)
De La Briandais, R.: File searching using variable length keys. In: Papers presented at the March 3–5, 1959, Western Joint Computer Conference, pp. 295–298. ACM (1959)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Google Scholar
Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools, vol. 110 (2009)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: IWSLT, pp. 68–75 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

NLP, Machine Learning and Applications (TALAA) Group Laboratory for Research in Artificial Intelligence (LRIA), Department of Computer Science, University of Science and Technology Houari Boumediene (USTHB), Bab-Ezzouar, Algiers, Algeria
Mohamed Seghir Hadj Ameur, Ahmed Guessoum & Farid Meziane

Authors

Mohamed Seghir Hadj Ameur
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Guessoum
View author publications
You can also search for this author in PubMed Google Scholar
Farid Meziane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mohamed Seghir Hadj Ameur , Ahmed Guessoum or Farid Meziane .

Editor information

Editors and Affiliations

Ex ENSA-USMBA, Fez, Morocco
Abdelmonaime Lachkar
EMI, UM5, Rabat, Morocco
Karim Bouzoubaa
FS, UMP, Oujda, Morocco
Azzedine Mazroui
IERA, UM5, Rabat, Morocco
Abdelfettah Hamdani
FS, UMP, Oujda, Morocco
Abdelhak Lekhouaja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hadj Ameur, M.S., Guessoum, A., Meziane, F. (2018). A POS-Based Preordering Approach for English-to-Arabic Statistical Machine Translation. In: Lachkar, A., Bouzoubaa, K., Mazroui, A., Hamdani, A., Lekhouaja, A. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2017. Communications in Computer and Information Science, vol 782. Springer, Cham. https://doi.org/10.1007/978-3-319-73500-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-73500-9_3
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73499-6
Online ISBN: 978-3-319-73500-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics