Abstract
We investigated multiple pivot approaches for the Japanese and Indonesian (Ja–Id) language pair in phrase-based statistical machine translation (SMT). We used four languages as pivots: viz., English, Malaysian, Filipino, and the Myanmar language. Considering that each language pair between the source–pivot and pivot–target has a different word order, we conducted two experiments, namely, without reordering (WoR) and with reordering (WR) on the source language. Triangulation and linear interpolation (LI) approaches were used to combine multiple pivot phrase tables. The combination of phrase tables was employed without a source–target phrase table. In the WoR experiment, the use of multiple pivots improved the BLEU scores by 0.24 and 2.49 compared to the baseline and single pivot, respectively. However, the translation output of WoR was incomprehensible because it followed the Japanese word order. In the WR experiment, we reordered the Japanese word order, that is, subject–object–verb (SOV), into Indonesian word order, that is, subject–verb–object (SVO) using the Lader (Latent Derivation Reorderer). The multiple pivots of WR improved the BLEU scores by 0.47 compared with the baseline. Furthermore, by combining many pivot languages, the BLEU score was improved by more than 0.20. The translation output of WR is also more comprehensible than that of WoR. Finally, a comparison with neural machine translation (NMT) indicates that SMT obtained better results than NMT in the experiments, including a small dataset setup.
Similar content being viewed by others
References
Adiputra CK, Arase Y (2017) Performance of Japanese-to-Indonesian machine translation on different models. In: The 23rd annual meeting of the Society of Language Processing. The Association for Natural Language Processing, pp 757–760
Ahmadnia B, Serrano J, Haffari G (2017) Persian–Spanish low-resource statistical machine translation through English as pivot language. In: Proceedings of the international conference recent advances in natural language processing (RANLP), pp 24–30
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations ICLR 2015. http://arxiv.org/abs/1409.0473
Bisazza A, Federico M (2016) A survey of word reordering in statistical machine translation: computational models and language phenomena. Comput Linguist 42(2):163–205. https://doi.org/10.1162/COLI_a_00245
Budiwati SD, Aritsugi M (2019) Multiple pivots in statistical machine translation for low resource languages. In: Proceedings of the 33rd Pacific Asia conference on language, information and computation (PACLIC33), pp 345–355. http://hdl.handle.net/2065/00063919
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluation the role of Bleu in machine translation research. In: EACL 2006, 11st conference of the European Chapter of the Association for Computational Linguistics. The Association for Computer Linguistics, pp 249–256
Dabre R, Cromieres F, Kurohashi S, Bhattacharyya P (2015) Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 1192–1202
Firat O, Cho K, Bengio Y (2016a) Multi-way, multilingual neural machine translation with a shared attention mechanism. In: NAACL HLT 2016, the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, pp 866–875
Firat O, Sankaran B, Al-Onaizan Y, Yarman-Vural FT, Cho K (2016b) Zero-resource translation with multi-lingual neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, pp 268–277
Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, EMNLP, pp 848–856
Goto I, Utiyama M, Sumita E (2012) Post-ordering by parsing for Japanese–English statistical machine translation. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics: short papers, vol 2. Association for Computational Linguistics, pp 311–316
Gu J, Wang Y, Cho K, Li VOK (2019) Improved zero-shot neural machine translation via ignoring spurious correlations. In: Proceedings of the 57th conference of the Association for Computational Linguistics, ACL 2019: long papers, vol 1, pp 1258–1268
Hoang DT, Bojar O (2016) Pivoting methods and data for Czech–Vietnamese translation via English. In: Proceedings of the 19th annual conference of the European Association for Machine Translation, EAMT 2016. European Association for Machine Translation, pp 190–202
Hoang H, Koehn P, Lopez A (2009) A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation. In: International workshop on spoken language translation (IWSLT) 2009. ISCA, pp 152–159
Hoshino S, Miyao Y, Sudoh K, Nagata M (2013) Two-stage pre-ordering for Japanese-to-English statistical machine translation. In: Sixth international joint conference on natural language processing, IJCNLP 2013, pp 1062–1066
Isozaki H, Sudoh K, Tsukada H, Duh K (2010) Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, pp 244–251
Isozaki H, Sudoh K, Tsukada H, Duh K (2012) HPSG-based preprocessing for English-to-Japanese translation. ACM Trans Asian Lang Inf Process 11(3):8:1-8:16. https://doi.org/10.1145/2334801.2334802
Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas FB, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351
Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics, ACL 2017. Association for Computational Linguistics, pp 67–72. https://doi.org/10.18653/v1/P17-4012
Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, NMT@ACL 2017. Association for Computational Linguistics, pp 28–39
Koehn P, Axelrod A, Birch A, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: 2005 International workshop on spoken language translation, IWSLT 2005. ISCA, pp 68–75
Larasati SD, Kubon V, Zeman D (2011) Indonesian Morphology Tool (MorphInd): towards an Indonesian corpus. In: Systems and frameworks for computational morphology—second international workshop, SFCM 2011. Springer, pp 119–129. https://doi.org/10.1007/978-3-642-23138-4_8
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, pp 1412–1421. https://doi.org/10.18653/v1/d15-1166
Neubig G, Nakata Y, Mori S (2011) Pointwise prediction for robust, adaptable Japanese morphological analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 529–533
Neubig G, Watanabe T, Mori S (2012) Inducing a discriminative parser to optimize machine translation reordering. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, EMNLP-CoNLL 2012. Association for Computational Linguistics, pp 843–853
Nomoto H, Okano K, Moeljadi D, Sawada H (2018) TUFS Asian Language Parallel Corpus (TALPCo). In: Proceedings of the twenty-fourth annual meeting of the Association for Natural Language Processing
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on Association for Computational Linguistics, ACL ’03, vol 1. Association for Computational Linguistics, pp 160–167. https://doi.org/10.3115/1075096.1075117
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting of Association for Computational Linguistics, ACL ’02. Association for Computational Linguistics, pp 295–302
Paul M, Finch AM, Sumita E (2013) How to choose the best pivot language for automatic translation of low-resource languages. ACM Trans Asian Lang Inf Process 12(4):14:1-14:17. https://doi.org/10.1145/2505126
Rashel F, Luthfi A, Dinakaramani A, Manurung R (2014) Building an Indonesian rule-based part-of-speech tagger. In: 2014 International conference on Asian language processing, IALP 2014, pp 70–73
Riza H, Purwoadi M, Gunarso, Uliniansyah T, Ti AA, Aljunied SM, Mai LC, Thang VT, Thai NP, Chea V, Sun R, Sam S, Seng S, Soe KM, Nwet KT, Utiyama M, Ding C (2016) Introduction of the Asian language treebank. In: 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA), pp 1–6. https://doi.org/10.1109/ICSDA.2016.7918974
Rubino R, Marie B, Dabre R, Fujita A, Utiyama M, Sumita E (2020) Extremely low-resource neural machine translation for Asian languages. Mach Transl 34(4):347–382. https://doi.org/10.1007/s10590-020-09258-6
Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 539–549
Simon HS, Purwarianti A (2013) Experiments on Indonesian–Japanese statistical machine translation. In: 2013 IEEE international conference on computational intelligence and cybernetics (CYBERNETICSCOM), pp 80–84. https://doi.org/10.1109/CyberneticsCom.2013.6865786
Singh TD (2015) An empirical study of diversity of word alignment and its symmetrization techniques for system combination. In: Proceedings of the 12th international conference on natural language processing, ICON 2015. NLP Association of India, pp 124–129
Stymne S, Tiedemann J, Nivre J (2014) Estimating word alignment quality for SMT reordering tasks. In: Proceedings of the ninth workshop on statistical machine translation, WMT@ACL 2014, pp 275–286. https://doi.org/10.3115/v1/w14-3334
Sulaeman MA, Purwarianti A (2015) Development of Indonesian–Japanese statistical machine translation using lemma translation and additional post-process. In: 2015 International conference on electrical engineering and informatics (ICEEI), pp 54–58. https://doi.org/10.1109/ICEEI.2015.7352469
Tiedemann J (2012) Parallel data, tools and interfaces in OPUS. In: Proceedings of the eight international conference on language resources and evaluation, LREC 2012. European Language Resources Association (ELRA)
Utiyama M, Isahara H (2007) A comparison of pivot methods for phrase-based statistical machine translation. In: Human language technologies 2007: the conference of the North American Chapter of the Association for Computational Linguistics; proceedings of the main conference. Association for Computational Linguistics, pp 484–491
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, pp 5998–6008
Wu H, Wang H (2007) Pivot language approach for phrase-based statistical machine translation. Mach Transl 21(3):165–181. https://doi.org/10.1007/s10590-008-9041-6
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144. http://arxiv.org/abs/1609.08144
Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016. The Association for Computational Linguistics, pp 1568–1575. https://doi.org/10.18653/v1/d16-1163
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Budiwati, S.D., Aritsugi, M. Word reordering on multiple pivots for the Japanese and Indonesian language pair. Machine Translation 35, 611–636 (2021). https://doi.org/10.1007/s10590-021-09288-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-021-09288-8