Skip to main content
Log in

Word reordering on multiple pivots for the Japanese and Indonesian language pair

  • Published:
Machine Translation

Abstract

We investigated multiple pivot approaches for the Japanese and Indonesian (Ja–Id) language pair in phrase-based statistical machine translation (SMT). We used four languages as pivots: viz., English, Malaysian, Filipino, and the Myanmar language. Considering that each language pair between the source–pivot and pivot–target has a different word order, we conducted two experiments, namely, without reordering (WoR) and with reordering (WR) on the source language. Triangulation and linear interpolation (LI) approaches were used to combine multiple pivot phrase tables. The combination of phrase tables was employed without a source–target phrase table. In the WoR experiment, the use of multiple pivots improved the BLEU scores by 0.24 and 2.49 compared to the baseline and single pivot, respectively. However, the translation output of WoR was incomprehensible because it followed the Japanese word order. In the WR experiment, we reordered the Japanese word order, that is, subject–object–verb (SOV), into Indonesian word order, that is, subject–verb–object (SVO) using the Lader (Latent Derivation Reorderer). The multiple pivots of WR improved the BLEU scores by 0.47 compared with the baseline. Furthermore, by combining many pivot languages, the BLEU score was improved by more than 0.20. The translation output of WR is also more comprehensible than that of WoR. Finally, a comparison with neural machine translation (NMT) indicates that SMT obtained better results than NMT in the experiments, including a small dataset setup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://github.com/s4d3/Pivot-for-Low-Resource-Languges.

  2. https://gist.github.com/neubig/2555399.

  3. https://github.com/OpenNMT/OpenNMT-py.

  4. https://github.com/OpenNMT/OpenNMT-py/tree/master/config.

References

  • Adiputra CK, Arase Y (2017) Performance of Japanese-to-Indonesian machine translation on different models. In: The 23rd annual meeting of the Society of Language Processing. The Association for Natural Language Processing, pp 757–760

  • Ahmadnia B, Serrano J, Haffari G (2017) Persian–Spanish low-resource statistical machine translation through English as pivot language. In: Proceedings of the international conference recent advances in natural language processing (RANLP), pp 24–30

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations ICLR 2015. http://arxiv.org/abs/1409.0473

  • Bisazza A, Federico M (2016) A survey of word reordering in statistical machine translation: computational models and language phenomena. Comput Linguist 42(2):163–205. https://doi.org/10.1162/COLI_a_00245

    Article  MathSciNet  Google Scholar 

  • Budiwati SD, Aritsugi M (2019) Multiple pivots in statistical machine translation for low resource languages. In: Proceedings of the 33rd Pacific Asia conference on language, information and computation (PACLIC33), pp 345–355. http://hdl.handle.net/2065/00063919

  • Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluation the role of Bleu in machine translation research. In: EACL 2006, 11st conference of the European Chapter of the Association for Computational Linguistics. The Association for Computer Linguistics, pp 249–256

  • Dabre R, Cromieres F, Kurohashi S, Bhattacharyya P (2015) Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 1192–1202

  • Firat O, Cho K, Bengio Y (2016a) Multi-way, multilingual neural machine translation with a shared attention mechanism. In: NAACL HLT 2016, the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, pp 866–875

  • Firat O, Sankaran B, Al-Onaizan Y, Yarman-Vural FT, Cho K (2016b) Zero-resource translation with multi-lingual neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, pp 268–277

  • Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, EMNLP, pp 848–856

  • Goto I, Utiyama M, Sumita E (2012) Post-ordering by parsing for Japanese–English statistical machine translation. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics: short papers, vol 2. Association for Computational Linguistics, pp 311–316

  • Gu J, Wang Y, Cho K, Li VOK (2019) Improved zero-shot neural machine translation via ignoring spurious correlations. In: Proceedings of the 57th conference of the Association for Computational Linguistics, ACL 2019: long papers, vol 1, pp 1258–1268

  • Hoang DT, Bojar O (2016) Pivoting methods and data for Czech–Vietnamese translation via English. In: Proceedings of the 19th annual conference of the European Association for Machine Translation, EAMT 2016. European Association for Machine Translation, pp 190–202

  • Hoang H, Koehn P, Lopez A (2009) A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation. In: International workshop on spoken language translation (IWSLT) 2009. ISCA, pp 152–159

  • Hoshino S, Miyao Y, Sudoh K, Nagata M (2013) Two-stage pre-ordering for Japanese-to-English statistical machine translation. In: Sixth international joint conference on natural language processing, IJCNLP 2013, pp 1062–1066

  • Isozaki H, Sudoh K, Tsukada H, Duh K (2010) Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, pp 244–251

  • Isozaki H, Sudoh K, Tsukada H, Duh K (2012) HPSG-based preprocessing for English-to-Japanese translation. ACM Trans Asian Lang Inf Process 11(3):8:1-8:16. https://doi.org/10.1145/2334801.2334802

    Article  Google Scholar 

  • Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas FB, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351

    Article  Google Scholar 

  • Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics, ACL 2017. Association for Computational Linguistics, pp 67–72. https://doi.org/10.18653/v1/P17-4012

  • Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, NMT@ACL 2017. Association for Computational Linguistics, pp 28–39

  • Koehn P, Axelrod A, Birch A, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: 2005 International workshop on spoken language translation, IWSLT 2005. ISCA, pp 68–75

  • Larasati SD, Kubon V, Zeman D (2011) Indonesian Morphology Tool (MorphInd): towards an Indonesian corpus. In: Systems and frameworks for computational morphology—second international workshop, SFCM 2011. Springer, pp 119–129. https://doi.org/10.1007/978-3-642-23138-4_8

  • Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, pp 1412–1421. https://doi.org/10.18653/v1/d15-1166

  • Neubig G, Nakata Y, Mori S (2011) Pointwise prediction for robust, adaptable Japanese morphological analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 529–533

  • Neubig G, Watanabe T, Mori S (2012) Inducing a discriminative parser to optimize machine translation reordering. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, EMNLP-CoNLL 2012. Association for Computational Linguistics, pp 843–853

  • Nomoto H, Okano K, Moeljadi D, Sawada H (2018) TUFS Asian Language Parallel Corpus (TALPCo). In: Proceedings of the twenty-fourth annual meeting of the Association for Natural Language Processing

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on Association for Computational Linguistics, ACL ’03, vol 1. Association for Computational Linguistics, pp 160–167. https://doi.org/10.3115/1075096.1075117

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting of Association for Computational Linguistics, ACL ’02. Association for Computational Linguistics, pp 295–302

  • Paul M, Finch AM, Sumita E (2013) How to choose the best pivot language for automatic translation of low-resource languages. ACM Trans Asian Lang Inf Process 12(4):14:1-14:17. https://doi.org/10.1145/2505126

    Article  Google Scholar 

  • Rashel F, Luthfi A, Dinakaramani A, Manurung R (2014) Building an Indonesian rule-based part-of-speech tagger. In: 2014 International conference on Asian language processing, IALP 2014, pp 70–73

  • Riza H, Purwoadi M, Gunarso, Uliniansyah T, Ti AA, Aljunied SM, Mai LC, Thang VT, Thai NP, Chea V, Sun R, Sam S, Seng S, Soe KM, Nwet KT, Utiyama M, Ding C (2016) Introduction of the Asian language treebank. In: 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA), pp 1–6. https://doi.org/10.1109/ICSDA.2016.7918974

  • Rubino R, Marie B, Dabre R, Fujita A, Utiyama M, Sumita E (2020) Extremely low-resource neural machine translation for Asian languages. Mach Transl 34(4):347–382. https://doi.org/10.1007/s10590-020-09258-6

    Article  Google Scholar 

  • Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 539–549

  • Simon HS, Purwarianti A (2013) Experiments on Indonesian–Japanese statistical machine translation. In: 2013 IEEE international conference on computational intelligence and cybernetics (CYBERNETICSCOM), pp 80–84. https://doi.org/10.1109/CyberneticsCom.2013.6865786

  • Singh TD (2015) An empirical study of diversity of word alignment and its symmetrization techniques for system combination. In: Proceedings of the 12th international conference on natural language processing, ICON 2015. NLP Association of India, pp 124–129

  • Stymne S, Tiedemann J, Nivre J (2014) Estimating word alignment quality for SMT reordering tasks. In: Proceedings of the ninth workshop on statistical machine translation, WMT@ACL 2014, pp 275–286. https://doi.org/10.3115/v1/w14-3334

  • Sulaeman MA, Purwarianti A (2015) Development of Indonesian–Japanese statistical machine translation using lemma translation and additional post-process. In: 2015 International conference on electrical engineering and informatics (ICEEI), pp 54–58. https://doi.org/10.1109/ICEEI.2015.7352469

  • Tiedemann J (2012) Parallel data, tools and interfaces in OPUS. In: Proceedings of the eight international conference on language resources and evaluation, LREC 2012. European Language Resources Association (ELRA)

  • Utiyama M, Isahara H (2007) A comparison of pivot methods for phrase-based statistical machine translation. In: Human language technologies 2007: the conference of the North American Chapter of the Association for Computational Linguistics; proceedings of the main conference. Association for Computational Linguistics, pp 484–491

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, pp 5998–6008

  • Wu H, Wang H (2007) Pivot language approach for phrase-based statistical machine translation. Mach Transl 21(3):165–181. https://doi.org/10.1007/s10590-008-9041-6

    Article  Google Scholar 

  • Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144. http://arxiv.org/abs/1609.08144

  • Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016. The Association for Computational Linguistics, pp 1568–1575. https://doi.org/10.18653/v1/d16-1163

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sari Dewi Budiwati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Budiwati, S.D., Aritsugi, M. Word reordering on multiple pivots for the Japanese and Indonesian language pair. Machine Translation 35, 611–636 (2021). https://doi.org/10.1007/s10590-021-09288-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-021-09288-8

Keywords

Navigation