Abstract
Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (distortion limit). For English-to-Japanese translation, we need a large distance limit to obtain acceptable translations, and the number of translation candidates is extremely large. Therefore, SMT systems often fail to find acceptable translations within a limited time. To solve this problem, some researchers use rule-based preprocessing approaches, which reorder English words just like Japanese by using dozens of rules. Our idea is based on the following two observations: (1) Japanese is a typical head-final language, and (2) we can detect heads of English sentences by a head-driven phrase structure grammar (HPSG) parser. The main contributions of this article are twofold: First, we demonstrate how off-the-shelf, state-of-the-art HPSG parser enables us to write the reordering rules in an abstract level and can easily improve the quality of English-to-Japanese translation. Second, we also show that syntactic heads achieve better results than semantic heads. The proposed method outperforms the best system of NTCIR-7 PATMT EJ task.
- Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 173--180. Google ScholarDigital Library
- Collins, M., Koehn, P., and Kucerova, I. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). Google ScholarDigital Library
- de Marneffe, M.-C., MacCartney, B., and Manning, C. D. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the Language Resources and Evaluation Conference (LREC’06). 449--454.Google Scholar
- Echizen-ya, H., Ehara, T., Shimohata, S., Fujii, A., Utiyama, M., Yamamoto, M., Utsuro, T., and Kando, N. 2009. Meta-evaluation of automatic evaluation methods for machine translation using patent translation data in NTCIR-7. In Proceedings of the 3rd Workshop on Patent Translation (WPT’09). 9--16.Google Scholar
- Fujii, A., Utiyama, M., Yamamoto, M., and Utsuro, T. 2008. Overview of the patent translation task at the NTCIR-7 workshop. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 389--400.Google Scholar
- Galley, M., Hopkins, M., Knight, K., and Marcu, D. 2004. What’s in a translation rule? In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL’04). 273--280.Google Scholar
- Genzel, D. 2010. Automatically learning source-side reordering rules for large-scale machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 376--384. Google ScholarDigital Library
- Hong, G., Lee, S.-W., and Rim, H.-C. 2009. Bridging morpho-syntactic gap between source and target sentences for English-Korean statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’09). 233--236. Google ScholarDigital Library
- Huang, L., Knight, K., and Joshi, A. 2006. A syntax-directed translator with extended domain of locality. In Proceedings of Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing (CHPJISLP’06). 1--8. Google ScholarDigital Library
- Isozaki, H., Sudoh, K., Tsukada, H., and Duh, K. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 250--257. Google ScholarDigital Library
- Izuha, T., Kumano, A., and Kuroda, Y. 2008. Toshiba rule-based machine translation system at NTCIR-7 PAT MT. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 430--434.Google Scholar
- Katz-Brown, J. and Collins, M. 2008. Syntactic reordering in preprocessing for Japanese → English translation: MIT system description for NTCIR-7 patent translation task. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08).Google Scholar
- Kendall, M. G. 1975. Rank Correlation Methods. Charles Griffin.Google Scholar
- Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395.Google Scholar
- Koehn, P. 2009. Statistical Machine Translation. Cambridge University Press. Google ScholarDigital Library
- Koehn, P. 2010. MOSES, Statistical Machine Translation System, User Manual and Code Guide. www.statmt.org/moses/manual/manual.pdf.Google Scholar
- Kumai, H., Segawa, H., and Morimoto, Y. 2008. NTCIR-7 patent translation experiments at Hitachi. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 441--444.Google Scholar
- Lee, Y.-S., Zhao, B., and Luo, X. 2010. Constituent reordering and syntax models for English-to-Japanese statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 626--634. Google ScholarDigital Library
- Li, C.-H., Zhang, D., Li, M., Zhou, M., Li, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’07). 720--727.Google Scholar
- Lin, C.-Y. and Och, F. J. 2004. Automatic evaluation of translation quality using longest common subsequences and skip-bigram statistics. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’04). 605--612. Google ScholarDigital Library
- Liu, D. and Gildea, D. 2008. Improved tree-to-string transducer for machine translation. In Proceedings of the Workshop on Statistical Machine Translation (SMT’08). 62--69. Google ScholarDigital Library
- Miyao, Y. and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34, 1, 35--80. Google ScholarDigital Library
- Miyao, Y. and Tsujii, J. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 83--90. Google ScholarDigital Library
- Nakachi, K., Furuse, J., Kinoshita, T., Kawashima, M., and And, H. I. 2010. A phase II study of induction chemotherapy with gemcitabine plus S-1 followed by chemoradiotherapy for locally advanced pancreatic cancer. Cancer Chemo. Pharmacol. 66, 3, 527--534.Google Scholar
- Nakazawa, T. and Kurohashi, S. 2008. Kyoto-u: Syntactical EBMT system for NTCIR-7 patent translation task. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 401--408.Google Scholar
- Nguyen, T. P. and Shimazu, A. 2006. Improving phrase-based statistical machine translation with morphosyntactic transformation. Mach. Trans. 20, 3, 147--166. Google ScholarDigital Library
- Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google ScholarDigital Library
- Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’02). 311--318. Google ScholarDigital Library
- Pollard, C. and Sag, I. A. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press.Google Scholar
- Quirk, C., Menezes, A., and Cherry, C. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 271--279. Google ScholarDigital Library
- Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA’06).Google Scholar
- Su, K.-Y., Wu, M.-W., and Chang, J.-S. 1992. A new quantitative quality measure for machine translation systems. In Proceedings of the International Conference on Computational Linguistics (COLING’92). 433--439. Google ScholarDigital Library
- Sudoh, K., Duh, K., Tsukada, H., Hirao, T., and Nagata, M. 2010. Divide and translate: Improving long distance reordering in statistical machine translation. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 418--427. Google ScholarDigital Library
- Toutanova, K. and Suzuki, H. 2007. Generating case markers in machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 49--56.Google Scholar
- Watanabe, T., Tsukada, H., and Isozaki, H. 2008. NTT SMT System 2008 at NTCIR-7. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 420--422.Google Scholar
- Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23, 3, 377--403. Google ScholarDigital Library
- Wu, X., Matsuzaki, T., and Tsujii, J. 2010. Fine-grained tree-to-string translation rule extraction. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’10). 325--334. Google ScholarDigital Library
- Xia, F. and McCord, M. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics (COLING’04). 508--514. Google ScholarDigital Library
- Xu, P., Kang, J., Ringgaard, M., and Och, F. 2009. Using a dependency parser to improve SMT for Subject-Object-Verb languages. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’09). 245--253. Google ScholarDigital Library
- Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’01). 523--530. Google ScholarDigital Library
Index Terms
- HPSG-Based Preprocessing for English-to-Japanese Translation
Recommendations
Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation
A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a ...
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation
This article proposes a novel reordering method for efficient two-step Japanese-to-English statistical machine translation (SMT) that isolates reordering from SMT and solves it after lexical translation. This reordering problem, called post-ordering, is ...
Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders
English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for ...
Comments