skip to main content
research-article

HPSG-Based Preprocessing for English-to-Japanese Translation

Published:01 September 2012Publication History
Skip Abstract Section

Abstract

Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (distortion limit). For English-to-Japanese translation, we need a large distance limit to obtain acceptable translations, and the number of translation candidates is extremely large. Therefore, SMT systems often fail to find acceptable translations within a limited time. To solve this problem, some researchers use rule-based preprocessing approaches, which reorder English words just like Japanese by using dozens of rules. Our idea is based on the following two observations: (1) Japanese is a typical head-final language, and (2) we can detect heads of English sentences by a head-driven phrase structure grammar (HPSG) parser. The main contributions of this article are twofold: First, we demonstrate how off-the-shelf, state-of-the-art HPSG parser enables us to write the reordering rules in an abstract level and can easily improve the quality of English-to-Japanese translation. Second, we also show that syntactic heads achieve better results than semantic heads. The proposed method outperforms the best system of NTCIR-7 PATMT EJ task.

References

  1. Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 173--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Collins, M., Koehn, P., and Kucerova, I. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. de Marneffe, M.-C., MacCartney, B., and Manning, C. D. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the Language Resources and Evaluation Conference (LREC’06). 449--454.Google ScholarGoogle Scholar
  4. Echizen-ya, H., Ehara, T., Shimohata, S., Fujii, A., Utiyama, M., Yamamoto, M., Utsuro, T., and Kando, N. 2009. Meta-evaluation of automatic evaluation methods for machine translation using patent translation data in NTCIR-7. In Proceedings of the 3rd Workshop on Patent Translation (WPT’09). 9--16.Google ScholarGoogle Scholar
  5. Fujii, A., Utiyama, M., Yamamoto, M., and Utsuro, T. 2008. Overview of the patent translation task at the NTCIR-7 workshop. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 389--400.Google ScholarGoogle Scholar
  6. Galley, M., Hopkins, M., Knight, K., and Marcu, D. 2004. What’s in a translation rule? In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL’04). 273--280.Google ScholarGoogle Scholar
  7. Genzel, D. 2010. Automatically learning source-side reordering rules for large-scale machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 376--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hong, G., Lee, S.-W., and Rim, H.-C. 2009. Bridging morpho-syntactic gap between source and target sentences for English-Korean statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’09). 233--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Huang, L., Knight, K., and Joshi, A. 2006. A syntax-directed translator with extended domain of locality. In Proceedings of Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing (CHPJISLP’06). 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Isozaki, H., Sudoh, K., Tsukada, H., and Duh, K. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 250--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Izuha, T., Kumano, A., and Kuroda, Y. 2008. Toshiba rule-based machine translation system at NTCIR-7 PAT MT. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 430--434.Google ScholarGoogle Scholar
  12. Katz-Brown, J. and Collins, M. 2008. Syntactic reordering in preprocessing for Japanese → English translation: MIT system description for NTCIR-7 patent translation task. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08).Google ScholarGoogle Scholar
  13. Kendall, M. G. 1975. Rank Correlation Methods. Charles Griffin.Google ScholarGoogle Scholar
  14. Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395.Google ScholarGoogle Scholar
  15. Koehn, P. 2009. Statistical Machine Translation. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Koehn, P. 2010. MOSES, Statistical Machine Translation System, User Manual and Code Guide. www.statmt.org/moses/manual/manual.pdf.Google ScholarGoogle Scholar
  17. Kumai, H., Segawa, H., and Morimoto, Y. 2008. NTCIR-7 patent translation experiments at Hitachi. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 441--444.Google ScholarGoogle Scholar
  18. Lee, Y.-S., Zhao, B., and Luo, X. 2010. Constituent reordering and syntax models for English-to-Japanese statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 626--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Li, C.-H., Zhang, D., Li, M., Zhou, M., Li, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’07). 720--727.Google ScholarGoogle Scholar
  20. Lin, C.-Y. and Och, F. J. 2004. Automatic evaluation of translation quality using longest common subsequences and skip-bigram statistics. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’04). 605--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Liu, D. and Gildea, D. 2008. Improved tree-to-string transducer for machine translation. In Proceedings of the Workshop on Statistical Machine Translation (SMT’08). 62--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Miyao, Y. and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34, 1, 35--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Miyao, Y. and Tsujii, J. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 83--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nakachi, K., Furuse, J., Kinoshita, T., Kawashima, M., and And, H. I. 2010. A phase II study of induction chemotherapy with gemcitabine plus S-1 followed by chemoradiotherapy for locally advanced pancreatic cancer. Cancer Chemo. Pharmacol. 66, 3, 527--534.Google ScholarGoogle Scholar
  25. Nakazawa, T. and Kurohashi, S. 2008. Kyoto-u: Syntactical EBMT system for NTCIR-7 patent translation task. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 401--408.Google ScholarGoogle Scholar
  26. Nguyen, T. P. and Shimazu, A. 2006. Improving phrase-based statistical machine translation with morphosyntactic transformation. Mach. Trans. 20, 3, 147--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’02). 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pollard, C. and Sag, I. A. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press.Google ScholarGoogle Scholar
  30. Quirk, C., Menezes, A., and Cherry, C. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA’06).Google ScholarGoogle Scholar
  32. Su, K.-Y., Wu, M.-W., and Chang, J.-S. 1992. A new quantitative quality measure for machine translation systems. In Proceedings of the International Conference on Computational Linguistics (COLING’92). 433--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sudoh, K., Duh, K., Tsukada, H., Hirao, T., and Nagata, M. 2010. Divide and translate: Improving long distance reordering in statistical machine translation. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 418--427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Toutanova, K. and Suzuki, H. 2007. Generating case markers in machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 49--56.Google ScholarGoogle Scholar
  35. Watanabe, T., Tsukada, H., and Isozaki, H. 2008. NTT SMT System 2008 at NTCIR-7. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 420--422.Google ScholarGoogle Scholar
  36. Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23, 3, 377--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wu, X., Matsuzaki, T., and Tsujii, J. 2010. Fine-grained tree-to-string translation rule extraction. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’10). 325--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xia, F. and McCord, M. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics (COLING’04). 508--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xu, P., Kang, J., Ringgaard, M., and Och, F. 2009. Using a dependency parser to improve SMT for Subject-Object-Verb languages. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’09). 245--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’01). 523--530. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HPSG-Based Preprocessing for English-to-Japanese Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 11, Issue 3
      September 2012
      93 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/2334801
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2012
      • Accepted: 1 August 2011
      • Revised: 1 June 2011
      • Received: 1 March 2011
      Published in talip Volume 11, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader