research-article

HPSG-Based Preprocessing for English-to-Japanese Translation

Authors:
Hideki Isozaki

NTT Corporation

NTT Corporation
View Profile

,
Katsuhito Sudoh

NTT Corporation

NTT Corporation
View Profile

,
Hajime Tsukada

NTT Corporation

NTT Corporation
View Profile

,
Kevin Duh

NTT Corporation

NTT Corporation
View Profile

ACM Transactions on Asian Language Information Processing Volume 11 Issue 3Article No.: 8pp 1–16https://doi.org/10.1145/2334801.2334802

Published:01 September 2012Publication History

ACM Transactions on Asian Language Information Processing

Abstract

Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (distortion limit). For English-to-Japanese translation, we need a large distance limit to obtain acceptable translations, and the number of translation candidates is extremely large. Therefore, SMT systems often fail to find acceptable translations within a limited time. To solve this problem, some researchers use rule-based preprocessing approaches, which reorder English words just like Japanese by using dozens of rules. Our idea is based on the following two observations: (1) Japanese is a typical head-final language, and (2) we can detect heads of English sentences by a head-driven phrase structure grammar (HPSG) parser. The main contributions of this article are twofold: First, we demonstrate how off-the-shelf, state-of-the-art HPSG parser enables us to write the reordering rules in an abstract level and can easily improve the quality of English-to-Japanese translation. Second, we also show that syntactic heads achieve better results than semantic heads. The proposed method outperforms the best system of NTCIR-7 PATMT EJ task.

References

Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 173--180. Google ScholarDigital Library
Collins, M., Koehn, P., and Kucerova, I. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). Google ScholarDigital Library
de Marneffe, M.-C., MacCartney, B., and Manning, C. D. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the Language Resources and Evaluation Conference (LREC’06). 449--454.Google Scholar
Echizen-ya, H., Ehara, T., Shimohata, S., Fujii, A., Utiyama, M., Yamamoto, M., Utsuro, T., and Kando, N. 2009. Meta-evaluation of automatic evaluation methods for machine translation using patent translation data in NTCIR-7. In Proceedings of the 3rd Workshop on Patent Translation (WPT’09). 9--16.Google Scholar
Fujii, A., Utiyama, M., Yamamoto, M., and Utsuro, T. 2008. Overview of the patent translation task at the NTCIR-7 workshop. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 389--400.Google Scholar
Galley, M., Hopkins, M., Knight, K., and Marcu, D. 2004. What’s in a translation rule? In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL’04). 273--280.Google Scholar
Genzel, D. 2010. Automatically learning source-side reordering rules for large-scale machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 376--384. Google ScholarDigital Library
Hong, G., Lee, S.-W., and Rim, H.-C. 2009. Bridging morpho-syntactic gap between source and target sentences for English-Korean statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’09). 233--236. Google ScholarDigital Library
Huang, L., Knight, K., and Joshi, A. 2006. A syntax-directed translator with extended domain of locality. In Proceedings of Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing (CHPJISLP’06). 1--8. Google ScholarDigital Library
Isozaki, H., Sudoh, K., Tsukada, H., and Duh, K. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 250--257. Google ScholarDigital Library
Izuha, T., Kumano, A., and Kuroda, Y. 2008. Toshiba rule-based machine translation system at NTCIR-7 PAT MT. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 430--434.Google Scholar
Katz-Brown, J. and Collins, M. 2008. Syntactic reordering in preprocessing for Japanese → English translation: MIT system description for NTCIR-7 patent translation task. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08).Google Scholar
Kendall, M. G. 1975. Rank Correlation Methods. Charles Griffin.Google Scholar
Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395.Google Scholar
Koehn, P. 2009. Statistical Machine Translation. Cambridge University Press. Google ScholarDigital Library
Koehn, P. 2010. MOSES, Statistical Machine Translation System, User Manual and Code Guide. www.statmt.org/moses/manual/manual.pdf.Google Scholar
Kumai, H., Segawa, H., and Morimoto, Y. 2008. NTCIR-7 patent translation experiments at Hitachi. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 441--444.Google Scholar
Lee, Y.-S., Zhao, B., and Luo, X. 2010. Constituent reordering and syntax models for English-to-Japanese statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 626--634. Google ScholarDigital Library
Li, C.-H., Zhang, D., Li, M., Zhou, M., Li, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’07). 720--727.Google Scholar
Lin, C.-Y. and Och, F. J. 2004. Automatic evaluation of translation quality using longest common subsequences and skip-bigram statistics. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’04). 605--612. Google ScholarDigital Library
Liu, D. and Gildea, D. 2008. Improved tree-to-string transducer for machine translation. In Proceedings of the Workshop on Statistical Machine Translation (SMT’08). 62--69. Google ScholarDigital Library
Miyao, Y. and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34, 1, 35--80. Google ScholarDigital Library
Miyao, Y. and Tsujii, J. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 83--90. Google ScholarDigital Library
Nakachi, K., Furuse, J., Kinoshita, T., Kawashima, M., and And, H. I. 2010. A phase II study of induction chemotherapy with gemcitabine plus S-1 followed by chemoradiotherapy for locally advanced pancreatic cancer. Cancer Chemo. Pharmacol. 66, 3, 527--534.Google Scholar
Nakazawa, T. and Kurohashi, S. 2008. Kyoto-u: Syntactical EBMT system for NTCIR-7 patent translation task. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 401--408.Google Scholar
Nguyen, T. P. and Shimazu, A. 2006. Improving phrase-based statistical machine translation with morphosyntactic transformation. Mach. Trans. 20, 3, 147--166. Google ScholarDigital Library
Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google ScholarDigital Library
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’02). 311--318. Google ScholarDigital Library
Pollard, C. and Sag, I. A. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press.Google Scholar
Quirk, C., Menezes, A., and Cherry, C. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 271--279. Google ScholarDigital Library
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA’06).Google Scholar
Su, K.-Y., Wu, M.-W., and Chang, J.-S. 1992. A new quantitative quality measure for machine translation systems. In Proceedings of the International Conference on Computational Linguistics (COLING’92). 433--439. Google ScholarDigital Library
Sudoh, K., Duh, K., Tsukada, H., Hirao, T., and Nagata, M. 2010. Divide and translate: Improving long distance reordering in statistical machine translation. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 418--427. Google ScholarDigital Library
Toutanova, K. and Suzuki, H. 2007. Generating case markers in machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 49--56.Google Scholar
Watanabe, T., Tsukada, H., and Isozaki, H. 2008. NTT SMT System 2008 at NTCIR-7. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 420--422.Google Scholar
Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23, 3, 377--403. Google ScholarDigital Library
Wu, X., Matsuzaki, T., and Tsujii, J. 2010. Fine-grained tree-to-string translation rule extraction. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’10). 325--334. Google ScholarDigital Library
Xia, F. and McCord, M. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics (COLING’04). 508--514. Google ScholarDigital Library
Xu, P., Kang, J., Ringgaard, M., and Och, F. 2009. Using a dependency parser to improve SMT for Subject-Object-Verb languages. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’09). 245--253. Google ScholarDigital Library
Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’01). 523--530. Google ScholarDigital Library

Index Terms

HPSG-Based Preprocessing for English-to-Japanese Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation

A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a ...
Read More
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

This article proposes a novel reordering method for efficient two-step Japanese-to-English statistical machine translation (SMT) that isolates reordering from SMT and solves it after lexical translation. This reordering problem, called post-ordering, is ...
Read More
Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders
English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 11, Issue 3
September 2012
93 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/2334801
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2012
- Accepted: 1 August 2011
- Revised: 1 June 2011
- Received: 1 March 2011
Published in talip Volume 11, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
English
HPSG
Japanese
Machine translation
SOV
SVO
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 447
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HPSG-Based Preprocessing for English-to-Japanese Translation

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation

Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

HPSG-Based Preprocessing for English-to-Japanese Translation

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation

Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media