skip to main content
research-article

A Chinese-Japanese Lexical Machine Translation through a Pivot Language

Published: 01 May 2009 Publication History

Abstract

The bilingual lexicon is an expensive but critical resource for multilingual applications in natural language processing. This article proposes an integrated framework for building a bilingual lexicon between the Chinese and Japanese languages. Since the language pair Chinese-Japanese does not include English, which is a central language of the world, few large-scale bilingual resources between Chinese and Japanese have been constructed. One solution to alleviate this problem is to build a Chinese-Japanese bilingual lexicon through English as the pivot language. In addition to the pivotal approach, we can make use of the characteristics of Chinese and Japanese languages that use Han characters. We incorporate a translation model obtained from a small Chinese-Japanese lexicon and use the similarity of the hanzi and kanji characters by using the log-linear model. Our experimental results show that the use of the pivotal approach can improve the translation performance over the translation model built from a small Chinese-Japanese lexicon. The results also demonstrate that the similarity between the hanzi and kanji characters provides a positive effect for translating technical terms.

References

[1]
Bond, F., Sulong, R. B., Yamazaki, T., and Ogura, K. 2001. Design and construction of a machine-tractable Japanese-Malay dictionary. In Proceedings of the Machine Translation Summit VIII (MT’01). 53--58.
[2]
Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Comput. Linguist. 16, 2, 79--85.
[3]
Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311.
[4]
Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the ARPA Workshop on Human Language Technology (HLT’02).
[5]
Gey, F. C., Kando, N., and Peters, C. 2005. Cross-language information retrieval: The way ahead. Inf. Process. Manag. 41, 3, 415--431.
[6]
Goh, C.-L., Asahara, M., and Matsumoto, Y. 2005. Building a Japanese-Chinese dictionary using kanji/hanzi conversion. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05). 670--681.
[7]
Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395.
[8]
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Demonstration Session (ACL’07). 177--180.
[9]
Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NACCL’03). 48--54.
[10]
Kudo, T. and Kazawa, H. 2007. Web Japanese n-gram version I. Gengo Shizen Kyokai.
[11]
Nakagawa, T. and Uchimoto, K. 2007. Hybrid approach to word segmentation and POS tagging. In Companion Volume to the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 217--220.
[12]
Nie, J.-Y., Simard, M., Isabelle, P., and Durand, R. 1999. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’99). 74--81.
[13]
Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL’03). 160--167.
[14]
Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). 295--302.
[15]
Paik, K., Bond, F., and Satoshi, S. 2001. Using multiple pivots to align Korean and Japanese lexical resources. In Proceedings of the Workshop on Language Resources in Asia, Natural Language Processing Pacific Rim Symposium (NLPRS’01). 63--70.
[16]
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2001. Bleu: A method for automatic evaluation of machine translation. Tech. rep., IBM Research Division, Thomas J. Watson Research Center.
[17]
Sato, M. 2007. Traditional/simplified Chinese characters and Shinjitai table. http://homepage3.nifty.com/jgrammar/ja/tools/ksimple.htm.
[18]
Schafer, C. and Yarowsky, D. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of the 6th Conference on Natural Language Learning (CoNLL’02). 20, 1--7.
[19]
Shirai, S. and Yamamoto, K. 2001. Linking English words in two bilingual dictionaries to generate another language pair dictionary. In Proceedings of the 19th International Conference on Computer Processing of Oriental Language (ICCPOL’01). 174--179.
[20]
Tan, C. L. and Nagao, M. 1995. Automatic alignment of Japanese-Chinese bilingual texts. IEICE Trans. Inf. Syst. E78-D, 1, 68--76.
[21]
Tanaka, K. and Umemura, K. 1994. Construction of a bilingual dictionary intermediated by a third language. In Proceedings of the 15th International Conference on Computational Linguistics (COLING’94). 297--303.
[22]
Utiyama, M. and Isahara, H. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NACCL’07). 484--491.
[23]
Wang, H., Wu, H., and Liu, Z. 2006. Word alignment for languages with scarce resources using bilingual corpora of other language pairs. In Proceedings of the International Conference on Computer Linguistics Main Conference Poster Sessions (COLING’06). 874--881.
[24]
Wu, H. and Wang, H. 2007. Pivot language approach for phrase-based statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 856--863.
[25]
Zhang, Y., Ma, Q., and Isahara, H. 2004. Use of kanji information in constructing a Japanese-Chinese bilingual lexicon. In Proceedings of the 4th Workshop on Asian Language Resources (ALR’04). 39--46.
[26]
Zhang, Y., Ma, Q., and Isahara, H. 2005. Construction of a Japanese-Chinese bilingual dictionary using English as an intermediary. Int. J. Comput. Process. Oriental Lang. 18, 1, 23--39.

Cited By

View all
  • (2024)KannadaLex: A lexical database with psycholinguistic informationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367068823:7(1-21)Online publication date: 3-Jun-2024
  • (2018)Instant Translation Model Adaptation by Translating Unseen Words in Continuous Vector SpaceComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_5(51-62)Online publication date: 21-Mar-2018
  • (2017)An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping TableIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7425E100.D:8(1882-1892)Online publication date: 2017
  • Show More Cited By

Index Terms

  1. A Chinese-Japanese Lexical Machine Translation through a Pivot Language

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 8, Issue 2
      May 2009
      89 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/1526252
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 May 2009
      Accepted: 01 March 2009
      Revised: 01 February 2009
      Received: 01 November 2008
      Published in TALIP Volume 8, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Bilingual lexicon
      2. Han characters
      3. hanzi
      4. kanji
      5. pivot language
      6. statistical machine translation

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 27 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)KannadaLex: A lexical database with psycholinguistic informationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367068823:7(1-21)Online publication date: 3-Jun-2024
      • (2018)Instant Translation Model Adaptation by Translating Unseen Words in Continuous Vector SpaceComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_5(51-62)Online publication date: 21-Mar-2018
      • (2017)An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping TableIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7425E100.D:8(1882-1892)Online publication date: 2017
      • (2016)Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine TranslationHybrid Approaches to Machine Translation10.1007/978-3-319-21311-8_4(77-108)Online publication date: 13-Jul-2016
      • (2011)Japanese sentence pattern learning with the use of illustrative examples extracted from the webIEEJ Transactions on Electrical and Electronic Engineering10.1002/tee.206866:5(490-496)Online publication date: 15-Jul-2011

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media