research-article

A Chinese-Japanese Lexical Machine Translation through a Pivot Language

Authors:

Takashi Tsunakawa,

Naoaki Okazaki,

Jun’ichi TsujiiAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 8, Issue 2

Article No.: 9, Pages 1 - 21

https://doi.org/10.1145/1526252.1526257

Published: 01 May 2009 Publication History

Abstract

The bilingual lexicon is an expensive but critical resource for multilingual applications in natural language processing. This article proposes an integrated framework for building a bilingual lexicon between the Chinese and Japanese languages. Since the language pair Chinese-Japanese does not include English, which is a central language of the world, few large-scale bilingual resources between Chinese and Japanese have been constructed. One solution to alleviate this problem is to build a Chinese-Japanese bilingual lexicon through English as the pivot language. In addition to the pivotal approach, we can make use of the characteristics of Chinese and Japanese languages that use Han characters. We incorporate a translation model obtained from a small Chinese-Japanese lexicon and use the similarity of the hanzi and kanji characters by using the log-linear model. Our experimental results show that the use of the pivotal approach can improve the translation performance over the translation model built from a small Chinese-Japanese lexicon. The results also demonstrate that the similarity between the hanzi and kanji characters provides a positive effect for translating technical terms.

References

[1]

Bond, F., Sulong, R. B., Yamazaki, T., and Ogura, K. 2001. Design and construction of a machine-tractable Japanese-Malay dictionary. In Proceedings of the Machine Translation Summit VIII (MT’01). 53--58.

[2]

Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Comput. Linguist. 16, 2, 79--85.

Digital Library

[3]

Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311.

Digital Library

[4]

Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the ARPA Workshop on Human Language Technology (HLT’02).

Digital Library

[5]

Gey, F. C., Kando, N., and Peters, C. 2005. Cross-language information retrieval: The way ahead. Inf. Process. Manag. 41, 3, 415--431.

Digital Library

[6]

Goh, C.-L., Asahara, M., and Matsumoto, Y. 2005. Building a Japanese-Chinese dictionary using kanji/hanzi conversion. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05). 670--681.

Digital Library

[7]

Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395.

[8]

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Demonstration Session (ACL’07). 177--180.

Digital Library

[9]

Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NACCL’03). 48--54.

Digital Library

[10]

Kudo, T. and Kazawa, H. 2007. Web Japanese n-gram version I. Gengo Shizen Kyokai.

[11]

Nakagawa, T. and Uchimoto, K. 2007. Hybrid approach to word segmentation and POS tagging. In Companion Volume to the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 217--220.

Digital Library

[12]

Nie, J.-Y., Simard, M., Isabelle, P., and Durand, R. 1999. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’99). 74--81.

Digital Library

[13]

Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL’03). 160--167.

Digital Library

[14]

Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). 295--302.

Digital Library

[15]

Paik, K., Bond, F., and Satoshi, S. 2001. Using multiple pivots to align Korean and Japanese lexical resources. In Proceedings of the Workshop on Language Resources in Asia, Natural Language Processing Pacific Rim Symposium (NLPRS’01). 63--70.

[16]

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2001. Bleu: A method for automatic evaluation of machine translation. Tech. rep., IBM Research Division, Thomas J. Watson Research Center.

[17]

Sato, M. 2007. Traditional/simplified Chinese characters and Shinjitai table. http://homepage3.nifty.com/jgrammar/ja/tools/ksimple.htm.

[18]

Schafer, C. and Yarowsky, D. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of the 6th Conference on Natural Language Learning (CoNLL’02). 20, 1--7.

Digital Library

[19]

Shirai, S. and Yamamoto, K. 2001. Linking English words in two bilingual dictionaries to generate another language pair dictionary. In Proceedings of the 19th International Conference on Computer Processing of Oriental Language (ICCPOL’01). 174--179.

[20]

Tan, C. L. and Nagao, M. 1995. Automatic alignment of Japanese-Chinese bilingual texts. IEICE Trans. Inf. Syst. E78-D, 1, 68--76.

[21]

Tanaka, K. and Umemura, K. 1994. Construction of a bilingual dictionary intermediated by a third language. In Proceedings of the 15th International Conference on Computational Linguistics (COLING’94). 297--303.

Digital Library

[22]

Utiyama, M. and Isahara, H. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NACCL’07). 484--491.

[23]

Wang, H., Wu, H., and Liu, Z. 2006. Word alignment for languages with scarce resources using bilingual corpora of other language pairs. In Proceedings of the International Conference on Computer Linguistics Main Conference Poster Sessions (COLING’06). 874--881.

Digital Library

[24]

Wu, H. and Wang, H. 2007. Pivot language approach for phrase-based statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 856--863.

[25]

Zhang, Y., Ma, Q., and Isahara, H. 2004. Use of kanji information in constructing a Japanese-Chinese bilingual lexicon. In Proceedings of the 4th Workshop on Asian Language Resources (ALR’04). 39--46.

[26]

Zhang, Y., Ma, Q., and Isahara, H. 2005. Construction of a Japanese-Chinese bilingual dictionary using English as an intermediary. Int. J. Comput. Process. Oriental Lang. 18, 1, 23--39.

Cited By

Aithal SSn MGaniga RRao B. AHegde K. G(2024)KannadaLex: A lexical database with psycholinguistic informationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367068823:7(1-21)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670688
Ishiwatari SYoshinaga NToyoda MKitsuregawa M(2018)Instant Translation Model Adaptation by Translating Unseen Words in Continuous Vector SpaceComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_5(51-62)Online publication date: 21-Mar-2018
https://doi.org/10.1007/978-3-319-75487-1_5
XU JCHEN YRU KZHANG YARAKI K(2017)An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping TableIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7425E100.D:8(1882-1892)Online publication date: 2017
https://doi.org/10.1587/transinf.2016EDP7425
Show More Cited By

Index Terms

A Chinese-Japanese Lexical Machine Translation through a Pivot Language
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Chinese-Japanese Machine Translation Exploiting Chinese Characters

The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic ...
Pivot language approach for phrase-based statistical machine translation

This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L _s and L _t with limited bilingual resources, we bring in a third language, L _p, ...
Statistical machine translation of subtitles for highly inflected language pair

This paper addresses the problem of statistical machine translation between highly inflected languages. Even when dealing with closely-related language pairs, statistical machine translation encounters problems if the parallel corpus is not big enough. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 8, Issue 2

May 2009

89 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/1526252

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2009

Accepted: 01 March 2009

Revised: 01 February 2009

Received: 01 November 2008

Published in TALIP Volume 8, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
398
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Aithal SSn MGaniga RRao B. AHegde K. G(2024)KannadaLex: A lexical database with psycholinguistic informationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367068823:7(1-21)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670688
Ishiwatari SYoshinaga NToyoda MKitsuregawa M(2018)Instant Translation Model Adaptation by Translating Unseen Words in Continuous Vector SpaceComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_5(51-62)Online publication date: 21-Mar-2018
https://doi.org/10.1007/978-3-319-75487-1_5
XU JCHEN YRU KZHANG YARAKI K(2017)An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping TableIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7425E100.D:8(1882-1892)Online publication date: 2017
https://doi.org/10.1587/transinf.2016EDP7425
Han DMartínez-Gómez PMiyao Y(2016)Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine TranslationHybrid Approaches to Machine Translation10.1007/978-3-319-21311-8_4(77-108)Online publication date: 13-Jul-2016
https://doi.org/10.1007/978-3-319-21311-8_4
Han DSong X(2011)Japanese sentence pattern learning with the use of illustrative examples extracted from the webIEEJ Transactions on Electrical and Electronic Engineering10.1002/tee.206866:5(490-496)Online publication date: 15-Jul-2011
https://doi.org/10.1002/tee.20686

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents