Improving Machine Transliteration Performance by Using Multiple Transliteration Models

Oh, Jong-Hoon; Choi, Key-Sun; Isahara, Hitoshi

doi:10.1007/11940098_9

Jong-Hoon Oh²²,
Key-Sun Choi²³ &
Hitoshi Isahara²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

1010 Accesses
2 Citations

Abstract

Machine transliteration has received significant attention as a supporting tool for machine translation and cross-language information retrieval. During the last decade, four kinds of transliteration model have been studied — grapheme-based model, phoneme-based model, hybrid model, and correspondence-based model. These models are classified in terms of the information sources for transliteration or the units to be transliterated — source graphemes, source phonemes, both source graphemes and source phonemes, and the correspondence between source graphemes and phonemes, respectively. Although each transliteration model has shown relatively good performance, one model alone has limitations on handling complex transliteration behaviors. To address the problem, we combined different transliteration models with a “generating transliterations followed by their validation” strategy. The strategy makes it possible to consider complex transliteration behaviors using the strengths of each model and to improve transliteration performance by validating transliterations. Our method makes use of web-based and transliteration model-based validation for transliteration validation. Experiments showed that our method outperforms both the individual transliteration models and previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Knight, K., Graehl, J.: Machine transliteration. In: Proc. of the 35th Annual Meetings of the Association for Computational Linguistics, pp. 128–135 (1997)
Google Scholar
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proc. of ACL 2002, pp. 400–408 (2002)
Google Scholar
Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35, 389–420 (2001)
Article Google Scholar
Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proc. of the Sixth Conference on Natural Language Learning (CoNLL), pp. 139–145 (2002)
Google Scholar
Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: Proc. of the 2nd International Conference on Language Resources and Evaluation, pp. 1135–1411 (2000)
Google Scholar
Kang, I.H., Kim, G.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: Proc. of the 18th International Conference on Computational Linguistics, pp. 418–424 (2000)
Google Scholar
Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proc. of MT-Summit IX, pp. 125–132 (2003)
Google Scholar
Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. In: Proc. of ACL 2004, pp. 160–167 (2004)
Google Scholar
Jung, S.Y., Hong, S., Paek, E.: An English to Korean transliteration model of extended markov window. In: Proc. of the 18th conference on Computational linguistics, pp. 383–389 (2000)
Google Scholar
Meng, H., Lo, W.K., Chen, B., Tang, K.: Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Proc. of Automatic Speech Recognition and Understanding. ASRU 2001, pp. 311–314 (2001)
Google Scholar
Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 542–547. Springer, Heidelberg (2005)
Google Scholar
Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proc. of COLING 2002, pp. 758–764 (2002)
Google Scholar
Oh, J.H., Choi, K.S.: An ensemble of grapheme and phoneme for machine transliteration. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 450–461. Springer, Heidelberg (2005)
Chapter Google Scholar
Oh, J.H., Choi, K.S.: Machine learning based English-to-Korean transliteration using grapheme and phoneme information. IEICE Transaction on Information & Systems E88-D, 1737–1748 (2005)
Article Google Scholar
Berger, A.L., Pietra, S.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)
Google Scholar
Zhang, L.: Maximum entropy modeling toolkit for python and C++ (2004), http://homepages.inf.ed.ac.uk/s0450736/software/maxent/manual.pdf
Qu, Y., Grefenstette, G.: Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation. In: ACL, pp. 183–190 (2004)
Google Scholar
Wang, J.H., Teng, J.W., Lu, W.H., Chien, L.F.: Exploiting the web as the multilingual corpus for unknown query translation. Journal of the American Society for Information Science and Technology 57, 660–670 (2006)
Article Google Scholar
Grefenstette, G., Qu, Y., Evans, D.A.: Mining the web to create a language model for mapping between English names and phrases and Japanese. In: Proc. of Web Intelligence, pp. 110–116 (2004)
Google Scholar
Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)
Google Scholar
Breen, J.: EDICT Japanese/English dictionary.le. The Electronic Dictionary Research and Development Group, Monash University (2003), http://www.csse.monash.edu.au/~jwb/edict.html

Download references

Author information

Authors and Affiliations

Computational Linguistics Group, National Institute of Information and Communications Technology (NICT), 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan
Jong-Hoon Oh & Hitoshi Isahara
Computer Science Division, EECS, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Republic of Korea
Key-Sun Choi

Authors

Jong-Hoon Oh
View author publications
You can also search for this author in PubMed Google Scholar
Key-Sun Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Isahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oh, JH., Choi, KS., Isahara, H. (2006). Improving Machine Transliteration Performance by Using Multiple Transliteration Models. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_9

Download citation

DOI: https://doi.org/10.1007/11940098_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics