Abstract
Machine transliteration has received significant attention as a supporting tool for machine translation and cross-language information retrieval. During the last decade, four kinds of transliteration model have been studied — grapheme-based model, phoneme-based model, hybrid model, and correspondence-based model. These models are classified in terms of the information sources for transliteration or the units to be transliterated — source graphemes, source phonemes, both source graphemes and source phonemes, and the correspondence between source graphemes and phonemes, respectively. Although each transliteration model has shown relatively good performance, one model alone has limitations on handling complex transliteration behaviors. To address the problem, we combined different transliteration models with a “generating transliterations followed by their validation” strategy. The strategy makes it possible to consider complex transliteration behaviors using the strengths of each model and to improve transliteration performance by validating transliterations. Our method makes use of web-based and transliteration model-based validation for transliteration validation. Experiments showed that our method outperforms both the individual transliteration models and previous work.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Knight, K., Graehl, J.: Machine transliteration. In: Proc. of the 35th Annual Meetings of the Association for Computational Linguistics, pp. 128–135 (1997)
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proc. of ACL 2002, pp. 400–408 (2002)
Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35, 389–420 (2001)
Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proc. of the Sixth Conference on Natural Language Learning (CoNLL), pp. 139–145 (2002)
Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: Proc. of the 2nd International Conference on Language Resources and Evaluation, pp. 1135–1411 (2000)
Kang, I.H., Kim, G.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: Proc. of the 18th International Conference on Computational Linguistics, pp. 418–424 (2000)
Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proc. of MT-Summit IX, pp. 125–132 (2003)
Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. In: Proc. of ACL 2004, pp. 160–167 (2004)
Jung, S.Y., Hong, S., Paek, E.: An English to Korean transliteration model of extended markov window. In: Proc. of the 18th conference on Computational linguistics, pp. 383–389 (2000)
Meng, H., Lo, W.K., Chen, B., Tang, K.: Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Proc. of Automatic Speech Recognition and Understanding. ASRU 2001, pp. 311–314 (2001)
Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 542–547. Springer, Heidelberg (2005)
Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proc. of COLING 2002, pp. 758–764 (2002)
Oh, J.H., Choi, K.S.: An ensemble of grapheme and phoneme for machine transliteration. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 450–461. Springer, Heidelberg (2005)
Oh, J.H., Choi, K.S.: Machine learning based English-to-Korean transliteration using grapheme and phoneme information. IEICE Transaction on Information & Systems E88-D, 1737–1748 (2005)
Berger, A.L., Pietra, S.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)
Zhang, L.: Maximum entropy modeling toolkit for python and C++ (2004), http://homepages.inf.ed.ac.uk/s0450736/software/maxent/manual.pdf
Qu, Y., Grefenstette, G.: Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation. In: ACL, pp. 183–190 (2004)
Wang, J.H., Teng, J.W., Lu, W.H., Chien, L.F.: Exploiting the web as the multilingual corpus for unknown query translation. Journal of the American Society for Information Science and Technology 57, 660–670 (2006)
Grefenstette, G., Qu, Y., Evans, D.A.: Mining the web to create a language model for mapping between English names and phrases and Japanese. In: Proc. of Web Intelligence, pp. 110–116 (2004)
Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)
Breen, J.: EDICT Japanese/English dictionary.le. The Electronic Dictionary Research and Development Group, Monash University (2003), http://www.csse.monash.edu.au/~jwb/edict.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oh, JH., Choi, KS., Isahara, H. (2006). Improving Machine Transliteration Performance by Using Multiple Transliteration Models. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_9
Download citation
DOI: https://doi.org/10.1007/11940098_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)