Skip to main content

Improving Machine Transliteration Performance by Using Multiple Transliteration Models

  • Conference paper
Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead (ICCPOL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

Abstract

Machine transliteration has received significant attention as a supporting tool for machine translation and cross-language information retrieval. During the last decade, four kinds of transliteration model have been studied — grapheme-based model, phoneme-based model, hybrid model, and correspondence-based model. These models are classified in terms of the information sources for transliteration or the units to be transliterated — source graphemes, source phonemes, both source graphemes and source phonemes, and the correspondence between source graphemes and phonemes, respectively. Although each transliteration model has shown relatively good performance, one model alone has limitations on handling complex transliteration behaviors. To address the problem, we combined different transliteration models with a “generating transliterations followed by their validation” strategy. The strategy makes it possible to consider complex transliteration behaviors using the strengths of each model and to improve transliteration performance by validating transliterations. Our method makes use of web-based and transliteration model-based validation for transliteration validation. Experiments showed that our method outperforms both the individual transliteration models and previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Knight, K., Graehl, J.: Machine transliteration. In: Proc. of the 35th Annual Meetings of the Association for Computational Linguistics, pp. 128–135 (1997)

    Google Scholar 

  2. Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proc. of ACL 2002, pp. 400–408 (2002)

    Google Scholar 

  3. Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35, 389–420 (2001)

    Article  Google Scholar 

  4. Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proc. of the Sixth Conference on Natural Language Learning (CoNLL), pp. 139–145 (2002)

    Google Scholar 

  5. Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: Proc. of the 2nd International Conference on Language Resources and Evaluation, pp. 1135–1411 (2000)

    Google Scholar 

  6. Kang, I.H., Kim, G.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: Proc. of the 18th International Conference on Computational Linguistics, pp. 418–424 (2000)

    Google Scholar 

  7. Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proc. of MT-Summit IX, pp. 125–132 (2003)

    Google Scholar 

  8. Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. In: Proc. of ACL 2004, pp. 160–167 (2004)

    Google Scholar 

  9. Jung, S.Y., Hong, S., Paek, E.: An English to Korean transliteration model of extended markov window. In: Proc. of the 18th conference on Computational linguistics, pp. 383–389 (2000)

    Google Scholar 

  10. Meng, H., Lo, W.K., Chen, B., Tang, K.: Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Proc. of Automatic Speech Recognition and Understanding. ASRU 2001, pp. 311–314 (2001)

    Google Scholar 

  11. Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 542–547. Springer, Heidelberg (2005)

    Google Scholar 

  12. Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proc. of COLING 2002, pp. 758–764 (2002)

    Google Scholar 

  13. Oh, J.H., Choi, K.S.: An ensemble of grapheme and phoneme for machine transliteration. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 450–461. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Oh, J.H., Choi, K.S.: Machine learning based English-to-Korean transliteration using grapheme and phoneme information. IEICE Transaction on Information & Systems E88-D, 1737–1748 (2005)

    Article  Google Scholar 

  15. Berger, A.L., Pietra, S.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)

    Google Scholar 

  16. Zhang, L.: Maximum entropy modeling toolkit for python and C++ (2004), http://homepages.inf.ed.ac.uk/s0450736/software/maxent/manual.pdf

  17. Qu, Y., Grefenstette, G.: Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation. In: ACL, pp. 183–190 (2004)

    Google Scholar 

  18. Wang, J.H., Teng, J.W., Lu, W.H., Chien, L.F.: Exploiting the web as the multilingual corpus for unknown query translation. Journal of the American Society for Information Science and Technology 57, 660–670 (2006)

    Article  Google Scholar 

  19. Grefenstette, G., Qu, Y., Evans, D.A.: Mining the web to create a language model for mapping between English names and phrases and Japanese. In: Proc. of Web Intelligence, pp. 110–116 (2004)

    Google Scholar 

  20. Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)

    Google Scholar 

  21. Breen, J.: EDICT Japanese/English dictionary.le. The Electronic Dictionary Research and Development Group, Monash University (2003), http://www.csse.monash.edu.au/~jwb/edict.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oh, JH., Choi, KS., Isahara, H. (2006). Improving Machine Transliteration Performance by Using Multiple Transliteration Models. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_9

Download citation

  • DOI: https://doi.org/10.1007/11940098_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics