A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora

Oh, Jong-Hoon; Choi, Key-Sun; Isahara, Hitoshi

doi:10.1007/11846406_15

Jong-Hoon Oh^21,22,
Key-Sun Choi²² &
Hitoshi Isahara²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1042 Accesses
3 Citations

Abstract

Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations. To date, however, there has been little literature regarding a framework that can accommodate several models at the same time. Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents. To address these problems, we propose a hybrid model for transliteration pair acquisition. In this paper, we concentrate on a framework for combining several models for transliteration pair acquisition. Experiments showed that our hybrid model was more effective than each individual transliteration pair acquisition model alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kang, B.J., Choi, K.S.: Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. IJCPOL 14 (2001)
Google Scholar
Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35, 389–420 (2001)
Article Google Scholar
Tsujii, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. IJCPOL 15, 261–279 (2002)
Google Scholar
Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting Katakana-English term pairs from search engine query logs. In: Proc. of NLPRS 2001, pp. 393–399 (2001)
Google Scholar
Bilac, S., Tanaka, H.: Extracting transliteration pairs from comparable corpora. In: Proc. of NLP 2005 (2005)
Google Scholar
Oh, J.H., Choi, K.S.: A statistical model for automatic extraction of Korean transliterated foreign words. IJCPOL 16 (2003)
Google Scholar
Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Linguistics Group, NICT, 3-5 Hikaridai, Kyoto, 619-0289, Japan
Jong-Hoon Oh & Hitoshi Isahara
Computer Science Division, EECS, KAIST, Daejeon, 305-701, Republic of Korea
Jong-Hoon Oh & Key-Sun Choi

Authors

Jong-Hoon Oh
View author publications
You can also search for this author in PubMed Google Scholar
Key-Sun Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Isahara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oh, JH., Choi, KS., Isahara, H. (2006). A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_15

Download citation

DOI: https://doi.org/10.1007/11846406_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics