Abstract
Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations. To date, however, there has been little literature regarding a framework that can accommodate several models at the same time. Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents. To address these problems, we propose a hybrid model for transliteration pair acquisition. In this paper, we concentrate on a framework for combining several models for transliteration pair acquisition. Experiments showed that our hybrid model was more effective than each individual transliteration pair acquisition model alone.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kang, B.J., Choi, K.S.: Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. IJCPOL 14 (2001)
Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35, 389–420 (2001)
Tsujii, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. IJCPOL 15, 261–279 (2002)
Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting Katakana-English term pairs from search engine query logs. In: Proc. of NLPRS 2001, pp. 393–399 (2001)
Bilac, S., Tanaka, H.: Extracting transliteration pairs from comparable corpora. In: Proc. of NLP 2005 (2005)
Oh, J.H., Choi, K.S.: A statistical model for automatic extraction of Korean transliterated foreign words. IJCPOL 16 (2003)
Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oh, JH., Choi, KS., Isahara, H. (2006). A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_15
Download citation
DOI: https://doi.org/10.1007/11846406_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)