Abstract
Transliteration pair acquisition has received significant attention as a technique for constructing up-to-date transliteration lexicons, and for supporting machine translation and cross-language information retrieval. Previous studies on transliteration pair acquisition focused on only the phonetic similarity model but seldom considered the real-usage of transliterations in texts. Moreover, previous web-based validation models considered only one-way validation (validation from the viewpoint of a source term) rather than joint validation between a source term and a target term. To address these problems, we propose a novel transliteration pair acquisition model that extracts transliteration pairs from the Web and validates the pairs by combining the phonetic similarity and joint web-validation models. Experiments demonstrated that our transliteration pair acquisition model was effective.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35(4), 389–420 (2001)
Kang, B.J., Choi, K.S.: Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. IJCPOL 14(2) (2001)
Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting Katakana-English term pairs from search engine query logs. In: Proc. of NLPRS 2001, pp. 393–399 (2001)
Tsujii, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. IJCPOL 15(3), 261–279 (2002)
Lee, C.J., Chang, J.S.: Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model. In: Proc. of the HLT-NAACL 2003 Workshop on Building and using parallel texts, pp. 96–103 (2003)
Bilac, S., Tanaka, H.: Extracting transliteration pairs from comparable corpora. In: Proc. of Symposium on Large-Scale Knowledge Resources (LKR 2005), pp. 203–206 (2005)
Oh, J.H., Choi, K.S.: Recognizing transliteration equivalents for enriching domain-specific thesauri. In: Proc. of the 3rd International WordNet Conference (GWC 2006), pp. 231–237 (2006)
Oh, J.H., Choi, K.S., Isahara, H.: A hybrid model for extracting transliteration equivalents from parallel corpora. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 119–126. Springer, Heidelberg (2006)
Resnik, P., Smith, N.A.: The web as a parallel corpus. Computational Linguistics 29(3), 349–380 (2003)
Qu, Y., Grefenstette, G.: Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation. In: Proc. of ACL, pp. 183–190 (2004)
Lu, W.H., Chien, L.F., Lee, H.J.: Translation of web queries using anchor text mining. ACM Transactions on Asian Language Information Processing 1(2), 159–172 (2002)
Lu, W.H., Chien, L.F., Lee, H.J.: Anchor text mining for translation of web queries: A transitive translation approach. ACM Transactions on Information Systems 22(2), 242–269 (2004)
Wang, J.H., Teng, J.W., Lu, W.H., Chien, L.F.: Exploiting the web as the multilingual corpus for unknown query translation. Journal of the American Society for Information Science and Technology 57(5), 660–670 (2006)
Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oh, JH., Isahara, H. (2006). Extracting English-Korean Transliteration Pairs from Web Corpora. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_24
Download citation
DOI: https://doi.org/10.1007/11940098_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)