Transliteration Retrieval Model for Cross Lingual Information Retrieval

Jan, Ea-Ee; Lin, Shih-Hsiang; Chen, Berlin

doi:10.1007/978-3-642-17187-1_17

Ea-Ee Jan²⁰,
Shih-Hsiang Lin^20,21 &
Berlin Chen²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6458))

Included in the following conference series:

Asia Information Retrieval Symposium

1367 Accesses
1 Citations

Abstract

The performance of transliteration from a source language to a target language builds the ground work in support of proper name Cross Lingual Information Retrieval (CLIR). Traditionally, this task is accomplished by two separate modules: transliteration and retrieval. Queries are first transliterated to target language using one or multiple hypotheses. The retrieval is then carried out based on translated queries. The transliteration often results in 30-50% errors with top 1 hypothesis, thus leading to significant performance degradation in CLIR. Therefore, we proposed a unified transliteration retrieval model that incorporates the transliteration similarity measurement into the relevance scoring function. In addition, we presented an efficient and robust method in similarity measurement for a given proper name pair using the Hidden Markov Model (HMM) based alignment and a Statistical Machine Translation (SMT) framework. Experimental data showed significant results with the proposed integrated method on the NTCIR7 IR4QA task, which demonstrated a greater flexibility and acceptance in transliteration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Larkey, L., AbdulJaleel, N., Connell, M.: What’s in a Name?: Proper Names in Arabic Cross Language Information Retrieval. CIIR Technical Report, IR-278, Univ. of Amherst (2003)
Google Scholar
Darwish, K., Doermann, D., Jones, R., Oard, D., Rautiainen, M.: TREC-10 Experiments at University of Maryland CLIR and Video. In: 10th TREC, pp. 549–561 (2002)
Google Scholar
Meng, H., Chen, B., Lo, W.K., Tang, K.: Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 311–314 (2001)
Google Scholar
Virga, P., Khudanpur, S.: Transliteration of Proper Names in Cross-lingual Information Retrieval. In: ACL Workshop on Multilingual and Mixed-Language Named Entity Recognition, pp. 57–64 (2003)
Google Scholar
Bellaachia, A., Amor-Tijani, G.: Proper Nouns in English–Arabic Cross Language Information Retrieval. J. American Society for Information Science and Technology 59(12), 1925–1935 (2008)
Article Google Scholar
Chen, H.-S., Huang, S.-J., Ding, Y.-W., Tasi, S.C.: Proper Name Translation in Cross-Language Information Retrieval. In: 17th COLING-ACL 1998, pp. 232–235 (1998)
Google Scholar
Kishida, K.: Technical Issues of Cross-Language Information Retrieval: A Review. Information Processing & Management 41(3), 433–455 (2005)
Article Google Scholar
Xu, J., Weischedel, R., Nguyen, C.: Evaluating a Probabilistic Model for Cross-Lingual Information Retrieval. In: 24th ACM SIGIR, pp. 105–110 (2001)
Google Scholar
Kraaij, W., Pohlmann, R., Hiemstra, D.: Twenty-one at TREC-8: Using Language Technology for Information Retrieval. In: 8th TREC, pp. 285–300 (2000)
Google Scholar
Lavrenko, V., Choquette, M., Croft, W.B.: Cross-lingual relevance models. In: 25th ACM SIGIR, pp. 175–182 (2002)
Google Scholar
Knight, K., Graehl, J.: Machine Transliteration. Computational Linguistics 24(4), 509–612 (1997)
Google Scholar
Brown, P.E., Pietra, S.A.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Gao, W., Wong, K.F., Lam, W.: Improving Transliteration with Precise Alignment of Phoneme Chunks and Using Context Features. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 106–117. Springer, Heidelberg (2005)
Chapter Google Scholar
Li, H.Z., Zhang, M., Su, J.: A Joint Source-Channel Model for Machine Transliteration. In: 42nd ACL, pp. 159–166 (2004)
Google Scholar
Kumaran, A., Kellner, T.: A Generic Framework for Machine Transliteration. In: 30th ACM SIGIR, pp. 721–722 (2008)
Google Scholar
Klementiev, A., Roth, D.: Weakly Supervised Named Entity Transliteration and Discovery from Multi-lingual Comparable Corpora. In: 44th ACL, pp. 817–824 (2006)
Google Scholar
Jiang, L., Zhou, M., Chien, L.F., Niu, C.: Named Entity Translation with Web Mining and Transliteration. In: 20th ICJAI, pp. 1629–1634 (2007)
Google Scholar
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: 10th ACM SIGIR, pp. 275–281 (1998)
Google Scholar
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: 22nd ACM SIGIR, pp. 222–229 (1999)
Google Scholar
Sakai, T., Kando, N., Lin, C.J., Mitamura, T., Shima, H., Ji, D., Chen, K.H., Nyberg, E.: Overview of the NTCIR-7 ACLIA IR4QA Task. In: NTCIR-7 Workshop Meeting, pp. 77–114 (2008)
Google Scholar
Zhai, C.X., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information retrieval. ACM Trans. on Information Systems 22(2), 179–214 (2004)
Article Google Scholar
Och, F., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Papeneni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: 40th ACL, pp. 311–318 (2001)
Google Scholar
Jan, E., Ge, N., Lin, S.H., Roukos, S., Sorensen, J.: A Novel Approach to Proper Name Transliteration. Submitted to ISCSLP 2010
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, NY, 10598, USA
Ea-Ee Jan & Shih-Hsiang Lin
Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan
Shih-Hsiang Lin & Berlin Chen

Authors

Ea-Ee Jan
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Hsiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Berlin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, Roosevelt Road National Taiwan University, No. 1, Sec. 4, 10617, Taipei, Taiwan R.O.C.
Pu-Jen Cheng
School of Computing, National University of Singapore (NUS), Computing 1, 13 Computing Drive, 117417, Singapore
Min-Yen Kan
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong Shatin, N.T. Hong Kong, China
Wai Lam
School of Computing, Computing 1, National University of Singapore (NUS), 13 Computing Drive, 117417, Singapore
Preslav Nakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jan, EE., Lin, SH., Chen, B. (2010). Transliteration Retrieval Model for Cross Lingual Information Retrieval. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-17187-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics