Skip to main content

Transliteration Retrieval Model for Cross Lingual Information Retrieval

  • Conference paper
Book cover Information Retrieval Technology (AIRS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6458))

Included in the following conference series:

Abstract

The performance of transliteration from a source language to a target language builds the ground work in support of proper name Cross Lingual Information Retrieval (CLIR). Traditionally, this task is accomplished by two separate modules: transliteration and retrieval. Queries are first transliterated to target language using one or multiple hypotheses. The retrieval is then carried out based on translated queries. The transliteration often results in 30-50% errors with top 1 hypothesis, thus leading to significant performance degradation in CLIR. Therefore, we proposed a unified transliteration retrieval model that incorporates the transliteration similarity measurement into the relevance scoring function. In addition, we presented an efficient and robust method in similarity measurement for a given proper name pair using the Hidden Markov Model (HMM) based alignment and a Statistical Machine Translation (SMT) framework. Experimental data showed significant results with the proposed integrated method on the NTCIR7 IR4QA task, which demonstrated a greater flexibility and acceptance in transliteration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Larkey, L., AbdulJaleel, N., Connell, M.: What’s in a Name?: Proper Names in Arabic Cross Language Information Retrieval. CIIR Technical Report, IR-278, Univ. of Amherst (2003)

    Google Scholar 

  2. Darwish, K., Doermann, D., Jones, R., Oard, D., Rautiainen, M.: TREC-10 Experiments at University of Maryland CLIR and Video. In: 10th TREC, pp. 549–561 (2002)

    Google Scholar 

  3. Meng, H., Chen, B., Lo, W.K., Tang, K.: Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 311–314 (2001)

    Google Scholar 

  4. Virga, P., Khudanpur, S.: Transliteration of Proper Names in Cross-lingual Information Retrieval. In: ACL Workshop on Multilingual and Mixed-Language Named Entity Recognition, pp. 57–64 (2003)

    Google Scholar 

  5. Bellaachia, A., Amor-Tijani, G.: Proper Nouns in English–Arabic Cross Language Information Retrieval. J. American Society for Information Science and Technology 59(12), 1925–1935 (2008)

    Article  Google Scholar 

  6. Chen, H.-S., Huang, S.-J., Ding, Y.-W., Tasi, S.C.: Proper Name Translation in Cross-Language Information Retrieval. In: 17th COLING-ACL 1998, pp. 232–235 (1998)

    Google Scholar 

  7. Kishida, K.: Technical Issues of Cross-Language Information Retrieval: A Review. Information Processing & Management 41(3), 433–455 (2005)

    Article  Google Scholar 

  8. Xu, J., Weischedel, R., Nguyen, C.: Evaluating a Probabilistic Model for Cross-Lingual Information Retrieval. In: 24th ACM SIGIR, pp. 105–110 (2001)

    Google Scholar 

  9. Kraaij, W., Pohlmann, R., Hiemstra, D.: Twenty-one at TREC-8: Using Language Technology for Information Retrieval. In: 8th TREC, pp. 285–300 (2000)

    Google Scholar 

  10. Lavrenko, V., Choquette, M., Croft, W.B.: Cross-lingual relevance models. In: 25th ACM SIGIR, pp. 175–182 (2002)

    Google Scholar 

  11. Knight, K., Graehl, J.: Machine Transliteration. Computational Linguistics 24(4), 509–612 (1997)

    Google Scholar 

  12. Brown, P.E., Pietra, S.A.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  13. Gao, W., Wong, K.F., Lam, W.: Improving Transliteration with Precise Alignment of Phoneme Chunks and Using Context Features. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 106–117. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Li, H.Z., Zhang, M., Su, J.: A Joint Source-Channel Model for Machine Transliteration. In: 42nd ACL, pp. 159–166 (2004)

    Google Scholar 

  15. Kumaran, A., Kellner, T.: A Generic Framework for Machine Transliteration. In: 30th ACM SIGIR, pp. 721–722 (2008)

    Google Scholar 

  16. Klementiev, A., Roth, D.: Weakly Supervised Named Entity Transliteration and Discovery from Multi-lingual Comparable Corpora. In: 44th ACL, pp. 817–824 (2006)

    Google Scholar 

  17. Jiang, L., Zhou, M., Chien, L.F., Niu, C.: Named Entity Translation with Web Mining and Transliteration. In: 20th ICJAI, pp. 1629–1634 (2007)

    Google Scholar 

  18. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: 10th ACM SIGIR, pp. 275–281 (1998)

    Google Scholar 

  19. Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: 22nd ACM SIGIR, pp. 222–229 (1999)

    Google Scholar 

  20. Sakai, T., Kando, N., Lin, C.J., Mitamura, T., Shima, H., Ji, D., Chen, K.H., Nyberg, E.: Overview of the NTCIR-7 ACLIA IR4QA Task. In: NTCIR-7 Workshop Meeting, pp. 77–114 (2008)

    Google Scholar 

  21. Zhai, C.X., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information retrieval. ACM Trans. on Information Systems 22(2), 179–214 (2004)

    Article  Google Scholar 

  22. Och, F., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  23. Papeneni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: 40th ACL, pp. 311–318 (2001)

    Google Scholar 

  24. Jan, E., Ge, N., Lin, S.H., Roukos, S., Sorensen, J.: A Novel Approach to Proper Name Transliteration. Submitted to ISCSLP 2010

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jan, EE., Lin, SH., Chen, B. (2010). Transliteration Retrieval Model for Cross Lingual Information Retrieval. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17187-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17186-4

  • Online ISBN: 978-3-642-17187-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics