Skip to main content

The Use of Monolingual Context Vectors for Missing Translations in Cross-Language Information Retrieval

  • Conference paper
Book cover Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

  • 1561 Accesses

Abstract

For cross-language text retrieval systems that rely on bilingual dictionaries for bridging the language gap between the source query language and the target document language, good bilingual dictionary coverage is imperative. For terms with missing translations, most systems employ some approaches for expanding the existing translation dictionaries. In this paper, instead of lexicon expansion, we explore whether using the context of the unknown terms can help mitigate the loss of meaning due to missing translation. Our approaches consist of two steps: (1) to identify terms that are closely associated with the unknown source language terms as context vectors and (2) to use the translations of the associated terms in the context vectors as the surrogate translations of the unknown terms. We describe a query-independent version and a query-dependent version using such monolingual context vectors. These methods are evaluated in Japanese-to-English retrieval using the NTCIR-3 topics and data sets. Empirical results show that both methods improved CLIR performance for short and medium-length queries and that the query-dependent context vectors performed better than the query-independent versions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ballesteros, L., Croft, B.: Dictionary Methods for Cross-Language Information Retrieval. In: Proceedings of Database and Expert Systems Applications, pp. 791–801 (1996)

    Google Scholar 

  2. Ballesteros, L., Croft, W.B.: Resolving Ambiguity for Cross-Language Retrieval. In: Proceedings of SIGIR, pp. 64–71 (1998)

    Google Scholar 

  3. Billhardt, H., Borrajo, D., Maojo, V.: A Context Vector Model for Information Retrieval. Journal of the American Society for Information Science and Technology 53(3), 236–249 (2002)

    Article  Google Scholar 

  4. Evans, D.A., Lefferts, R.G.: CLARIT–TREC Experiments. Information Processing and Management 31(3), 385–395 (1995)

    Article  Google Scholar 

  5. Fujii, A., Ishikawa, T.: Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration. Computer and the Humanities 35(4), 389–420 (2001)

    Article  Google Scholar 

  6. Fung, P.: A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora. In: Proceedings of AMTA, pp. 1–17 (1998)

    Google Scholar 

  7. Fung, P., Yee, L.Y.: An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In: Proceedings of COLING-ACL, pp. 414–420 (1998)

    Google Scholar 

  8. Hull, D.A., Grefenstette, G.: Experiments in Multilingual Information Retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–57 (1996)

    Google Scholar 

  9. Grefenstette, G.: Evaluating the Adequacy of a Multilingual Transfer Dictionary for Cross Language Information Retrieval. In: Proceedings of LREC, pp. 755–758 (1998)

    Google Scholar 

  10. Grefenstette, G.: The Problem of Cross Language Information Retrieval. In: Grefenstette, G. (ed.) Cross Language Information Retrieval, pp. 1–9. Kluwer Academic Publishers, Dordrecht (1998)

    Google Scholar 

  11. Grefenstette, G., Qu, Y., Evans, D.A.: Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 110–116 (2004)

    Google Scholar 

  12. Ido, D., Church, K., Gale, W.A.: Robust Bilingual Word Alignment for Machine Aided Translation. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, pp. 1–8 (1993)

    Google Scholar 

  13. Jeong, K.S., Myaeng, S., Lee, J.S., Choi, K.S.: Automatic Identification and Back-transliteration of Foreign Words for Information Retrieval. Information Processing and Management 35(4), 523–540 (1999)

    Article  Google Scholar 

  14. Knight, K., Graehl, J.: Machine Transliteration. Computational Linguistics 24(4), 599–612 (1998)

    Google Scholar 

  15. Kumano, A., Hirakawa, H.: Building an MT dictionary from Parallel Texts Based on Linguistic and Statistical Information. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING), pp. 76–81 (1994)

    Google Scholar 

  16. Meng, H., Lo, W., Chen, B., Tang, K.: Generating Phonetic Cognates to Handel Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In: Proc. of the Automatic Speech Recognition and Understanding Workshop, ASRU 2001 (2001)

    Google Scholar 

  17. Pirkola, A., Puolamaki, D., Jarvelin, K.: Applying Query Structuring in Cross-Language Retrieval. Information Management and Processing: An International Journal 39(3), 391–402 (2003)

    Article  MATH  Google Scholar 

  18. Qu, Y., Grefenstette, G.: Finding Ideographic Representations of Japanese Names in Latin Scripts via Language Identification and Corpus Validation. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 183–190 (2004)

    Google Scholar 

  19. Qu, Y., Grefenstette, G., Evans, D.A.: Resolving Translation Ambiguity Using Monolingual Corpora. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 223–241. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  20. Qu, Y., Grefenstette, G., Evans, D.A.: Automatic Transliteration for Japanese-to-English Text Retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 353–360 (2003)

    Google Scholar 

  21. Qu, Y., Hull, D.A., Grefenstette, G., Evans, D.A., Ishikawa, M., Nara, S., Ueda, T., Noda, D., Arita, K., Funakoshi, Y., Matsuda, H.: Towards Effective Strategies for Monolingual and Bilingual Information Retrieval: Lessons Learned from NTCIR-4. In: ACM Transactions on Asian Language Information Processing (to appear)

    Google Scholar 

  22. Zhang, Y., Vines, P.: Using the web for automated translation extraction in cross-language information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 162–169 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qu, Y., Grefenstette, G., Evans, D.A. (2005). The Use of Monolingual Context Vectors for Missing Translations in Cross-Language Information Retrieval. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_3

Download citation

  • DOI: https://doi.org/10.1007/11562214_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics