Skip to main content
Log in

Toward Web mining of cross-language query translations in digital libraries

  • Regular contribution
  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

This paper proposes an effective query-translation approach that enables a cross-language information retrieval (CLIR) service to be more easily supported in digital library systems that only contain monolingual content. A query-translation engine called LiveTrans is used to process the translation requests of cross-lingual queries from connected digital library systems. To automatically extract translations not covered by standard dictionaries, the engine is developed based on a novel integration of dictionary resources and Web mining approaches, including anchor-text and search-result methods. The engine exploits a broad range of multilingual Web resources used as live bilingual corpora to alleviate translation difficulties. It is shown to be particularly effective for extracting multilingual translation equivalents of query terms containing proper names or new terminology. The obtained results show the feasibility of and great potential for creating English-Chinese CLIR services in existing digital libraries and new applications in cross-language Web searching, although difficulties still remain that need to be investigated further.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Borgman CL (1997) Multi-Media, multi-cultural, and multi-lingual digital libraries: or How do we exchange data in 400 languages? D-Lib Mag 3(6)

  2. Cao Y, Li H (2002) Base noun phrase translation using Web data and the EM algorithm. In: Proceedings of the 19th international conference on computational linguistics, pp 127–133

  3. Chen A, Jiang H, Gey F (2000) Combining multiple sources for short query translation in Chinese-English cross-language information retrieval. In: Proceedings of the 5th international workshop on information retrieval with Asian languages (IRAL 2000), pp 17–23

  4. Chien LF (1997) PAT-tree-based keyword extraction for Chinese information retrieval. In: Proceedings of the 20th annual international ACM conference on research and development in information retrieval (SIGIR 1997), pp 50–58

  5. Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the World Wide Web. In: Proceedings of the 9th IEEE international conference on tools with artificial intelligence, pp 558–567

  6. Dreilinger D, Howe A (1996) Experiences with selecting search engines using meta-search. ACM Trans Inf Syst 15(3):195–222

    Google Scholar 

  7. Dumais ST, Landauer TK, Littman ML (1996) Automatic cross-linguistic information retrieval using latent semantic indexing. In: Proceedings of ACM-SIGIR workshop on cross-linguistic information retrieval, pp 16–24

  8. Feldman R, Dagan I (1995) KDT – Knowledge discovery in texts. In: Proceedings of the 1st international conference on knowledge discovery and data mining

  9. Fung P, Yee LY (1998) An IR approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 36th annual conference of the association for computational linguistics, pp 414–420

  10. Gravano L, Chang K, Garcia-Molina H, Paepcke A (1997) STARTS: Stanford protocol proposal for Internet retrieval and search. In: Proceedings of ACM SIGMOD, pp 126–137

  11. Ide E (1971) New experiments in relevance feedback. In: Salton G (ed) The SMART retrieval system. Prentice-Hall, Englewood Cliffs, NJ, pp 337–354

  12. Kwok KL (2001) NTCIR-2 Chinese, cross language retrieval experiments using PIRCS. In: Proceedings of the 2nd NTCIR workshop on research in Chinese and Japanese text retrieval and text summarization, pp 111–118

  13. Larson RR, Gey F, Chen A (2002) Harvesting translingual vocabulary mappings for multilingual digital libraries. In: Proceedings of the ACM/IEEE joint conference on digital libraries, pp 185–190

  14. Lavrenko V, Choquette M, Croft WB (2002) Cross-lingual relevance models. In: Proceedings of ACM SIGIR 2002, pp 175–182

  15. Liu SH, Chen KJ, Chang LP, Chin YH (1995) Automatic part-of-speech tagging for Chinese corpora. Comput Process Chinese Oriental Lang 9(1):31–47

    Google Scholar 

  16. Lu WH, Chien LF, Lee HJ (2001) Anchor text mining for translation of Web queries. In: Proceedings of the IEEE international conference on data mining, pp 401–408

  17. Lu WH, Chien LF, Lee HJ (2002) Translation of Web queries using anchor text mining. ACM Trans Asian Lang Inf Process 1(2):159–172

    Google Scholar 

  18. Lu WH, Chien LF, Lee HJ (2002) A transitive model for extracting translation equivalents of Web queries through anchor text mining. In: Proceedings of the 19th international conference on computational linguistics, pp 584–590

  19. Nie JY, Isabelle P, Simard M, Durand R (1999) Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In: Proceedings of ACM SIGIR, pp 74–81

  20. Oard DW (1997) Cross-language text retrieval research in the USA. In: Proceedings of the 3rd ERCIM DELOS workshop, Zurich, Switzerland

  21. Oard DW (1997) Serving users in many languages: cross-language information retrieval for digital libraries. D-Lib Mag 3(12)

  22. Peters C, Picchi E (1997) Across languages, across cultures: issues in multilinguality and digital libraries. D-Lib Mag 3(5)

  23. Powell J, Fox EA (1998) Multilingual federated searching across heterogeneous collections. D-Lib Mag 4(9)

  24. Pu HT, Chuang SL, Yang C (2002) Exploration of Web users’ search interests through automatic subject categorization of query terms. J Am Soc Inf Sci Technol 53(8):617–630

    Google Scholar 

  25. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system. Prentice-Hall, Englewood Cliffs, NJ, pp 313–323

  26. Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci Technol 41(4):288–297

    Google Scholar 

  27. Smadja F, McKeown K, Hatzivassiloglou V (1996) Translating collocations for bilingual lexicons: a statistical approach. Comput Linguist 22(1):1–38

    Google Scholar 

  28. Spink A, Wolfram D, Jansen MBJ, Saracevic T (2001) Searching the Web: the public and their queries. J Am Soc Inf Sci Technol 52(3):226–234

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenq-Haur Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, JH., Lu, WH. & Chien, LF. Toward Web mining of cross-language query translations in digital libraries. Int J Digit Libr 4, 247–257 (2004). https://doi.org/10.1007/s00799-004-0091-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-004-0091-y

Keywords

Navigation