Abstract
This paper proposes an effective query-translation approach that enables a cross-language information retrieval (CLIR) service to be more easily supported in digital library systems that only contain monolingual content. A query-translation engine called LiveTrans is used to process the translation requests of cross-lingual queries from connected digital library systems. To automatically extract translations not covered by standard dictionaries, the engine is developed based on a novel integration of dictionary resources and Web mining approaches, including anchor-text and search-result methods. The engine exploits a broad range of multilingual Web resources used as live bilingual corpora to alleviate translation difficulties. It is shown to be particularly effective for extracting multilingual translation equivalents of query terms containing proper names or new terminology. The obtained results show the feasibility of and great potential for creating English-Chinese CLIR services in existing digital libraries and new applications in cross-language Web searching, although difficulties still remain that need to be investigated further.
Similar content being viewed by others
References
Borgman CL (1997) Multi-Media, multi-cultural, and multi-lingual digital libraries: or How do we exchange data in 400 languages? D-Lib Mag 3(6)
Cao Y, Li H (2002) Base noun phrase translation using Web data and the EM algorithm. In: Proceedings of the 19th international conference on computational linguistics, pp 127–133
Chen A, Jiang H, Gey F (2000) Combining multiple sources for short query translation in Chinese-English cross-language information retrieval. In: Proceedings of the 5th international workshop on information retrieval with Asian languages (IRAL 2000), pp 17–23
Chien LF (1997) PAT-tree-based keyword extraction for Chinese information retrieval. In: Proceedings of the 20th annual international ACM conference on research and development in information retrieval (SIGIR 1997), pp 50–58
Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the World Wide Web. In: Proceedings of the 9th IEEE international conference on tools with artificial intelligence, pp 558–567
Dreilinger D, Howe A (1996) Experiences with selecting search engines using meta-search. ACM Trans Inf Syst 15(3):195–222
Dumais ST, Landauer TK, Littman ML (1996) Automatic cross-linguistic information retrieval using latent semantic indexing. In: Proceedings of ACM-SIGIR workshop on cross-linguistic information retrieval, pp 16–24
Feldman R, Dagan I (1995) KDT – Knowledge discovery in texts. In: Proceedings of the 1st international conference on knowledge discovery and data mining
Fung P, Yee LY (1998) An IR approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 36th annual conference of the association for computational linguistics, pp 414–420
Gravano L, Chang K, Garcia-Molina H, Paepcke A (1997) STARTS: Stanford protocol proposal for Internet retrieval and search. In: Proceedings of ACM SIGMOD, pp 126–137
Ide E (1971) New experiments in relevance feedback. In: Salton G (ed) The SMART retrieval system. Prentice-Hall, Englewood Cliffs, NJ, pp 337–354
Kwok KL (2001) NTCIR-2 Chinese, cross language retrieval experiments using PIRCS. In: Proceedings of the 2nd NTCIR workshop on research in Chinese and Japanese text retrieval and text summarization, pp 111–118
Larson RR, Gey F, Chen A (2002) Harvesting translingual vocabulary mappings for multilingual digital libraries. In: Proceedings of the ACM/IEEE joint conference on digital libraries, pp 185–190
Lavrenko V, Choquette M, Croft WB (2002) Cross-lingual relevance models. In: Proceedings of ACM SIGIR 2002, pp 175–182
Liu SH, Chen KJ, Chang LP, Chin YH (1995) Automatic part-of-speech tagging for Chinese corpora. Comput Process Chinese Oriental Lang 9(1):31–47
Lu WH, Chien LF, Lee HJ (2001) Anchor text mining for translation of Web queries. In: Proceedings of the IEEE international conference on data mining, pp 401–408
Lu WH, Chien LF, Lee HJ (2002) Translation of Web queries using anchor text mining. ACM Trans Asian Lang Inf Process 1(2):159–172
Lu WH, Chien LF, Lee HJ (2002) A transitive model for extracting translation equivalents of Web queries through anchor text mining. In: Proceedings of the 19th international conference on computational linguistics, pp 584–590
Nie JY, Isabelle P, Simard M, Durand R (1999) Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In: Proceedings of ACM SIGIR, pp 74–81
Oard DW (1997) Cross-language text retrieval research in the USA. In: Proceedings of the 3rd ERCIM DELOS workshop, Zurich, Switzerland
Oard DW (1997) Serving users in many languages: cross-language information retrieval for digital libraries. D-Lib Mag 3(12)
Peters C, Picchi E (1997) Across languages, across cultures: issues in multilinguality and digital libraries. D-Lib Mag 3(5)
Powell J, Fox EA (1998) Multilingual federated searching across heterogeneous collections. D-Lib Mag 4(9)
Pu HT, Chuang SL, Yang C (2002) Exploration of Web users’ search interests through automatic subject categorization of query terms. J Am Soc Inf Sci Technol 53(8):617–630
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system. Prentice-Hall, Englewood Cliffs, NJ, pp 313–323
Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci Technol 41(4):288–297
Smadja F, McKeown K, Hatzivassiloglou V (1996) Translating collocations for bilingual lexicons: a statistical approach. Comput Linguist 22(1):1–38
Spink A, Wolfram D, Jansen MBJ, Saracevic T (2001) Searching the Web: the public and their queries. J Am Soc Inf Sci Technol 52(3):226–234
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, JH., Lu, WH. & Chien, LF. Toward Web mining of cross-language query translations in digital libraries. Int J Digit Libr 4, 247–257 (2004). https://doi.org/10.1007/s00799-004-0091-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-004-0091-y