ABSTRACT
With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of translated query terms. However, those natural language resources are not readily available. In this paper, we propose a disambiguation method for dictionary-based query translation that is independent of the availability of such scarce language resources, while achieving adequate retrieval effectiveness by utilizing Web documents as a corpus and using co-occurrence information between terms within that corpus. In the experiments, our method achieved 97% of manual translation case in terms of the average precision.
- 1.Kikui, G. Identifying the coding system and language of on-line documents using statistical language models. Transactions oflPSJ, 1997, 38(12), pp. 2440-2448.]]Google Scholar
- 2.Sugimoto, S., Maeda, A., Dartois, M., Ohta, J., Nakao, S., Sakaguchi, T. and Tabata, K. Experimental studies on an applet-based document viewer for multilingual WWW Documents -- Functional Extension of and Lessons Learned from Multilingual HTML. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL'98), Lecture Notes in Computer Science 1513, Springer-Verlag, 1998, pp. 199-214.]] Google ScholarDigital Library
- 3.Jansen, B. J., Spink, A. and Saracevic, T. Real life, real users, and real needs: a study and analysis of user queries on the Web. Information Processing & Management, 2000, 36(2), pp. 207-227.]] Google ScholarDigital Library
- 4.Fujii, A. and Ishikawa, T. Cross-language information retrieval for technical documents. In Proceedings of the Joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999, pp. 29-37.]]Google Scholar
- 5.Oard, D. W. Alternative approaches for cross-language text retrieval. In Electronic Working Notes of the AAAI Symposium on Cross-Language Text and Speech Retrieval, 1997.]]Google Scholar
- 6.Grefenstette, G., editor. Cross-language information retrieval. The Kluwer International Series on Information Retrieval, Vol. 2. Kluwer Academic Publishers, 1998.]] Google ScholarDigital Library
- 7.Nie, J., Simard, M., Isabelle, P. and Durand, R. Crosslanguage information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), 1999, pp. 74-81.]] Google ScholarDigital Library
- 8.Maeda, A. and Uemura, S. Key technologies for multilingual information processing on WWW. In Proceedings of the Fourth International Symposium on Standardization of Multilingual Information Technology (MLIT-4), 1999, pp. 15-25.]]Google Scholar
- 9.Lin, C., Lin, W., Bian, G. and Chen, H. Description of the NTU Japanese-English cross-lingual information retrieval system used for NTCIR workshop. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, 1999, pp. 145-148.]]Google Scholar
- 10.Jang, M., Myaeng, S. H. and Park, S. Y. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), 1999, pp. 223-229.]] Google ScholarDigital Library
- 11.Ballesteros, L. and Croft, W. B. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98), 1998, pp. 64-71.]] Google ScholarDigital Library
- 12.Fatiha, S., Maeda, A., Yoshikawa, M. and Uemura, S.: Integrating Dictionary-based and Statistical-based Approaches in Cross-Language Information Retrieval, IPSJ SIG Notes, 2000-DBS-121/2000-FI-Sg, 2000, pp. 61--68.]]Google Scholar
- 13.Ikeno, A., Murata, T., Shimohata, S. and Yamamoto, H. Machine translation using the Internet natural language resources. In Proceedings of World TELECOM99+ lnteractive99 Forum, 1999.]]Google Scholar
- 14.Church, K. W. and Hanks, P. Word association norms, mutual information, and lexicography. Computational Linguistics, 1990, 16(1), pp. 22-29.]] Google ScholarDigital Library
- 15.Kitamura, M. and Matsumoto, Y. Automatic extraction of translation patterns in parallel corpora. Transactions oflPSJ, 1997, 38(4), pp. 727-736. (in Japanese)]]Google Scholar
- 16.Dunning, T. Accurate methods for the statisticx of surprise and coincidence. Computational Linguistics, 1993, 19(1), pp. 61-74.]] Google ScholarDigital Library
- 17.Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S. and Adachi, J. The NTCIR workshop: the first evaluation workshop on Japanese text retrieval and cross-lingual information retrieval. In Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages (1RAL '99), 1999.]]Google Scholar
- 18.Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H. and Asahara, M. Japanese morphological analysis system ChaSen version 2.0 manual 2nd edition. Technical Report NAIST-IS- TR99013, Nara Institute of Science and Technology, 1999.]]Google Scholar
- 19.Japan Electronic Dictionary Research Institute, Ltd. EDR electronic dictionary version 1.5 technical guide, Technical Report TR2-007, Japan Electronic Dictionary Research Institute, Ltd., 1996.]]Google Scholar
- Query term disambiguation for Web cross-language information retrieval using a search engine
Recommendations
Using Statistical Term Similarity for Sense Disambiguation in Cross-Language Information Retrieval
AbstractWith the increasing availability of machine-readable bilingual dictionaries, dictionary-based automatic query translation has become a viable approach to Cross-Language Information Retrieval (CLIR). In this approach, resolving term ambiguity is a ...
Term disambiguation techniques based on target document collection for cross-language information retrieval: an empirical comparison of performance between techniques
Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based ...
Using Mutual Information Technique in Cross-Language Information Retrieval
ICADL 08: Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to InformationThis paper describes the Indonesian-English cross language information system, where we investigated the problem of cross language document retrieval for Indonesian-English. Our work based on Indonesian-English parallel corpus and applied mutual ...
Comments