ABSTRACT
Information retrieval using word senses is emerging as a good research challenge on semantic information retrieval. In this paper, we propose a new method using word senses in information retrieval: root sense tagging method. This method assigns coarse-grained word senses defined in WordNet to query terms and document terms by unsupervised way using co-occurrence information constructed automatically. Our sense tagger is crude, but performs consistent disambiguation by considering only the single most informative word as evidence to disambiguate the target word. We also allow multiple-sense assignment to alleviate the problem caused by incorrect disambiguation.Experimental results on a large-scale TREC collection show that our approach to improve retrieval effectiveness is successful, while most of the previous work failed to improve performances even on small text collection. Our method also shows promising results when is combined with pseudo relevance feedback and state-of-the-art retrieval function such as BM25.
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6):391--407, 1990.Google ScholarCross Ref
- T. Hofmann. Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval pages 50--57, Berkeley, California, August 1999. Google ScholarDigital Library
- K. S. Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments -part 1. Information Processing and Management 36(6):779--808, 2000. Google ScholarDigital Library
- R. Krovetz and W. B. Croft. Lexical ambiguity and information retrieval. Information Systems 10(2):115--141, 1992. Google ScholarDigital Library
- C. D. Manning and H. Schutze. Foundations of statistical natural language processing MIT Press, 1999. Google ScholarDigital Library
- S. E. Robertson and S. Walker. Okapi/keenbow at trec-8. In Proceedings of TREC-8, 8th Text Retrieval Conference pages 151--161, Gaithersburg, US, 2000.Google Scholar
- M. Sanderson. Retrieving with good sense. Inf. Retr., 2(1):49--69, 2000. Google ScholarDigital Library
- M. Sanderson and C. J. V. Rijsbergen. The impact on retrieval effectiveness of skewed frequency distributions. ACM Transactions on Information Systems 17(4):440--465, 1999. Google ScholarDigital Library
- H.Schutze and J. Pedersen. Information retrieval based on word senses. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval pages 161--175, 1995.Google Scholar
- C. Stokoe, M. P. Oakes, and J. Tait. Word sense disambiguation in information retrieval revisited. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval pages 159--166. ACM Press, 2003. Google ScholarDigital Library
- E. M. Voorhees. Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval pages 171--180. ACM Press, 1993. Google ScholarDigital Library
- P. Wallis. Information retrieval based on paraphrase. In Proceedings of the 1st Pacific Association for Computational Linguistics Conference 1993.Google Scholar
- D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of 33rd Annual Meeting of the Association for Computational Linguistics pages 189--196, 1995. Google ScholarDigital Library
Index Terms
- Information retrieval using word senses: root sense tagging approach
Recommendations
Word sense disambiguation in information retrieval revisited
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalWord sense ambiguity is recognized as having a detrimental effect on the precision of information retrieval systems in general and web search systems in particular, due to the sparse nature of the queries involved. Despite continued research into the ...
Word sense disambiguation in queries
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementThis paper presents a new approach to determine the senses of words in queries by using WordNet. In our approach, noun phrases in a query are determined first. For each word in the query, information associated with it, including its synonyms, hyponyms, ...
An effective approach to document retrieval via utilizing WordNet and recognizing phrases
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalNoun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window ...
Comments