Abstract
Current state of the art information retrieval models treat documents and queries as bags of words. There have been many attempts to go beyond this simple representation. Unfortunately, few have shown consistent improvements in retrieval effectiveness across a wide range of tasks and data sets. Here, we propose a new statistical model for information retrieval based on Markov random fields. The proposed model goes beyond the bag of words assumption by allowing dependencies between terms to be incorporated into the model. This allows for a variety of textual and non-textual features to be easily combined under the umbrella of a single model. Within this framework, we explore the theoretical issues involved, parameter estimation, feature selection, and query expansion. We give experimental results from a number of information retrieval tasks, such as ad hoc retrieval and web search.
Index Terms
- Beyond bags of words: effectively modeling dependence and features in information retrieval
Recommendations
Beyond bag-of-words: Bigram-enhanced context-dependent term weights
While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in ...
Medical image retrieval based on unclean image bags
Traditional content-based image retrieval (CBIR) scheme with assumption of independent individual images in large-scale collections suffers from poor retrieval performance. In medical applications, images usually exist in the form of image bags and each ...
Comments