skip to main content
research-article

Beyond bags of words: effectively modeling dependence and features in information retrieval

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

Current state of the art information retrieval models treat documents and queries as bags of words. There have been many attempts to go beyond this simple representation. Unfortunately, few have shown consistent improvements in retrieval effectiveness across a wide range of tasks and data sets. Here, we propose a new statistical model for information retrieval based on Markov random fields. The proposed model goes beyond the bag of words assumption by allowing dependencies between terms to be incorporated into the model. This allows for a variety of textual and non-textual features to be easily combined under the umbrella of a single model. Within this framework, we explore the theoretical issues involved, parameter estimation, feature selection, and query expansion. We give experimental results from a number of information retrieval tasks, such as ad hoc retrieval and web search.

Index Terms

  1. Beyond bags of words: effectively modeling dependence and features in information retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGIR Forum
      ACM SIGIR Forum  Volume 42, Issue 1
      June 2008
      76 pages
      ISSN:0163-5840
      DOI:10.1145/1394251
      Issue’s Table of Contents

      Copyright © 2008 Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2008

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader