To read this content please select one of the options below:

Document text characteristics affect the ranking of the most relevant documents by expanded structured queries

Eero Sormunen (Department of Information Studies, University of Tampere, Finland)
Jaana Kekÿlÿinen (Department of Information Studies, University of Tampere, Finland)
Jussi Koivisto (Department of Information Studies, University of Tampere, Finland)
Kalervo Jÿrvelin (Department of Information Studies, University of Tampere, Finland)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 1 June 2001

583

Abstract

The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non‐relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept‐based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept‐based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept‐based QE in ranking than marginally relevant documents.

Keywords

Citation

Sormunen, E., Kekÿlÿinen, J., Koivisto, J. and Jÿrvelin, K. (2001), "Document text characteristics affect the ranking of the most relevant documents by expanded structured queries", Journal of Documentation, Vol. 57 No. 3, pp. 358-376. https://doi.org/10.1108/EUM0000000007087

Publisher

:

MCB UP Ltd

Copyright © 2001, MCB UP Limited

Related articles