Abstract
In classic Information Retrieval systems a relevant document will not be retrieved in response to a query if the document and query representations do not share at least one term. This problem, known as “term mismatch”, has been recognised for a long time by the Information Retrieval community and a number of possible solutions have been proposed. Here I present a preliminary investigation into a new class of retrieval models that attempt to solve the term mismatch problem by exploiting complete or partial knowledge of term similarity in the term space. The use of term similarity enables to enhance classic retrieval models by taking into account non-matching terms. The theoretical advantages and drawbacks of these models are presented and compared with other models tackling the same problem. A preliminary experimental investigation into the performance gain achieved by exploiting term similarity with the proposed models is presented and discussed.
Article PDF
Similar content being viewed by others
References
Aitchison J and Gilchrist A (1987) Thesaurus construction. A practical manual. ASLIB, London, 2nd edition.
Brown PF, Della Pietra VJ, deSouza PV, Lai JC and Mercer RL (1992) Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–479.
Church KW and Hanks P (1989) Word association norms, mutual information and lexicography. In: Proccedings of ACL 27, Vancouver, Canada, pp. 76–83.
Crestani F, Ruthven I, Sanderson M and van Rijsbergen CJ (1995) The troubles with using a logical model of IR on a large collection of documents. Experimenting retrieval by logical imaging on TREC. In: Proceedings of the TREC Conference, Washington D.C., USA, pp. 509–525.
Crestani F and van Rijsbergen CJ (1995) Information retrieval by logical imaging. Journal of Documentation, 51(1):1–15.
Crestani F and van Rijsbergen CJ (1995) Probability kinematics in Information Retrieval. In: Proceedings of ACM SIGIR, Seattle, WA, USA, pp. 291–299.
Crestani F and van Rijsbergen CJ (1998) Astudy of probability kinematics information retrieval. ACMTransactions on Information Systems, 16(3):225–255.
Deerwester S, Dumais ST, Furnas GW, Landauer T and Harshman (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407.
Efthimiadis E (1996) Query expansion. Annual Review of Information Science and Technology, 31:121–187.
Frakes WB (1992) Stemming algorithms. In: Frakes WB and Baeza-Yates R, Eds., Information Retrieval: data structures and algorithms, Prentice Hall, Englewood Cliffs, New Jersey, USA, ch. 8.
Gärdenfors P (1988) Knowledge in flux: modelling the dynamics of epistemic states. The MIT Press, Cambridge, Massachusetts, USA.
Harman D (1992) Relevance feedback and other query modification techniques. In: Frakes WB and Baeza-Yates R, Eds., Information Retrieval: data structures and algorithms, Prentice Hall, Englewood Cliffs, New Jersey, USA, ch. 11.
Harman D, Ed. (1997) Proceedings of the Sixth Text Retrieval Conference (TREC-6), Gaithersburg, MD, USA.
Magennis M and van Rijsbergen CJ (1997) The potential and actual effectiveness of interactive query expansion. In: Proceedings of ACM SIGIR, Philadelphia, PA, USA, pp. 324–332.
Nie JY (1988) An outline of a general model for information retrieval. In: Proceedings of ACMSIGIR, Grenoble, France, pp. 495–506.
Rasmussen E (1992) Clustering algorithms. In: Frakes WB and Baeza-Yates R, Eds., Information Retrieval: data structures and algorithms., Prentice Hall, Englewood Cliffs, New Jersey, USA, ch. 16.
Richardson R and Smeaton AF (1995) Using wordnet in a knowledge-based approach to Information retrieval. Technical Report CA-0395, School of Computer Applications, Dublin City University, Dublin, Ireland.
Salton G (1968) Automatic information organization and retrieval. McGraw Hill, New York.
Sanderson M (1996) System for Information Retrieval experiments (SIRE). Unpublished paper.
Smeaton AF (1992) Progress in the application of natural language processing to information retrieval tasks. The Computer Journal, 35(3):268–278.
Sparck Jones K (1981) Information Retrieval Experiments. Butterworth, London.
Srinivadsan P (1992) Thesaurus construction. In: Frakes WB and Baeza-Yates R, Eds., Information retrieval: data structures and algorithms, Prentice Hall, Englewood Cliffs, New Jersey, USA, ch. 9, pp. 161–218.
Tague-Sutcliffe J (1995) Measuring information. Academic Press, San Diego, CA, USA.
van Rijsbergen CJ (1979) Information Retrieval. second edition, Butterworths, London.
van Rijsbergen CJ (1986) A non-classical logic for Information retrieval. The Computer Journal, 29(6):481–485.
Voorhees EM (1993) On expanding query vectors with lexically related words. In: Proceedings of the TREC Conference, Gaithersburg, MD, USA, pp. 223–232
Wong SKM, Cai YJ and Yao YY (1993) Computation of term association by a neural network. In: Proceedings of ACM SIGIR, Pittsburgh, PA, USA.
Xu J (1997) Solving the word mismatch problem through automatic text analysis. Ph.D. Thesis, Department of Computer Science, University of Massachusetts, Amherst, MA, USA.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Crestani, F. Exploiting the Similarity of Non-Matching Terms at Retrieval Time. Information Retrieval 2, 27–47 (2000). https://doi.org/10.1023/A:1009973415168
Issue Date:
DOI: https://doi.org/10.1023/A:1009973415168