Abstract
A classical information retrieval system ranks documents according to distances between texts and a user query. The answer list is often so long that users cannot examine all the documents retrieved whereas some relevant ones are badly ranked and thus never retrieved. To solve this problem, retrieved documents are automatically clustered. We describe an algorithm based on hierarchical and clustering methods. It classifies the set of documents retrieved by any IR-system. This method is evaluated over the TREC-7 corpora and queries. We show that it improves the results of the retrieval by providing users at least one high precision cluster. The impact of the number of clusters and the way to browse them to build a reordered list are examined. Over TREC corpora and queries, we show that the choice of the number of clusters according to the length of queries improves results compared with a prefixed number.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, R.B., Obry, P., Littman, M.: An interface for navigating clustered document sets returned by queries. In: Proceedings of COCS, p. 166 (1993)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Sactter/Gather: a Cluster-based Approach to Browsing Large Document Collections. In: ACM/SIGIR, pp. 318–329 (1992)
Diday, E., Lemaire, J., Pouget, J., Testu, F.: Eléments d’Analyse des Données, Dunod Informatique (1982)
Evans, D.A., Huettner, A., Tong, X., Jansen, P., Subasic, P.: Notes on the Effectiveness of Clustering in Ad-Hoc Retrieval. In: TREC-7, NIST special publication (1998)
Frakes, W.B., Baeza-Yates, R. (eds.): Information Retrieval, Data Structures & Algorithms. Prentice-Hall Inc., Englewood Cliffs (1992) ISBN-0-13-463837-9
Hearst, M.A., Pedersen, J.O.: Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In: ACM/SIGIR, pp. 76–84 (1996)
Kowalski, G.: Information Retrieval Systems, Theory and Implementation. Kluwer Academic Publishers, Dordrecht (1997) ISBN-0-7923-9926-9
de Loupy, C., Bellot, P., El-Bèze, M., Marteau, P.F.: Query Expansion and Automatic Classification. In: TREC-7, NIST special publication #500-242, pp. 443–450 (1999)
Sahami, M., Yusufali, S., Baldonaldo, M.Q.W.: SONIA a service for organizing networked information autonomously. In: ACM/DL, pp. 200-209 (1998)
Schütze, H., Silverstein, C.: Projections for Efficient Document Clustering. In: ACM/SIGIR, pp. 74-81 (1997)
Silverstein, C., Pedersen, J.O.: Almost-Constant-Time Clustering of Arbitrary Corpus Subsets. In: ACM/SIGIR, pp. 60-66 (1997)
Van Rijsbergen, C.J.: Information Retrieval, Buttherwords, London (1979)
Voorhees, E.M., Harman, D.: Overview of the Seventh Text REtrieval Conference (TREC- 7), NIST special publication (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bellot, P., El-Bèze, M. (1999). Query Length, Number of Classes and Routes through Clusters: Experiments with a Clustering Method for Information Retrieval. In: Hui, L.C.K., Lee, DL. (eds) Internet Applications. ICSC 1999. Lecture Notes in Computer Science, vol 1749. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-46652-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-46652-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66903-6
Online ISBN: 978-3-540-46652-9
eBook Packages: Springer Book Archive