research-article

Query-oriented clustering: a multi-objective approach

Authors:
Sylvain Lamprier

Université Pierre et, Marie Curie

Université Pierre et, Marie Curie
View Profile

,
Tassadit Amghar

Université d'Angers

Université d'Angers
View Profile

,
Frédéric Saubion

Université d'Angers

Université d'Angers
View Profile

,
Bernard Levrat

Université d'Angers

Université d'Angers
View Profile

SAC '10: Proceedings of the 2010 ACM Symposium on Applied ComputingMarch 2010Pages 1789–1795https://doi.org/10.1145/1774088.1774467

Published:22 March 2010Publication History

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

Pages 1789–1795

ABSTRACT

Document clustering techniques have been widely applied in Information Retrieval to reorganize results furnished as a response to user's queries. Following the Cluster Hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant ones, most of relevant documents are likely to be gathered in a single cluster. Usually, systems organizing search results as a set of clusters consider this tendency as a very advantageous phenomenon, since it allows to filter the results provided by the initial search. Adopting a different point of view, we rather consider the Cluster Hypothesis as a hindrance to the information access since it prevents the emergence of the various aspects of the query. The risk induced is to restrict the perception of the subject to an unique point of view. Therefore, we propose to rather distribute the relevant documents over clusters by orienting the organization of the clusters according to the user's topic. The aim is to attract the clusters around the latter in order to highlight the thematic differences between documents which are strongly connected to the query. Rather than modifying the inter-documents similarity computation as it is the case in several studies, we propose to directly act on the organization of the clusters by using a multi-objective evolutionary clustering algorithm which, besides the classical internal cohesion, also optimizes the query proximity of the clusters. First experimental results highlight the great benefit which may be gained by our way of query consideration.

References

R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999. Google ScholarDigital Library
P. Bellot and M. El-Bèze. Query length, number of classes and routes through clusters: Experiments with a clustering method for information retrieval. In ICSC'99, pages 196--205, London, UK, 1999. Springer-Verlag. Google ScholarDigital Library
D. Goldberg. Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley, Reading, MA, USA, 1989. Google ScholarDigital Library
J. Handl and J. Knowles. An evolutionary approach to multiobjective clustering. Evolutionary Computation, IEEE Transactions on, 11(1):56--76, 2007. Google ScholarDigital Library
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In SIGIR'96, pages 76--84, Zürich, CH, 1996. Google ScholarDigital Library
J. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, USA, 1975. Google ScholarDigital Library
N. Jardine and C. J. Van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.Google ScholarCross Ref
S. Koshman, A. Spink, and B. J. Jansen. Web searching on the vivisimo search engine. Journal of the American Society for Information Science and Technology, 57(14), 2006. Google ScholarDigital Library
A. Leuski. Evaluating document clustering for interactive information retrieval. In CIKM'01, pages 33--40, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
D. Liu, Y. He, D. Ji, and H. Yang. Genetic algorithm based multi-document summarization. In PRICAI'06, volume 4099 of LNCS, pages 1140--1144. Springer, 2006. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
U. Maulik and S. Bandyopadhyay. Genetic algorithm-based clustering technique. Pattern Recognition, 33:1455--1465, 2000.Google ScholarCross Ref
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarCross Ref
G. Salton. Cluster search strategies and the optimization of retrieval effectiveness. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 223--242. Prentice-Hall, 1971.Google Scholar
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988. Google ScholarDigital Library
G. C. Stein, A. Bagga, and G. B. Wise. Multi-document summarization: Methodologies and evaluations. In TALN'00, pages 337--346, 2000.Google Scholar
A. Tombros. The Effectiveness of Query-Based Hierarchic Clustering of Documents for Information Retrieval. PhD thesis, University of Glasgow, UK, 2002.Google Scholar
A. Tombros and C. J. Van Rijsbergen. Query-sensitive similarity measures for information retrieval. Knowledge Information Systems, 6(5):617--642, 2004. Google ScholarDigital Library
J. T. Tou and R. C. Gonzalez. Pattern recognition principles. Applied Mathematics and Computation, Reading, Mass.: Addison-Wesley, 1974, 1974.Google Scholar
E. M. Voorhees and D. Harman. Overview of the fifth text retrieval conference (trec-5). In TREC'97, pages 1--28. NIST Special Publication 500--238, 1997.Google Scholar
P. Willett. Recent trends in hierarchic document clustering: a critical review. Information Processing & Management, 24(5):577--597, 1988. Google ScholarDigital Library
E. Zitzler. Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. Phd thesis, Swiss Federal Institute of Technology, Zurich, 1999.Google Scholar

Index Terms

Query-oriented clustering: a multi-objective approach
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Query-based Multi-Document Summarization by Clustering of Documents
ICONIAAC '14: Proceedings of the 2014 International Conference on Interdisciplinary Advances in Applied Computing

Information Retrieval (IR) systems such as search engines retrieve a large set of documents, images and videos in response to a user query. Computational methods such as Automatic Text Summarization (ATS) reduce this information load enabling users to ...
Read More
The effectiveness of query-specific hierarchic clustering in information retrieval

Hierarchic document clustering has been widely applied to information retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search (IFS). However, previous research has been inconclusive as to whether clustering does ...
Read More
Query clustering and IR system detection: experiments on TREC data
RIAO '07: Large Scale Semantic Access to Content (Text, Image, Video, and Sound)

This paper investigates two aspects in this experiment. Linguistic techniques are used to categorize queries in a first step. This classification is then used to analyze systems performances in a TREC context. More precisely, we cluster TREC topics with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
March 2010
2712 pages
ISBN:9781605586397
DOI:10.1145/1774088
Conference Chairs:
Sung Y. Shin
South Dakota State University
,
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Michael Schumacher
University of Applied Sciences Western Switzerland, Switzerland
,
Program Chairs:
Mathew J. Palakal
Indiana University Purdue University
,
Chih-Cheng Hung
Southern Polytechnic State University
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
multiobjective optimization
query-oriented clustering
Qualifiers
- research-article
Conference

Acceptance Rates
SAC '10 Paper Acceptance Rate364of1,353submissions,27%Overall Acceptance Rate1,650of6,669submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 168
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Query-oriented clustering: a multi-objective approach

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Query-based Multi-Document Summarization by Clustering of Documents

The effectiveness of query-specific hierarchic clustering in information retrieval

Query clustering and IR system detection: experiments on TREC data