skip to main content
10.1145/1774088.1774467acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Query-oriented clustering: a multi-objective approach

Published:22 March 2010Publication History

ABSTRACT

Document clustering techniques have been widely applied in Information Retrieval to reorganize results furnished as a response to user's queries. Following the Cluster Hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant ones, most of relevant documents are likely to be gathered in a single cluster. Usually, systems organizing search results as a set of clusters consider this tendency as a very advantageous phenomenon, since it allows to filter the results provided by the initial search. Adopting a different point of view, we rather consider the Cluster Hypothesis as a hindrance to the information access since it prevents the emergence of the various aspects of the query. The risk induced is to restrict the perception of the subject to an unique point of view. Therefore, we propose to rather distribute the relevant documents over clusters by orienting the organization of the clusters according to the user's topic. The aim is to attract the clusters around the latter in order to highlight the thematic differences between documents which are strongly connected to the query. Rather than modifying the inter-documents similarity computation as it is the case in several studies, we propose to directly act on the organization of the clusters by using a multi-objective evolutionary clustering algorithm which, besides the classical internal cohesion, also optimizes the query proximity of the clusters. First experimental results highlight the great benefit which may be gained by our way of query consideration.

References

  1. R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bellot and M. El-Bèze. Query length, number of classes and routes through clusters: Experiments with a clustering method for information retrieval. In ICSC'99, pages 196--205, London, UK, 1999. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Goldberg. Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley, Reading, MA, USA, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Handl and J. Knowles. An evolutionary approach to multiobjective clustering. Evolutionary Computation, IEEE Transactions on, 11(1):56--76, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In SIGIR'96, pages 76--84, Zürich, CH, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, USA, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Jardine and C. J. Van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Koshman, A. Spink, and B. J. Jansen. Web searching on the vivisimo search engine. Journal of the American Society for Information Science and Technology, 57(14), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Leuski. Evaluating document clustering for interactive information retrieval. In CIKM'01, pages 33--40, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Liu, Y. He, D. Ji, and H. Yang. Genetic algorithm based multi-document summarization. In PRICAI'06, volume 4099 of LNCS, pages 1140--1144. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. U. Maulik and S. Bandyopadhyay. Genetic algorithm-based clustering technique. Pattern Recognition, 33:1455--1465, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  14. G. Salton. Cluster search strategies and the optimization of retrieval effectiveness. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 223--242. Prentice-Hall, 1971.Google ScholarGoogle Scholar
  15. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. C. Stein, A. Bagga, and G. B. Wise. Multi-document summarization: Methodologies and evaluations. In TALN'00, pages 337--346, 2000.Google ScholarGoogle Scholar
  17. A. Tombros. The Effectiveness of Query-Based Hierarchic Clustering of Documents for Information Retrieval. PhD thesis, University of Glasgow, UK, 2002.Google ScholarGoogle Scholar
  18. A. Tombros and C. J. Van Rijsbergen. Query-sensitive similarity measures for information retrieval. Knowledge Information Systems, 6(5):617--642, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. T. Tou and R. C. Gonzalez. Pattern recognition principles. Applied Mathematics and Computation, Reading, Mass.: Addison-Wesley, 1974, 1974.Google ScholarGoogle Scholar
  20. E. M. Voorhees and D. Harman. Overview of the fifth text retrieval conference (trec-5). In TREC'97, pages 1--28. NIST Special Publication 500--238, 1997.Google ScholarGoogle Scholar
  21. P. Willett. Recent trends in hierarchic document clustering: a critical review. Information Processing & Management, 24(5):577--597, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Zitzler. Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. Phd thesis, Swiss Federal Institute of Technology, Zurich, 1999.Google ScholarGoogle Scholar

Index Terms

  1. Query-oriented clustering: a multi-objective approach

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
          March 2010
          2712 pages
          ISBN:9781605586397
          DOI:10.1145/1774088

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 March 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SAC '10 Paper Acceptance Rate364of1,353submissions,27%Overall Acceptance Rate1,650of6,669submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader