skip to main content
10.1145/2063576.2063910acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Beyond precision@10: clustering the long tail of web search results

Published:24 October 2011Publication History

ABSTRACT

The paper addresses the missing user acceptance of web search result clustering. We report on selected analyses and propose new concepts to improve existing result clustering approaches. Our findings in a nutshell are: 1. Don't compete with a search engine's top hits. In response to a query we presume search engines to return an optimal result list in the sense of the probabilistic ranking principle: documents that are expected by the majority of users are placed on top and form the result list head. We argue that, with respect to the top results, it is not beneficial to replace this established form of result presentation. 2. Improve document access in the result list tail. Documents that address the information need of "minorities" appear at some position in the result list tail. Especially for ambiguous and multi-faceted queries we expect this tail to be long, with many users appreciating different documents. In this situation web search result clustering can improve user satisfaction by reorganizing the long tail into topic-specific clusters. 3. Avoid shadowing when constructing cluster labels. We show that most of the cluster labels that are generated by current clustering technology occur within the snippets of the result list head--an effect which we call shadowing. The value of such labels for topic organization and navigating within a clustering of the entire result list is limited. We propose and analyze a filtering approach to significantly alleviate the label shadowing effect.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying Search Results. In Proceedings of WSDM 2009, pages 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Barker and N. Cornacchia. Using Noun Phrase Heads to Extract Document Keyphrases. In Proceedings of AI 2000, pages 40--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bernardini, C. Carpineto, and M. D'Amico. Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering. In Proceedings of WI-IAT 2009, pages 206--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Carpineto, S. Osinski, G. Romano, D. Weiss. A Survey of Web Clustering Engines. ACM Comp. Surveys, 41 (3): Article 17, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Carpineto and G. Romano. AMBIENT dataset. http://credo.fub.it/ambient, 2008.Google ScholarGoogle Scholar
  6. C. Carpineto and G. Romano. Optimal Meta Search Results Clustering. In Proceedings of SIGIR 2010, pages 170--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C.L.A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. http://plg.uwaterloo.ca/trecweb/2009.html, 2009.Google ScholarGoogle Scholar
  8. C.L.A. Clarke, N. Craswell, I. Soboroff, and G.V. Cormack. Overview of the TREC 2010 Web Track. http://plg.uwaterloo.ca/trecweb/2010.html, 2010.Google ScholarGoogle Scholar
  9. P. Ferragina and A. Gullì. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. In Proceedings of WWW 2005, pages 801--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Geraci, M. Pellegrini, M. Maggini, and F. Sebastiani. Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution. In Proceedings of SPIRE 2006, pages 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Giannotti, M. Nanni, D. Predreschi, and F. Samaritani. WebCat: Automatic Categorization of Web Search Results. In Proceedings of SEBD 2003, pages 507--518.Google ScholarGoogle Scholar
  12. M.A. Hearst. Clustering versus Faceted Categories for Information Exploration. Commun. ACM, 49 (4):,pages 59--61, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. iProspect.com, Inc. iProspect Blended Search Results Study. http://www.iprospect.com, 2008.Google ScholarGoogle Scholar
  14. R. Jones and K.L. Klinkner. Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs. In Proceedings of CIKM 2008, pages 699--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, R. Krishnapuram. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In Proceedings of WWW 2004, pages 658--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z.-Y. Ming, K. Wang, and T.-S. Chua. Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections. In Proceedings of SIGIR 2010, pages 2--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Navigli and G. Crisafulli. Inducing Word Senses to improve Web Search Result Clustering. In Proc. of EMNLP 2010, pages 116--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In Proceedings of IIPWM 2004, pages 359--368.Google ScholarGoogle ScholarCross RefCross Ref
  19. D. Pinto, J.-M. Benedí, and P. Rosso. Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. In Proceedings of CICling 2007, pages 611--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Stefanowski and D. Weiss. Comprehensible and Accurate Cluster Labels in Text Clustering. In Proceedings of RIAO 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Stein and S. Meyer zu Eißen. Topic Identification: Framework and Application. In Proceedings of i-KNOW 2004, pages 353--360.Google ScholarGoogle Scholar
  22. A. Swaminathan, C.V. Mathew, and D. Kirovski. Essential Pages. In Proceedings of WI-IAT 2009, pages 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Toda and R. Kataoka. A Clustering Method for News Articles Retrieval System. In Proceedings of WWW 2005, pages 988--989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Weiss. Descriptive Clustering as a Method for Exploring Text Collections. Ph.D. diss., Poznan Univ. of Technology, Poland, 2006.Google ScholarGoogle Scholar
  26. M.J. Welch, J. Cho, and C. Olston. Search Result Diversity for Informational Queries. In Proceedings of WWW 2011, pages 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Zamir and O. Etzioni. Grouper: A dynamic Clustering Interface to Web Search Results. In Proceedings of WWW 1999, pages 1361--1374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Zaragoza, B. B. Cambazoglu, and R. Baeza-Yates. Web Search Solved? All Result Rankings the Same?. In Proceedings of CIKM 2010, pages 529--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Zhai, W. W. Cohen, and J. Lafferty. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In Proceedings of SIGIR 2003, pages 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Beyond precision@10: clustering the long tail of web search results

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
          October 2011
          2712 pages
          ISBN:9781450307178
          DOI:10.1145/2063576

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 October 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader