ABSTRACT
The paper addresses the missing user acceptance of web search result clustering. We report on selected analyses and propose new concepts to improve existing result clustering approaches. Our findings in a nutshell are: 1. Don't compete with a search engine's top hits. In response to a query we presume search engines to return an optimal result list in the sense of the probabilistic ranking principle: documents that are expected by the majority of users are placed on top and form the result list head. We argue that, with respect to the top results, it is not beneficial to replace this established form of result presentation. 2. Improve document access in the result list tail. Documents that address the information need of "minorities" appear at some position in the result list tail. Especially for ambiguous and multi-faceted queries we expect this tail to be long, with many users appreciating different documents. In this situation web search result clustering can improve user satisfaction by reorganizing the long tail into topic-specific clusters. 3. Avoid shadowing when constructing cluster labels. We show that most of the cluster labels that are generated by current clustering technology occur within the snippets of the result list head--an effect which we call shadowing. The value of such labels for topic organization and navigating within a clustering of the entire result list is limited. We propose and analyze a filtering approach to significantly alleviate the label shadowing effect.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying Search Results. In Proceedings of WSDM 2009, pages 5--14. Google ScholarDigital Library
- K. Barker and N. Cornacchia. Using Noun Phrase Heads to Extract Document Keyphrases. In Proceedings of AI 2000, pages 40--52. Google ScholarDigital Library
- A. Bernardini, C. Carpineto, and M. D'Amico. Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering. In Proceedings of WI-IAT 2009, pages 206--213. Google ScholarDigital Library
- C. Carpineto, S. Osinski, G. Romano, D. Weiss. A Survey of Web Clustering Engines. ACM Comp. Surveys, 41 (3): Article 17, 2009. Google ScholarDigital Library
- C. Carpineto and G. Romano. AMBIENT dataset. http://credo.fub.it/ambient, 2008.Google Scholar
- C. Carpineto and G. Romano. Optimal Meta Search Results Clustering. In Proceedings of SIGIR 2010, pages 170--177. Google ScholarDigital Library
- C.L.A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. http://plg.uwaterloo.ca/trecweb/2009.html, 2009.Google Scholar
- C.L.A. Clarke, N. Craswell, I. Soboroff, and G.V. Cormack. Overview of the TREC 2010 Web Track. http://plg.uwaterloo.ca/trecweb/2010.html, 2010.Google Scholar
- P. Ferragina and A. Gullì. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. In Proceedings of WWW 2005, pages 801--810. Google ScholarDigital Library
- F. Geraci, M. Pellegrini, M. Maggini, and F. Sebastiani. Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution. In Proceedings of SPIRE 2006, pages 25--36. Google ScholarDigital Library
- F. Giannotti, M. Nanni, D. Predreschi, and F. Samaritani. WebCat: Automatic Categorization of Web Search Results. In Proceedings of SEBD 2003, pages 507--518.Google Scholar
- M.A. Hearst. Clustering versus Faceted Categories for Information Exploration. Commun. ACM, 49 (4):,pages 59--61, 2006. Google ScholarDigital Library
- iProspect.com, Inc. iProspect Blended Search Results Study. http://www.iprospect.com, 2008.Google Scholar
- R. Jones and K.L. Klinkner. Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs. In Proceedings of CIKM 2008, pages 699--708. Google ScholarDigital Library
- K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, R. Krishnapuram. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In Proceedings of WWW 2004, pages 658--665. Google ScholarDigital Library
- Z.-Y. Ming, K. Wang, and T.-S. Chua. Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections. In Proceedings of SIGIR 2010, pages 2--9. Google ScholarDigital Library
- R. Navigli and G. Crisafulli. Inducing Word Senses to improve Web Search Result Clustering. In Proc. of EMNLP 2010, pages 116--126. Google ScholarDigital Library
- S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In Proceedings of IIPWM 2004, pages 359--368.Google ScholarCross Ref
- D. Pinto, J.-M. Benedí, and P. Rosso. Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. In Proceedings of CICling 2007, pages 611--622. Google ScholarDigital Library
- J. Stefanowski and D. Weiss. Comprehensible and Accurate Cluster Labels in Text Clustering. In Proceedings of RIAO 2007. Google ScholarDigital Library
- B. Stein and S. Meyer zu Eißen. Topic Identification: Framework and Application. In Proceedings of i-KNOW 2004, pages 353--360.Google Scholar
- A. Swaminathan, C.V. Mathew, and D. Kirovski. Essential Pages. In Proceedings of WI-IAT 2009, pages 173--182. Google ScholarDigital Library
- H. Toda and R. Kataoka. A Clustering Method for News Articles Retrieval System. In Proceedings of WWW 2005, pages 988--989. Google ScholarDigital Library
- D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. Google ScholarDigital Library
- D. Weiss. Descriptive Clustering as a Method for Exploring Text Collections. Ph.D. diss., Poznan Univ. of Technology, Poland, 2006.Google Scholar
- M.J. Welch, J. Cho, and C. Olston. Search Result Diversity for Informational Queries. In Proceedings of WWW 2011, pages 237--246. Google ScholarDigital Library
- O. Zamir and O. Etzioni. Grouper: A dynamic Clustering Interface to Web Search Results. In Proceedings of WWW 1999, pages 1361--1374. Google ScholarDigital Library
- H. Zaragoza, B. B. Cambazoglu, and R. Baeza-Yates. Web Search Solved? All Result Rankings the Same?. In Proceedings of CIKM 2010, pages 529--538. Google ScholarDigital Library
- C. Zhai, W. W. Cohen, and J. Lafferty. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In Proceedings of SIGIR 2003, pages 10--17. Google ScholarDigital Library
Index Terms
- Beyond precision@10: clustering the long tail of web search results
Recommendations
Search result presentation based on faceted clustering
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementWe propose a competence partitioning strategy for Web search result presentation: the unmodified head of a ranked result list is combined with a clustering of documents from the result list tail. We identify two principles to which such a clustering ...
Mining query subtopics from search log data
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalMost queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, ...
A new approach to search result clustering and labeling
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval TechnologySearch engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In this paper, we present a novel ...
Comments