poster

Beyond precision@10: clustering the long tail of web search results

Authors:
Benno Stein

Bauhaus-Universität, Weimar, Germany

Bauhaus-Universität, Weimar, Germany
View Profile

,
Tim Gollub

Bauhaus-Universität, Weimar, Germany

Bauhaus-Universität, Weimar, Germany
View Profile

,
Dennis Hoppe

Bauhaus-Universität, Weimar, Germany

Bauhaus-Universität, Weimar, Germany
View Profile

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementOctober 2011Pages 2141–2144https://doi.org/10.1145/2063576.2063910

Published:24 October 2011Publication History

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 2141–2144

ABSTRACT

The paper addresses the missing user acceptance of web search result clustering. We report on selected analyses and propose new concepts to improve existing result clustering approaches. Our findings in a nutshell are: 1. Don't compete with a search engine's top hits. In response to a query we presume search engines to return an optimal result list in the sense of the probabilistic ranking principle: documents that are expected by the majority of users are placed on top and form the result list head. We argue that, with respect to the top results, it is not beneficial to replace this established form of result presentation. 2. Improve document access in the result list tail. Documents that address the information need of "minorities" appear at some position in the result list tail. Especially for ambiguous and multi-faceted queries we expect this tail to be long, with many users appreciating different documents. In this situation web search result clustering can improve user satisfaction by reorganizing the long tail into topic-specific clusters. 3. Avoid shadowing when constructing cluster labels. We show that most of the cluster labels that are generated by current clustering technology occur within the snippets of the result list head--an effect which we call shadowing. The value of such labels for topic organization and navigating within a clustering of the entire result list is limited. We propose and analyze a filtering approach to significantly alleviate the label shadowing effect.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying Search Results. In Proceedings of WSDM 2009, pages 5--14. Google ScholarDigital Library
K. Barker and N. Cornacchia. Using Noun Phrase Heads to Extract Document Keyphrases. In Proceedings of AI 2000, pages 40--52. Google ScholarDigital Library
A. Bernardini, C. Carpineto, and M. D'Amico. Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering. In Proceedings of WI-IAT 2009, pages 206--213. Google ScholarDigital Library
C. Carpineto, S. Osinski, G. Romano, D. Weiss. A Survey of Web Clustering Engines. ACM Comp. Surveys, 41 (3): Article 17, 2009. Google ScholarDigital Library
C. Carpineto and G. Romano. AMBIENT dataset. http://credo.fub.it/ambient, 2008.Google Scholar
C. Carpineto and G. Romano. Optimal Meta Search Results Clustering. In Proceedings of SIGIR 2010, pages 170--177. Google ScholarDigital Library
C.L.A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. http://plg.uwaterloo.ca/trecweb/2009.html, 2009.Google Scholar
C.L.A. Clarke, N. Craswell, I. Soboroff, and G.V. Cormack. Overview of the TREC 2010 Web Track. http://plg.uwaterloo.ca/trecweb/2010.html, 2010.Google Scholar
P. Ferragina and A. Gullì. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. In Proceedings of WWW 2005, pages 801--810. Google ScholarDigital Library
F. Geraci, M. Pellegrini, M. Maggini, and F. Sebastiani. Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution. In Proceedings of SPIRE 2006, pages 25--36. Google ScholarDigital Library
F. Giannotti, M. Nanni, D. Predreschi, and F. Samaritani. WebCat: Automatic Categorization of Web Search Results. In Proceedings of SEBD 2003, pages 507--518.Google Scholar
M.A. Hearst. Clustering versus Faceted Categories for Information Exploration. Commun. ACM, 49 (4):,pages 59--61, 2006. Google ScholarDigital Library
iProspect.com, Inc. iProspect Blended Search Results Study. http://www.iprospect.com, 2008.Google Scholar
R. Jones and K.L. Klinkner. Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs. In Proceedings of CIKM 2008, pages 699--708. Google ScholarDigital Library
K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, R. Krishnapuram. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In Proceedings of WWW 2004, pages 658--665. Google ScholarDigital Library
Z.-Y. Ming, K. Wang, and T.-S. Chua. Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections. In Proceedings of SIGIR 2010, pages 2--9. Google ScholarDigital Library
R. Navigli and G. Crisafulli. Inducing Word Senses to improve Web Search Result Clustering. In Proc. of EMNLP 2010, pages 116--126. Google ScholarDigital Library
S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In Proceedings of IIPWM 2004, pages 359--368.Google ScholarCross Ref
D. Pinto, J.-M. Benedí, and P. Rosso. Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. In Proceedings of CICling 2007, pages 611--622. Google ScholarDigital Library
J. Stefanowski and D. Weiss. Comprehensible and Accurate Cluster Labels in Text Clustering. In Proceedings of RIAO 2007. Google ScholarDigital Library
B. Stein and S. Meyer zu Eißen. Topic Identification: Framework and Application. In Proceedings of i-KNOW 2004, pages 353--360.Google Scholar
A. Swaminathan, C.V. Mathew, and D. Kirovski. Essential Pages. In Proceedings of WI-IAT 2009, pages 173--182. Google ScholarDigital Library
H. Toda and R. Kataoka. A Clustering Method for News Articles Retrieval System. In Proceedings of WWW 2005, pages 988--989. Google ScholarDigital Library
D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. Google ScholarDigital Library
D. Weiss. Descriptive Clustering as a Method for Exploring Text Collections. Ph.D. diss., Poznan Univ. of Technology, Poland, 2006.Google Scholar
M.J. Welch, J. Cho, and C. Olston. Search Result Diversity for Informational Queries. In Proceedings of WWW 2011, pages 237--246. Google ScholarDigital Library
O. Zamir and O. Etzioni. Grouper: A dynamic Clustering Interface to Web Search Results. In Proceedings of WWW 1999, pages 1361--1374. Google ScholarDigital Library
H. Zaragoza, B. B. Cambazoglu, and R. Baeza-Yates. Web Search Solved? All Result Rankings the Same?. In Proceedings of CIKM 2010, pages 529--538. Google ScholarDigital Library
C. Zhai, W. W. Cohen, and J. Lafferty. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In Proceedings of SIGIR 2003, pages 10--17. Google ScholarDigital Library

Index Terms

Beyond precision@10: clustering the long tail of web search results
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Search result presentation based on faceted clustering
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

We propose a competence partitioning strategy for Web search result presentation: the unmodified head of a ranked result list is combined with a clustering of documents from the result list tail. We identify two principles to which such a clustering ...
Read More
Mining query subtopics from search log data
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, ...
Read More
A new approach to search result clustering and labeling
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In this paper, we present a novel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
October 2011
2712 pages
ISBN:9781450307178
DOI:10.1145/2063576
Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster labeling
search result clustering
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 244
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond precision@10: clustering the long tail of web search results

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Search result presentation based on faceted clustering

Mining query subtopics from search log data

A new approach to search result clustering and labeling