research-article

Fusion helps diversification

Authors:
Shangsong Liang

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Zhaochun Ren

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Maarten de Rijke

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalJuly 2014Pages 303–312https://doi.org/10.1145/2600428.2609561

Published:03 July 2014Publication History

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Pages 303–312

ABSTRACT

A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.

References

S. Abbar, S. Amer-Yahia, P. Indyk, and S. Mahabadi. Real-time recommendation of diverse related articles. In WWW, pages 1--12, 2013. Google ScholarDigital Library
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
E. Aktolga and J. Allan. Sentiment diversification with different biases. In SIGIR, pages 593--600, 2013. Google ScholarDigital Library
J. A. Aslam and M. Montague. Models for metasearch. In SIGIR'01, pages 276--284, 2001. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarCross Ref
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
C. L. A. Clarke and N. Craswell. Overview of the TREC 2011 web track. In TREC, pages 1--9, 2011.Google Scholar
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 web track. In TREC, pages 1--9, 2009.Google Scholar
C. L. A. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 web track. In TREC, pages 1--9, 2010.Google Scholar
C. L. A. Clarke, N. Craswell, and E. M. Voorhees. Overview of the TREC 2012 web track. In TREC, pages 1--8, 2012.Google Scholar
V. Dang and W. B. Croft. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR, pages 65--74, 2012. Google ScholarDigital Library
V. Dang and W. B. Croft. Term level search result diversification. In SIGIR, pages 603--612, 2013. Google ScholarDigital Library
M. Efron. Information search and retrieval in microblogs. J. Am. Soc. for Inform. Sci. and Techn., 62(6):996--1008, 2011. Google ScholarDigital Library
M. Farah and D. Vanderpooten. An outranking approach for rank aggregation in information retrieval. In SIGIR'07, 2007. Google ScholarDigital Library
E. A. Fox and J. A. Shaw. Combination of multiple searches. In TREC-2, 1994.Google Scholar
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.Google ScholarCross Ref
D. He and D. Wu. Toward a robust data fusion for document retrieval. In IEEE NLP-KE'08, 2008.Google Scholar
J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR, pages 851--860, 2012. Google ScholarDigital Library
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
O. Jin, N. N. Liu, K. Zhao, Y. Yu, and Q. Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. In CIKM, pages 775--784, 2011. Google ScholarDigital Library
A. K. Kozorovitsky and O. Kurland. Cluster-based fusion of retrieved lists. In SIGIR'11, pages 893--902, 2011. Google ScholarDigital Library
T. Kurashima, T. Iwata, T. Hoshide, N. Takaya, and K. Fujimura. Geo topic model: joint modeling of user's activity area and interests for location recommendation. In WSDM, pages 375--384, 2013. Google ScholarDigital Library
J. D. Lafferty and D. M. Blei. Correlated topic models. In NIPS'05, pages 147--154, 2005.Google Scholar
J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In SIGIR'95, pages 180--188, 1995. Google ScholarDigital Library
J. H. Lee. Analyses of multiple evidence combination. In SIGIR, 1997. Google ScholarDigital Library
F. Li, M. Huang, and X. Zhu. Sentiment analysis with global topics and local dependency. In AAAI, 2010.Google Scholar
W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML, pages 577--584. ACM, 2006. Google ScholarDigital Library
S. Liang and M. de Rijke. Finding knowledgeable groups in enterprise corpora. In SIGIR'13, pages 1005--1008, 2013. Google ScholarDigital Library
S. Liang, M. de Rijke, and M. Tsagkias. Late data fusion for microblog search. In ECIR'13, pages 743--746, 2013. Google ScholarDigital Library
S. Liang, Z. Ren, and M. de Rijke. The impact of semantic document expansion on cluster-based fusion for microblog search. In ECIR'14 , pages 493--499, 2014.Google ScholarDigital Library
N. Limsopatham, R. McCreadie, and M.-D. Albakour. University of Glasgow at TREC 2012: Experiments with Terrier in medical records, microblog, and web tracks. In TREC, 2012.Google Scholar
J. S. Liu. The collapsed gibbs sampler in bayesian computations with applications to a gene regulation problem. J. Am. Stat. Assoc., 89(427):958--966, 1994.Google ScholarCross Ref
C. Macdonald and I. Ounis. Voting for candidates: Adapting data fusion techniques for an expert search task. In CIKM, 2006. Google ScholarDigital Library
Z. Ren, S. Liang, E. Meij, and M. de Rijke. Personalized time-aware tweets summarization. In SIGIR, 2013. Google ScholarDigital Library
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, pages 487--494, 2004. Google ScholarDigital Library
T. Sakai, Z. Dou, and C. L. A. Clarke. The impact of intent selection on diversified search result. In SIGIR, 2013. Google ScholarDigital Library
R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881--890, 2010. Google ScholarDigital Library
R. L. Santos, C. Macdonald, and I. Ounis. Intent-aware search result diversification. In SIGIR, pages 595--604, 2011. Google ScholarDigital Library
R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, 2010. Google ScholarDigital Library
J. A. Shaw and E. A. Fox. Combination of multiple searches. In TREC 1992, pages 243--252. NIST, 1993.Google Scholar
D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. LambdaMerge: merging the results of query reformulations. In WSDM, pages 795--804, 2011. Google ScholarDigital Library
I. Szpektor, Y. Maarek, and D. Pelleg. When relevance is not enough: promoting diversity and freshness in personalized question recommendation. In WWW '13, 2013. Google ScholarDigital Library
S. Vargas, P. Castells, and D. Vallet. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR, pages 75--84, 2012. Google ScholarDigital Library
X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD'06, pages 424--433, 2006. Google ScholarDigital Library
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006. Google ScholarDigital Library
X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time-series. In IJCAI, pages 2909--2914, 2007. Google ScholarDigital Library
S. Wu. Data fusion in information retrieval, volume 13 of Adaptation, Learning and Optimization. Springer, 2012. Google ScholarDigital Library
Z. Xu, Y. Zhang, Y. Wu, and Q. Yang. Modeling user posting behavior on social media. In SIGIR, pages 545--554, 2012. Google ScholarDigital Library
C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library

Index Terms

Fusion helps diversification
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

The early fusion strategy for search result diversification
ACM TURC '17: Proceedings of the ACM Turing 50th Celebration Conference - China

A typical strategy for search result diversification is a two-stage process: first we use a traditional search engine to obtain a ranked list of documents, in which relevance is the only concern; then the results are re-ranked so as to promote ...
Read More
Time-Aware Rank Aggregation for Microblog Search
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

We tackle the problem of searching microblog posts and frame it as a rank aggregation problem where we merge result lists generated by separate rankers so as to produce a final ranking to be returned to the user. We propose a rank aggregation method, ...
Read More
Search result diversification via data fusion
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

In recent years, researchers have investigated search result diversification through a variety of approaches. In such situations, information retrieval systems need to consider both aspects of relevance and diversity for those retrieved documents. On ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
July 2014
1330 pages
ISBN:9781450322577
DOI:10.1145/2600428
General Chairs:
Shlomo Geva
Queensland University of Technology
,
Andrew Trotman
University of Dunedin
,
Program Chairs:
Peter Bruza
Queensland University of Technology
,
Charles L.A. Clarke
University of Waterloo
,
Kal Järvelin
University of Tampere
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 July 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ad hoc retrieval
data fusion
diversification
rank aggregation
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 725
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fusion helps diversification

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

The early fusion strategy for search result diversification

Time-Aware Rank Aggregation for Microblog Search

Search result diversification via data fusion