ABSTRACT
A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.
- S. Abbar, S. Amer-Yahia, P. Indyk, and S. Mahabadi. Real-time recommendation of diverse related articles. In WWW, pages 1--12, 2013. Google ScholarDigital Library
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
- E. Aktolga and J. Allan. Sentiment diversification with different biases. In SIGIR, pages 593--600, 2013. Google ScholarDigital Library
- J. A. Aslam and M. Montague. Models for metasearch. In SIGIR'01, pages 276--284, 2001. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarCross Ref
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
- H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
- C. L. A. Clarke and N. Craswell. Overview of the TREC 2011 web track. In TREC, pages 1--9, 2011.Google Scholar
- C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
- C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 web track. In TREC, pages 1--9, 2009.Google Scholar
- C. L. A. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 web track. In TREC, pages 1--9, 2010.Google Scholar
- C. L. A. Clarke, N. Craswell, and E. M. Voorhees. Overview of the TREC 2012 web track. In TREC, pages 1--8, 2012.Google Scholar
- V. Dang and W. B. Croft. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR, pages 65--74, 2012. Google ScholarDigital Library
- V. Dang and W. B. Croft. Term level search result diversification. In SIGIR, pages 603--612, 2013. Google ScholarDigital Library
- M. Efron. Information search and retrieval in microblogs. J. Am. Soc. for Inform. Sci. and Techn., 62(6):996--1008, 2011. Google ScholarDigital Library
- M. Farah and D. Vanderpooten. An outranking approach for rank aggregation in information retrieval. In SIGIR'07, 2007. Google ScholarDigital Library
- E. A. Fox and J. A. Shaw. Combination of multiple searches. In TREC-2, 1994.Google Scholar
- T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.Google ScholarCross Ref
- D. He and D. Wu. Toward a robust data fusion for document retrieval. In IEEE NLP-KE'08, 2008.Google Scholar
- J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR, pages 851--860, 2012. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
- O. Jin, N. N. Liu, K. Zhao, Y. Yu, and Q. Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. In CIKM, pages 775--784, 2011. Google ScholarDigital Library
- A. K. Kozorovitsky and O. Kurland. Cluster-based fusion of retrieved lists. In SIGIR'11, pages 893--902, 2011. Google ScholarDigital Library
- T. Kurashima, T. Iwata, T. Hoshide, N. Takaya, and K. Fujimura. Geo topic model: joint modeling of user's activity area and interests for location recommendation. In WSDM, pages 375--384, 2013. Google ScholarDigital Library
- J. D. Lafferty and D. M. Blei. Correlated topic models. In NIPS'05, pages 147--154, 2005.Google Scholar
- J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In SIGIR'95, pages 180--188, 1995. Google ScholarDigital Library
- J. H. Lee. Analyses of multiple evidence combination. In SIGIR, 1997. Google ScholarDigital Library
- F. Li, M. Huang, and X. Zhu. Sentiment analysis with global topics and local dependency. In AAAI, 2010.Google Scholar
- W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML, pages 577--584. ACM, 2006. Google ScholarDigital Library
- S. Liang and M. de Rijke. Finding knowledgeable groups in enterprise corpora. In SIGIR'13, pages 1005--1008, 2013. Google ScholarDigital Library
- S. Liang, M. de Rijke, and M. Tsagkias. Late data fusion for microblog search. In ECIR'13, pages 743--746, 2013. Google ScholarDigital Library
- S. Liang, Z. Ren, and M. de Rijke. The impact of semantic document expansion on cluster-based fusion for microblog search. In ECIR'14 , pages 493--499, 2014.Google ScholarDigital Library
- N. Limsopatham, R. McCreadie, and M.-D. Albakour. University of Glasgow at TREC 2012: Experiments with Terrier in medical records, microblog, and web tracks. In TREC, 2012.Google Scholar
- J. S. Liu. The collapsed gibbs sampler in bayesian computations with applications to a gene regulation problem. J. Am. Stat. Assoc., 89(427):958--966, 1994.Google ScholarCross Ref
- C. Macdonald and I. Ounis. Voting for candidates: Adapting data fusion techniques for an expert search task. In CIKM, 2006. Google ScholarDigital Library
- Z. Ren, S. Liang, E. Meij, and M. de Rijke. Personalized time-aware tweets summarization. In SIGIR, 2013. Google ScholarDigital Library
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, pages 487--494, 2004. Google ScholarDigital Library
- T. Sakai, Z. Dou, and C. L. A. Clarke. The impact of intent selection on diversified search result. In SIGIR, 2013. Google ScholarDigital Library
- R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881--890, 2010. Google ScholarDigital Library
- R. L. Santos, C. Macdonald, and I. Ounis. Intent-aware search result diversification. In SIGIR, pages 595--604, 2011. Google ScholarDigital Library
- R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, 2010. Google ScholarDigital Library
- J. A. Shaw and E. A. Fox. Combination of multiple searches. In TREC 1992, pages 243--252. NIST, 1993.Google Scholar
- D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. LambdaMerge: merging the results of query reformulations. In WSDM, pages 795--804, 2011. Google ScholarDigital Library
- I. Szpektor, Y. Maarek, and D. Pelleg. When relevance is not enough: promoting diversity and freshness in personalized question recommendation. In WWW '13, 2013. Google ScholarDigital Library
- S. Vargas, P. Castells, and D. Vallet. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR, pages 75--84, 2012. Google ScholarDigital Library
- X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD'06, pages 424--433, 2006. Google ScholarDigital Library
- X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006. Google ScholarDigital Library
- X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time-series. In IJCAI, pages 2909--2914, 2007. Google ScholarDigital Library
- S. Wu. Data fusion in information retrieval, volume 13 of Adaptation, Learning and Optimization. Springer, 2012. Google ScholarDigital Library
- Z. Xu, Y. Zhang, Y. Wu, and Q. Yang. Modeling user posting behavior on social media. In SIGIR, pages 545--554, 2012. Google ScholarDigital Library
- C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library
Index Terms
- Fusion helps diversification
Recommendations
The early fusion strategy for search result diversification
ACM TURC '17: Proceedings of the ACM Turing 50th Celebration Conference - ChinaA typical strategy for search result diversification is a two-stage process: first we use a traditional search engine to obtain a ranked list of documents, in which relevance is the only concern; then the results are re-ranked so as to promote ...
Time-Aware Rank Aggregation for Microblog Search
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementWe tackle the problem of searching microblog posts and frame it as a rank aggregation problem where we merge result lists generated by separate rankers so as to produce a final ranking to be returned to the user. We propose a rank aggregation method, ...
Search result diversification via data fusion
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalIn recent years, researchers have investigated search result diversification through a variety of approaches. In such situations, information retrieval systems need to consider both aspects of relevance and diversity for those retrieved documents. On ...
Comments