skip to main content
10.1145/2600428.2609561acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Fusion helps diversification

Published:03 July 2014Publication History

ABSTRACT

A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.

References

  1. S. Abbar, S. Amer-Yahia, P. Indyk, and S. Mahabadi. Real-time recommendation of diverse related articles. In WWW, pages 1--12, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Aktolga and J. Allan. Sentiment diversification with different biases. In SIGIR, pages 593--600, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. A. Aslam and M. Montague. Models for metasearch. In SIGIR'01, pages 276--284, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. L. A. Clarke and N. Craswell. Overview of the TREC 2011 web track. In TREC, pages 1--9, 2011.Google ScholarGoogle Scholar
  9. C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 web track. In TREC, pages 1--9, 2009.Google ScholarGoogle Scholar
  11. C. L. A. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 web track. In TREC, pages 1--9, 2010.Google ScholarGoogle Scholar
  12. C. L. A. Clarke, N. Craswell, and E. M. Voorhees. Overview of the TREC 2012 web track. In TREC, pages 1--8, 2012.Google ScholarGoogle Scholar
  13. V. Dang and W. B. Croft. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR, pages 65--74, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Dang and W. B. Croft. Term level search result diversification. In SIGIR, pages 603--612, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Efron. Information search and retrieval in microblogs. J. Am. Soc. for Inform. Sci. and Techn., 62(6):996--1008, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Farah and D. Vanderpooten. An outranking approach for rank aggregation in information retrieval. In SIGIR'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. A. Fox and J. A. Shaw. Combination of multiple searches. In TREC-2, 1994.Google ScholarGoogle Scholar
  18. T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. D. He and D. Wu. Toward a robust data fusion for document retrieval. In IEEE NLP-KE'08, 2008.Google ScholarGoogle Scholar
  20. J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR, pages 851--860, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. Jin, N. N. Liu, K. Zhao, Y. Yu, and Q. Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. In CIKM, pages 775--784, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. K. Kozorovitsky and O. Kurland. Cluster-based fusion of retrieved lists. In SIGIR'11, pages 893--902, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Kurashima, T. Iwata, T. Hoshide, N. Takaya, and K. Fujimura. Geo topic model: joint modeling of user's activity area and interests for location recommendation. In WSDM, pages 375--384, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. D. Lafferty and D. M. Blei. Correlated topic models. In NIPS'05, pages 147--154, 2005.Google ScholarGoogle Scholar
  26. J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In SIGIR'95, pages 180--188, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. H. Lee. Analyses of multiple evidence combination. In SIGIR, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Li, M. Huang, and X. Zhu. Sentiment analysis with global topics and local dependency. In AAAI, 2010.Google ScholarGoogle Scholar
  29. W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML, pages 577--584. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Liang and M. de Rijke. Finding knowledgeable groups in enterprise corpora. In SIGIR'13, pages 1005--1008, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Liang, M. de Rijke, and M. Tsagkias. Late data fusion for microblog search. In ECIR'13, pages 743--746, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Liang, Z. Ren, and M. de Rijke. The impact of semantic document expansion on cluster-based fusion for microblog search. In ECIR'14 , pages 493--499, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Limsopatham, R. McCreadie, and M.-D. Albakour. University of Glasgow at TREC 2012: Experiments with Terrier in medical records, microblog, and web tracks. In TREC, 2012.Google ScholarGoogle Scholar
  34. J. S. Liu. The collapsed gibbs sampler in bayesian computations with applications to a gene regulation problem. J. Am. Stat. Assoc., 89(427):958--966, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  35. C. Macdonald and I. Ounis. Voting for candidates: Adapting data fusion techniques for an expert search task. In CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Ren, S. Liang, E. Meij, and M. de Rijke. Personalized time-aware tweets summarization. In SIGIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, pages 487--494, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Sakai, Z. Dou, and C. L. A. Clarke. The impact of intent selection on diversified search result. In SIGIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881--890, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. L. Santos, C. Macdonald, and I. Ounis. Intent-aware search result diversification. In SIGIR, pages 595--604, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. A. Shaw and E. A. Fox. Combination of multiple searches. In TREC 1992, pages 243--252. NIST, 1993.Google ScholarGoogle Scholar
  43. D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. LambdaMerge: merging the results of query reformulations. In WSDM, pages 795--804, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. I. Szpektor, Y. Maarek, and D. Pelleg. When relevance is not enough: promoting diversity and freshness in personalized question recommendation. In WWW '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Vargas, P. Castells, and D. Vallet. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR, pages 75--84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD'06, pages 424--433, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time-series. In IJCAI, pages 2909--2914, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Wu. Data fusion in information retrieval, volume 13 of Adaptation, Learning and Optimization. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Z. Xu, Y. Zhang, Y. Wu, and Q. Yang. Modeling user posting behavior on social media. In SIGIR, pages 545--554, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fusion helps diversification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
      July 2014
      1330 pages
      ISBN:9781450322577
      DOI:10.1145/2600428

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 July 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader