skip to main content
10.1145/1526709.1526761acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

An axiomatic approach for result diversification

Published: 20 April 2009 Publication History

Abstract

Understanding user intent is key to designing an effective ranking system in a search engine. In the absence of any explicit knowledge of user intent, search engines want to diversify results to improve user satisfaction. In such a setting, the probability ranking principle-based approach of presenting the most relevant results on top can be sub-optimal, and hence the search engine would like to trade-off relevance for diversity in the results.
In analogy to prior work on ranking and clustering systems, we use the axiomatic approach to characterize and design diversification systems. We develop a set of natural axioms that a diversification system is expected to satisfy, and show that no diversification function can satisfy all the axioms simultaneously. We illustrate the use of the axiomatic framework by providing three example diversification objectives that satisfy different subsets of the axioms. We also uncover a rich link to the facility dispersion problem that results in algorithms for a number of diversification objectives. Finally, we propose an evaluation methodology to characterize the objectives and the underlying axioms. We conduct a large scale evaluation of our objectives based on two data sets: a data set derived from the Wikipedia disambiguation pages and a product database.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. 2nd ACM Intl Conf on Web Search and Data Mining, 2009.
[2]
A. Altman and M. Tennenholtz. On the axiomatic foundations of ranking systems. In Proc. 19th International Joint Conference on Artificial Intelligence, pages 917--922, 2005.
[3]
Kenneth Arrow. Social Choice and Individual Values. Wiley, New York, 1951.
[4]
Yair Bartal. On approximating arbitrary metrices by tree metrics. In STOC, pages 161--168, 1998.
[5]
Andrei Z. Broder, Moses Charikar, Alan M. Frieze, and Michael Mitzenmacher. Min-wise independent permutations. Journal of Computer and System Sciences, 60(3):630--659, 2000.
[6]
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335--336, 1998.
[7]
Barun Chandra and Magnus M. Halldorsson. Approximation algorithms for dispersion problems. J. Algorithms, 38(2):438--465, 2001.
[8]
H. Chen and D.R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 429--436, 2006.
[9]
C.L.A. Clarke, M. Kolla, G.V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659--666, 2008.
[10]
Sreenivas Gollapudi and Rina Panigrahy. Exploiting asymmetry in hierarchical topic extraction. In CIKM, pages 475--482, 2006.
[11]
R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Operations Research Letters, 21(3):133--137, 1997.
[12]
J. Kleinberg. An Impossibility Theorem for Clustering. Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference, 2003.
[13]
B. Korte and D. Hausmann. An Analysis of the Greedy Heuristic for Independence Systems. Algorithmic Aspects of Combinatorics, 2:65--74, 1978.
[14]
SS Ravi, D.J. Rosenkrantz, and G.K. Tayi. Facility dispersion problems: Heuristics and special cases. Proc. 2nd Workshop on Algorithms and Data Structures (WADS), pages 355--366, 1991.
[15]
S.S. Ravi, D.J. Rosenkrantz, and G.K. Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299--310, 1994.
[16]
SS Ravi, D.J. Rosenkrantzt, and G.K. Tayi. Approximation Algorithms for Facility Dispersion. In Teofilo F. Gonzalez, editor, Handbook of Approximation Algorithms and Metaheuristics. Chapman & Hall/CRC, 2007.
[17]
Stephen Robertson and Hugo Zaragoza. On rank-based e ectiveness measures and optimization. Inf. Retr., 10(3):321--339, 2007.
[18]
Atish Das Sarma, Sreenivas Gollapudi, and Samuel Ieong. Bypass rates: reducing query abandonment using negative inferences. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 177--185, New York, NY, USA, 2008. ACM.
[19]
E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S.A. Yahia. Efficient Computation of Diverse Query Results. IEEE 24th International Conference on Data Engineering, 2008. ICDE 2008, pages 228--236, 2008.
[20]
ChengXiang Zhai. Risk Minimization and Language Modeling in Information Retrieval. PhD thesis, Carnegie Mellon University, 2002.
[21]
C.X. Zhai, W.W. Cohen, and J. La erty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 10--17, 2003.
[22]
C.X. Zhai and J. La erty. A risk minimization framework for information retrieval. Information Processing and Management, 42(1):31--55, 2006.
[23]
C.N. Ziegler, S.M. McNee, J.A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. Proceedings of the 14th international conference on World Wide Web, pages 22--32, 2005.

Cited By

View all
  • (2024)Training greedy policy for proposal batch selection in expensive multi-objective combinatorial optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693144(26948-26975)Online publication date: 21-Jul-2024
  • (2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
  • (2024)Pb-Hash: Partitioned b-bit HashingProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672523(239-246)Online publication date: 2-Aug-2024
  • Show More Cited By

Index Terms

  1. An axiomatic approach for result diversification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '09: Proceedings of the 18th international conference on World wide web
    April 2009
    1280 pages
    ISBN:9781605584874
    DOI:10.1145/1526709

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximation algorithms
    2. axiomatic framework
    3. diversification
    4. facility dispersion
    5. search engine
    6. wikipedia

    Qualifiers

    • Research-article

    Conference

    WWW '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Training greedy policy for proposal batch selection in expensive multi-objective combinatorial optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693144(26948-26975)Online publication date: 21-Jul-2024
    • (2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
    • (2024)Pb-Hash: Partitioned b-bit HashingProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672523(239-246)Online publication date: 2-Aug-2024
    • (2024)Query Refinement for Diverse Top-k SelectionProceedings of the ACM on Management of Data10.1145/36549692:3(1-27)Online publication date: 30-May-2024
    • (2024)Batch Active Learning of Reward Functions from Human PreferencesACM Transactions on Human-Robot Interaction10.1145/364988513:2(1-27)Online publication date: 14-Jun-2024
    • (2024)Discovering Denial Constraints Based on Deep Reinforcement LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679714(120-129)Online publication date: 21-Oct-2024
    • (2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
    • (2024)Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic FeaturesICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448068(12351-12355)Online publication date: 14-Apr-2024
    • (2024)Query Exploration Based on Knowledge ReasoningAdvanced Data Mining and Applications10.1007/978-981-96-0814-0_24(366-382)Online publication date: 13-Dec-2024
    • (2023)Core-sets for fair and diverse data summarizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669576(78987-79011)Online publication date: 10-Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media