skip to main content
research-article

A Comparative Analysis of Interleaving Methods for Aggregated Search

Published: 17 February 2015 Publication History

Abstract

A result page of a modern search engine often goes beyond a simple list of “10 blue links.” Many specific user needs (e.g., News, Image, Video) are addressed by so-called aggregated or vertical search solutions: specially presented documents, often retrieved from specific sources, that stand out from the regular organic Web search results. When it comes to evaluating ranking systems, such complex result layouts raise their own challenges. This is especially true for so-called interleaving methods that have arisen as an important type of online evaluation: by mixing results from two different result pages, interleaving can easily break the desired Web layout in which vertical documents are grouped together, and hence hurt the user experience.
We conduct an analysis of different interleaving methods as applied to aggregated search engine result pages. Apart from conventional interleaving methods, we propose two vertical-aware methods: one derived from the widely used Team-Draft Interleaving method by adjusting it in such a way that it respects vertical document groupings, and another based on the recently introduced Optimized Interleaving framework. We show that our proposed methods are better at preserving the user experience than existing interleaving methods while still performing well as a tool for comparing ranking systems. For evaluating our proposed vertical-aware interleaving methods, we use real-world click data as well as simulated clicks and simulated ranking systems.

References

[1]
Jaime Arguello, Fernando Diaz, Jamie Callan, and Jean-François Crespo. 2009. Sources of evidence for vertical selection. In Proceedings of SIGIR. ACM, New York, NY, 315--322.
[2]
Jaime Arguello, Fernando Diaz, and Jamie Callan. 2011a. Learning to aggregate vertical results into Web search results. In Proceedings of CIKM. ACM, New York, NY, 201--210.
[3]
Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011b. A methodology for evaluating aggregated search results. In Proceedings of ECIR. 141--152.
[4]
Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems 30, 1, Article No. 6.
[5]
Danqi Chen, Weizhu Chen, Haixun Wang, Zheng Chen, and Qiang Yang. 2012. Beyond ten blue links: Enabling user click modeling in federated Web search. In Proceedings of WSDM. ACM, New York, NY, 463--472.
[6]
Aleksandr Chuklin, Anne Schuth, Katja Hofmann, Pavel Serdyukov, and Maarten de Rijke. 2013a. Evaluating aggregated search using interleaving. In Proceedings of CIKM. ACM, New York, NY, 669--678.
[7]
Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013b. Click model-based information retrieval metrics. In Proceedings of SIGIR. ACM, New York, NY, 493--502.
[8]
Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013c. Using intent information to model user behavior in diversified search. In Advances in Information Retrieval. Lecture Notes in Computer Science, Vol. 7814. Springer, 1--13.
[9]
Aleksandr Chuklin, Ke Zhou, Anne Schuth, Floor Sietsma, and Maarten de Rijke. 2014. Evaluating intuitiveness of vertical-aware click models. In Proceedings of SIGIR. ACM, New York, NY, 1075--1078.
[10]
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR. ACM, New York, NY, 659--666.
[11]
Cyril W. Cleverdon, Jack Mills, and Michael Keen. 1996. Factors Determining the Performance of Indexing Systems. Technical Report. ASLIB Cranfield project.
[12]
Thomas Demeester, Dolf Trieschnigg, Dong Nguen, and Djoerd Hiemstra. 2013. Overview of the TREC 2013 Federated Web Search track. In Proceedings of TREC.
[13]
Susan Dumais, Edward Cutrell, and Hao Chen. 2001. Optimizing search by showing results in context. In Proceedings of CHI. ACM, New York, NY, 277--284.
[14]
Jing He, Chengxiang Zhai, and Xiaoming Li. 2009. Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In Proceedings of CIKM. ACM, New York, NY, 2029--2032.
[15]
Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2011. A probabilistic method for inferring preferences from clicks. In Proceedings of CIKM. ACM, New York, NY, 249--258.
[16]
Katja Hofmann, Shimon Whiteson, and Maarten Rijke. 2012. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval 16, 1, 63--90.
[17]
Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013a. Reusing historical interaction data for faster online learning to rank for IR. In Proceedings of WSDM. ACM, New York, NY, 183--192.
[18]
Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013b. Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems 31, 3, Article No. 18.
[19]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446.
[20]
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of KDD. ACM, New York, NY, 133--142.
[21]
Thorsten Joachims. 2003. Evaluating retrieval performance using clickthrough data. In Text Mining, J. Franke, G. Nakhaeizadeh, and I. Renz (Eds.). Physica/Springer-Verlag, 79--96.
[22]
Dong Nguyen, Thomas Demeester, Dolf Trieschnigg, and Djoerd Hiemstra. 2012. Federated search in the wild: The combined power of over a hundred search engines. In Proceedings of CIKM. ACM, New York, NY, 1874--1878.
[23]
Ashok Kumar Ponnuswami, Kumaresh Pattabiraman, Qiang Wu, Ran Gilad-Bachrach, and Tapas Kanungo. 2011. On composition of a federated Web search result page: Using online users to provide pairwise preference for heterogeneous verticals. In Proceedings of WSDM. ACM, New York, NY, 715--724.
[24]
Filip Radlinski and Nick Craswell. 2010. Comparing the sensitivity of information retrieval metrics. In Proceedings of SIGIR. ACM, New York, NY, 667--674.
[25]
Filip Radlinski and Nick Craswell. 2013. Optimized interleaving for online retrieval evaluation. In Proceedings of WSDM. ACM, New York, NY, 245--254.
[26]
Filip Radlinski, Madhu Kurup, and Thorsten Joachims. 2008. How does clickthrough data reflect retrieval quality? In Proceedings of CIKM. ACM, New York, NY, 43--52.
[27]
Anne Schuth, Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013. Lerot: An online learning to rank framework. In Proceedings of LivingLab. ACM, New York, NY, 23--26.
[28]
Anne Schuth, Floor Sietsma, Shimon Whiteson, Damien Lefortier, and Maarten de Rijke. 2014. Multileaved comparisons for fast online evaluation. In Proceedings of CIKM. ACM, New York, NY, 71--80.
[29]
Jangwon Seo, W. Bruce Croft, Kwang Hyun Kim, and Joon Ho Lee. 2011. Smoothing click counts for aggregated vertical search. In Advances in Information Retrieval. Lecture Notes in Computer Science, Vol. 6611. Springer, 387--398.
[30]
Andrey Styskin. 2013. Aggregate and conquer: Finding the way in the diverse world of user intents. In Proceedings of ECIR.
[31]
Shanu Sushmita, Hideo Joho, Mounia Lalmas, and Robert Villa. 2010. Factors affecting click-through behavior in aggregated search interfaces. In Proceedings of CIKM. ACM, New York, NY, 519--528.
[32]
Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of ICML. ACM, New York, NY, 1201--1208.
[33]
Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2012a. Evaluating aggregated search pages. In Proceedings of SIGIR. ACM, New York, NY,115--124.
[34]
Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2012b. Evaluating reward and risk for vertical selection. In Proceedings of CIKM. ACM, New York, NY, 2631--2634.
[35]
Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2013a. Which vertical search engines are relevant? In WWW, 1557--1568, ACM.
[36]
Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, and Joemon M. Jose. 2013b. On the reliability and intuitiveness of aggregated search metrics. In CIKM, 689--698, ACM.
[37]
Ke Zhou, Thomas Demeester, Dong Nguyen, Djoerd Hiemstra, and Dolf Trieschnigg. 2014. Aligning vertical collection relevance with user intent. In CIKM, ACM.

Cited By

View all
  • (2024)Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human FeedbackProceedings of the ACM Web Conference 202410.1145/3589334.3645365(1374-1385)Online publication date: 13-May-2024
  • (2024)Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender SystemsProceedings of the Future Technologies Conference (FTC) 2024, Volume 110.1007/978-3-031-73110-5_11(138-157)Online publication date: 5-Nov-2024
  • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
  • Show More Cited By

Index Terms

  1. A Comparative Analysis of Interleaving Methods for Aggregated Search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 33, Issue 2
    February 2015
    181 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/2737813
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 February 2015
    Accepted: 01 September 2014
    Received: 01 June 2014
    Published in TOIS Volume 33, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Information retrieval
    2. aggregated search
    3. clicks
    4. interleaved comparison
    5. interleaving
    6. online evaluation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Elite Network Shifts project funded by the Royal Dutch Academy of Sciences (KNAW)
    • TROVe project funded by the CLARIAH program
    • the Netherlands eScience Center under project number 027.012.105
    • ESF Research Network Program ELIAS
    • Microsoft Research Ph.D. program
    • HPC Fund
    • European Community's Seventh Framework Programme (FP7/2007-2013)
    • QuaMerdes project funded by the CLARIN-nl program
    • Dutch national program COMMIT
    • Yahoo! Faculty Research and Engagement Program
    • the Netherlands Organisation for Scientific Research (NWO)
    • Center for Creation, Content and Technology (CCCT)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human FeedbackProceedings of the ACM Web Conference 202410.1145/3589334.3645365(1374-1385)Online publication date: 13-May-2024
    • (2024)Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender SystemsProceedings of the Future Technologies Conference (FTC) 2024, Volume 110.1007/978-3-031-73110-5_11(138-157)Online publication date: 5-Nov-2024
    • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
    • (2023)S2phere: Semi-Supervised Pre-training for Web Search over Heterogeneous Learning to Rank DataProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599935(4437-4448)Online publication date: 6-Aug-2023
    • (2023) MHRR : MOOCs Recommender Service With Meta Hierarchical Reinforced Ranking IEEE Transactions on Services Computing10.1109/TSC.2023.332530216:6(4467-4480)Online publication date: Nov-2023
    • (2023)MPGraf: a Modular and Pre-trained Graphformer for Learning to Rank at Web-scale2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00043(339-348)Online publication date: 1-Dec-2023
    • (2023)Methodology for Analyzing Bitstreams Based on the Use of the Damerau–Levenshtein Distance and Other MetricsCybernetics and Systems Analysis10.1007/s10559-023-00627-659:6(919-927)Online publication date: 1-Nov-2023
    • (2023)LtrGCN: Large-Scale Graph Convolutional Networks-Based Learning to Rank for Web SearchMachine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track10.1007/978-3-031-43427-3_38(635-651)Online publication date: 18-Sep-2023
    • (2021)Pre-trained Language Model for Web-scale Retrieval in Baidu SearchProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467149(3365-3375)Online publication date: 14-Aug-2021
    • (2020)MergeDTSACM Transactions on Information Systems10.1145/341175338:4(1-28)Online publication date: 10-Sep-2020
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media