research-article

A Comparative Analysis of Interleaving Methods for Aggregated Search

Authors:

Aleksandr Chuklin,

Maarten De RijkeAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 33, Issue 2

Article No.: 5, Pages 1 - 38

https://doi.org/10.1145/2668120

Published: 17 February 2015 Publication History

Abstract

A result page of a modern search engine often goes beyond a simple list of “10 blue links.” Many specific user needs (e.g., News, Image, Video) are addressed by so-called aggregated or vertical search solutions: specially presented documents, often retrieved from specific sources, that stand out from the regular organic Web search results. When it comes to evaluating ranking systems, such complex result layouts raise their own challenges. This is especially true for so-called interleaving methods that have arisen as an important type of online evaluation: by mixing results from two different result pages, interleaving can easily break the desired Web layout in which vertical documents are grouped together, and hence hurt the user experience.

We conduct an analysis of different interleaving methods as applied to aggregated search engine result pages. Apart from conventional interleaving methods, we propose two vertical-aware methods: one derived from the widely used Team-Draft Interleaving method by adjusting it in such a way that it respects vertical document groupings, and another based on the recently introduced Optimized Interleaving framework. We show that our proposed methods are better at preserving the user experience than existing interleaving methods while still performing well as a tool for comparing ranking systems. For evaluating our proposed vertical-aware interleaving methods, we use real-world click data as well as simulated clicks and simulated ranking systems.

References

[1]

Jaime Arguello, Fernando Diaz, Jamie Callan, and Jean-François Crespo. 2009. Sources of evidence for vertical selection. In Proceedings of SIGIR. ACM, New York, NY, 315--322.

Digital Library

[2]

Jaime Arguello, Fernando Diaz, and Jamie Callan. 2011a. Learning to aggregate vertical results into Web search results. In Proceedings of CIKM. ACM, New York, NY, 201--210.

Digital Library

[3]

Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011b. A methodology for evaluating aggregated search results. In Proceedings of ECIR. 141--152.

Digital Library

[4]

Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems 30, 1, Article No. 6.

Digital Library

[5]

Danqi Chen, Weizhu Chen, Haixun Wang, Zheng Chen, and Qiang Yang. 2012. Beyond ten blue links: Enabling user click modeling in federated Web search. In Proceedings of WSDM. ACM, New York, NY, 463--472.

Digital Library

[6]

Aleksandr Chuklin, Anne Schuth, Katja Hofmann, Pavel Serdyukov, and Maarten de Rijke. 2013a. Evaluating aggregated search using interleaving. In Proceedings of CIKM. ACM, New York, NY, 669--678.

Digital Library

[7]

Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013b. Click model-based information retrieval metrics. In Proceedings of SIGIR. ACM, New York, NY, 493--502.

Digital Library

[8]

Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013c. Using intent information to model user behavior in diversified search. In Advances in Information Retrieval. Lecture Notes in Computer Science, Vol. 7814. Springer, 1--13.

Digital Library

[9]

Aleksandr Chuklin, Ke Zhou, Anne Schuth, Floor Sietsma, and Maarten de Rijke. 2014. Evaluating intuitiveness of vertical-aware click models. In Proceedings of SIGIR. ACM, New York, NY, 1075--1078.

Digital Library

[10]

Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR. ACM, New York, NY, 659--666.

Digital Library

[11]

Cyril W. Cleverdon, Jack Mills, and Michael Keen. 1996. Factors Determining the Performance of Indexing Systems. Technical Report. ASLIB Cranfield project.

[12]

Thomas Demeester, Dolf Trieschnigg, Dong Nguen, and Djoerd Hiemstra. 2013. Overview of the TREC 2013 Federated Web Search track. In Proceedings of TREC.

[13]

Susan Dumais, Edward Cutrell, and Hao Chen. 2001. Optimizing search by showing results in context. In Proceedings of CHI. ACM, New York, NY, 277--284.

Digital Library

[14]

Jing He, Chengxiang Zhai, and Xiaoming Li. 2009. Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In Proceedings of CIKM. ACM, New York, NY, 2029--2032.

Digital Library

[15]

Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2011. A probabilistic method for inferring preferences from clicks. In Proceedings of CIKM. ACM, New York, NY, 249--258.

Digital Library

[16]

Katja Hofmann, Shimon Whiteson, and Maarten Rijke. 2012. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval 16, 1, 63--90.

Digital Library

[17]

Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013a. Reusing historical interaction data for faster online learning to rank for IR. In Proceedings of WSDM. ACM, New York, NY, 183--192.

Digital Library

[18]

Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013b. Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems 31, 3, Article No. 18.

Digital Library

[19]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446.

Digital Library

[20]

Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of KDD. ACM, New York, NY, 133--142.

Digital Library

[21]

Thorsten Joachims. 2003. Evaluating retrieval performance using clickthrough data. In Text Mining, J. Franke, G. Nakhaeizadeh, and I. Renz (Eds.). Physica/Springer-Verlag, 79--96.

[22]

Dong Nguyen, Thomas Demeester, Dolf Trieschnigg, and Djoerd Hiemstra. 2012. Federated search in the wild: The combined power of over a hundred search engines. In Proceedings of CIKM. ACM, New York, NY, 1874--1878.

Digital Library

[23]

Ashok Kumar Ponnuswami, Kumaresh Pattabiraman, Qiang Wu, Ran Gilad-Bachrach, and Tapas Kanungo. 2011. On composition of a federated Web search result page: Using online users to provide pairwise preference for heterogeneous verticals. In Proceedings of WSDM. ACM, New York, NY, 715--724.

Digital Library

[24]

Filip Radlinski and Nick Craswell. 2010. Comparing the sensitivity of information retrieval metrics. In Proceedings of SIGIR. ACM, New York, NY, 667--674.

Digital Library

[25]

Filip Radlinski and Nick Craswell. 2013. Optimized interleaving for online retrieval evaluation. In Proceedings of WSDM. ACM, New York, NY, 245--254.

Digital Library

[26]

Filip Radlinski, Madhu Kurup, and Thorsten Joachims. 2008. How does clickthrough data reflect retrieval quality&quest; In Proceedings of CIKM. ACM, New York, NY, 43--52.

Digital Library

[27]

Anne Schuth, Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013. Lerot: An online learning to rank framework. In Proceedings of LivingLab. ACM, New York, NY, 23--26.

Digital Library

[28]

Anne Schuth, Floor Sietsma, Shimon Whiteson, Damien Lefortier, and Maarten de Rijke. 2014. Multileaved comparisons for fast online evaluation. In Proceedings of CIKM. ACM, New York, NY, 71--80.

Digital Library

[29]

Jangwon Seo, W. Bruce Croft, Kwang Hyun Kim, and Joon Ho Lee. 2011. Smoothing click counts for aggregated vertical search. In Advances in Information Retrieval. Lecture Notes in Computer Science, Vol. 6611. Springer, 387--398.

Digital Library

[30]

Andrey Styskin. 2013. Aggregate and conquer: Finding the way in the diverse world of user intents. In Proceedings of ECIR.

[31]

Shanu Sushmita, Hideo Joho, Mounia Lalmas, and Robert Villa. 2010. Factors affecting click-through behavior in aggregated search interfaces. In Proceedings of CIKM. ACM, New York, NY, 519--528.

Digital Library

[32]

Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of ICML. ACM, New York, NY, 1201--1208.

Digital Library

[33]

Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2012a. Evaluating aggregated search pages. In Proceedings of SIGIR. ACM, New York, NY,115--124.

Digital Library

[34]

Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2012b. Evaluating reward and risk for vertical selection. In Proceedings of CIKM. ACM, New York, NY, 2631--2634.

Digital Library

[35]

Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2013a. Which vertical search engines are relevant&quest; In WWW, 1557--1568, ACM.

Digital Library

[36]

Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, and Joemon M. Jose. 2013b. On the reliability and intuitiveness of aggregated search metrics. In CIKM, 689--698, ACM.

Digital Library

[37]

Ke Zhou, Thomas Demeester, Dong Nguyen, Djoerd Hiemstra, and Dolf Trieschnigg. 2014. Aligning vertical collection relevance with user intent. In CIKM, ACM.

Digital Library

Cited By

Wang ZGan BShi WChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human FeedbackProceedings of the ACM Web Conference 202410.1145/3589334.3645365(1374-1385)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645365
Schultzberg COttens B(2024)Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender SystemsProceedings of the Future Technologies Conference (FTC) 2024, Volume 110.1007/978-3-031-73110-5_11(138-157)Online publication date: 5-Nov-2024
https://doi.org/10.1007/978-3-031-73110-5_11
Breuer TFuhr NSchaer P(2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
https://dl.acm.org/doi/10.1145/3623640
Show More Cited By

Index Terms

A Comparative Analysis of Interleaving Methods for Aggregated Search
1. Information systems
  1. Information retrieval

Recommendations

Large-scale validation and analysis of interleaved search evaluation

Interleaving is an increasingly popular technique for evaluating information retrieval systems based on implicit user feedback. While a number of isolated studies have analyzed how this technique agrees with conventional offline evaluation approaches ...
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-...
Aggregated Search and Interleaving Methods: A survey
BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless Technologies

Aggregated search attempts to satisfy user's need by searching and assembling information from variety verticals and placing them into a single result page. Aggregated search has two research directions namely, cross-vertical Aggregated Search (cvAS) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 33, Issue 2

February 2015

181 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/2737813

Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2015

Accepted: 01 September 2014

Received: 01 June 2014

Published in TOIS Volume 33, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Elite Network Shifts project funded by the Royal Dutch Academy of Sciences (KNAW)
TROVe project funded by the CLARIAH program
the Netherlands eScience Center under project number 027.012.105
ESF Research Network Program ELIAS
Microsoft Research Ph.D. program
HPC Fund
European Community's Seventh Framework Programme (FP7/2007-2013)
QuaMerdes project funded by the CLARIN-nl program
Dutch national program COMMIT
Yahoo! Faculty Research and Engagement Program
the Netherlands Organisation for Scientific Research (NWO)
Center for Creation, Content and Technology (CCCT)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
311
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZGan BShi WChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human FeedbackProceedings of the ACM Web Conference 202410.1145/3589334.3645365(1374-1385)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645365
Schultzberg COttens B(2024)Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender SystemsProceedings of the Future Technologies Conference (FTC) 2024, Volume 110.1007/978-3-031-73110-5_11(138-157)Online publication date: 5-Nov-2024
https://doi.org/10.1007/978-3-031-73110-5_11
Breuer TFuhr NSchaer P(2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
https://dl.acm.org/doi/10.1145/3623640
Li YXiong HKong LWang QWang SChen GYin DSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)S2phere: Semi-Supervised Pre-training for Web Search over Heterogeneous Learning to Rank DataProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599935(4437-4448)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599935
Li YXiong HKong LZhang RXu FChen GLi M(2023) MHRR : MOOCs Recommender Service With Meta Hierarchical Reinforced Ranking IEEE Transactions on Services Computing10.1109/TSC.2023.332530216:6(4467-4480)Online publication date: Nov-2023
https://doi.org/10.1109/TSC.2023.3325302
Li YXiong HKong LSun ZChen HWang SYin D(2023)MPGraf: a Modular and Pre-trained Graphformer for Learning to Rank at Web-scale2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00043(339-348)Online publication date: 1-Dec-2023
https://doi.org/10.1109/ICDM58522.2023.00043
Nyshchuk ANikolaev SRomanov O(2023)Methodology for Analyzing Bitstreams Based on the Use of the Damerau–Levenshtein Distance and Other MetricsCybernetics and Systems Analysis10.1007/s10559-023-00627-659:6(919-927)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s10559-023-00627-6
Li YXiong HKong LWang SSun ZChen HChen GYin D(2023)LtrGCN: Large-Scale Graph Convolutional Networks-Based Learning to Rank for Web SearchMachine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track10.1007/978-3-031-43427-3_38(635-651)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43427-3_38
Liu YLu WCheng SShi DWang SCheng ZYin DZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)Pre-trained Language Model for Web-scale Retrieval in Baidu SearchProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467149(3365-3375)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467149
Li CMarkov IRijke MZoghi M(2020)MergeDTSACM Transactions on Information Systems10.1145/341175338:4(1-28)Online publication date: 10-Sep-2020
https://dl.acm.org/doi/10.1145/3411753
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents