Abstract
In many search scenarios, such as exploratory, comparative, or survey-oriented search, users interact with dynamic search systems to satisfy multi-aspect information needs. These systems utilize different dynamic approaches that exploit various user feedback granularity types. Although studies have provided insights about the role of many components of these systems, they used black-box and isolated experimental setups. Therefore, the effects of these components or their interactions are still not well understood. We address this by following a methodology based on Analysis of Variance (ANOVA). We built a Grid Of Points that consists of systems based on different ways to instantiate three components: initial rankers, dynamic rerankers, and user feedback granularity. Using evaluation scores based on the TREC Dynamic Domain collections, we built several ANOVA models to estimate the effects. We found that (i) although all components significantly affect search effectiveness, the initial ranker has the largest effective size, (ii) the effect sizes of these components vary based on the length of the search session and the used effectiveness metric, and (iii) initial rankers and dynamic rerankers have more prominent effects than user feedback granularity. To improve effectiveness, we recommend improving the quality of initial rankers and dynamic rerankers. This does not require eliciting detailed user feedback, which might be expensive or invasive.
- [1] . 2016. Recommender Systems: The Textbook (1st ed.). Springer Publishing Company, Incorporated, Cham. Google ScholarDigital Library
- [2] . 2009. Diversifying search results. In Proc. WSDM. ACM, New York, NY, 5–14. Google ScholarDigital Library
- [3] . 2017. On the additivity and weak baselines for search result diversification research. In Proc. ICTIR. ACM, New York, NY, 109116. Google ScholarDigital Library
- [4] . 2019. Meta-evaluation of dynamic search: How do metrics capture topical relevance, diversity and user effort? In Proc. ECIR. Springer International Publishing, Cham, 607–620.Google Scholar
- [5] . 2018. Desirable properties for diversity and truncated effectiveness metrics. In Proc. Aust. Doc. Comp. Symp. ACM, New York, NY, 9:1–9:7. Google ScholarDigital Library
- [6]
. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Sys.
20, 4 (
Oct. 2002), 357–389. Google ScholarDigital Library - [7] . 2009. Improvements that don’t add up: Ad-hoc retrieval results since 1998. In Proc. CIKM. ACM, New York, NY, 601–610. Google ScholarDigital Library
- [8] . 1999. Blind men and elephants: Six approaches to TREC data. Inf. Retr. 1, 1 (1999), 7–34. Google ScholarDigital Library
- [9] . 2008. THUIR at TREC2008: Relevance feedback track1. In Proc. TREC, Vol. 500-277. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [10] . 2016. Utilizing focused relevance feedback. In Proc. SIGIR. ACM, New York, NY, 1061–1064. Google ScholarDigital Library
- [11] . 2008. Relevance feedback track overview: TREC 2008. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [12] . 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR. ACM, New York, NY, 335–336. Google ScholarDigital Library
- [13] . 2014. Overview of the TREC 2014 session track. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [14]
. 2012. Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans. Inf. Syst.
30, 1, Article
4 (Mar. 2012), 34 pages. Google ScholarDigital Library - [15] . 2009. Expected reciprocal rank for graded relevance. In Proc. CIKM. ACM, New York, NY, 621–630. Google ScholarDigital Library
- [16] . 2019. TianGong-ST: A new dataset with large-scale refined real-world web search sessions. In Proc. CIKM. ACM, New York, NY. Google ScholarDigital Library
- [17] . 2020. Balancing reinforcement learning training experiences in interactive information retrieval. In Proc. SIGIR. ACM, New York, NY, 1525–1528. Google ScholarDigital Library
- [18] . 2008. Novelty and diversity in information retrieval evaluation. In Proc. SIGIR. ACM, New York, NY, 659–666. Google ScholarDigital Library
- [19] . 2012. Overview of the TREC 2012 web track. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [20] . 2021. MS MARCO: Benchmarking ranking models in the large-data regime. In Proc. SIGIR. ACM, New York, NY, 1566–1576. Google ScholarDigital Library
- [21] . 2012. Diversity by proportionality: An election-based approach to search result diversification. In Proc. SIGIR. ACM, New York, NY, 65–74. Google ScholarDigital Library
- [22] . 2013. Term level search result diversification. In Proc. SIGIR. ACM, New York, NY, 603–612. Google ScholarDigital Library
- [23] . 2021. System effect estimation by sharding: A comparison between ANOVA approaches to detect significant differences. In Proc. ECIR. Springer International Publishing, Cham, 33–46.Google Scholar
- [24]
. 2021. Towards Meaningful Statements in IR Evaluation. Mapping Evaluation Measures to Interval Scales.
arxiv:2101.02668 [cs.IR]. Retrieved from https://arxiv.org/abs/2101.02668.Google Scholar - [25] . 2019. Improving the accuracy of system performance estimation by using shards. In Proc. SIGIR. ACM, New York, NY, 805–814. Google ScholarDigital Library
- [26] . 2016. A general linear mixed models approach to study system component effects. In Proc. SIGIR. ACM, New York, NY, 25–34. Google ScholarDigital Library
- [27] . 2018. Toward an anatomy of IR system component performances. J. Assoc. Inf. Sci. Technol. 69, 2 (2018), 187–200.Google ScholarCross Ref
- [28] . 2011. Combining strategies for XML retrieval. In Comparative Evaluation of Focused Retrieval, , , , and (Eds.). Springer, Berlin , 319–331. Google ScholarDigital Library
- [29] . 1999. From reading to retrieval: Freeform ink annotations as queries. In Proc. SIGIR. ACM, New York, NY, 19–25. Google ScholarDigital Library
- [30] . 2009. Overview of the reliable information access workshop. Inf. Retr. 12, 6 (2009), 615–641. Google ScholarDigital Library
- [31] . 2015. Search result diversification based on hierarchical intents. In Proc. CIKM. ACM, New York, NY, 63–72. Google ScholarDigital Library
- [32]
. 1980. 7 robustness of ANOVA and MANOVA test procedures. In Analysis of Variance.
Handbook of Statistics , Vol. 1. Elsevier, Amsterdam, 199–236.Google Scholar - [33] . 2017. Comparing in situ and multidimensional relevance judgments. In Proc. SIGIR. ACM, New York, NY, 405–414. Google ScholarDigital Library
- [34] . 2017. Learning to diversify search results via subtopic attention. In Proc. SIGIR. ACM, New York, NY, 545–554. Google ScholarDigital Library
- [35] . 2013. Interactive exploratory search for multi page search results. In Proc. WWW. ACM, New York, NY, 655–666. Google ScholarDigital Library
- [36] . 2015. Re-ranking via user feedback: Georgetown university at TREC 2015 DD track. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [37] . 2015. Laval university and lakehead university at TREC dynamic domain 2015: Combination of techniques for subtopics coverage. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [38] . 2011. Evaluating multi-query sessions. In Proc. SIGIR (Beijing, China). ACM, New York, NY, 1053–1062. Google ScholarDigital Library
- [39]
. 2016. Examining additivity and weak baselines. ACM Trans. Inf. Sys.
34, 4, Article
23 (June 2016), 18 pages. Google ScholarDigital Library - [40] . 2001. Relevance based language models. In Proc. SIGIR. ACM, New York, NY, 120–127. Google ScholarDigital Library
- [41] . 2001. Evaluating document clustering for interactive information retrieval. In Proc. CIKM. ACM, New York, NY, 33–40. Google ScholarDigital Library
- [42] . 2017. An extended relevance model for session search. In Proc. SIGIR. ACM, New York, NY, 865–868. Google ScholarDigital Library
- [43] . 2016. Multiple queries as bandit arms. In Proc. CIKM. ACM, New York, NY, 1089–1098. Google ScholarDigital Library
- [44] . 2021. Significant improvements over the state of the art? A case study of the MS MARCO document ranking leaderboard. In Proc. SIGIR. ACM, New York, NY, 2283–2287. Google ScholarDigital Library
- [45] . 2020. DVGAN: A minimax game for search result diversification combining explicit and implicit features. In Proc. SIGIR. ACM, New York, NY, 479–488. Google ScholarDigital Library
- [46] . 2019. Relevance modeling with multiple query variations. In Proc. SIGIR. ACM, New York, NY, 27–34. Google ScholarDigital Library
- [47] . 2013. The water filling model and the cube test: Multi-dimensional evaluation for professional search. In Proc. CIKM. ACM, New York, NY, 709–714. Google ScholarDigital Library
- [48] . 2014. Win-win Search: Dual-agent stochastic game in session search. In Proc. SIGIR. ACM, New York, NY, 587–596. Google ScholarDigital Library
- [49] . 2018. CLAIRE: A combinatorial visual analytics system for information retrieval evaluation. Inf. Proc. Man. 54, 6 (2018), 1077–1100.Google ScholarCross Ref
- [50]
. 2019. The impact of result diversification on search behaviour and performance. Inf. Retr.
22, 5 (
16 May 2019), 422–446.Google ScholarDigital Library - [51] . 2018. An introduction to neural information retrieval. Found. Trends in IR 13, 1 (2018), 1–126.Google ScholarCross Ref
- [52] . 2017. On Effective Dynamic Search Systems. Master thesis. University of Minas Gerais.Google Scholar
- [53] . 2017. On effective dynamic search in specialized domains. In Proc. ICTIR. ACM, New York, NY, 177–184. Google ScholarDigital Library
- [54] . 2016. UFMG at the TREC 2016 dynamic domain track. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [55] . 1999. TREC-7 interactive track report. In Proc. TREC (Gaithersburg, Maryland). National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [56] . 1998. A language modeling approach to information retrieval. In Proc. SIGIR. ACM, New York, NY, 275–281. Google ScholarDigital Library
- [57] . 2020. Diversifying search results using self-attention network. In Proc. CIKM. ACM, New York, NY, 1265–1274. Google ScholarDigital Library
- [58] . 2016. An investigation of basic retrieval models for the dynamic domain task. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [59]
. 2019. Relevance feedback: The whole is inferior to the sum of its parts. ACM Trans. Inf. Sys.
37, 4, Article
44 (Oct. 2019), 28 pages. Google ScholarDigital Library - [60] . 2013. Toward whole-session relevance: Exploring intrinsic diversity in web search. In Proc. SIGIR. ACM, New York, NY, 463–472. Google ScholarDigital Library
- [61] . 2011. Structured learning of two-level dynamic rankings. In Proc. CIKM. ACM, New York, NY, 291–296. Google ScholarDigital Library
- [62] . 1971. Relevance Feedback in Information Retrieval. Prentice Hall, USA.Google Scholar
- [63] . 2020. Leveraging behavioral heterogeneity across markets for cross-market training of recommender systems. In Proc. WWW. ACM, New York, NY, 694–702. Google ScholarDigital Library
- [64] . 2010. Exploiting query reformulations for web search result diversification. In Proc. WWW. ACM, New York, NY, 881–890. Google ScholarDigital Library
- [65] . 2015. Search result diversification. Found. Trends IR 9, 1 (2015), 1–90. Google ScholarDigital Library
- [66] . 2005. Active feedback in ad hoc information retrieval. In Proc. SIGIR. ACM, New York, NY, 59–66. Google ScholarDigital Library
- [67] . 2015. Dynamic information retrieval: Theoretical framework and application. In Proc. ICTIR. ACM, New York, NY, 61–70. Google ScholarDigital Library
- [68] . 2019. Evaluation of rich and explicit feedback for exploratory search. In Proc. CHIIR. ACM, New York, NY.Google Scholar
- [69] . 1996. Okapi at TREC-3. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 109–126.Google Scholar
- [70] . 1994. A statistical analysis of the TREC-3 data. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [71] . 2017. Georgetown university at TREC 2017 dynamic domain track. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [72]
. 2017. Using replicates in information retrieval evaluation. ACM Trans. Inf. Sys.
36, 2, Article
12 (Aug. 2017), 21 pages. Google ScholarDigital Library - [73] . 2019. Investigating passage-level relevance and its role in document-level relevance judgment. In Proc. SIGIR. ACM, New York, NY, 605–614. Google ScholarDigital Library
- [74] . 2015. Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In Proc. SIGIR. ACM, New York, NY, 113–122. Google ScholarDigital Library
- [75] . 2016. Modeling document novelty with neural tensor network for search result diversification. In Proc. SIGIR. ACM, New York, NY, 395–404. Google ScholarDigital Library
- [76] . 2017. Directly optimize diversity evaluation measures: A new approach to search result diversification. ACM Trans. Int. Syst. Technol. 8, 3 (2017), 1–26. Google ScholarDigital Library
- [77] . 2021. Diversification-aware learning to rank using distributed representation. In Proc. Web (WWW’21). ACM, New York, NY, 10 pages. Google ScholarDigital Library
- [78] . 2017. A contextual bandit approach to dynamic search. In Proc. ICTIR. ACM, New York, NY, 301–304. Google ScholarDigital Library
- [79] . 2015. TREC 2015 dynamic domain track overview. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [80] . 2016. TREC 2016 dynamic domain track overview. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [81] . 2017. TREC 2017 dynamic domain track overview. In Proc. TREC, Vol. 500-324. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [82] . 2019. On topic difficulty in ir evaluation: The effect of systems, corpora, and system components. In Proc. SIGIR. ACM, New York, NY, 909–912. Google ScholarDigital Library
- [83]
. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst.
22, 2 (
April 2004), 179–214. Google ScholarDigital Library - [84] . 2017. ICTNET at TREC 2017 dynamic domain track. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD.Google Scholar
- [85] . 2020. Deep reinforcement learning for information retrieval: Fundamentals and advances. In Proc. SIGIR. ACM, New York, NY, 2468–2471. Google ScholarDigital Library
- [86] . 2008. Extending relevance model for relevance feedback. In Proc. TREC. National Institute of Standards and Technology (NIST), Gaithersburg, MD, 500–277.Google Scholar
- [87] . 2020. Corpus-level end-to-end exploration for interactive systems. In Proc. AAAI. AAAI Press, 2527–2534.Google Scholar
- [88] . 2020. RLIRank: Learning to rank with reinforcement learning for dynamic search. In Proc. WWW. ACM, New York, NY, 2842–2848. Google ScholarDigital Library
- [89] . 2014. Learning for search result diversification. In Proc. SIGIR. ACM, New York, NY, 293–302.Google Scholar
Index Terms
Component-based Analysis of Dynamic Search Performance
Recommendations
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and ProtectionThis paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Personalized interactive faceted search
WWW '08: Proceedings of the 17th international conference on World Wide WebFaceted search is becoming a popular method to allow users to interactively search and navigate complex information spaces. A faceted search system presents users with key-value metadata that is used for query refinement. While popular in e-commerce and ...
Reducing Click and Skip Errors in Search Result Ranking
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningSearch engines provide result summaries to help users quickly identify whether or not it is worthwhile to click on a result and read in detail. However, users may visit non-relevant results and/or skip relevant ones. These actions are usually harmful to ...
Comments