Abstract
Evaluation has always been the cornerstone of scientific development. Scientists come up with hypotheses (models) to explain physical phenomena, and validate these models by comparing their output to observations in nature. A scientific field consists then merely by a collection of hypotheses that could not been disproved (yet) when compared to nature. Evaluation plays the exact key role in the field of information retrieval. Researchers and practitioners develop models to explain the relation between an information need expressed by a person and information contained in available resources, and test these models by comparing their outcomes to collections of observations.
This article is a short survey on methods, measures, and designs used in the field of Information Retrieval to evaluate the quality of search algorithms (aka the implementation of a model) against collections of observations. The phrase “search quality” has more than one interpretations, however here I will only discuss one of these interpretations, the effectiveness of a search algorithm to find the information requested by a user. There are two types of collections of observations used for the purpose of evaluation: (a) relevance annotations, and (b) observable user behaviour. I will call the evaluation framework based on the former a collection-based evaluation, while the one based on the latter an in-situ evaluation.
This survey is far from complete; it only presents my personal viewpoint on the recent developments in the field.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Retrieval systems and search engines are used interchangeably in this paper.
- 2.
Text REtrieval Conference.
- 3.
See the TREC Crowdsourcing track: https://sites.google.com/site/treccrowd/.
- 4.
- 5.
- 6.
Also known as split testing, control/treatment testing, bucket testing, randomised experiments, and online field experiments.
- 7.
Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, NetFlix, Shop Direct, Yahoo!, Zynga have reported performing A/B tests.
- 8.
“If you torture the data enough, it will confess to anything”, Ronald Harry Coase.
References
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)
Al-Harbi, A.L., Smucker, M.D.: A qualitative exploration of secondary assessor relevance judging behavior. In: Proceedings of the 5th Information Interaction in Context Symposium, IIiX 2014, pp. 195–204. ACM, New York (2014). http://doi.acm.org/10.1145/2637002.2637025
Allan, J., Carterette, B., Dachev, B., Aslam, J.A., Pavlu, V., Kanoulas, E.: Million query track 2007 overview. In: Proceedings of the Sixteenth Text REtrieval Conference, TREC 2007, Gaithersburg, Maryland, USA, 5–9 November 2007. http://trec.nist.gov/pubs/trec16/papers/1MQ.OVERVIEW16.pdf
Alonso, O., Baeza-Yates, R.: Design and implementation of relevance assessments using crowdsourcing. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 153–164. Springer, Heidelberg (2011). http://dx.doi.org/10.1007/978-3-642-20161-5_16
Alonso, O., Mizzaro, S.: Using crowdsourcing for trec relevance assessment. Inf. Process. Manage. 48(6), 1053–1066 (2012). http://dx.doi.org/10.1016/j.ipm.2012.01.004
Amigó, E., Gonzalo, J., Verdejo, F.: A general evaluation measure for document organization tasks. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 643–652. ACM, New York (2013). http://doi.acm.org/10.1145/2484028.2484081
Ashkan, A., Clarke, C.L.: On the informativeness of cascade and intent-aware effectiveness measures. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 407–416. ACM, New York (2011). http://doi.acm.org/10.1145/1963405.1963464
Aslam, J.A., Pavlu, V., Savell, R.: A unified model for metasearch, pooling, and system evaluation. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM 2003, pp. 484–491. ACM, New York (2003). http://doi.acm.org/10.1145/956863.956953
Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, pp. 541–548, 6–11 August 2006. http://doi.acm.org/10.1145/1148170.1148263
Aslam, J.A., Savell, R.: On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 361–362. ACM, New York (2003). http://doi.acm.org/10.1145/860435.860501
Aslam, J.A., Yilmaz, E.: Inferring document relevance via average precision. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 601–602. ACM, New York (2006). http://doi.acm.org/10.1145/1148170.1148275
Aslam, J.A., Yilmaz, E.: Inferring document relevance from incomplete information. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, pp. 633–642, 6–10 November 2007. http://doi.acm.org/10.1145/1321440.1321529
Aslam, J.A., Yilmaz, E., Pavlu, V.: The maximum entropy method for analyzing retrieval measures. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 27–34. ACM, New York (2005). http://doi.acm.org/10.1145/1076034.1076042
Bailey, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A.P., Yilmaz, E.: Relevance assessment: are judges exchangeable and does it matter. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, pp. 667–674, 20–24 July 2008. http://doi.acm.org/10.1145/1390334.1390447
Bakshy, E., Eckles, D.: Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp. 1303–1311. ACM, New York (2013). http://doi.acm.org/10.1145/2487575.2488218
Bakshy, E., Eckles, D., Bernstein, M.S.: Designing and deploying online field experiments. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 283–292. ACM, New York (2014). http://doi.acm.org/10.1145/2566486.2567967
Baskaya, F., Keskustalo, H., Järvelin, K.: Simulating simple and fallible relevance feedback. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 593–604. Springer, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=1996889.1996965
Baskaya, F., Keskustalo, H., Järvelin, K.: Time drives interaction: simulating sessions in diverse searching environments. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 105–114. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348301
Belkin, N.J.: Salton award lecture: people, interacting with information. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, pp. 1–2, 9–13 August 2015. http://doi.acm.org/10.1145/2766462.2767854
Berto, A., Mizzaro, S., Robertson, S.: On using fewer topics in information retrieval evaluations. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, ICTIR 2013, pp. 9:30–9:37. ACM, New York (2013). http://doi.acm.org/10.1145/2499178.2499184
Bilgic, M., Bennett, P.N.: Active query selection for learning rankers. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 1033–1034. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348455
Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Tran Duc, T.: Repeatable and reliable search system evaluation using crowdsourcing. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 923–932. ACM, New York (2011). http://doi.acm.org/10.1145/2009916.2010039
Busin, L., Mizzaro, S.: Axiometrics: an axiomatic approach to information retrieval effectiveness metrics. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, ICTIR 2013, pp. 8:22–8:29. ACM, New York (2013). http://doi.acm.org/10.1145/2499178.2499182
Carterette, B.: Robust test collections for retrieval evaluation. In: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 55–62, 23–27 July 2007. http://doi.acm.org/10.1145/1277741.1277754
Carterette, B.: System effectiveness, user models, and user utility: a conceptual framework for investigation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 903–912. ACM, New York (2011). http://doi.acm.org/10.1145/2009916.2010037
Carterette, B.: Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans. Inf. Syst. 30(1), 4:1–4:34 (2012). http://doi.acm.org/10.1145/2094072.2094076
Carterette, B.: Statistical significance testing in information retrieval: theory and practice. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, ICTIR 2013, p. 2:2. ACM, New York (2013). http://doi.acm.org/10.1145/2499178.2499204
Carterette, B.: Statistical significance testing in information retrieval: theory and practice. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, p. 1286. ACM, New York (2014). http://doi.acm.org/10.1145/2600428.2602292
Carterette, B., Allan, J., Sitaraman, R.K.: Minimal test collections for retrieval evaluation. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, pp. 268–275, 6–11 August 2006. http://doi.acm.org/10.1145/1148170.1148219
Carterette, B., Bah, A., Zengin, M.: Dynamic test collections for retrieval evaluation. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, ICTIR 2015, pp. 91–100. ACM, New York (2015). http://doi.acm.org/10.1145/2808194.2809470
Carterette, B., Kanoulas, E., Pavlu, V., Fang, H.: Reusable test collections through experimental design. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, pp. 547–554, 19–23 July 2010. http://doi.acm.org/10.1145/1835449.1835541
Carterette, B., Kanoulas, E., Yilmaz, E.: Simulating simple user behavior for system effectiveness evaluation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 611–620. ACM, New York (2011). http://doi.acm.org/10.1145/2063576.2063668
Carterette, B., Kanoulas, E., Yilmaz, E.: Incorporating variability in user behavior into systems based evaluation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 135–144. ACM, New York (2012). http://doi.acm.org/10.1145/2396761.2396782
Carterette, B., Pavlu, V., Fang, H., Kanoulas, E.: Million query track 2009 overview. In: Proceedings of The Eighteenth Text REtrieval Conference, TREC 2009, Gaithersburg, Maryland, USA, 17–20 November 2009. http://trec.nist.gov/pubs/trec18/papers/MQ09OVERVIEW.pdf
Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J.: Evaluation over thousands of queries. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, pp. 651–658, 20–24 July 2008. http://doi.acm.org/10.1145/1390334.1390445
Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J.: If I had a million queries. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 288–300. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-00958-7_27
Carterette, B., Soboroff, I.: The effect of assessor error on IR system evaluation. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 539–546. ACM, New York (2010). http://doi.acm.org/10.1145/1835449.1835540
Chakraborty, S., Radlinski, F., Shokouhi, M., Baecke, P.: On correlation of absence time and search effectiveness. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 1163–1166. ACM, New York (2014). http://doi.acm.org/10.1145/2600428.2609535
Chandar, P., Webber, W., Carterette, B.: Document features predicting assessor disagreement. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 745–748. ACM, New York (2013). http://doi.acm.org/10.1145/2484028.2484161
Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., Wu, S.L.: Intent-based diversification of web search results: metrics and algorithms. Inf. Retr. 14(6), 572–592 (2011)
Chapelle, O., Joachims, T., Radlinski, F., Yue, Y.: Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst. 30(1), 6:1–6:41 (2012). http://doi.acm.org/10.1145/2094072.2094078
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 621–630. ACM, New York (2009). http://doi.acm.org/10.1145/1645953.1646033
Chuklin, A., Markov, I., de Rijke, M.: Click Models for Web Search. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, San Rafael (2015). http://dx.doi.org/10.2200/S00654ED1V01Y201507ICR043
Chuklin, A., Markov, I., de Rijke, M.: Click Models for Web Search. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, San Rafael (2015). http://clickmodels.weebly.com/uploads/5/2/2/5/52257029/mc2015-clickmodels.pdf
Chuklin, A., Serdyukov, P., de Rijke, M.: Click model-based information retrieval metrics. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 493–502. ACM, New York (2013). http://doi.acm.org/10.1145/2484028.2484071
Chuklin, A., Zhou, K., Schuth, A., Sietsma, F., de Rijke, M.: Evaluating intuitiveness of vertical-aware click models. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2014, pp. 1075–1078. ACM, New York (2014). http://doi.acm.org/10.1145/2600428.2609513
Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: SIGIR 2008: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 659–666. ACM, New York (2008)
Cormack, G.V., Palmer, C.R., Clarke, C.L.A.: Efficient construction of large test collections. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 282–289. ACM, New York (1998). http://doi.acm.org/10.1145/290941.291009
Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 239–246. ACM, New York (2007). http://doi.acm.org/10.1145/1277741.1277784
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM 2008, pp. 87–94. ACM, New York (2008). http://doi.acm.org/10.1145/1341531.1341545
Crook, T., Frasca, B., Kohavi, R., Longbotham, R.: Seven pitfalls to avoid when running controlled experiments on the web. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 1105–1114. ACM, New York (2009). http://doi.acm.org/10.1145/1557019.1557139
Dang, V., Xue, X., Croft, W.B.: Inferring query aspects from reformulations using clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 2117–2120. ACM, New York (2011). http://doi.acm.org/10.1145/2063576.2063904
Demartini, G., Mizzaro, S.: A classification of IR effectiveness metrics. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 488–491. Springer, Heidelberg (2006). http://dx.doi.org/10.1007/11735106_48
Demeester, T., Aly, R., Hiemstra, D., Nguyen, D., Trieschnigg, D., Develder, C.: Exploiting user disagreement for web search evaluation: an experimental approach. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 33–42. ACM, New York (2014). http://doi.acm.org/10.1145/2556195.2556268
Deng, A., Hu, V.: Diluted treatment effect estimation for trigger analysis in online controlled experiments. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 349–358. ACM, New York (2015). http://doi.acm.org/10.1145/2684822.2685307
Deng, A., Li, T., Guo, Y.: Statistical inference in two-stage online controlled experiments with treatment selection and validation. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 609–618. ACM, New York (2014). http://doi.acm.org/10.1145/2566486.2568028
Deng, A., Xu, Y., Kohavi, R., Walker, T.: Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 123–132. ACM, New York (2013). http://doi.acm.org/10.1145/2433396.2433413
Diriye, A., White, R., Buscher, G., Dumais, S.: Leaving so soon?: understanding and predicting web search abandonment rationales. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 1025–1034. ACM, New York (2012). http://doi.acm.org/10.1145/2396761.2398399
Drutsa, A., Gusev, G., Serdyukov, P.: Future user engagement prediction and its application to improve the sensitivity of online experiments. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 256–266. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2015). http://dx.doi.org/10.1145/2736277.2741116
Dupret, G.E., Piwowarski, B.: A user browsing model to predict search engine click data from past observations. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 331–338. ACM, New York (2008). http://doi.acm.org/10.1145/1390334.1390392
Efron, M.: Using multiple query aspects to build test collections without human relevance judgments. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 276–287. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-00958-7_26
Ferrante, M., Ferro, N., Maistro, M.: Injecting user models and time into precision via markov chains. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 597–606. ACM, New York (2014). http://doi.acm.org/10.1145/2600428.2609637
Fox, S., Karnawat, K., Mydland, M., Dumais, S., White, T.: Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst. 23(2), 147–168 (2005). http://doi.acm.org/10.1145/1059981.1059982
Grotov, A., Chuklin, A., Markov, I., Stout, L., Xumara, F., de Rijke, M.: A comparative study of click models for web search. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G., San Juan, E., Capellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 78–90. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24027-5_7
Grotov, A., Whiteson, S., de Rijke, M.: Bayesian ranker comparison based on historical user interactions. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 273–282. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767730
Guiver, J., Mizzaro, S., Robertson, S.: A few good topics: Experiments in topic set reduction for retrieval evaluation. ACM Trans. Inf. Syst. 27(4), 21:1–21:26 (2009). http://doi.acm.org/10.1145/1629096.1629099
Guo, F., Liu, C., Kannan, A., Minka, T., Taylor, M., Wang, Y.M., Faloutsos, C.: Click chain model in web search. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 11–20. ACM, New York (2009). http://doi.acm.org/10.1145/1526709.1526712
Guo, F., Liu, C., Wang, Y.M.: Efficient multiple-click models in web search. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM 2009, pp. 124–131. ACM, New York (2009). http://doi.acm.org/10.1145/1498759.1498818
Guo, Q., Agichtein, E.: Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 569–578. ACM, New York (2012). http://doi.acm.org/10.1145/2187836.2187914
Guo, Y., Deng, A.: Flexible Online Repeated Measures Experiment. ArXiv e-prints, January 2015
Harman, D., Voorhees, E.M.: TREC: an overview. ARIST 40(1), 113–155 (2006). http://dx.doi.org/10.1002/aris.1440400111
Hassan, A., Shi, X., Craswell, N., Ramsey, B.: Beyond clicks: query reformulation as a predictor of search satisfaction. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, pp. 2019–2028. ACM, New York (2013). http://doi.acm.org/10.1145/2505515.2505682
Hauff, C., Hiemstra, D., Azzopardi, L., de Jong, F.: A case for automatic system evaluation. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 153–165. Springer, Heidelberg (2010). http://dx.doi.org/10.1007/978-3-642-12275-0_16
He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 2029–2032. ACM, New York (2009). http://doi.acm.org/10.1145/1645953.1646293
Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 249–258. ACM, New York (2011). http://doi.acm.org/10.1145/2063576.2063618
Hofmann, K., Whiteson, S., de Rijke, M.: Estimating interleaved comparison outcomes from historical click data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 1779–1783. ACM, New York (2012). http://doi.acm.org/10.1145/2396761.2398516
Hosseini, M., Cox, I., Milic-Frayling, N.: Optimizing the cost of information retrieval testcollections. In: Proceedings of the 4th Workshop on Workshop for Ph.D. Students in Information and Knowledge Management, PIKM 2011, pp. 79–82. ACM, New York (2011). http://doi.acm.org/10.1145/2065003.2065020
Hosseini, M., Cox, I.J., Milic-Frayling, N., Shokouhi, M., Yilmaz, E.: An uncertainty-aware query selection model for evaluation of IR systems. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 901–910. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348403
Hosseini, M., Cox, I.J., Milic-Frayling, N., Vinay, V., Sweeting, T.: Selecting a subset of queries for acquisition of further relevance judgements. In: Amati, G., Crestani, F. (eds.) ICTIR 2011. LNCS, vol. 6931, pp. 113–124. Springer, Heidelberg (2011)
Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., Zheng, Q.: Mining query subtopics from search log data. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 305–314. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348327
Huang, J., White, R.W., Dumais, S.: No clicks, no problem: using cursor movements to understand and improve search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2011, pp. 1225–1234. ACM, New York (2011). http://doi.acm.org/10.1145/1978942.1979125
Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted cumulated gain based evaluation of multiple-query IR sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008). http://dl.acm.org/citation.cfm?id=1793274.1793280
Jiang, J., He, D., Han, S., Yue, Z., Ni, C.: Contextual evaluation of query reformulations in a search session by user simulation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2635–2638. ACM, New York (2012). http://doi.acm.org/10.1145/2396761.2398710
Joachims, T.: Evaluating retrieval performance using clickthrough data. In: Franke, J., Nakhaeizadeh, G., Renz, I. (eds.) Text Mining, pp. 79–96. Physica/Springer Verlag, New York (2003)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 154–161. ACM, New York (2005). http://doi.acm.org/10.1145/1076034.1076063
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2), 1–26 (2007). http://doi.acm.org/10.1145/1229179.1229181
Kanoulas, E., Aslam, J.A.: Empirical justification of the gain and discount function for ndcg. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 611–620. ACM, New York (2009). http://doi.acm.org/10.1145/1645953.1646032
Kanoulas, E., Carterette, B., Clough, P.D., Sanderson, M.: Evaluating multi-query sessions. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1053–1062. ACM, New York (2011). http://doi.acm.org/10.1145/2009916.2010056
Kazai, G.: In search of quality in crowdsourcing for search engine evaluation. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 165–176. Springer, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=1996889.1996911
Kazai, G., Craswell, N., Yilmaz, E., Tahaghoghi, S.: An analysis of systematic judging errors in information retrieval. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 105–114. ACM, New York (2012). http://doi.acm.org/10.1145/2396761.2396779
Kazai, G., Kamps, J., Milic-Frayling, N.: An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Inf. Retr. 16(2), 138–178 (2013). http://dx.doi.org/10.1007/s10791-012-9205-0
Kazai, G., Yilmaz, E., Craswell, N., Tahaghoghi, S.M.M.: User intent and assessor disagreement in web search evaluation. In: 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, pp. 699–708, 27 October–1 November 2013. http://doi.acm.org/10.1145/2505515.2505716
Kelly, D., Belkin, N.J.: Display time as implicit feedback: understanding task effects. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004, pp. 377–384. ACM, New York (2004). http://doi.acm.org/10.1145/1008992.1009057
Kharitonov, V., Macdonald, S., Ounis: sequential testing for early stopping of online experiments. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015. ACM, New York (2015)
Kharitonov, E., Macdonald, C., Serdyukov, P., Ounis, I.: Generalized team draft interleaving. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 773–782. ACM, New York (2015). http://doi.acm.org/10.1145/2806416.2806477
Kim, Y., Hassan, A., White, R.W., Zitouni, I.: Modeling dwell time to predict click-level satisfaction. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 193–202. ACM, New York (2014). http://doi.acm.org/10.1145/2556195.2556220
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., Xu, Y.: Trustworthy online controlled experiments: five puzzling outcomes explained. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, pp. 786–794. ACM, New York (2012). http://doi.acm.org/10.1145/2339530.2339653
Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., Pohlmann, N.: Online controlled experiments at large scale. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp. 1168–1176. ACM, New York (2013). http://doi.acm.org/10.1145/2487575.2488217
Kohavi, R., Deng, A., Longbotham, R., Xu, Y.: Seven rules of thumb for web site experimenters. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 1857–1866. ACM, New York (2014). http://doi.acm.org/10.1145/2623330.2623341
Kohavi, R., Longbotham, R.: Online controlled experiments and A/B tests. In: Sammut, C., Webb, G. (eds.) Encyclopedia of Machine Learning and Data Mining (2015)
Kohavi, R., Longbotham, R., Sommerfield, D., Henne, R.: Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Disc. 18(1), 140–181 (2009). http://dx.doi.org/10.1007/s10618-008-0114-1
Lagun, D., Ageev, M., Guo, Q., Agichtein, E.: Discovering common motifs in cursor movement data for improving web search. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 183–192. ACM, New York (2014). http://doi.acm.org/10.1145/2556195.2556265
Lease, M., Yilmaz, E.: Crowdsourcing for information retrieval. SIGIR Forum 45(2), 66–75 (2012). http://doi.acm.org/10.1145/2093346.2093356
Li, L., Chen, S., Kleban, J., Gupta, A.: Counterfactual estimation and optimization of click metrics for search engines. CoRR abs/1403.1891 (2014). http://arxiv.org/abs/1403.1891
Li, L., Chen, S., Kleban, J., Gupta, A.: Counterfactual estimation and optimization of click metrics in search engines: a case study. In: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015 Companion, pp. 929–934. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2015). http://dx.doi.org/10.1145/2740908.2742562
Li, L., Kim, J.Y., Zitouni, I.: Toward predicting the outcome of an a/b experiment for search relevance. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 37–46. ACM, New York (2015). http://doi.acm.org/10.1145/2684822.2685311
Liu, Y., Chen, Y., Tang, J., Sun, J., Zhang, M., Ma, S., Zhu, X.: Different users, different opinions: predicting search satisfaction with mouse movement information. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 493–502. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767721
Maddalena, E., Mizzaro, S., Scholer, F., Turpin, A.: Judging relevance using magnitude estimation. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 215–220. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-16354-3_23
Megorskaya, O., Kukushkin, V., Serdyukov, P.: On the relation between assessor’s agreement and accuracy in gamified relevance assessment. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 605–614. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767727
Mehrotra, R., Yilmaz, E.: Representative & informative query selection for learning to rank using submodular functions. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 545–554. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767753
Metrikov, P., Pavlu, V., Aslam, J.A.: Impact of assessor disagreement on ranking performance. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 1091–1092. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348484
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2:1–2:27 (2008). http://doi.acm.org/10.1145/1416950.1416952
Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Inf. Process. Manage. 42(3), 595–614 (2006). http://dx.doi.org/10.1016/j.ipm.2005.03.023
Pavlu, V., Rajput, S., Golbus, P.B., Aslam, J.A.: IR system evaluation using nugget-based test collections. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 393–402. ACM, New York (2012). http://doi.acm.org/10.1145/2124295.2124343
Pearl, J.: Comment: understanding simpson’s paradox. Am. Stat. 68(1), 8–13 (2014). http://EconPapers.repec.org/RePEc:taf:amstat:v:68:y:2014:i:1:p:8–13
Qian, Y., Sakai, T., Ye, J., Zheng, Q., Li, C.: Dynamic query intent mining from a search log stream. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, pp. 1205–1208. ACM, New York (2013). http://doi.acm.org/10.1145/2505515.2507856
Radlinski, F., Craswell, N.: Comparing the sensitivity of information retrieval metrics. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 667–674. ACM, New York (2010). http://doi.acm.org/10.1145/1835449.1835560
Radlinski, F., Craswell, N.: Optimized interleaving for online retrieval evaluation. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 245–254. ACM, New York (2013). http://doi.acm.org/10.1145/2433396.2433429
Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 43–52. ACM, New York (2008). http://doi.acm.org/10.1145/1458082.1458092
Radlinski, F., Szummer, M., Craswell, N.: Inferring query intent from reformulations and clicks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1171–1172. ACM, New York (2010). http://doi.acm.org/10.1145/1772690.1772859
Robertson, S.: On the contributions of topics to system evaluation. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 129–140. Springer, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=1996889.1996908
Robertson, S.E., Kanoulas, E.: On per-topic variance in IR evaluation. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 891–900. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348402
Sakai, T.: Bootstrap-based comparisons of IR metrics for finding one relevant document. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 374–389. Springer, Heidelberg (2006). http://dx.doi.org/10.1007/11880592_29
Sakai, T.: Designing test collections for comparing many systems. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, pp. 61–70. ACM, New York (2014). http://doi.acm.org/10.1145/2661829.2661893
Sakai, T., Dou, Z., Clarke, C.L.: The impact of intent selection on diversified search evaluation. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 921–924. ACM, New York (2013). http://doi.acm.org/10.1145/2484028.2484105
Sakai, T., Song, R.: Evaluating diversified search results using per-intent graded relevance. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1043–1052. ACM, New York (2011). http://doi.acm.org/10.1145/2009916.2010055
Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retrieval 4(4), 247–375 (2010). http://dx.doi.org/10.1561/1500000009
Sanderson, M., Paramita, M.L., Clough, P., Kanoulas, E.: Do user preferences and evaluation measures line up? In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 555–562. ACM, New York (2010). http://doi.acm.org/10.1145/1835449.1835542
Schaer, P.: Better than their reputation? On the reliability of relevance assessments with students. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 124–135. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-33247-0_14
Scholer, F., Turpin, A., Sanderson, M.: Quantifying test collection quality based on the consistency of relevance judgements. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1063–1072. ACM, New York (2011). http://doi.acm.org/10.1145/2009916.2010057
Schuth, A., Bruintjes, R.J., Buüttner, F., van Doorn, J., Groenland, C., Oosterhuis, H., Tran, C.N., Veeling, B., van der Velde, J., Wechsler, R., Woudenberg, D., de Rijke, M.: Probabilistic multileave for online retrieval evaluation. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 955–958. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767838
Schuth, A., Hofmann, K., Radlinski, F.: Predicting search satisfaction metrics with interleaved comparisons. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 463–472. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767695
Schuth, A., Sietsma, F., Whiteson, S., Lefortier, D., de Rijke, M.: Multileaved comparisons for fast online evaluation. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, pp. 71–80. ACM, New York (2014). http://doi.acm.org/10.1145/2661829.2661952
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 623–632. ACM, New York (2007). http://doi.acm.org/10.1145/1321440.1321528
Smucker, M.D., Allan, J., Carterette, B.: Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, pp. 630–631. ACM, New York (2009). http://doi.acm.org/10.1145/1571941.1572050
Smucker, M.D., Clarke, C.L.: Time-based calibration of effectiveness measures. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 95–104. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348300
Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 66–73. ACM, New York (2001). http://doi.acm.org/10.1145/383952.383961
Song, Y., Shi, X., Fu, X.: Evaluating and predicting user engagement change with degraded search relevance. In: Proceedings of the 22Nd International Conference on World Wide Web, WWW 2013, pp. 1213–1224. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2013). http://dl.acm.org/citation.cfm?id=2488388.2488494
Tang, D., Agarwal, A., O’Brien, D., Meyer, M.: Overlapping experiment infrastructure: more, better, faster experimentation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 17–26. ACM, New York (2010). http://doi.acm.org/10.1145/1835804.1835810
Turpin, A., Scholer, F., Mizzaro, S., Maddalena, E.: The benefits of magnitude estimation relevance assessments for information retrieval evaluation. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 565–574. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767760
Webber, W., Chandar, P., Carterette, B.: Alternative assessor disagreement and retrieval depth. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 125–134. ACM, New York (2012). http://doi.acm.org/10.1145/2396761.2396781
Wu, S., Crestani, F.: Methods for ranking information retrieval systems without relevance judgments. In: Proceedings of the 2003 ACM Symposium on Applied Computing, SAC 2003, pp. 811–816. ACM, New York (2003). http://doi.acm.org/10.1145/952532.952693
Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, pp. 102–111, 6–11 November 2006. http://doi.acm.org/10.1145/1183614.1183633
Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, pp. 603–610, 20–24 July 2008. http://doi.acm.org/10.1145/1390334.1390437
Yilmaz, E., Kanoulas, E., Craswell, N.: Effect of intent descriptions on retrieval evaluation. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, pp. 599–608. ACM, New York (2014). http://doi.acm.org/10.1145/2661829.2661950
Yilmaz, E., Kazai, G., Craswell, N., Tahaghoghi, S.M.: On judgments obtained from a commercial search engine. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 1115–1116. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348496
Yilmaz, E., Shokouhi, M., Craswell, N., Robertson, S.: Expected browsing utility for web search evaluation. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1561–1564. ACM, New York (2010). http://doi.acm.org/10.1145/1871437.1871672
Yilmaz, E., Verma, M., Craswell, N., Radlinski, F., Bailey, P.: Relevance and effort: an analysis of document utility. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, pp. 91–100. ACM, New York (2014). http://doi.acm.org/10.1145/2661829.2661953
Yue, Y., Gao, Y., Chapelle, O., Zhang, Y., Joachims, T.: Learning more powerful test statistics for click-based retrieval evaluation. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 507–514. ACM, New York (2010). http://doi.acm.org/10.1145/1835449.1835534
Zhang, Y., Park, L.A., Moffat, A.: Click-based evidence for decaying weight distributions in search effectiveness metrics. Inf. Retr. 13(1), 46–69 (2010). http://dx.doi.org/10.1007/s10791-009-9099-7
Zhu, J., Wang, J., Vinay, V., Cox, I.J.: Topic (query) selection for IR evaluation. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, pp. 802–803. ACM, New York (2009). http://doi.acm.org/10.1145/1571941.1572136
Acknowledgements
This work is based on a tutorial I gave at the 2015 Russian Summer School in Information Retrieval (RuSSIR 2015). I would like to thank Ben Carterette, Emine Yilmaz, Anne Schuth, Katja Hofmann, and Filip Radlinski for sharing references and material used in that tutorial and hence as the basis for this survey.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kanoulas, E. (2016). A Short Survey on Online and Offline Methods for Search Quality Evaluation. In: Braslavski, P., et al. Information Retrieval. RuSSIR 2015. Communications in Computer and Information Science, vol 573. Springer, Cham. https://doi.org/10.1007/978-3-319-41718-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-41718-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41717-2
Online ISBN: 978-3-319-41718-9
eBook Packages: Computer ScienceComputer Science (R0)