Abstract
Sample selection bias consists of the effects of the procedure for selecting individuals for inclusion in the sample. Bias may affect system evaluation in many respects, since it (i) requires larger samples than those sufficient to estimate the efficiency of one system, (ii) requires much larger samples to rank systems by efficiency and (iii) can penalize some systems. The unbiased measure that is described in this paper awards the systems that poorly perform for difficult tasks, thus providing a better picture both of system efficiency and system ranking. Nevertheless, we found that bias cannot be completely removed when a group of systems is ranked even though it is corrected for each single system. Eventually, further research should be done to find methods that substantially improve retrieval effectiveness for difficult tasks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79, 151–175 (2010)
Berk, R.A.: An introduction to sample selection bias in sociological data. Am. Sociol. Rev. 48(3), 386–398 (1983)
Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137–2155 (2009)
Buckley, C.: trec_eval. http://trec.nist.gov/trec_eval/index.html. Visited on Sept 2016
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of SIGIR, pp. 33–40 (2000)
Buckley, C., Voorhees, E.M.: Retrieval system evaluation. In: Voorhees, E.M., Harman, D. (eds.) TREC: Experiment and Evaluation in Information Retrieval, Chap. 3. MIT Press, Cambridge (2005)
Cortes, C., Mohri, M.: Domain adaptation and sample bias correction theory and algorithm for regression. Theoret. Comput. Sci. 519, 103–126 (2014)
Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correction theory. In: Proceedings of ALT, pp. 38–53. Springer (2008)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)
Fan, W., Davidson, I.: ReverseTesting: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of KDD, pp. 147–156 (2006)
Gronau, R.: Wage comparisons—a selectivity bias. J. Polit. Econ. 82(6), 1119–1143 (1974)
Guiver, J., Mizzaro, S., Robertson, S.: A few good topics: experiments in topic set reduction for retrieval evaluation. ACM Trans. Inf. Syst. 27, 1–26 (2009)
Hagan, J., Parker, P.: White-collar crime and punishment: the class structure and legal sanctioning of securities violations. Am. Sociol. Rev. 50(3), 302–316 (1985)
Hauff, C., Murdock, V., Baeza-Yates, R.: Improved query difficulty prediction for the web. In: Proceedings of CIKM, pp. 439–448 (2008)
Hawking, D., Craswell, N.: Overview of TREC-10 Web track. In: Proceedings of TREC. Department of Commerce, National Institute of Standards and Technology (2002). http://trec.nist.gov/
Heckman, J.J.: Sample selection bias as a specification error. Econometrica 47(1), 153–161 (1979)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Hosseini, M., Cox, I.J., Milic-Frayling, N., Shokouhi, M., Yilmaz, E.: An uncertainty-aware query selection model for evaluation of IR systems. In: Proceedings of SIGIR, pp. 901–910 (2012)
Hosseini, M., Cox, I.J., Milic-Frayling, N., Sweeting, T., Vinay, V.: Prioritizing relevance judgments to improve the construction of IR test collections. In: Proceedings of CIKM, pp. 641–646 (2011)
Jarvëlin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 10, 1391–1445 (2009)
Kendall, M.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Markov, A.A.: The Calculus of Probabilities. Gosizdat, Moscow (1913)
Melucci, M.: Contextual Search: A Computational Framework. Foundations and Trends in Information Retrieval. Now Publishers, Breda (2012)
Melucci, M.: Impact of query sample selection bias on information retrieval system ranking. In: Proceedings of IEEE DSAA (2016)
Peterson, R.D., Hagan, J.: Changing conceptions of race: towards an account of anomalous findings of sentencing research. Am. Sociol. Rev. 49(1), 56–70 (1984)
Read, C.R.: Markov’s inequality. In: Kotz, S., Read, C.B., Balakrishnan, N., Vidakovic, B., Johnson, N.L. (eds.) Encyclopaedia of Statistical Science. Wiley, Hoboken (2004)
Ren, J., Shi, X., Fan, W., Yu, P.S.: Type-independent correction of sample selection bias via structural discovery and re-balancing. In: Proceedings of ICDM, pp. 565–576 (2008)
Seah, C., Tsang, I.W., Ong, Y.: Healing sample selection bias by source classifier selection. In: Proceedings of ICDM, pp. 577–586 (2011)
Shieh, G.: A weighted Kendall’s tau statistic. Stat. Probab. Lett. 39(1), 17–24 (1998)
Sparck Jones, K., van Rijsbergen, C.: Information retrieval test collections. J. Doc. 32(1), 59–75 (1976)
Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)
Stevens, W.L.: Sampling without replacement with probability proportional to size. J. R. Stat. Soc. Ser. B (Methodol.) 20(2), 393–397 (1958)
van Rijsbergen, C., Sparck Jones, K.: Report on the need for and provision of and “ideal” information retrieval test collection. Tech. Rep. BLRDR 5266, British Library. Cambridge University Computer Laboratory (1976)
Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)
Vella, F.: Estimating models with sample selection bias: a survey. J. Hum. Resour. XXXII(1), 127–169 (2000)
Voorhees, E., Buckley, C.: The effect of topic set size on retrieval experiment error. In: Proceedings of SIGIR, pp. 316–323 (2002)
Voorhees, E., Harman, D. (eds.): TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)
Webber, W., Park, L.A.F.: Score adjustment for correction of pooling bias. In: Proceedings of SIGIR, pp. 444–451 (2009)
Williams, B.: A Sampler on Sampling. Wiley, Hoboken (1978)
Winship, C., Mare, R.D.: Models for sample selection bias. Annu. Rev. Sociol. 18, 327–350 (1992)
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proceedings of the International Conference on Machine Learning (2004)
Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of SIGIR, pp. 307–314 (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Melucci, M. A characterization of sample selection bias in system evaluation and the case of information retrieval. Int J Data Sci Anal 6, 131–146 (2018). https://doi.org/10.1007/s41060-018-0134-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0134-x