Skip to main content
Log in

A characterization of sample selection bias in system evaluation and the case of information retrieval

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Sample selection bias consists of the effects of the procedure for selecting individuals for inclusion in the sample. Bias may affect system evaluation in many respects, since it (i) requires larger samples than those sufficient to estimate the efficiency of one system, (ii) requires much larger samples to rank systems by efficiency and (iii) can penalize some systems. The unbiased measure that is described in this paper awards the systems that poorly perform for difficult tasks, thus providing a better picture both of system efficiency and system ranking. Nevertheless, we found that bias cannot be completely removed when a group of systems is ranked even though it is corrected for each single system. Eventually, further research should be done to find methods that substantially improve retrieval effectiveness for difficult tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. As for an introduction, the reader may refer to [34, 38].

References

  1. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79, 151–175 (2010)

    Article  MathSciNet  Google Scholar 

  2. Berk, R.A.: An introduction to sample selection bias in sociological data. Am. Sociol. Rev. 48(3), 386–398 (1983)

    Article  Google Scholar 

  3. Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137–2155 (2009)

    MathSciNet  MATH  Google Scholar 

  4. Buckley, C.: trec_eval. http://trec.nist.gov/trec_eval/index.html. Visited on Sept 2016

  5. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of SIGIR, pp. 33–40 (2000)

  6. Buckley, C., Voorhees, E.M.: Retrieval system evaluation. In: Voorhees, E.M., Harman, D. (eds.) TREC: Experiment and Evaluation in Information Retrieval, Chap. 3. MIT Press, Cambridge (2005)

    Google Scholar 

  7. Cortes, C., Mohri, M.: Domain adaptation and sample bias correction theory and algorithm for regression. Theoret. Comput. Sci. 519, 103–126 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correction theory. In: Proceedings of ALT, pp. 38–53. Springer (2008)

  9. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)

  10. Fan, W., Davidson, I.: ReverseTesting: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of KDD, pp. 147–156 (2006)

  11. Gronau, R.: Wage comparisons—a selectivity bias. J. Polit. Econ. 82(6), 1119–1143 (1974)

    Article  Google Scholar 

  12. Guiver, J., Mizzaro, S., Robertson, S.: A few good topics: experiments in topic set reduction for retrieval evaluation. ACM Trans. Inf. Syst. 27, 1–26 (2009)

    Article  Google Scholar 

  13. Hagan, J., Parker, P.: White-collar crime and punishment: the class structure and legal sanctioning of securities violations. Am. Sociol. Rev. 50(3), 302–316 (1985)

    Article  Google Scholar 

  14. Hauff, C., Murdock, V., Baeza-Yates, R.: Improved query difficulty prediction for the web. In: Proceedings of CIKM, pp. 439–448 (2008)

  15. Hawking, D., Craswell, N.: Overview of TREC-10 Web track. In: Proceedings of TREC. Department of Commerce, National Institute of Standards and Technology (2002). http://trec.nist.gov/

  16. Heckman, J.J.: Sample selection bias as a specification error. Econometrica 47(1), 153–161 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  17. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hosseini, M., Cox, I.J., Milic-Frayling, N., Shokouhi, M., Yilmaz, E.: An uncertainty-aware query selection model for evaluation of IR systems. In: Proceedings of SIGIR, pp. 901–910 (2012)

  19. Hosseini, M., Cox, I.J., Milic-Frayling, N., Sweeting, T., Vinay, V.: Prioritizing relevance judgments to improve the construction of IR test collections. In: Proceedings of CIKM, pp. 641–646 (2011)

  20. Jarvëlin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  21. Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 10, 1391–1445 (2009)

    MathSciNet  MATH  Google Scholar 

  22. Kendall, M.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)

    Article  MATH  Google Scholar 

  23. Markov, A.A.: The Calculus of Probabilities. Gosizdat, Moscow (1913)

    Google Scholar 

  24. Melucci, M.: Contextual Search: A Computational Framework. Foundations and Trends in Information Retrieval. Now Publishers, Breda (2012)

  25. Melucci, M.: Impact of query sample selection bias on information retrieval system ranking. In: Proceedings of IEEE DSAA (2016)

  26. Peterson, R.D., Hagan, J.: Changing conceptions of race: towards an account of anomalous findings of sentencing research. Am. Sociol. Rev. 49(1), 56–70 (1984)

    Article  Google Scholar 

  27. Read, C.R.: Markov’s inequality. In: Kotz, S., Read, C.B., Balakrishnan, N., Vidakovic, B., Johnson, N.L. (eds.) Encyclopaedia of Statistical Science. Wiley, Hoboken (2004)

    Google Scholar 

  28. Ren, J., Shi, X., Fan, W., Yu, P.S.: Type-independent correction of sample selection bias via structural discovery and re-balancing. In: Proceedings of ICDM, pp. 565–576 (2008)

  29. Seah, C., Tsang, I.W., Ong, Y.: Healing sample selection bias by source classifier selection. In: Proceedings of ICDM, pp. 577–586 (2011)

  30. Shieh, G.: A weighted Kendall’s tau statistic. Stat. Probab. Lett. 39(1), 17–24 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  31. Sparck Jones, K., van Rijsbergen, C.: Information retrieval test collections. J. Doc. 32(1), 59–75 (1976)

    Article  Google Scholar 

  32. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)

    Article  Google Scholar 

  33. Stevens, W.L.: Sampling without replacement with probability proportional to size. J. R. Stat. Soc. Ser. B (Methodol.) 20(2), 393–397 (1958)

    MathSciNet  MATH  Google Scholar 

  34. van Rijsbergen, C., Sparck Jones, K.: Report on the need for and provision of and “ideal” information retrieval test collection. Tech. Rep. BLRDR 5266, British Library. Cambridge University Computer Laboratory (1976)

  35. Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)

    MATH  Google Scholar 

  36. Vella, F.: Estimating models with sample selection bias: a survey. J. Hum. Resour. XXXII(1), 127–169 (2000)

    Article  Google Scholar 

  37. Voorhees, E., Buckley, C.: The effect of topic set size on retrieval experiment error. In: Proceedings of SIGIR, pp. 316–323 (2002)

  38. Voorhees, E., Harman, D. (eds.): TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)

    Google Scholar 

  39. Webber, W., Park, L.A.F.: Score adjustment for correction of pooling bias. In: Proceedings of SIGIR, pp. 444–451 (2009)

  40. Williams, B.: A Sampler on Sampling. Wiley, Hoboken (1978)

    MATH  Google Scholar 

  41. Winship, C., Mare, R.D.: Models for sample selection bias. Annu. Rev. Sociol. 18, 327–350 (1992)

    Article  Google Scholar 

  42. Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proceedings of the International Conference on Machine Learning (2004)

  43. Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of SIGIR, pp. 307–314 (1998)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Melucci.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melucci, M. A characterization of sample selection bias in system evaluation and the case of information retrieval. Int J Data Sci Anal 6, 131–146 (2018). https://doi.org/10.1007/s41060-018-0134-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0134-x

Keywords

Navigation