Skip to main content
Log in

Evaluation of Text Retrieval Systems

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Evaluation is one of the main driving forces in studies and developments related to text retrieval. It is a basic tool for the comparison of efficiencies of alternative approaches. In this paper, the state of the art in the field of evaluation of text retrieval systems is surveyed. Two basic—system-oriented and user-oriented— paradigms, which are commonly accepted in this field, are often considered as incompatible. In this survey, both paradigms are considered in the context of a unique framework based on attributes affecting the innovation distribution and adaptation. A detailed discussion of the evaluation of text retrieval systems is based on the consideration of required components of the evaluation process for an arbitrary system. Methodological problems related to the verification of the results obtained are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

REFERENCES

  1. Raghavan, V., What Do You Say after You Say, “I Work in IR”?, http://www.cacs. usl.edu/~raghavan.

  2. Information Retrieval Experiment, Jones, K.S., Ed., London: Butterworth, 1981.

  3. Robins, D., Interactive Information Retrieval: Context and Basic Notions, Informing Sci., 2000, vol. 3, no. 2, pp. 57-62.

    Google Scholar 

  4. Wu, M.-M. and Sonnenwald, D.H., Reflection on Information Retrieval Evaluation, Proc.1999 EBTI, ECAI, SEER & PNC Joint Meeting, 1999.

  5. Cleverdon, C., The Effect of Variations in Relevance Assessments in Comparative Experimental Tests of Index Languages, Inform.Processing and Management, 1970, vol. 5, no. 28, pp.619-627.

    Google Scholar 

  6. Jones, K.S., Reflections on TREC, Inform.Processing and Management, 1995, vol. 31, no. 3, pp. 291-314.

    Google Scholar 

  7. Voorhees, E., Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, Proc.SIGIR'98, 1998, pp. 315-323.

  8. Harman, D., What We Have Learned, and Not Learned, from TREC, Proc.BCS IRSG'2000, 2000, pp. 2-20.

  9. Saracevic, T., Evaluation of Evaluation in Information Retrieval, Proc.SIGIR'95, 1995, pp. 135-146.

  10. Lamblet, V. and Simen, D., Distribution and Adaptation: Basic Processes of Social Changes.Handbook on Information-Consulting Service: Processes and Practice, Thompson Educational Publishing, 1994, http://www.aris.ru/INFO/KONS/11/h697.html.

  11. Dunlop, M.D., Time, Relevance and Interaction Modeling for Information Retrieval, Proc.SIGIR'97, 1997, pp. 206-213.

  12. Ingwersen, P., Polyrepresentation of Information Needs and Semantic Entities, Proc.SIGIR'94, 1994, pp. 51-60.

  13. Losee, R., Comparing Boolean and Probabilistic Information Retrieval Systems Across Queries and Disciplines, JASIS, 1997, vol. 48, no. 2, pp. 143-156.

    Google Scholar 

  14. Harman, D., Overview of the First Text Retrieval Conference, Proc.TREC-1, 1992.

  15. Voorhees, E. and Harman, D., Overview of the Ninth Text Retrieval Conference, Proc.TREC-9, 2001.

  16. Lewis, D., Reuters-21578 Text Categorization Test Collection, 1997.

  17. Hawking, D. and Thistlewaite, P.B., Overview of TREC-6 Very Large Collection Track, Proc.TREC-6, 1998.

  18. Hawking, D., Craswell, N., and Thistlewaite, P.B., Overview of TREC-7 Very Large Collection Track, Proc.TREC-7, 1999, pp. 40-52.

  19. Klavans, J., McKeown, K., Kan, M., and Lee, S., Resources for the Evaluation of Summarization Techniques, Proc.Conf.on Language Resources and Evaluation, 1998.

  20. Oard, D.W., Evaluating Interactive Cross-Language Information Retrieval: Document Selection, Proc. CLEF, 2000, pp. 57-71.

  21. Nekrestyanov, I. and Panteleeva, N., Text Retrieval Systems for the Web, Programmirovanie, 2002 (in press).

  22. Hawking, D., Voorhees, E., Craswell, N., and Bailey, P., Overview of the TREC-8 Web Track, Proc.TREC-8, 2000, pp. 131-150.

  23. Voorhees, E. and Harman, D., Overview of the Eighth Text Retrieval Conference, Proc.TREC-8, 1999.

  24. Voorhees, E. and Tice, D., Building a Question Answering Test Collection, Proc.SIGIR'00, 2000, pp. 200-207.

  25. Reid, J., A Task-oriented Non-interactive Evaluation Methodology for Information Retrieval Systems, Inform.Retrieval, 2000, vol. 2, no. 1, pp. 113-127.

    Google Scholar 

  26. Lagergren, E. and Over, P., Comparing Interactive Information Retrieval Systems across Sites: The TREC-6 Interactive Track Matrix Experiment, Proc.SIGIR'98, 1998, pp. 164-172.

  27. Lewis, D., The TREC-4 Filtering Track, Proc.TREC-4, 1996.

  28. Greisdorf, H., Relevance: An Interdisciplinary and Information Science Perspective, Informing Sci., 2000, vol. 3, no. 2, pp. 67-72.

    Google Scholar 

  29. Jackson, P., Introduction to Expert Systems, Reading, Mass.: Addison-Wesley, 1999. Translated under the title Vvedenie v ekspertnye sistemy, Vil'yams, 2001.

    Google Scholar 

  30. Mizzaro, S., Relevance: The Whole History, J.Amer.Soc.Inform.Sci., 1997, vol. 48, no. 9, pp. 810-832.

    Google Scholar 

  31. Gabrielli, S. and Mizzaro, S., Negotiating a Multidimensional Framework for Relevance Space, Proc.MIRA'99, 1999, pp. 1-15.

  32. Draper, S., Mizzaro's Framework for Relevance, Aug. 1998, Available from http://staff.psy.gla.ac.uk/~steve/ stefano.html.

  33. Vakkari, P., Cognition and Changes of Search Terms and Tactics during Task Performance: A Longitudinal Study, Proc.RIAO'2000, 2000, pp. 894-907.

  34. Brooks, T., The Semantic Distance Model of Relevance Assessment, Proc.ASIS, 1998, pp. 33-44.

  35. Mizzaro, S., How Many Relevances in Information Retrieval? Interacting Comput., 1998, no. 10, pp. 303-320.

  36. Saracevic, T., Relevance Reconsidered 1996, Proc.CoLIS2, 1996, pp. 201-218.

  37. Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval, ACM, 1999.

  38. Wallis, P. and Thom, J., Relevance Judgments for Assessing Recall, Inform.Processing Management, 1996, vol. 32, no. 11, pp. 273-286.

    Google Scholar 

  39. Vakkari, P., Relevance and Contributory Information Types of Searched Documents in Task Performance, Proc.SIGIR'00, 2000.

  40. Losee, R., When Information Retrieval Measures Agree about the Relative Quality of Document Rankings, JASIS, 2000, vol. 51, no. 9, pp. 834-840.

    Google Scholar 

  41. Rijsbergen, C.V., Foundation of Evaluation, J.Documentation, 1974, vol. 4, no. 30, pp. 365-373.

    Google Scholar 

  42. Rijsbergen, C.J., Information Retrieval, Butterworths, 1979, 2nd ed.

  43. Lewis, D., Evaluating and Optimizing Autonomous Text Classification Systems, Proc.SIGIR'95, 1995, pp. 246?255.

  44. Lewis, D.D., Schapire, R.E., Callan, J.P., and Papka, R., Training Algorithms for Linear Text Classifiers, Proc.SIGIR'96, 1996, pp. 298-306.

  45. Yang, Y., An Evaluation of Statistical Approaches to Text Categorization, Inform.Retrieval, 1999, vol. 1, nos. 1-2, pp. 69-90.

    Google Scholar 

  46. Yang, Y. and Pederson, J., Feature Selection in Statistical Learning of Text Categorization, Proc.ICML'97, 1997, pp. 412-420.

  47. Cooper, W.S., On Selecting a Measure of Retrieval Effectiveness, Readings in Information Retrieval, Jones, K.S. and Willett, P., Eds., Morgan Kaufmann, 1997, pp. 87-100.

  48. Lewis, D. and Ringuette, M., A Comparison of Two Learning Algorithms for Text Categorization, Proc.SDAIR-94, 1994, pp. 81-93.

  49. Wiener, R., Pedersen, E., and Weigend, A., A Neural Network Approach to Topic Spotting, Proc.Symp.on Document Analysis and Information Retrieval, 1995.

  50. Yang, Y., Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Proc.SIGIR'94, 1994, pp. 13-22.

  51. Wilbur, J.W., The Knowledge in Multiple Human Relevance Judgments, TOIS, 1998, vol. 16, no. 2, pp. 101-126.

    Google Scholar 

  52. Bruza, P., McArthur, R., and Dennis, S., Interactive Internet Search: Keyword, Directory and Query Reformulation Mechanisms Compared, Proc.SIGIR'00, 2000, pp. 280-287.

  53. Järvelin, K. and Kekäläinen, J., IR Evaluation Methods for Retrieving Highly Relevant Documents, Proc.SIGIR'00, 2000, pp. 41-48.

  54. Voorhees, E., Evaluating by Highly Relevant Documents, Proc.SIGIR'01, 2001, pp. 74-82.

  55. Borlund, P. and Ingwersen, P., Measures of Relative Relevance and Ranked Half-Life: Performance Indicators for Interactive IR, Proc.SIGIR'98, 1998, pp. 324-331.

  56. Kuralenok, I., A Preliminary Estimation of Effectiveness of Semantic Text Processing Methods, Trudy tret'ei vserossiiskoi nauchnoi konferentsii “Elektronnye biblioteki” (Proc. of the Third All-Russia Scientific Conf. “Digital Libraries”), 2001.

  57. Bruza, P. and Huibers, T.W.C., Investigating Aboutness Axioms Using Information Fields, Proc.SIGIR'94, 1994, pp. 112-121.

  58. Bruza, P. and Huibers, T.W.C., A Study of Aboutness in Information Retrieval, Artificial Intelligence Rev., 1996, vol. 10, nos. 5-6, pp. 381-407.

    Google Scholar 

  59. Bruza, P. and Song, D., Theoretical Evaluation of IR Models Using Symbolic Means, Proc.MFIR'01, 2001.

  60. Song, D., Wong, K., Bruza, P., and Cheng, C., Towards Functional Benchmarking of Information Retrieval Models, Proc.FLAIRS'99, 1999, pp. 389-393.

  61. Cochran, W.G., Sampling Techniques, New York: Wiley, 1963, 2nd ed.

    Google Scholar 

  62. Cormack, G., Lhotak, O., and Palmer, C., Estimating Precision by Random Sampling, Proc.SIGIR'99, 1999, pp. 273-274.

  63. Cormack, G.V., Palmer, C.R., and Clarke, C.L.A., Efficient Construction of Large Test Collections, Proc.SIGIR'98, 1998, pp. 282-289.

  64. Dolin, R., Pierre, J., Butler, M., and Avedon, R., Practical Evaluation of IR within Automated Classification Systems, Proc.CIKM'99, 1999, pp. 322-329.

  65. Hongyan, J., Barzilay, R., McKeown, C., and Elhadad, M., Summarization Evaluation Methods: Experiments and Analysis, Proc.of AAAI Spring Symp.on Intelligent Text Summarization, 1998, pp. 60-68.

  66. Aivazyan, S.A. and Mkhitaryan, V.S., Prikladnaya statistika i osnovy ekonometriki (Applied Statistics and Fundamentals of Econometrics), Moscow: Yuniti, 1998.

    Google Scholar 

  67. Zobel, J., How Reliable Are Large-Scale Information Retrieval Experiments? Proc.SIGIR'98, 1998, pp. 308-315.

  68. Buckley, C. and Voorhees, E.M., Evaluating Evaluation Measure Stability, Proc.SIGIR'00, 2000, pp. 33-40.

  69. Burgin, R., Variations in Relevance Judgments and the Evaluation of Retrieval Performance, Inform.Processing Management, 1992, vol. 5, no. 28, pp. 619-627.

    Google Scholar 

  70. Lesk, M. and Salton, G., Relevance Assessments and Retrieval System Evaluation, Inform.Processing Management, 1968, vol. 3, no. 4, pp. 343-358.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuralenok, I.E., Nekrestyanov, I.S. Evaluation of Text Retrieval Systems. Programming and Computer Software 28, 226–242 (2002). https://doi.org/10.1023/A:1016323201283

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1016323201283

Keywords

Navigation