Abstract
Evaluation is one of the main driving forces in studies and developments related to text retrieval. It is a basic tool for the comparison of efficiencies of alternative approaches. In this paper, the state of the art in the field of evaluation of text retrieval systems is surveyed. Two basic—system-oriented and user-oriented— paradigms, which are commonly accepted in this field, are often considered as incompatible. In this survey, both paradigms are considered in the context of a unique framework based on attributes affecting the innovation distribution and adaptation. A detailed discussion of the evaluation of text retrieval systems is based on the consideration of required components of the evaluation process for an arbitrary system. Methodological problems related to the verification of the results obtained are also discussed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.REFERENCES
Raghavan, V., What Do You Say after You Say, “I Work in IR”?, http://www.cacs. usl.edu/~raghavan.
Information Retrieval Experiment, Jones, K.S., Ed., London: Butterworth, 1981.
Robins, D., Interactive Information Retrieval: Context and Basic Notions, Informing Sci., 2000, vol. 3, no. 2, pp. 57-62.
Wu, M.-M. and Sonnenwald, D.H., Reflection on Information Retrieval Evaluation, Proc.1999 EBTI, ECAI, SEER & PNC Joint Meeting, 1999.
Cleverdon, C., The Effect of Variations in Relevance Assessments in Comparative Experimental Tests of Index Languages, Inform.Processing and Management, 1970, vol. 5, no. 28, pp.619-627.
Jones, K.S., Reflections on TREC, Inform.Processing and Management, 1995, vol. 31, no. 3, pp. 291-314.
Voorhees, E., Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, Proc.SIGIR'98, 1998, pp. 315-323.
Harman, D., What We Have Learned, and Not Learned, from TREC, Proc.BCS IRSG'2000, 2000, pp. 2-20.
Saracevic, T., Evaluation of Evaluation in Information Retrieval, Proc.SIGIR'95, 1995, pp. 135-146.
Lamblet, V. and Simen, D., Distribution and Adaptation: Basic Processes of Social Changes.Handbook on Information-Consulting Service: Processes and Practice, Thompson Educational Publishing, 1994, http://www.aris.ru/INFO/KONS/11/h697.html.
Dunlop, M.D., Time, Relevance and Interaction Modeling for Information Retrieval, Proc.SIGIR'97, 1997, pp. 206-213.
Ingwersen, P., Polyrepresentation of Information Needs and Semantic Entities, Proc.SIGIR'94, 1994, pp. 51-60.
Losee, R., Comparing Boolean and Probabilistic Information Retrieval Systems Across Queries and Disciplines, JASIS, 1997, vol. 48, no. 2, pp. 143-156.
Harman, D., Overview of the First Text Retrieval Conference, Proc.TREC-1, 1992.
Voorhees, E. and Harman, D., Overview of the Ninth Text Retrieval Conference, Proc.TREC-9, 2001.
Lewis, D., Reuters-21578 Text Categorization Test Collection, 1997.
Hawking, D. and Thistlewaite, P.B., Overview of TREC-6 Very Large Collection Track, Proc.TREC-6, 1998.
Hawking, D., Craswell, N., and Thistlewaite, P.B., Overview of TREC-7 Very Large Collection Track, Proc.TREC-7, 1999, pp. 40-52.
Klavans, J., McKeown, K., Kan, M., and Lee, S., Resources for the Evaluation of Summarization Techniques, Proc.Conf.on Language Resources and Evaluation, 1998.
Oard, D.W., Evaluating Interactive Cross-Language Information Retrieval: Document Selection, Proc. CLEF, 2000, pp. 57-71.
Nekrestyanov, I. and Panteleeva, N., Text Retrieval Systems for the Web, Programmirovanie, 2002 (in press).
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P., Overview of the TREC-8 Web Track, Proc.TREC-8, 2000, pp. 131-150.
Voorhees, E. and Harman, D., Overview of the Eighth Text Retrieval Conference, Proc.TREC-8, 1999.
Voorhees, E. and Tice, D., Building a Question Answering Test Collection, Proc.SIGIR'00, 2000, pp. 200-207.
Reid, J., A Task-oriented Non-interactive Evaluation Methodology for Information Retrieval Systems, Inform.Retrieval, 2000, vol. 2, no. 1, pp. 113-127.
Lagergren, E. and Over, P., Comparing Interactive Information Retrieval Systems across Sites: The TREC-6 Interactive Track Matrix Experiment, Proc.SIGIR'98, 1998, pp. 164-172.
Lewis, D., The TREC-4 Filtering Track, Proc.TREC-4, 1996.
Greisdorf, H., Relevance: An Interdisciplinary and Information Science Perspective, Informing Sci., 2000, vol. 3, no. 2, pp. 67-72.
Jackson, P., Introduction to Expert Systems, Reading, Mass.: Addison-Wesley, 1999. Translated under the title Vvedenie v ekspertnye sistemy, Vil'yams, 2001.
Mizzaro, S., Relevance: The Whole History, J.Amer.Soc.Inform.Sci., 1997, vol. 48, no. 9, pp. 810-832.
Gabrielli, S. and Mizzaro, S., Negotiating a Multidimensional Framework for Relevance Space, Proc.MIRA'99, 1999, pp. 1-15.
Draper, S., Mizzaro's Framework for Relevance, Aug. 1998, Available from http://staff.psy.gla.ac.uk/~steve/ stefano.html.
Vakkari, P., Cognition and Changes of Search Terms and Tactics during Task Performance: A Longitudinal Study, Proc.RIAO'2000, 2000, pp. 894-907.
Brooks, T., The Semantic Distance Model of Relevance Assessment, Proc.ASIS, 1998, pp. 33-44.
Mizzaro, S., How Many Relevances in Information Retrieval? Interacting Comput., 1998, no. 10, pp. 303-320.
Saracevic, T., Relevance Reconsidered 1996, Proc.CoLIS2, 1996, pp. 201-218.
Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval, ACM, 1999.
Wallis, P. and Thom, J., Relevance Judgments for Assessing Recall, Inform.Processing Management, 1996, vol. 32, no. 11, pp. 273-286.
Vakkari, P., Relevance and Contributory Information Types of Searched Documents in Task Performance, Proc.SIGIR'00, 2000.
Losee, R., When Information Retrieval Measures Agree about the Relative Quality of Document Rankings, JASIS, 2000, vol. 51, no. 9, pp. 834-840.
Rijsbergen, C.V., Foundation of Evaluation, J.Documentation, 1974, vol. 4, no. 30, pp. 365-373.
Rijsbergen, C.J., Information Retrieval, Butterworths, 1979, 2nd ed.
Lewis, D., Evaluating and Optimizing Autonomous Text Classification Systems, Proc.SIGIR'95, 1995, pp. 246?255.
Lewis, D.D., Schapire, R.E., Callan, J.P., and Papka, R., Training Algorithms for Linear Text Classifiers, Proc.SIGIR'96, 1996, pp. 298-306.
Yang, Y., An Evaluation of Statistical Approaches to Text Categorization, Inform.Retrieval, 1999, vol. 1, nos. 1-2, pp. 69-90.
Yang, Y. and Pederson, J., Feature Selection in Statistical Learning of Text Categorization, Proc.ICML'97, 1997, pp. 412-420.
Cooper, W.S., On Selecting a Measure of Retrieval Effectiveness, Readings in Information Retrieval, Jones, K.S. and Willett, P., Eds., Morgan Kaufmann, 1997, pp. 87-100.
Lewis, D. and Ringuette, M., A Comparison of Two Learning Algorithms for Text Categorization, Proc.SDAIR-94, 1994, pp. 81-93.
Wiener, R., Pedersen, E., and Weigend, A., A Neural Network Approach to Topic Spotting, Proc.Symp.on Document Analysis and Information Retrieval, 1995.
Yang, Y., Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Proc.SIGIR'94, 1994, pp. 13-22.
Wilbur, J.W., The Knowledge in Multiple Human Relevance Judgments, TOIS, 1998, vol. 16, no. 2, pp. 101-126.
Bruza, P., McArthur, R., and Dennis, S., Interactive Internet Search: Keyword, Directory and Query Reformulation Mechanisms Compared, Proc.SIGIR'00, 2000, pp. 280-287.
Järvelin, K. and Kekäläinen, J., IR Evaluation Methods for Retrieving Highly Relevant Documents, Proc.SIGIR'00, 2000, pp. 41-48.
Voorhees, E., Evaluating by Highly Relevant Documents, Proc.SIGIR'01, 2001, pp. 74-82.
Borlund, P. and Ingwersen, P., Measures of Relative Relevance and Ranked Half-Life: Performance Indicators for Interactive IR, Proc.SIGIR'98, 1998, pp. 324-331.
Kuralenok, I., A Preliminary Estimation of Effectiveness of Semantic Text Processing Methods, Trudy tret'ei vserossiiskoi nauchnoi konferentsii “Elektronnye biblioteki” (Proc. of the Third All-Russia Scientific Conf. “Digital Libraries”), 2001.
Bruza, P. and Huibers, T.W.C., Investigating Aboutness Axioms Using Information Fields, Proc.SIGIR'94, 1994, pp. 112-121.
Bruza, P. and Huibers, T.W.C., A Study of Aboutness in Information Retrieval, Artificial Intelligence Rev., 1996, vol. 10, nos. 5-6, pp. 381-407.
Bruza, P. and Song, D., Theoretical Evaluation of IR Models Using Symbolic Means, Proc.MFIR'01, 2001.
Song, D., Wong, K., Bruza, P., and Cheng, C., Towards Functional Benchmarking of Information Retrieval Models, Proc.FLAIRS'99, 1999, pp. 389-393.
Cochran, W.G., Sampling Techniques, New York: Wiley, 1963, 2nd ed.
Cormack, G., Lhotak, O., and Palmer, C., Estimating Precision by Random Sampling, Proc.SIGIR'99, 1999, pp. 273-274.
Cormack, G.V., Palmer, C.R., and Clarke, C.L.A., Efficient Construction of Large Test Collections, Proc.SIGIR'98, 1998, pp. 282-289.
Dolin, R., Pierre, J., Butler, M., and Avedon, R., Practical Evaluation of IR within Automated Classification Systems, Proc.CIKM'99, 1999, pp. 322-329.
Hongyan, J., Barzilay, R., McKeown, C., and Elhadad, M., Summarization Evaluation Methods: Experiments and Analysis, Proc.of AAAI Spring Symp.on Intelligent Text Summarization, 1998, pp. 60-68.
Aivazyan, S.A. and Mkhitaryan, V.S., Prikladnaya statistika i osnovy ekonometriki (Applied Statistics and Fundamentals of Econometrics), Moscow: Yuniti, 1998.
Zobel, J., How Reliable Are Large-Scale Information Retrieval Experiments? Proc.SIGIR'98, 1998, pp. 308-315.
Buckley, C. and Voorhees, E.M., Evaluating Evaluation Measure Stability, Proc.SIGIR'00, 2000, pp. 33-40.
Burgin, R., Variations in Relevance Judgments and the Evaluation of Retrieval Performance, Inform.Processing Management, 1992, vol. 5, no. 28, pp. 619-627.
Lesk, M. and Salton, G., Relevance Assessments and Retrieval System Evaluation, Inform.Processing Management, 1968, vol. 3, no. 4, pp. 343-358.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kuralenok, I.E., Nekrestyanov, I.S. Evaluation of Text Retrieval Systems. Programming and Computer Software 28, 226–242 (2002). https://doi.org/10.1023/A:1016323201283
Issue Date:
DOI: https://doi.org/10.1023/A:1016323201283