Evaluation of Text Retrieval Systems

Kuralenok, I. E.; Nekrestyanov, I. S.

doi:10.1023/A:1016323201283

Evaluation of Text Retrieval Systems

Published: July 2002

Volume 28, pages 226–242, (2002)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

I. E. Kuralenok¹ &
I. S. Nekrestyanov¹

103 Accesses
4 Citations
Explore all metrics

Abstract

Evaluation is one of the main driving forces in studies and developments related to text retrieval. It is a basic tool for the comparison of efficiencies of alternative approaches. In this paper, the state of the art in the field of evaluation of text retrieval systems is surveyed. Two basic—system-oriented and user-oriented— paradigms, which are commonly accepted in this field, are often considered as incompatible. In this survey, both paradigms are considered in the context of a unique framework based on attributes affecting the innovation distribution and adaptation. A detailed discussion of the evaluation of text retrieval systems is based on the consideration of required components of the evaluation process for an arbitrary system. Methodological problems related to the verification of the results obtained are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

REFERENCES

Raghavan, V., What Do You Say after You Say, “I Work in IR”?, http://www.cacs. usl.edu/~raghavan.
Information Retrieval Experiment, Jones, K.S., Ed., London: Butterworth, 1981.
Robins, D., Interactive Information Retrieval: Context and Basic Notions, Informing Sci., 2000, vol. 3, no. 2, pp. 57-62.
Google Scholar
Wu, M.-M. and Sonnenwald, D.H., Reflection on Information Retrieval Evaluation, Proc.1999 EBTI, ECAI, SEER & PNC Joint Meeting, 1999.
Cleverdon, C., The Effect of Variations in Relevance Assessments in Comparative Experimental Tests of Index Languages, Inform.Processing and Management, 1970, vol. 5, no. 28, pp.619-627.
Google Scholar
Jones, K.S., Reflections on TREC, Inform.Processing and Management, 1995, vol. 31, no. 3, pp. 291-314.
Google Scholar
Voorhees, E., Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, Proc.SIGIR'98, 1998, pp. 315-323.
Harman, D., What We Have Learned, and Not Learned, from TREC, Proc.BCS IRSG'2000, 2000, pp. 2-20.
Saracevic, T., Evaluation of Evaluation in Information Retrieval, Proc.SIGIR'95, 1995, pp. 135-146.
Lamblet, V. and Simen, D., Distribution and Adaptation: Basic Processes of Social Changes.Handbook on Information-Consulting Service: Processes and Practice, Thompson Educational Publishing, 1994, http://www.aris.ru/INFO/KONS/11/h697.html.
Dunlop, M.D., Time, Relevance and Interaction Modeling for Information Retrieval, Proc.SIGIR'97, 1997, pp. 206-213.
Ingwersen, P., Polyrepresentation of Information Needs and Semantic Entities, Proc.SIGIR'94, 1994, pp. 51-60.
Losee, R., Comparing Boolean and Probabilistic Information Retrieval Systems Across Queries and Disciplines, JASIS, 1997, vol. 48, no. 2, pp. 143-156.
Google Scholar
Harman, D., Overview of the First Text Retrieval Conference, Proc.TREC-1, 1992.
Voorhees, E. and Harman, D., Overview of the Ninth Text Retrieval Conference, Proc.TREC-9, 2001.
Lewis, D., Reuters-21578 Text Categorization Test Collection, 1997.
Hawking, D. and Thistlewaite, P.B., Overview of TREC-6 Very Large Collection Track, Proc.TREC-6, 1998.
Hawking, D., Craswell, N., and Thistlewaite, P.B., Overview of TREC-7 Very Large Collection Track, Proc.TREC-7, 1999, pp. 40-52.
Klavans, J., McKeown, K., Kan, M., and Lee, S., Resources for the Evaluation of Summarization Techniques, Proc.Conf.on Language Resources and Evaluation, 1998.
Oard, D.W., Evaluating Interactive Cross-Language Information Retrieval: Document Selection, Proc. CLEF, 2000, pp. 57-71.
Nekrestyanov, I. and Panteleeva, N., Text Retrieval Systems for the Web, Programmirovanie, 2002 (in press).
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P., Overview of the TREC-8 Web Track, Proc.TREC-8, 2000, pp. 131-150.
Voorhees, E. and Harman, D., Overview of the Eighth Text Retrieval Conference, Proc.TREC-8, 1999.
Voorhees, E. and Tice, D., Building a Question Answering Test Collection, Proc.SIGIR'00, 2000, pp. 200-207.
Reid, J., A Task-oriented Non-interactive Evaluation Methodology for Information Retrieval Systems, Inform.Retrieval, 2000, vol. 2, no. 1, pp. 113-127.
Google Scholar
Lagergren, E. and Over, P., Comparing Interactive Information Retrieval Systems across Sites: The TREC-6 Interactive Track Matrix Experiment, Proc.SIGIR'98, 1998, pp. 164-172.
Lewis, D., The TREC-4 Filtering Track, Proc.TREC-4, 1996.
Greisdorf, H., Relevance: An Interdisciplinary and Information Science Perspective, Informing Sci., 2000, vol. 3, no. 2, pp. 67-72.
Google Scholar
Jackson, P., Introduction to Expert Systems, Reading, Mass.: Addison-Wesley, 1999. Translated under the title Vvedenie v ekspertnye sistemy, Vil'yams, 2001.
Google Scholar
Mizzaro, S., Relevance: The Whole History, J.Amer.Soc.Inform.Sci., 1997, vol. 48, no. 9, pp. 810-832.
Google Scholar
Gabrielli, S. and Mizzaro, S., Negotiating a Multidimensional Framework for Relevance Space, Proc.MIRA'99, 1999, pp. 1-15.
Draper, S., Mizzaro's Framework for Relevance, Aug. 1998, Available from http://staff.psy.gla.ac.uk/~steve/ stefano.html.
Vakkari, P., Cognition and Changes of Search Terms and Tactics during Task Performance: A Longitudinal Study, Proc.RIAO'2000, 2000, pp. 894-907.
Brooks, T., The Semantic Distance Model of Relevance Assessment, Proc.ASIS, 1998, pp. 33-44.
Mizzaro, S., How Many Relevances in Information Retrieval? Interacting Comput., 1998, no. 10, pp. 303-320.
Saracevic, T., Relevance Reconsidered 1996, Proc.CoLIS2, 1996, pp. 201-218.
Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval, ACM, 1999.
Wallis, P. and Thom, J., Relevance Judgments for Assessing Recall, Inform.Processing Management, 1996, vol. 32, no. 11, pp. 273-286.
Google Scholar
Vakkari, P., Relevance and Contributory Information Types of Searched Documents in Task Performance, Proc.SIGIR'00, 2000.
Losee, R., When Information Retrieval Measures Agree about the Relative Quality of Document Rankings, JASIS, 2000, vol. 51, no. 9, pp. 834-840.
Google Scholar
Rijsbergen, C.V., Foundation of Evaluation, J.Documentation, 1974, vol. 4, no. 30, pp. 365-373.
Google Scholar
Rijsbergen, C.J., Information Retrieval, Butterworths, 1979, 2nd ed.
Lewis, D., Evaluating and Optimizing Autonomous Text Classification Systems, Proc.SIGIR'95, 1995, pp. 246?255.
Lewis, D.D., Schapire, R.E., Callan, J.P., and Papka, R., Training Algorithms for Linear Text Classifiers, Proc.SIGIR'96, 1996, pp. 298-306.
Yang, Y., An Evaluation of Statistical Approaches to Text Categorization, Inform.Retrieval, 1999, vol. 1, nos. 1-2, pp. 69-90.
Google Scholar
Yang, Y. and Pederson, J., Feature Selection in Statistical Learning of Text Categorization, Proc.ICML'97, 1997, pp. 412-420.
Cooper, W.S., On Selecting a Measure of Retrieval Effectiveness, Readings in Information Retrieval, Jones, K.S. and Willett, P., Eds., Morgan Kaufmann, 1997, pp. 87-100.
Lewis, D. and Ringuette, M., A Comparison of Two Learning Algorithms for Text Categorization, Proc.SDAIR-94, 1994, pp. 81-93.
Wiener, R., Pedersen, E., and Weigend, A., A Neural Network Approach to Topic Spotting, Proc.Symp.on Document Analysis and Information Retrieval, 1995.
Yang, Y., Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Proc.SIGIR'94, 1994, pp. 13-22.
Wilbur, J.W., The Knowledge in Multiple Human Relevance Judgments, TOIS, 1998, vol. 16, no. 2, pp. 101-126.
Google Scholar
Bruza, P., McArthur, R., and Dennis, S., Interactive Internet Search: Keyword, Directory and Query Reformulation Mechanisms Compared, Proc.SIGIR'00, 2000, pp. 280-287.
Järvelin, K. and Kekäläinen, J., IR Evaluation Methods for Retrieving Highly Relevant Documents, Proc.SIGIR'00, 2000, pp. 41-48.
Voorhees, E., Evaluating by Highly Relevant Documents, Proc.SIGIR'01, 2001, pp. 74-82.
Borlund, P. and Ingwersen, P., Measures of Relative Relevance and Ranked Half-Life: Performance Indicators for Interactive IR, Proc.SIGIR'98, 1998, pp. 324-331.
Kuralenok, I., A Preliminary Estimation of Effectiveness of Semantic Text Processing Methods, Trudy tret'ei vserossiiskoi nauchnoi konferentsii “Elektronnye biblioteki” (Proc. of the Third All-Russia Scientific Conf. “Digital Libraries”), 2001.
Bruza, P. and Huibers, T.W.C., Investigating Aboutness Axioms Using Information Fields, Proc.SIGIR'94, 1994, pp. 112-121.
Bruza, P. and Huibers, T.W.C., A Study of Aboutness in Information Retrieval, Artificial Intelligence Rev., 1996, vol. 10, nos. 5-6, pp. 381-407.
Google Scholar
Bruza, P. and Song, D., Theoretical Evaluation of IR Models Using Symbolic Means, Proc.MFIR'01, 2001.
Song, D., Wong, K., Bruza, P., and Cheng, C., Towards Functional Benchmarking of Information Retrieval Models, Proc.FLAIRS'99, 1999, pp. 389-393.
Cochran, W.G., Sampling Techniques, New York: Wiley, 1963, 2nd ed.
Google Scholar
Cormack, G., Lhotak, O., and Palmer, C., Estimating Precision by Random Sampling, Proc.SIGIR'99, 1999, pp. 273-274.
Cormack, G.V., Palmer, C.R., and Clarke, C.L.A., Efficient Construction of Large Test Collections, Proc.SIGIR'98, 1998, pp. 282-289.
Dolin, R., Pierre, J., Butler, M., and Avedon, R., Practical Evaluation of IR within Automated Classification Systems, Proc.CIKM'99, 1999, pp. 322-329.
Hongyan, J., Barzilay, R., McKeown, C., and Elhadad, M., Summarization Evaluation Methods: Experiments and Analysis, Proc.of AAAI Spring Symp.on Intelligent Text Summarization, 1998, pp. 60-68.
Aivazyan, S.A. and Mkhitaryan, V.S., Prikladnaya statistika i osnovy ekonometriki (Applied Statistics and Fundamentals of Econometrics), Moscow: Yuniti, 1998.
Google Scholar
Zobel, J., How Reliable Are Large-Scale Information Retrieval Experiments? Proc.SIGIR'98, 1998, pp. 308-315.
Buckley, C. and Voorhees, E.M., Evaluating Evaluation Measure Stability, Proc.SIGIR'00, 2000, pp. 33-40.
Burgin, R., Variations in Relevance Judgments and the Evaluation of Retrieval Performance, Inform.Processing Management, 1992, vol. 5, no. 28, pp. 619-627.
Google Scholar
Lesk, M. and Salton, G., Relevance Assessments and Retrieval System Evaluation, Inform.Processing Management, 1968, vol. 3, no. 4, pp. 343-358.
Google Scholar

Download references

Author information

Authors and Affiliations

St. Petersburg State University, Bibliotechnaya pl. 2, St. Petersburg, 198904, Russia
I. E. Kuralenok & I. S. Nekrestyanov

Authors

I. E. Kuralenok
View author publications
You can also search for this author in PubMed Google Scholar
I. S. Nekrestyanov
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuralenok, I.E., Nekrestyanov, I.S. Evaluation of Text Retrieval Systems. Programming and Computer Software 28, 226–242 (2002). https://doi.org/10.1023/A:1016323201283

Download citation

Issue Date: July 2002
DOI: https://doi.org/10.1023/A:1016323201283

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Text Retrieval Systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Intrinsic Framework of Information Retrieval Evaluation Measures

The Evolution of Cranfield

An Introduction to Contemporary Search Technology

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Evaluation of Text Retrieval Systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Intrinsic Framework of Information Retrieval Evaluation Measures

The Evolution of Cranfield

An Introduction to Contemporary Search Technology

Explore related subjects

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation