Abstract
Past research has identified many different types of relevance in information retrieval (IR). So far, however, most evaluation of IR systems has been through batch experiments conducted with test collections containing only expert, topical relevance judgements. Recently, there has been some movement away from this traditional approach towards interactive, more user-centred methods of evaluation. However, these are expensive for evaluators in terms both of time and of resources. This paper describes a new evaluation methodology, using a task-oriented test collection, which combines the advantages of traditional non-interactive testing with a more user-centred emphasis. The main features of a task-oriented test collection are the adoption of the task, rather than the query, as the primary unit of evaluation and the naturalistic character of the relevance judgements.
Article PDF
Similar content being viewed by others
References
Barry CL and Schamber L (1998) Users' criteria for relevance evaluation: a cross-situational comparison. Information Processing and Management, 34:219–236.
Bates M (1979) Information search tactics. Journal of the American Society for Information Science, 30:205–214.
Beaulieu M, Robertson SE and Rasmussen EM (1995) Evaluating interactive systems in TREC. Journal of the American Society for Information Science, 47:85–94.
Belkin NJ, Oddy RN and Brooks HM(1982) ASK for information retrieval: part 1 background and theory. Journal of Documentation, 38:61–71.
Belkin NJ and Vickery A (1985) Interaction in information systems: a review of research from document retrieval to knowledge-based systeme. Library and Information Research Report 35, the British Library.
Blair, DC (1990) Language and Representation in Information Retrieval. Elsevier, New York.
Borgman C (1989) All users of information retrieval systems are not created equal: an exploration into individual differences. Information Processing and Management, 25:237–251.
Borlund P and Ingwersen P (1997) The development of a method for the evaluation of interactive information retrieval systems. Journal of Documentation, 53:225–250.
Burgin R (1992) Variations in relevance judgments and the evaluation of retrieval performance. Information Processing and Management, 28:619–627.
Byström K and Järvelin K (1995) Task complexity affects information seeking and use. Information Processing and Management, 31:191–213.
Cleverdon CW, Mills J and Keen M (1966) Factors determining the performance of indexing systems. ASLIB Cranfield project, Cranfield.
Cuadra CA and Katter RV (1967) Opening the black box of "relevance". Journal of Documentation, 23:291–303.
Cooper WS (1973) On selecting a measure of retrieval effectiveness. Part 1. Journal of the American Society for Information Science, 24:87–100.
Diaper D, Ed. (1989) Task analysis for human-computer interaction. Ellis Horwood Limited, Chichester, England.
Draper SW and Dunlop MD(1997) New IR–new evaluation: the impact of interactive multimedia on information retrieval and its evaluation. The New Review of Hypermedia and Multimedia, 3:107–122.
Dunlop MD(1997) Time relevance and interaction modelling for information retrieval. In: Belkin NJ, Narasimhalu AD and Willett, P, Eds. SIGIR '97, Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM, Philadelphia, pp. 206–213.
Eisenberg M and Barry C (1988) Order effects: A study of the possible influence of presentation order on user judgments of document relevance. Journal of the American Society for Information Science, 39:293–300.
Gentner D and Stevens L (1983) Mental Models. Lawrence Erlbaum Associates, Hillsdale, N.J., USA.
Harman DK (1995) The TREC Conferences. In: Kuhlen R and Rittberger M, Eds. Hypertext–Information Retrieval–Multimedia: Proceedings of HIM 95. Konstanz, Germany, pp. 9–28.
Harter SP (1992) Psychological relevance and information science. Journal of the American Society for Information Science, 43:602–615.
Harter SP (1996) Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society of Information Science, 47:37–49.
Hersh WR, Buckley C, Leone TJ and Hickman DH (1994) OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Croft WB and van Rijsbergen CJ, Eds. SIGIR '94, Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, Dublin, pp. 192–201.
Ingwersen P (1992) Information Retrieval Interaction. Taylor Graham, London.
Janes J (1991) Relevance judgements and the incremental presentation of document representations. Information Processing and Management, 27:629–646.
Johnson-Laird PN (1983) Mental Models. Cambridge University Press, Cambridge.
Kuhlthau CC (1991) Inside the search process: information seeking from the user' perspective. Journal of the American Society for Information Science, 42:361–371.
Lalmas M(1998) Logical models in information retrieval: introduction and overview. Information Processing and Management, 34:19–33.
Marchionini G (1995) Information seeking in electronic environments. Cambridge Series on Human-Computer Interaction, Cambridge University Press.
MIRA (1998) Evaluation frameworks for interactive multi-media information retrieval applications. http:// www.dcs.gla.ac.uk/mira.
Mizzaro S (1997) Relevance: the whole history. Journal of the American Society for Information Science, 48:810–832.
Mizzaro S (1998) How many relevances in information retrieval? Interacting with Computers, 10:305–322.
Park TK (1993) The nature of relevance in information retrieval: an empirical study. Library quarterly, 63:318–351.
Park TK (1994) Toward a theory of user-based relevance: a call for a new paradigm of enquiry. Journal of the American Society of Information Science, 45:135–141.
Pejtersen AM (1996) Empirical work place evaluation of complex systems. In: ICAE '96, Proceedings of the 1st International Conference on Applied Ergonomics. Istanbul, Turkey, pp. 21–24.
Preece J, Rogers Y, Sharp H, Benyon D, Holland S and Carey T (1994) Human-Computer Interaction. Addison-Wesley, England.
Rees AM and Schulz DG (1967) A field experimental approach to the study of relevance assessments in relation to document searching. I: Final report. NSF contract no. C-423, Case Western Reserve University, Cleveland.
Reid J (1999) A new, task-oriented paradigm for information retrieval: implications for evaluation of information retrieval systems. In: Aparac T, Saracevic T, Ingwersen P and Vakkari P, Eds. Proceedings of the Third International Conference on Conceptions of Library and Information Science. Dubrovnik, Croatia, pp. 97–108.
Regazzi JJ (1988) Performance measures for information retrieval systems–an experimental approach. Journal of the American Society of Information Science, 39:235–251.
Robertson SE and Hancock-Beaulieu M (1992) On the evaluation of IR systems. Information Processing and Management, 28:457–466.
Saracevic T (1975) Relevance: a review of and a framework for thinking on the notion in information science. Journal of the American Society for Information Science, 26:321–343.
Saracevic T(1996) Relevance reconsidered '96. In: Ingwersen P and Pors NO, Eds. Proceedings of CoLIS 2, Second International Conference on Conceptions of Library and Information Science: Integration in Perspective. The Royal School of Librarianship, Copenhagen, pp. 201–218.
Saracevic T and Kantor P (1988a) A study of information seeking and retrieving. II. Users, questions and effectiveness. Journal of the American Society for Information Science, 39:177–196.
Saracevic T and Kantor P (1988b) A study of information seeking and retrieving. III. Searchers, searches and overlap. Journal of the American Society for Information Science, 39:197–216.
Saracevic T, Kantor P, Chamis AY and Trivison D (1988c) A study of information seeking and retrieving. I. Background and methodology. Journal of the American Society for Information Science, 39:161–176.
Schamber L, Eisenberg MB and Nilan MS (1990) A re-examination of relevance: toward a dynamic, situational definition. Information Processing and Management, 26:755–776.
Soergel D (1976) Is user satisfaction a hobgoblin? Journal of the American Society for Information Science, 24:87–100.
Smithson S (1994) Information retrieval evaluation in practice: a case study approach. Information Processing and Management, 30:205–221.
Sparck Jones K and van Rijsbergen CJ (1975) Report on the Need for and Provision of an ‘Ideal’ Information Retrieval Test Collection. Report number 5266, University Computer Laboratory, Cambridge.
Sparck Jones K, Ed. (1981) Information Retrieval Experiment. Butterworths, London.
Su LT (1994) The relevance of recall and precision in user evaluation. Journal of the American Society of Information Science, 45:207–217.
Swanson DR (1977) Information retrieval as a trial-and-error process. Library Quarterly, 47:128–148.
Swanson DR (1986) Subjective versus objective relevance in bibliographic retrieval systems. Library Quarterly, 56:389–398.
Tague J and Schultz R (1989) Evaluation of the user interface in an information retrieval system: a model. Information Processing and Management, 25:377–389.
Tang R and Solomon P (1998) Towards an understanding of the dynamics of relevance judgment: an analysis of one person' search behaviour. Information Processing and Management, 34:237–256.
van Rijsbergen CJ (1979) Information Retrieval 2nd ed. Butterworths, London.
Voorhees EM (1998) Variations in relevance judgments and the measurement of retrieval effectiveness. In: Croft WB, Moffat A, van Rijsbergen CJ, Wilkinson R and Zobel J, Eds. SIGIR '98, Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM Press, Melbourne, pp. 315–323.
Wilson P (1973) Situational relevance. Information Storage and Retrieval, 9:457–471.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Reid, J. A Task-Oriented Non-Interactive Evaluation Methodology for Information Retrieval Systems. Information Retrieval 2, 115–129 (2000). https://doi.org/10.1023/A:1009906420620
Issue Date:
DOI: https://doi.org/10.1023/A:1009906420620