Abstract
In this paper, we examine novel and less expensive methods for search engine evaluation that do not rely on document relevance judgments. These methods, described within a proposed framework, are motivated by the increasing focus on search results presentation, by the growing diversity of documents and content sources, and by the need to measure effectiveness relative to other search engines. Correlation analysis of the data obtained from actual tests using a subset of the methods in the framework suggest that these methods measure different aspects of the search engine. In practice, we argue that the selection of the test method is a tradeoff between measurement intent and cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cleverdon, C.: The significance of the cranfield tests on index languages. In: Proceedings of the SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1991)
Amitay, E., Carmel, D., Lempel, R., Soffer, A.: Scaling IR-System Evaluatin using Term Relevance Sets. In: Proceedings of SIGIR 2004, Sheffield, UK, pp. 10–17 (2004)
Buckley, C., Voorhees, E.: Retrieval Evaluation with Incomplete Information. In: Proceedings of SIGIR 2004, Sheffield, UK, pp. 25–32 (2004)
Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002)
Buckley, C., Voorhees, E.: Evaluating evaluation measure stability. In: Proceedings of SIGIR 2000, pp. 33–40 (2000)
Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of SIGIR 1998, Melbourne, Australia, pp. 307–314 (1998)
Gabrieli, S., Mizzaro, S.: Negotiating a Multidimensional Framework for Relevance Space. In: Proceedings of MIRA Conference (1999)
Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems(ACM TOIS) 20(4), 422–446
Joachims, T.: Evaluating Retrieval Performance Using Clickthrough Data. In: Proceedings of the SIGIR Workshop on Mathematical/Formal Models in Information Retrieval (2002)
Mizarro, S.: How Many Relevances in Information Retrieval? Interacting With Computers 10(3), 305–322 (1998)
Chang, C. and Ali, K.: How much correlation is there from one judge to another? Yahoo! Technical Report, 2004-12
Silverstein, C., Henzinger, M., Marais, H. and Moricz, M.: Analysis of a Very Large AltaVista Query Log, SRC Technical Note #1998-14
Amento, B., Terveen, L., Hill, W.D.: Does ’Authority’ Mean Quailty? Predicting Expert Quality Ratings of Web Sites. In: Proceedings of SIGIR 2000, Athens, Greece (2000)
Harter, S.: Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness. JASIS 47(1), 37–49 (1996)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ali, K., Chang, CC., Juan, Y. (2005). Exploring Cost-Effective Approaches to Human Evaluation of Search Engine Relevance. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)