Abstract
Cranfield-style evaluations standardised Information Retrieval (IR) evaluation practices, enabling the creation of programmes such as TREC, CLEF, and INEX, and long-term comparability of IR systems. However, the methodology does not translate well into the Interactive IR (IIR) domain, where the inclusion of the user into the search process and the repeated interaction between user and system creates more variability than the Cranfield-style evaluations can support. As a result, IIR evaluations of various systems have tended to be non-comparable, not because the systems vary, but because the methodologies used are non-comparable. In this paper we describe a standardised IIR evaluation framework, that ensures that IIR evaluations can share a standardised baseline methodology in much the same way that TREC, CLEF, and INEX imposed a process on IR evaluation. The framework provides a common baseline, derived by integrating existing, validated evaluation measures, that enables inter-study comparison, but is also flexible enough to support most kinds of IIR studies. This is achieved through the use of a “pluggable” system, into which any web-based IIR interface can be embedded. The framework has been implemented and the software will be made available to reduce the resource commitment required for IIR studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Trec 2002 interactive track guidelines. Technical report (2002)
Bierig, R., Gwizdka, J., Cole, M.: A user-centered experiment and logging framework for interactive information retrieval. In: Proceedings of the SIGIR 2009 Workshop on Understanding the User: Logging and Interpreting User Interactions in Information Search and Retrieval, pp. 8–11 (2009)
Cacioppo, J.T., Petty, R.E., Kao, C.F.: The efficient assessment of need for cognition. Journal of Personality Assessment 48(3), 306–307 (1984)
Gwizdka, J.: Distribution of cognitive load in web search. Journal of the American Society for Information Science and Technology 61(11), 2167–2187 (2010)
Hall, M., Clough, P., Stevenson, M.: Evaluating the use of clustering for automatically organising digital library collections. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds.) TPDL 2012. LNCS, vol. 7489, pp. 323–334. Springer, Heidelberg (2012)
Hersh, W.: Trec 2002 interactive track report. In: Proc. TREC (2002)
Ingwersen, P., Järvelin, K.: The turn: Integration of information seeking and retrieval in context, vol. 18. Springer (2005)
Kanoulas, E., Hall, M., Clough, P., Carterette, B.: Overview of the trec 2013 session track. In: Proceedings of the Twentieth Text REtrieval Conference (TREC 2013) (2013)
Kashdan, T.B., Gallagher, M.W., Silvia, P.J., Winterstein, B.P., Breen, W.E., Terhar, D., Steger, M.F.: The curiosity and exploration inventory-ii: Development, factor structure, and psychometrics. Journal of Research in Personality 43(6), 987–998 (2009)
Kelly, D.: Measuring online information seeking context, part 1: background and method. Journal of the American Society for Information Science and Technology (14), 1862–1874 (2006)
Kelly, D.: Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval 3(1), 1–224 (2009)
Kelly, D., Gyllstrom, K., Bailey, E.W.: A comparison of query and term suggestion features for interactive searching. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 371–378. ACM (2009)
Kelly, D., Sugimoto, C.: A systematic review of interactive information retrieval evaluation studies, 1967-2006. JASIST 64(4), 745–770 (2013)
Lee, K., Ashton, M.: The hexaco personality inventory: A new measure of the major dimensions of personality. Multivariate Behavioral Research 39, 329–358 (2004)
O’Brien, H.L., Toms, E.G.: The development and evaluation of a survey to measure user engagement. Journal of the American Society for Information Science and Technology 61(1), 50–69 (2009)
Petras, V., Hall, M., Savoy, J., Bogers, T., Malak, P., Toms, E., Pawlowski, A.: Cultural heritage in clef (chic) (2013)
Reips, U.-D.: Standards for internet-based experimenting. Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) 49(4), 243–256 (2002)
Reips, U.-D., Lengler, R.: Theweb experiment list: A web service for the recruitment of participants and archiving of internet-based experiments. Behavior Research Methods 37(2), 287–292 (2005)
Renaud, G., Azzopardi, L.: Scamp: a tool for conducting interactive information retrieval experiments. In: Proceedings of the 4th Information Interaction in Context Symposium, pp. 286–289. ACM (2012)
Riding, R.J., Rayner, S.: Cognitive styles and learning strategies: Understanding style differences in learning and behaviour. D. Fulton Publishers (1998)
Tague-Sutcliffe, J.: The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28(4), 467–490 (1992)
Toms, E.: Task-based information searching and retrieval, pp. 43–59. Facet Publishing (2011)
Toms, E.G., Freund, L., Li, C.: Wiire: the web interactive information retrieval experimentation system prototype. Information Processing & Management 40(4), 655–675 (2004)
Toms, E.G., Freund, L., Li, C.: Wiire: the web interactive information retrieval experimentation system prototype. Information Processing & Management 40(4), 655–675 (2004)
Toms, E.G., O’Brien, H., Mackenzie, T., Jordan, C., Freund, L., Toze, S., Dawe, E., MacNutt, A.: Task effects on interactive search: The query factor. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 359–372. Springer, Heidelberg (2008)
Toms, E.G., Villa, R., McCay-Peet, L.: How is a search system used in work task completion? Journal of Information Science 39(1), 15–25 (2013)
Yuan, W., Meadow, C.T.: A study of the use of variables in information retrieval user studies. Journal of the American Society for Information Science 50(2), 140–150 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hall, M.M., Toms, E. (2013). Building a Common Framework for IIR Evaluation. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visualization. CLEF 2013. Lecture Notes in Computer Science, vol 8138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40802-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-40802-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40801-4
Online ISBN: 978-3-642-40802-1
eBook Packages: Computer ScienceComputer Science (R0)