Towards Nuanced System Evaluation Based on Implicit User Expectations

Thomas, Paul; Bailey, Peter; Moffat, Alistair; Scholer, Falk

doi:10.1007/978-3-319-28940-3_26

Paul Thomas¹⁹,
Peter Bailey²⁰,
Alistair Moffat²¹ &
…
Falk Scholer²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Included in the following conference series:

AIRS

842 Accesses
1 Citations

Abstract

Information retrieval systems are often evaluated through the use of effectiveness metrics. In the past, the metrics used have corresponded to fixed models of user behavior, presuming, for example, that the user will view a pre-determined number of items in the search engine results page, or that they have a constant probability of advancing from one item in the result page to the next. Recently, a number of proposals for models of user behavior have emerged that are parameterized in terms of the number of relevant documents (or other material) a user expects to be required to address their information need. That recent work has demonstrated that T, the user’s a priori utility expectation, is correlated with the underlying nature of the information need; and hence that evaluation metrics should be sensitive to T. Here we examine the relationship between the query the user issues, and their anticipated T, seeking syntactic and other clues to guide the subsequent system evaluation. That is, we wish to develop mechanisms that, based on the query alone, can be used to adjust system evaluations so that the experience of the user of the system is better captured in the system’s effectiveness score, and hence can be used as a more refined way of comparing systems. This paper reports on a first round of experimentation, and describes the progress (albeit modest) that we have achieved towards that goal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Assessing efficiency–effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments

Article 09 March 2016

Exploring Behavioral Dimensions in Session Effectiveness

How Cognitive Computational Models Can Improve Information Search

Notes

1.
http://trec.nist.gov.
2.
The backstories are available for reuse at DOI 10.4225/08/55D0B6A098248.
3.
http://www.crowdflower.com.
4.
Cumulative logistic regression – also known as ordinal regression – used R’s ordinal::clm and ordinal::step.clm functions.

References

The roar of the crowd. The Economist (2012)
Google Scholar
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Article MathSciNet MATH Google Scholar
Alonso, O., Mizzaro, S.: Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. In: Proceedings of the SIGIR Workshop. Future IR Evaluation, pp. 15–16 (2009)
Google Scholar
Anderson, L.W., Krathwohl, D.A.: A Taxonomy for Learning, Teaching and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. Longman, New York (2001)
Google Scholar
Bailey, P., Moffat, A., Scholer, F., Thomas, P.: User variability and IR system evaluation. In: Proceedings of SIGIR, pp. 625–634 (2015)
Google Scholar
Bennett, P.N., White, R.W., Chu, W., Dumais, S.T., Bailey, P., Borisyuk, F., Cui, X.: Modeling the impact of short-and long-term behavior on search personalization. In: Proceedings of SIGIR, pp. 185–194 (2012)
Google Scholar
Buckley, C., Walz, J.: The TREC-8 query track. In: Proceedings of TREC 1999. NIST Special Publication 500–246 (1999)
Google Scholar
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of CIKM, pp. 89–96 (2005)
Google Scholar
Chapelle, O., Metzler, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of CIKM, pp. 621–630 (2009)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of SIGIR, pp. 154–161 (2005)
Google Scholar
Kelly, D., Arguello, J., Edwards, A., Wu, W.C.: Development and evaluation of search tasks for IIR experiments using a cognitive complexity framework. In: Proceeding of ICTIR (2015)
Google Scholar
Lin, S.J., Belkin, N.: Validation of a model of information seeking over multiple search sessions. J. Am. Soc. Inf. Sci. Technol. 56(4), 393–415 (2005)
Article Google Scholar
Moffat, A., Thomas, P., Scholer, F.: Users versus models: what observation tells us about effectiveness metrics. In: Proceedings of CIKM, pp. 659–668 (2013)
Google Scholar
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2:1–2:27 (2008)
Article Google Scholar
Phan, N., Bailey, P., Wilkinson, R.: Understanding the relationship of information need specificity to search query length. In: Proceedings of SIGIR, pp. 709–710 (2007)
Google Scholar
Smucker, M.D., Clarke, C.L.A.: Time-based calibration of effectiveness measures. In: Proceedings of SIGIR, pp. 95–104 (2012)
Google Scholar
Smucker, M., Kazai, G., Lease, M.: The TREC-12 crowdsourcing track. In: Proceedings of TREC 2012. NIST Special Publication 500–298 (2012)
Google Scholar
Sormunen, E.: Liberal relevance criteria of TREC: counting on negligible documents? In: Proceedings of SIGIR, pp. 324–330 (2002)
Google Scholar
Teevan, J., Dumais, S.T., Liebling, D.J.: To personalize or not to personalize: modeling queries with variation in user intent. In: Proceedings of SIGIR, pp. 163–170 (2008)
Google Scholar
Thomas, P., Scholer, F., Moffat, A.: What users do: the eyes have it. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 416–427. Springer, Heidelberg (2013)
Chapter Google Scholar
Wu, W.C., Kelly, D., Edwards, A., Arguello, J.: Grannies, tanning beds, tattoos and NASCAR: evaluation of search tasks with varying levels of cognitive complexity. In: Proceedings of IIiX, pp. 254–257 (2012)
Google Scholar

Download references

Acknowledgments

This work was supported by the Australian Research Council’s Discovery Projects Scheme (projects DP110101934 and DP140102655). We thank Xiaolu Lu for assistance with the data collection and Bodo von Billerbeck for assistance with query log mining.

Author information

Authors and Affiliations

CSIRO, Canberra, Australia
Paul Thomas
Microsoft, Canberra, Australia
Peter Bailey
The University of Melbourne, Melbourne, Australia
Alistair Moffat
RMIT University, Melbourne, Australia
Falk Scholer

Authors

Paul Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Peter Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Alistair Moffat
View author publications
You can also search for this author in PubMed Google Scholar
Falk Scholer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alistair Moffat .

Editor information

Editors and Affiliations

Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia
Guido Zuccon
Brisbane, Queensland, Australia
Shlomo Geva
University of Tsukuba, Ibaraki, Japan
Hideo Joho
RMIT University, Melbourne, Australia
Falk Scholer
School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Aixin Sun
Tianjin University, China
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thomas, P., Bailey, P., Moffat, A., Scholer, F. (2015). Towards Nuanced System Evaluation Based on Implicit User Expectations. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-28940-3_26
Published: 22 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics