research-article

Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness

Authors:
Marco Ferrante

University of Padua, Padua, Italy

University of Padua, Padua, Italy
View Profile

,
Nicola Ferro

University of Padua, Padua, Italy

University of Padua, Padua, Italy
View Profile

,
Maria Maistro

University of Padua, Padua, Italy

University of Padua, Padua, Italy
View Profile

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information RetrievalSeptember 2015Pages 21–30https://doi.org/10.1145/2808194.2809452

Published:27 September 2015Publication History

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pages 21–30

ABSTRACT

In this paper we present a formal framework to define and study the properties of utility-oriented measurements of retrieval effectiveness, like AP, RBP, ERR and many other popular IR evaluation measures. The proposed framework is laid in the wake of the representational theory of measurement, which provides the foundations of the modern theory of measurement in both physical and social sciences, thus contributing to explicitly link IR evaluation to a broader context. The proposed framework is minimal, in the sense that it relies on just one axiom, from which other properties are derived. Finally, it contributes to a better understanding and a clear separation of what issues are due to the inherent problems in comparing systems in terms of retrieval effectiveness and what others are due to the expected numerical properties of a measurement.

References

M. Angelini, N. Ferro, G. Santucci, and G. Silvello. VIRTUE: A visual tool for information retrieval performance evaluation and failure analysis. JVLC, 25(4):394--413, 2014. Google ScholarDigital Library
E. Amigó, J. Gonzalo, J. Artiles, and M. F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. IR, 12(4):461--486, 2009. Google ScholarDigital Library
E. Amigó, J. Gonzalo, and M. F. Verdejo. A General Evaluation Measure for Document Organization Tasks. In SIGIR 2013, pp. 643--652. Google ScholarDigital Library
P. Billingsley. Probability and Measure. John Wiley & Sons, New York, USA, 3rd edition, 1995.Google Scholar
P. Bollman. Two Axioms for Evaluation Measures in Information Retrieval. In SIGIR 1984, pp. 233--245. Google ScholarDigital Library
C. Buckley and E. M. Voorhees. Evaluating Evaluation Measure Stability. In SIGIR 2000, pp. 33--40. Google ScholarDigital Library
C. Buckley and E. M. Voorhees. Retrieval Evaluation with Incomplete Information. In SIGIR 2004, pp. 25--32. Google ScholarDigital Library
L. Busin and S. Mizzaro. Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics. In ICTIR 2013, pp. 22--29. Google ScholarDigital Library
B. A. Carterette. System Effectiveness, User Models, and User Utility: A Conceptual Framework for Investigation. In SIGIR 2011, pp. 903--912. Google ScholarDigital Library
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected Reciprocal Rank for Graded Relevance. In CIKM 2009, pp. 621--630. Google ScholarDigital Library
W. S. Cooper. On Selecting a Measure of Retrieval Effectiveness. JASIS, 24(2):87--100, 1973.Google ScholarCross Ref
N. E. Fenton and J. Bieman. Software Metrics: A Rigorous & Practical Approach. Chapman and Hall/CRC, USA, 3rd edition, 2014. Google ScholarDigital Library
N. Ferro, G. Silvello, H. Keskustalo, A. Pirkola, and K. Järvelin. The Twist Measure for IR Evaluation: Taking User's Effort Into Account. JASIST, 2015.Google Scholar
L. Finkelstein. Widely, Strongly and Weakly Defined Measurement. Measurement, 34(1):39--48, 2003.Google ScholarCross Ref
G. B. Folland. Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, New York, USA, 2nd edition, 1999.Google Scholar
N. Fuhr. IR between Science and Engineering, and the Role of Experimentation. In CLEF 2010, p. 1. LNCS 6360. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. Cumulated Gain-Based Evaluation of IR Techniques. TOIS, 20(4):422--446, 2002. Google ScholarDigital Library
J. Kekäläinen and K. Järvelin. Using Graded Relevance Assessments in IR Evaluation. JASIST, 53(13):1120--1129, 2002. Google ScholarDigital Library
M. G. Kendall. Rank correlation methods. Griffin, Oxford, England, 1948.Google Scholar
D. E. Knuth. The Art of Computer Programming - Volume 2: Seminumerical Algorithms. Addison-Wesley, USA, 2nd edition, 1981.Google Scholar
D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky. Foundations of Measurement. Additive and Polynomial Representations, volume 1. Academic Press, New York, USA, 1971.Google Scholar
E. Maddalena and S. Mizzaro. Axiometrics: Axioms of Information Retrieval Effectiveness Metrics. In EVIA 2014, pp. 17--24.Google Scholar
E. Maddalena, S. Mizzaro, F. Scholer, and A. Turpin. Judging Relevance Using Magnitude Estimation. In ECIR 2015, pp. 215--220. LNCS 9022.Google Scholar
L. Mari. Beyond the Representational Viewpoint: a New Formalization of Measurement. Measurement, 27(2):71--84, 2000.Google ScholarCross Ref
S. Miyamoto. Generalizations of Multisets and Rough Approximations. International Journal of Intelligent Systems, 19(7):639--652, 2004. Google ScholarDigital Library
A. Moffat. Seven Numeric Properties of Effectiveness Metrics. In AIRS 2013, pp. 1--12. LNCS 8281.Google Scholar
A. Moffat and J. Zobel. Rank-biased Precision for Measurement of Retrieval Effectiveness. TOIS, 27(1):2:1--2:27, 2008. Google ScholarDigital Library
T. Sakai. Evaluating Evaluation Metrics based on the Bootstrap. In SIGIR 2006, pp. 525--532. Google ScholarDigital Library
T. Sakai. Metrics, Statistics, Tests. In Bridging Between Information Retrieval and Databases - PROMISE Winter School 2013, Revised Tutorial Lectures, pp. 116--163. LNCS 8173, 2014.Google Scholar
S. S. Stevens. On the Theory of Scales of Measurement. Science, New Series, 103(2684):677--680, 1946.Google Scholar
C. J. van Rijsbergen. Retrieval effectiveness. In K. Spärck Jones, editor, Information Retrieval Experiment, pp. 32--43. Butterworths, London, United Kingdom, 1981.Google Scholar
Z. Y. Wang and G. J. Klir. Fuzzy Measure Theory. Springer-Verlag, New York, USA, 1992. Google ScholarCross Ref
W. Webber, A. Moffat, and J. Zobel. A Similarity Measure for Indefinite Rankings. TOIS, 4(28):20:1--20:38, 2010. Google ScholarDigital Library
E. Yilmaz and J. A. Aslam. Estimating average precision when judgments are incomplete. Knowledge and Information Systems, 16(2):173--211, 2008. Google ScholarDigital Library
E. Yilmaz, J. A. Aslam, and S. E. Robertson. A New Rank Correlation Coefficient for Information Retrieval. In SIGIR 2008, pp. 587--594. Google ScholarDigital Library

Index Terms

Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Are IR Evaluation Measures on an Interval Scale?
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

In this paper, we formally investigate whether, or not, IR evaluation measures are on an interval scale, which is needed to safely compute the basic statistics, such as mean and variance, we daily use to compare IR systems. We face this issue in the ...
Read More
Measuring retrieval effectiveness: a new proposal and a first experimental validation

Most common effectiveness measures for information retrieval systems are based on the assumptions of binary relevance (either a document is relevant to a given query or it is not) and binary retrieval (either a document is retrieved or it is not). In ...
Read More
Evaluating the effectiveness of content-oriented XML retrieval methods
Abstract
Content-oriented XML retrieval approaches aim at a more focused retrieval strategy: Instead of retrieving whole documents, document components that are exhaustive to the information need while at the same time being as specific as possible should ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
September 2015
402 pages
ISBN:9781450338332
DOI:10.1145/2808194
General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Bruce Croft
University of Massachusetts Amherst, USA
,
Program Chairs:
Arjen de Vries
CWI Amsterdam, The Netherlands
,
Chengxiang Zhai
University of Illinois at Urbana-Champaign, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 September 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
balancing index
omomorphism
replacement
representational theory of measurement
swap
Qualifiers
- research-article
Conference

Acceptance Rates
ICTIR '15 Paper Acceptance Rate29of57submissions,51%Overall Acceptance Rate209of482submissions,43%
More
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 162
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Are IR Evaluation Measures on an Interval Scale?

Measuring retrieval effectiveness: a new proposal and a first experimental validation

Evaluating the effectiveness of content-oriented XML retrieval methods