skip to main content
10.1145/2808194.2809452acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness

Published:27 September 2015Publication History

ABSTRACT

In this paper we present a formal framework to define and study the properties of utility-oriented measurements of retrieval effectiveness, like AP, RBP, ERR and many other popular IR evaluation measures. The proposed framework is laid in the wake of the representational theory of measurement, which provides the foundations of the modern theory of measurement in both physical and social sciences, thus contributing to explicitly link IR evaluation to a broader context. The proposed framework is minimal, in the sense that it relies on just one axiom, from which other properties are derived. Finally, it contributes to a better understanding and a clear separation of what issues are due to the inherent problems in comparing systems in terms of retrieval effectiveness and what others are due to the expected numerical properties of a measurement.

References

  1. M. Angelini, N. Ferro, G. Santucci, and G. Silvello. VIRTUE: A visual tool for information retrieval performance evaluation and failure analysis. JVLC, 25(4):394--413, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Amigó, J. Gonzalo, J. Artiles, and M. F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. IR, 12(4):461--486, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Amigó, J. Gonzalo, and M. F. Verdejo. A General Evaluation Measure for Document Organization Tasks. In SIGIR 2013, pp. 643--652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Billingsley. Probability and Measure. John Wiley & Sons, New York, USA, 3rd edition, 1995.Google ScholarGoogle Scholar
  5. P. Bollman. Two Axioms for Evaluation Measures in Information Retrieval. In SIGIR 1984, pp. 233--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Buckley and E. M. Voorhees. Evaluating Evaluation Measure Stability. In SIGIR 2000, pp. 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Buckley and E. M. Voorhees. Retrieval Evaluation with Incomplete Information. In SIGIR 2004, pp. 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Busin and S. Mizzaro. Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics. In ICTIR 2013, pp. 22--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. A. Carterette. System Effectiveness, User Models, and User Utility: A Conceptual Framework for Investigation. In SIGIR 2011, pp. 903--912. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected Reciprocal Rank for Graded Relevance. In CIKM 2009, pp. 621--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. S. Cooper. On Selecting a Measure of Retrieval Effectiveness. JASIS, 24(2):87--100, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  12. N. E. Fenton and J. Bieman. Software Metrics: A Rigorous & Practical Approach. Chapman and Hall/CRC, USA, 3rd edition, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Ferro, G. Silvello, H. Keskustalo, A. Pirkola, and K. Järvelin. The Twist Measure for IR Evaluation: Taking User's Effort Into Account. JASIST, 2015.Google ScholarGoogle Scholar
  14. L. Finkelstein. Widely, Strongly and Weakly Defined Measurement. Measurement, 34(1):39--48, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. G. B. Folland. Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, New York, USA, 2nd edition, 1999.Google ScholarGoogle Scholar
  16. N. Fuhr. IR between Science and Engineering, and the Role of Experimentation. In CLEF 2010, p. 1. LNCS 6360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Järvelin and J. Kekäläinen. Cumulated Gain-Based Evaluation of IR Techniques. TOIS, 20(4):422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kekäläinen and K. Järvelin. Using Graded Relevance Assessments in IR Evaluation. JASIST, 53(13):1120--1129, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. G. Kendall. Rank correlation methods. Griffin, Oxford, England, 1948.Google ScholarGoogle Scholar
  20. D. E. Knuth. The Art of Computer Programming - Volume 2: Seminumerical Algorithms. Addison-Wesley, USA, 2nd edition, 1981.Google ScholarGoogle Scholar
  21. D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky. Foundations of Measurement. Additive and Polynomial Representations, volume 1. Academic Press, New York, USA, 1971.Google ScholarGoogle Scholar
  22. E. Maddalena and S. Mizzaro. Axiometrics: Axioms of Information Retrieval Effectiveness Metrics. In EVIA 2014, pp. 17--24.Google ScholarGoogle Scholar
  23. E. Maddalena, S. Mizzaro, F. Scholer, and A. Turpin. Judging Relevance Using Magnitude Estimation. In ECIR 2015, pp. 215--220. LNCS 9022.Google ScholarGoogle Scholar
  24. L. Mari. Beyond the Representational Viewpoint: a New Formalization of Measurement. Measurement, 27(2):71--84, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Miyamoto. Generalizations of Multisets and Rough Approximations. International Journal of Intelligent Systems, 19(7):639--652, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Moffat. Seven Numeric Properties of Effectiveness Metrics. In AIRS 2013, pp. 1--12. LNCS 8281.Google ScholarGoogle Scholar
  27. A. Moffat and J. Zobel. Rank-biased Precision for Measurement of Retrieval Effectiveness. TOIS, 27(1):2:1--2:27, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Sakai. Evaluating Evaluation Metrics based on the Bootstrap. In SIGIR 2006, pp. 525--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Sakai. Metrics, Statistics, Tests. In Bridging Between Information Retrieval and Databases - PROMISE Winter School 2013, Revised Tutorial Lectures, pp. 116--163. LNCS 8173, 2014.Google ScholarGoogle Scholar
  30. S. S. Stevens. On the Theory of Scales of Measurement. Science, New Series, 103(2684):677--680, 1946.Google ScholarGoogle Scholar
  31. C. J. van Rijsbergen. Retrieval effectiveness. In K. Spärck Jones, editor, Information Retrieval Experiment, pp. 32--43. Butterworths, London, United Kingdom, 1981.Google ScholarGoogle Scholar
  32. Z. Y. Wang and G. J. Klir. Fuzzy Measure Theory. Springer-Verlag, New York, USA, 1992. Google ScholarGoogle ScholarCross RefCross Ref
  33. W. Webber, A. Moffat, and J. Zobel. A Similarity Measure for Indefinite Rankings. TOIS, 4(28):20:1--20:38, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. E. Yilmaz and J. A. Aslam. Estimating average precision when judgments are incomplete. Knowledge and Information Systems, 16(2):173--211, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. Yilmaz, J. A. Aslam, and S. E. Robertson. A New Rank Correlation Coefficient for Information Retrieval. In SIGIR 2008, pp. 587--594. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
      September 2015
      402 pages
      ISBN:9781450338332
      DOI:10.1145/2808194

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 September 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICTIR '15 Paper Acceptance Rate29of57submissions,51%Overall Acceptance Rate209of482submissions,43%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader