ABSTRACT
In this paper, we apply methods from educational testing to measure the reliability of an IR collection.
- Linda Crocker and James Algina. Introduction to Classical & Modern Test Theory. Holt, Rinehart and Winston, 1986.Google Scholar
- R. Robert L. Brennan. Generalizability Theory. Springer-Verlag, 2001.Google Scholar
- Richard J. Shavelson, Noreen M. Webb. Generalizability Theory: A Primer. Sage Publications, 1991.Google Scholar
- Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998. Google ScholarDigital Library
- Ellen M. Voorhees, Chris Buckley. The effect of topic set size on retrieval experiment error. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002. Google ScholarDigital Library
Index Terms
- Testing algorithms is like testing students
Recommendations
Model-based testing in industry: a case study with two MBT tools
AST '10: Proceedings of the 5th Workshop on Automation of Software TestTraditional testing techniques may not always be suitable for adequate, thorough, and extensible testing of critical and complex software in a resource and time constrained software development environment. Model-based testing (MBT) is an evolving ...
Differential testing: a new approach to change detection
ESEC-FSE companion '07: The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papersRegression testing, as it's commonly practiced, is unsound due to inconsistent test repair and test addition. This paper presents a new technique, differential testing, that alleviates the test repair problem and detects more changes than regression ...
NTCIR Lifelog: The First Test Collection for Lifelog Research
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalTest collections have a long history of supporting repeatable and comparable evaluation in Information Retrieval (IR). However, thus far, no shared test collection exists for IR systems that are designed to index and retrieve multimodal lifelog data. In ...
Comments