ABSTRACT
Assessment is a key aspect of any instructional process, as it is the main mean of determining the competences of the students. Nowadays, the assessment and scoring carried out by groups of reviewers, namely evaluation of final year project or thesis works or even peer evaluation, are becoming more and more frequent. However, the assessment and scoring of a work in such scenarios can be affected by each rater's thinking processes, knowledge level and personal preferences among other issues. These idiosyncrasies are known as rater effects and can dramatically affect the evaluation process. Although many works point out that the use of certain evaluation instruments, e.g., evaluation rubrics, can increase the fairness and impartiality of the evaluation, rater effects may be still present and remarkably affect the scoring. Furthermore, some works might present controversy on their assessment, i.e., the evaluators of a certain work might strongly disagree on its quality. Therefore, the identification of the rater effects and controversial evaluations is crucial to be able to take remediation actions and to guarantee a fair evaluation. However, this identification process is often hard for scoring leaders. Consequently, tools that help leaders in this process are necessary. This paper presents the visualizations used by RaMon (a system for monitoring raters and controversial evaluations) to help the monitoring process, along with the support it provides to take remediation actions.
- K.L. Chan. 2001. Statistical analysis of final year project marks in the computer engineering undergraduate program. IEEE Trans. Educ. 44, 3 (August 2001), 258--261. Google ScholarDigital Library
- Jr. Engelhard Jr George and Wang, Jue. 2015. Unfolding Rater Accuracy in Performance Assessments. Rasch Meas. Trans. 28, 4 (2015), 1489--1491.Google Scholar
- Judy Kay and Susan Bull. 2015. New Opportunities with Open Learner Models and Visual Learning Analytics. In Actas de Artificial Intelligence in Education, 666--669.Google Scholar
- Haiying Long and Weiguo Pang. 2015. Rater effects in creativity assessment: A mixed methods investigation. Think. Ski. Creat. 15, (March 2015), 13--25.Google ScholarCross Ref
- Carol M. Myford and Edward W. Wolfe. 2003. Detecting and measuring rater effects using many-facet Rasch measurement: part I. J. Appl. Meas. 4, 4 (2003), 386--422.Google Scholar
- Abelardo Pardo and Shance Dawson. 2016. Learning Analytics: How can Data be used to Improve Learning Practice. In P. Reimann, S. Bull, M. Kickmeier-Rust, R. K. Vatrapu & B. Wasson (Eds.), Measuring and visualizing learning in the information-rich classroom,. Routledge, 41--55.Google Scholar
- George Siemens. 2012. Learning analytics: envisioning a research discipline and a domain of practice. In Actas de International Conference on Learning Analytics and Knowledge, 4--8. Google ScholarDigital Library
- Cheng Yu Teo and Duan Juat Ho. 1998. A systematic approach to the implementation of final year project in an electrical engineering undergraduate course. IEEE Trans. Educ. 41, 1 (February 1998), 25--30. Google ScholarDigital Library
- A. M. Tervakari, K. Silius, J. Koro, J. Paukkeri, and O. Pirttilä. 2014. Usefulness of information visualizations based on educational data. In Actas de IEEE Global Engineering Education Conference, 142--151.Google Scholar
- E. Valderrama, Mercè Rullan, Fermín Sánchez, Jordi Pons, Claudi Mans, Francesc Giné, Laureà Jiménez, and Enric Peig. 2009. Guidelines for the final year project assessment in engineering. In Actas de IEEE Frontiers in Education Conference, 1--5. Google ScholarDigital Library
- Mikel Villamañe. 2017. Análisis y mejora de los marcos actuales de desarrollo y evaluación de los Trabajos Fin de Grado mediante el uso de las TIC. Universidad del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU.Google Scholar
- Mikel Villamañe, Ainhoa Álvarez, Mikel Larrañaga, and Begoña Ferrero. 2017. Desarrollo y validación de un conjunto de rúbricas para la evaluación de Trabajos Fin de Grado. ReVisión 10, 1 (2017), 17--27.Google Scholar
- Mikel Villamañe, Begoña Ferrero, Ainhoa Álvarez, Mikel Larrañaga, Ana Arruarte, and Jon Ander Elorriaga. 2014. Dealing with common problems in engineering degrees' Final Year Projects. In Actas de IEEE Frontiers in Education Conference, 2663--2670.Google ScholarCross Ref
- Edward W. Wolfe. 2004. Identifying rater effects using latent trait models. Psychol. Sci. 46, 1 (2004), 35--51.Google Scholar
- Edward W. Wolfe. 2014. Methods for monitoring rating quality: Current practices and suggested changes. Iowa City IA Pearson (2014). Retrieved April 17, 2017 from https://www.pearson.com/content/dam/one-dot-com/one-dot-com/global/Files/efficacy-and-research/schools/015_Wolfe_MethodsForMonitoring_May2014-2.pdfGoogle Scholar
- Kin Fai Ellick Wong and Jessica Y. Y. Kwong. 2007. Effects of rater goals on rating patterns: Evidence from an experimental field study. J. Appl. Psychol. 92, 2 (2007), 577--585.Google ScholarCross Ref
Index Terms
- Rating monitoring as a means to mitigate rater effects and controversial evaluations
Recommendations
Gauging the Quality of Relevance Assessments using Inter-Rater Agreement
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalIn recent years, gathering relevance judgments through non-topic originators has become an increasingly important problem in Information Retrieval. Relevance judgments can be used to measure the effectiveness of a system, and are often needed to build ...
A document rating system for preference judgements
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalHigh quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by ...
Comments