research-article

Rating monitoring as a means to mitigate rater effects and controversial evaluations

Authors:

Mikel Villamañe,

Mikel Larrañaga,

Ainhoa ÁlvarezAuthors Info & Claims

TEEM 2017: Proceedings of the 5th International Conference on Technological Ecosystems for Enhancing Multiculturality

Article No.: 39, Pages 1 - 8

https://doi.org/10.1145/3144826.3145389

Published: 18 October 2017 Publication History

Abstract

Assessment is a key aspect of any instructional process, as it is the main mean of determining the competences of the students. Nowadays, the assessment and scoring carried out by groups of reviewers, namely evaluation of final year project or thesis works or even peer evaluation, are becoming more and more frequent. However, the assessment and scoring of a work in such scenarios can be affected by each rater's thinking processes, knowledge level and personal preferences among other issues. These idiosyncrasies are known as rater effects and can dramatically affect the evaluation process. Although many works point out that the use of certain evaluation instruments, e.g., evaluation rubrics, can increase the fairness and impartiality of the evaluation, rater effects may be still present and remarkably affect the scoring. Furthermore, some works might present controversy on their assessment, i.e., the evaluators of a certain work might strongly disagree on its quality. Therefore, the identification of the rater effects and controversial evaluations is crucial to be able to take remediation actions and to guarantee a fair evaluation. However, this identification process is often hard for scoring leaders. Consequently, tools that help leaders in this process are necessary. This paper presents the visualizations used by RaMon (a system for monitoring raters and controversial evaluations) to help the monitoring process, along with the support it provides to take remediation actions.

References

[1]

K.L. Chan. 2001. Statistical analysis of final year project marks in the computer engineering undergraduate program. IEEE Trans. Educ. 44, 3 (August 2001), 258--261.

Digital Library

[2]

Jr. Engelhard Jr George and Wang, Jue. 2015. Unfolding Rater Accuracy in Performance Assessments. Rasch Meas. Trans. 28, 4 (2015), 1489--1491.

[3]

Judy Kay and Susan Bull. 2015. New Opportunities with Open Learner Models and Visual Learning Analytics. In Actas de Artificial Intelligence in Education, 666--669.

[4]

Haiying Long and Weiguo Pang. 2015. Rater effects in creativity assessment: A mixed methods investigation. Think. Ski. Creat. 15, (March 2015), 13--25.

[5]

Carol M. Myford and Edward W. Wolfe. 2003. Detecting and measuring rater effects using many-facet Rasch measurement: part I. J. Appl. Meas. 4, 4 (2003), 386--422.

[6]

Abelardo Pardo and Shance Dawson. 2016. Learning Analytics: How can Data be used to Improve Learning Practice. In P. Reimann, S. Bull, M. Kickmeier-Rust, R. K. Vatrapu & B. Wasson (Eds.), Measuring and visualizing learning in the information-rich classroom,. Routledge, 41--55.

[7]

George Siemens. 2012. Learning analytics: envisioning a research discipline and a domain of practice. In Actas de International Conference on Learning Analytics and Knowledge, 4--8.

Digital Library

[8]

Cheng Yu Teo and Duan Juat Ho. 1998. A systematic approach to the implementation of final year project in an electrical engineering undergraduate course. IEEE Trans. Educ. 41, 1 (February 1998), 25--30.

Digital Library

[9]

A. M. Tervakari, K. Silius, J. Koro, J. Paukkeri, and O. Pirttilä. 2014. Usefulness of information visualizations based on educational data. In Actas de IEEE Global Engineering Education Conference, 142--151.

[10]

E. Valderrama, Mercè Rullan, Fermín Sánchez, Jordi Pons, Claudi Mans, Francesc Giné, Laureà Jiménez, and Enric Peig. 2009. Guidelines for the final year project assessment in engineering. In Actas de IEEE Frontiers in Education Conference, 1--5.

Digital Library

[11]

Mikel Villamañe. 2017. Análisis y mejora de los marcos actuales de desarrollo y evaluación de los Trabajos Fin de Grado mediante el uso de las TIC. Universidad del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU.

[12]

Mikel Villamañe, Ainhoa Álvarez, Mikel Larrañaga, and Begoña Ferrero. 2017. Desarrollo y validación de un conjunto de rúbricas para la evaluación de Trabajos Fin de Grado. ReVisión 10, 1 (2017), 17--27.

[13]

Mikel Villamañe, Begoña Ferrero, Ainhoa Álvarez, Mikel Larrañaga, Ana Arruarte, and Jon Ander Elorriaga. 2014. Dealing with common problems in engineering degrees' Final Year Projects. In Actas de IEEE Frontiers in Education Conference, 2663--2670.

[14]

Edward W. Wolfe. 2004. Identifying rater effects using latent trait models. Psychol. Sci. 46, 1 (2004), 35--51.

[15]

Edward W. Wolfe. 2014. Methods for monitoring rating quality: Current practices and suggested changes. Iowa City IA Pearson (2014). Retrieved April 17, 2017 from https://www.pearson.com/content/dam/one-dot-com/one-dot-com/global/Files/efficacy-and-research/schools/015_Wolfe_MethodsForMonitoring_May2014-2.pdf

[16]

Kin Fai Ellick Wong and Jessica Y. Y. Kwong. 2007. Effects of rater goals on rating patterns: Evidence from an experimental field study. J. Appl. Psychol. 92, 2 (2007), 577--585.

Cited By

Paige JRogers CKerdolff KGarbee DBonanno LYu Q(2022)Conceptualizing a Quantitative Measurement Suite to Evaluate Healthcare TeamsSimulation & Gaming10.1177/10468781211066348(104687812110663)Online publication date: 10-Jan-2022
https://doi.org/10.1177/10468781211066348

Index Terms

Rating monitoring as a means to mitigate rater effects and controversial evaluations
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Student assessment

Recommendations

Investigating the Combined Effects of Rater Expertise, Working Memory Capacity, and Cognitive Functionality on the Scoring of Second Language Speaking Performance
Gauging the Quality of Relevance Assessments using Inter-Rater Agreement
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

In recent years, gathering relevance judgments through non-topic originators has become an increasingly important problem in Information Retrieval. Relevance judgments can be used to measure the effectiveness of a system, and are often needed to build ...
A document rating system for preference judgements
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

High quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

TEEM 2017: Proceedings of the 5th International Conference on Technological Ecosystems for Enhancing Multiculturality

October 2017

723 pages

ISBN:9781450353861

DOI:10.1145/3144826

Editors:
Juan Manuel Dodero
University of Cádiz
,
María Soledad Ibarra Sáiz
University of Cádiz
,
Iván Ruiz Rube
University of Cádiz

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

University of Salamanca: University of Salamanca

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

TEEM 2017

TEEM 2017: 5th International Conference Technological Ecosystems for Enhancing Multiculturality

October 18 - 20, 2017

Cádiz, Spain

Acceptance Rates

TEEM 2017 Paper Acceptance Rate 84 of 109 submissions, 77%;

Overall Acceptance Rate 496 of 705 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
45
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Paige JRogers CKerdolff KGarbee DBonanno LYu Q(2022)Conceptualizing a Quantitative Measurement Suite to Evaluate Healthcare TeamsSimulation & Gaming10.1177/10468781211066348(104687812110663)Online publication date: 10-Jan-2022
https://doi.org/10.1177/10468781211066348

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten