Authors:
Daisy Albuquerque da Silva
;
Carlos Eduardo de Mello
and
Ana Garcia
Affiliation:
Federal University of Rio de Janeiro (UNIRIO), PPGI, Av. Pasteur, 458, Urca, RJ, Brazil
Keyword(s):
Feedback Generation, Automated Feeback, Large Language Model, Feedback Effectiveness, GPT, LLM.
Abstract:
This study examines the use of Large Language Models (LLMs) like GPT-4 in the evaluation of argumentative writing, particularly opinion articles authored by military school students. It explores the potential of LLMs to provide instant, personalized feedback across different writing stages and assesses their effectiveness compared to human evaluators. The study utilizes a detailed rubric to guide the LLM evaluation, focusing on competencies from topic choice to bibliographical references. Initial findings suggest that GPT-4 can consistently evaluate technical and structural aspects of writing, offering reliable feedback, especially in the References category. However, its conservative classification approach may underestimate article quality, indicating a need for human oversight. The study also uncovers GPT-4’s challenges with nuanced and contextual elements of opinion writing, evident from variability in precision and low recall in recognizing complete works. These findings highlig
ht the evolving role of LLMs as supplementary tools in education that require integration with human judgment to enhance argumentative writing and critical thinking in academic settings.
(More)