Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions About Code

Jaromir Savelka; Arav Agarwal; Christopher Bogart; Majd Sakr

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions About Code

Topics: Learning with AI Systems; Machine Learning; Natural Language Processing

In Proceedings of the 15th International Conference on Computer Supported Education - Volume 2: CSEDU, 47-58, 2023 , Prague, Czech Republic

Authors: Jaromir Savelka ; Arav Agarwal ; Christopher Bogart and Majd Sakr

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, U.S.A.

Keyword(s): Multiple-Choice Question Answering, MCQ, Introductory, Intermediate Programming, Code Analysis, Generative Pre-Trained Transformers, GPT, Python Course, Programming Knowledge Assessment, ChatGPT, Codex, GitHub Copilot, AlphaCode.

Abstract: We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple- choice question (MCQ) assessments, often involving short snippets of code, from introductory and interme- diate programming courses at the postsecondary level. This emerging technology stirs countless discussions of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming educa- tion (e.g., cheating). However, the capabilities of GPT models and their limitations to reason about and/or analyze code in educational settings have been under-explored. We evaluated several OpenAI’s GPT models on formative and summative MCQ assessments from three Python courses (530 questions). We found that MCQs containing code snippets are not answered as successfully as those that only contain natural language. While questions requiring to fill-in a blank in the code or completing a natural language statement about the snippet are handled rather successfully, MCQs that require analysis and/or reasoning about the code (e.g., what is true/false about the snippet, or what is its output) appear to be the most challenging. These findings can be leveraged by educators to adapt their instructional practices and assessments in programming courses, so that GPT becomes a valuable assistant for a learner as opposed to a source of confusion and/or potential hindrance in the learning process. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.147.73.35

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Savelka, J.; Agarwal, A.; Bogart, C. and Sakr, M. (2023). Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions About Code. In Proceedings of the 15th International Conference on Computer Supported Education - Volume 2: CSEDU; ISBN 978-989-758-641-5; ISSN 2184-5026, SciTePress, pages 47-58. DOI: 10.5220/0011996900003470

@conference{csedu23,
author={Jaromir Savelka. and Arav Agarwal. and Christopher Bogart. and Majd Sakr.},
title={Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions About Code},
booktitle={Proceedings of the 15th International Conference on Computer Supported Education - Volume 2: CSEDU},
year={2023},
pages={47-58},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011996900003470},
isbn={978-989-758-641-5},
issn={2184-5026},
}

TY - CONF

JO - Proceedings of the 15th International Conference on Computer Supported Education - Volume 2: CSEDU
TI - Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions About Code
SN - 978-989-758-641-5
IS - 2184-5026
AU - Savelka, J.
AU - Agarwal, A.
AU - Bogart, C.
AU - Sakr, M.
PY - 2023
SP - 47
EP - 58
DO - 10.5220/0011996900003470
PB - SciTePress