short-paper

Open access

Using Large Language Models To Diagnose Math Problem-solving Skills At Scale

Authors:

Hyoungwook Jin,

Bekzat Tilekbay,

Juho KimAuthors Info & Claims

L@S '24: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Pages 471 - 475

https://doi.org/10.1145/3657604.3664697

Published: 15 July 2024 Publication History

Abstract

Personalized feedback, tailored to students' needs and prior knowledge, is essential for fostering mathematical problem-solving skills. However, personalized feedback is often limited to one-to-one tutoring or small classrooms as it requires instructors' in-depth diagnosis of cognitive processes employed in students' answers. We propose a large language model (LLM) pipeline that diagnoses students' problem-solving skills from their answers at scale in elementary school math word problems. Based on prior literature and an interview with a math education expert, we developed PERC, a framework composed of four problem-solving stages that students can follow: Parse, Extract, Retrieve, and Combine. The framework facilitates diagnosis by externalizing students' step-by-step problem-solving processes and allowing our pipeline to analyze each stage individually. Our LLM pipeline diagnoses each stage by (1) generating rubrics and (2) comparing students' answers with the rubrics. We fine-tuned our LLM pipeline with 71 math problem-rubric pairs and 128 problem-answer-grade triplets collected from elementary school students. We evaluated our pipeline's diagnosis accuracy against vanilla GPT-3.5 and vanilla GPT-4 with automatic and expert evaluations. The results showed the potential of our approach in improving the end-to-end diagnosis accuracy of LLMs, and expert evaluation provided specific aspects that should be improved.

References

[1]

Vanessa Chang, Christian Gütl, and Martin Ebner. 2018. Trends and opportunities in online learning, MOOCs, and cloud-based tools. Second handbook of information technology in primary and secondary education (2018), 935--953.

[2]

Esen Ersoy and Belgin Bal-Incebacak. 2017. The evaluation of the problem solving in mathematics course according to student views. In ITM Web of Conferences, Vol. 13. EDP Sciences, 01012.

[3]

Marie-Pier Goulet-Lyle, Dominic Voyer, and Lieven Verschaffel. 2020. How does imposing a step-by-step solution method impact students' approach to mathematical word problem solving? ZDM, Vol. 52, 1 (2020), 139--149.

[4]

Nourooz Hashemi, Mohd Salleh Abu, Hamidreza Kashefi, Mahani Mokhtar, and Khadijeh Rahimi. 2015. Designing learning strategy to improve undergraduate students' problem solving in derivatives and integrals: A conceptual framework. Eurasia Journal of Mathematics, Science and Technology Education, Vol. 11, 2 (2015), 227--238.

[5]

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874 (2021).

[6]

Jessica Hoth, Martina Döhrmann, Gabriele Kaiser, Andreas Busse, Johannes König, and Sigrid Blömeke. 2016. Diagnostic competence of primary school mathematics teachers during classroom situations. ZDM, Vol. 48 (2016), 41--53.

[7]

Shima Imani, Liang Du, and Harsh Shrivastava. 2023. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398 (2023).

[8]

Tae Soo Kim, Yoonjoo Lee, Jamin Shin, Young-Ho Kim, and Juho Kim. 2023. EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria. arxiv: 2309.13633 [cs.HC]

[9]

Stephen Krulik and Jesse A Rudnick. 1988. Problem Solving: A Handbook for Elementary School Teachers. ERIC.

[10]

Chien I Lee. 2016. An appropriate prompts system based on the Polya method for mathematical problem-solving. Eurasia Journal of Mathematics, Science and Technology Education, Vol. 13, 3 (2016), 893--910.

[11]

Kathryn S McCarthy, Micah Watanabe, Jianmin Dai, and Danielle S McNamara. 2020. Personalized learning in iSTART: Past modifications and future design. Journal of Research on Technology in Education, Vol. 52, 3 (2020), 301--321.

[12]

Nunuy Nurkaeti. 2018. Polya's strategy: an analysis of mathematical problem solving difficulty in 5th grade elementary school. Edu Humanities| Journal of Basic Education Cibiru Campus, Vol. 10, 2 (2018), 140.

[13]

George Polya. 2004. How to solve it: A new aspect of mathematical method. Number 246. Princeton university press.

[14]

Alexander Renkl. 1999. Learning mathematics from worked-out examples: Analyzing and fostering self-explanations. European Journal of Psychology of Education, Vol. 14, 4 (1999), 477--488.

[15]

Alan H Schoenfeld. 1983. Beyond the purely cognitive: Belief systems, social cognitions, and metacognitions as driving forces in intellectual performance. Cognitive science, Vol. 7, 4 (1983), 329--363.

[16]

NSH Simpol, M Shahrill, HC Li, and RCI Prahmana. 2017. Implementing thinking aloud pair and Pólya problem solving strategies in fractions. In Journal of Physics: Conference Series, Vol. 943. IOP Publishing, 012013.

[17]

Dirk T. Tempelaar, André Heck, Hans Cuypers, Henk van der Kooij, and Evert van de Vrie. 2013. Formative assessment and learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge (Leuven, Belgium) (LAK '13). Association for Computing Machinery, New York, NY, USA, 205--209. https://doi.org/10.1145/2460296.2460337

Digital Library

[18]

Yuwalee Thiangthung. 2016. Applying Polya's four-steps and Schoenfeld's behavior categories to enhance students' mathematical problem solving. Journal of Advances in Humanities and Social Sciences, Vol. 2, 5 (2016), 261--268.

[19]

Candace Walkington and Matthew L Bernacki. 2020. Appraising research on personalized learning: Definitions, theoretical alignment, advancements, and future directions., bibinfonumpages235--252 pages.

[20]

Huanhuan Wang and James D Lehman. 2021. Using achievement goal-based personalized motivational feedback to enhance online learning. Educational Technology Research and Development, Vol. 69, 2 (2021), 553--581.

[21]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, Vol. 35 (2022), 24824--24837.

[22]

Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 385, 22 pages. https://doi.org/10.1145/3491102.3517582

Digital Library

[23]

Erna YAYUK and H Husamah. 2020. The difficulties of prospective elementary school teachers in item problem solving for mathematics: Polya's steps. Journal for the Education of Gifted Young Scientists, Vol. 8, 1 (2020), 361--368.

Index Terms

Using Large Language Models To Diagnose Math Problem-solving Skills At Scale
1. Applied computing
  1. Education

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

L@S '24: Proceedings of the Eleventh ACM Conference on Learning @ Scale

July 2024

582 pages

ISBN:9798400706332

DOI:10.1145/3657604

General Chair:
David Joyner
Georgia Tech, USA
,
Program Chairs:
Min Kyu Kim
Georgia State University, USA
,
Xu Wang
University of Michigan, USA
,
Meng Xia
Texas A&M University, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Algorithm LABS

Conference

L@S '24

L@S '24: Eleventh ACM Conference on Learning @ Scale

July 18 - 20, 2024

GA, Atlanta, USA

Acceptance Rates

Overall Acceptance Rate 117 of 440 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
359
Total Downloads

Downloads (Last 12 months)359
Downloads (Last 6 weeks)86

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents