Using Large Language Models To Diagnose Math Problem-solving Skills At Scale
Pages 471 - 475
Abstract
Personalized feedback, tailored to students' needs and prior knowledge, is essential for fostering mathematical problem-solving skills. However, personalized feedback is often limited to one-to-one tutoring or small classrooms as it requires instructors' in-depth diagnosis of cognitive processes employed in students' answers. We propose a large language model (LLM) pipeline that diagnoses students' problem-solving skills from their answers at scale in elementary school math word problems. Based on prior literature and an interview with a math education expert, we developed PERC, a framework composed of four problem-solving stages that students can follow: Parse, Extract, Retrieve, and Combine. The framework facilitates diagnosis by externalizing students' step-by-step problem-solving processes and allowing our pipeline to analyze each stage individually. Our LLM pipeline diagnoses each stage by (1) generating rubrics and (2) comparing students' answers with the rubrics. We fine-tuned our LLM pipeline with 71 math problem-rubric pairs and 128 problem-answer-grade triplets collected from elementary school students. We evaluated our pipeline's diagnosis accuracy against vanilla GPT-3.5 and vanilla GPT-4 with automatic and expert evaluations. The results showed the potential of our approach in improving the end-to-end diagnosis accuracy of LLMs, and expert evaluation provided specific aspects that should be improved.
References
[1]
Vanessa Chang, Christian Gütl, and Martin Ebner. 2018. Trends and opportunities in online learning, MOOCs, and cloud-based tools. Second handbook of information technology in primary and secondary education (2018), 935--953.
[2]
Esen Ersoy and Belgin Bal-Incebacak. 2017. The evaluation of the problem solving in mathematics course according to student views. In ITM Web of Conferences, Vol. 13. EDP Sciences, 01012.
[3]
Marie-Pier Goulet-Lyle, Dominic Voyer, and Lieven Verschaffel. 2020. How does imposing a step-by-step solution method impact students' approach to mathematical word problem solving? ZDM, Vol. 52, 1 (2020), 139--149.
[4]
Nourooz Hashemi, Mohd Salleh Abu, Hamidreza Kashefi, Mahani Mokhtar, and Khadijeh Rahimi. 2015. Designing learning strategy to improve undergraduate students' problem solving in derivatives and integrals: A conceptual framework. Eurasia Journal of Mathematics, Science and Technology Education, Vol. 11, 2 (2015), 227--238.
[5]
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874 (2021).
[6]
Jessica Hoth, Martina Döhrmann, Gabriele Kaiser, Andreas Busse, Johannes König, and Sigrid Blömeke. 2016. Diagnostic competence of primary school mathematics teachers during classroom situations. ZDM, Vol. 48 (2016), 41--53.
[7]
Shima Imani, Liang Du, and Harsh Shrivastava. 2023. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398 (2023).
[8]
Tae Soo Kim, Yoonjoo Lee, Jamin Shin, Young-Ho Kim, and Juho Kim. 2023. EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria. arxiv: 2309.13633 [cs.HC]
[9]
Stephen Krulik and Jesse A Rudnick. 1988. Problem Solving: A Handbook for Elementary School Teachers. ERIC.
[10]
Chien I Lee. 2016. An appropriate prompts system based on the Polya method for mathematical problem-solving. Eurasia Journal of Mathematics, Science and Technology Education, Vol. 13, 3 (2016), 893--910.
[11]
Kathryn S McCarthy, Micah Watanabe, Jianmin Dai, and Danielle S McNamara. 2020. Personalized learning in iSTART: Past modifications and future design. Journal of Research on Technology in Education, Vol. 52, 3 (2020), 301--321.
[12]
Nunuy Nurkaeti. 2018. Polya's strategy: an analysis of mathematical problem solving difficulty in 5th grade elementary school. Edu Humanities| Journal of Basic Education Cibiru Campus, Vol. 10, 2 (2018), 140.
[13]
George Polya. 2004. How to solve it: A new aspect of mathematical method. Number 246. Princeton university press.
[14]
Alexander Renkl. 1999. Learning mathematics from worked-out examples: Analyzing and fostering self-explanations. European Journal of Psychology of Education, Vol. 14, 4 (1999), 477--488.
[15]
Alan H Schoenfeld. 1983. Beyond the purely cognitive: Belief systems, social cognitions, and metacognitions as driving forces in intellectual performance. Cognitive science, Vol. 7, 4 (1983), 329--363.
[16]
NSH Simpol, M Shahrill, HC Li, and RCI Prahmana. 2017. Implementing thinking aloud pair and Pólya problem solving strategies in fractions. In Journal of Physics: Conference Series, Vol. 943. IOP Publishing, 012013.
[17]
Dirk T. Tempelaar, André Heck, Hans Cuypers, Henk van der Kooij, and Evert van de Vrie. 2013. Formative assessment and learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge (Leuven, Belgium) (LAK '13). Association for Computing Machinery, New York, NY, USA, 205--209. https://doi.org/10.1145/2460296.2460337
[18]
Yuwalee Thiangthung. 2016. Applying Polya's four-steps and Schoenfeld's behavior categories to enhance students' mathematical problem solving. Journal of Advances in Humanities and Social Sciences, Vol. 2, 5 (2016), 261--268.
[19]
Candace Walkington and Matthew L Bernacki. 2020. Appraising research on personalized learning: Definitions, theoretical alignment, advancements, and future directions., bibinfonumpages235--252 pages.
[20]
Huanhuan Wang and James D Lehman. 2021. Using achievement goal-based personalized motivational feedback to enhance online learning. Educational Technology Research and Development, Vol. 69, 2 (2021), 553--581.
[21]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, Vol. 35 (2022), 24824--24837.
[22]
Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 385, 22 pages. https://doi.org/10.1145/3491102.3517582
[23]
Erna YAYUK and H Husamah. 2020. The difficulties of prospective elementary school teachers in item problem solving for mathematics: Polya's steps. Journal for the Education of Gifted Young Scientists, Vol. 8, 1 (2020), 361--368.
Index Terms
- Using Large Language Models To Diagnose Math Problem-solving Skills At Scale
Recommendations
Comments
Information & Contributors
Information
Published In
July 2024
582 pages
ISBN:9798400706332
DOI:10.1145/3657604
- General Chair:
- David Joyner,
- Program Chairs:
- Min Kyu Kim,
- Xu Wang,
- Meng Xia
Copyright © 2024 Owner/Author.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 15 July 2024
Check for updates
Author Tags
Qualifiers
- Short-paper
Funding Sources
- Algorithm LABS
Conference
L@S '24
Acceptance Rates
Overall Acceptance Rate 117 of 440 submissions, 27%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 359Total Downloads
- Downloads (Last 12 months)359
- Downloads (Last 6 weeks)86
Reflects downloads up to 20 Jan 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in