skip to main content
10.1145/3657604.3664697acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesl-at-sConference Proceedingsconference-collections
short-paper
Open access

Using Large Language Models To Diagnose Math Problem-solving Skills At Scale

Published: 15 July 2024 Publication History

Abstract

Personalized feedback, tailored to students' needs and prior knowledge, is essential for fostering mathematical problem-solving skills. However, personalized feedback is often limited to one-to-one tutoring or small classrooms as it requires instructors' in-depth diagnosis of cognitive processes employed in students' answers. We propose a large language model (LLM) pipeline that diagnoses students' problem-solving skills from their answers at scale in elementary school math word problems. Based on prior literature and an interview with a math education expert, we developed PERC, a framework composed of four problem-solving stages that students can follow: Parse, Extract, Retrieve, and Combine. The framework facilitates diagnosis by externalizing students' step-by-step problem-solving processes and allowing our pipeline to analyze each stage individually. Our LLM pipeline diagnoses each stage by (1) generating rubrics and (2) comparing students' answers with the rubrics. We fine-tuned our LLM pipeline with 71 math problem-rubric pairs and 128 problem-answer-grade triplets collected from elementary school students. We evaluated our pipeline's diagnosis accuracy against vanilla GPT-3.5 and vanilla GPT-4 with automatic and expert evaluations. The results showed the potential of our approach in improving the end-to-end diagnosis accuracy of LLMs, and expert evaluation provided specific aspects that should be improved.

References

[1]
Vanessa Chang, Christian Gütl, and Martin Ebner. 2018. Trends and opportunities in online learning, MOOCs, and cloud-based tools. Second handbook of information technology in primary and secondary education (2018), 935--953.
[2]
Esen Ersoy and Belgin Bal-Incebacak. 2017. The evaluation of the problem solving in mathematics course according to student views. In ITM Web of Conferences, Vol. 13. EDP Sciences, 01012.
[3]
Marie-Pier Goulet-Lyle, Dominic Voyer, and Lieven Verschaffel. 2020. How does imposing a step-by-step solution method impact students' approach to mathematical word problem solving? ZDM, Vol. 52, 1 (2020), 139--149.
[4]
Nourooz Hashemi, Mohd Salleh Abu, Hamidreza Kashefi, Mahani Mokhtar, and Khadijeh Rahimi. 2015. Designing learning strategy to improve undergraduate students' problem solving in derivatives and integrals: A conceptual framework. Eurasia Journal of Mathematics, Science and Technology Education, Vol. 11, 2 (2015), 227--238.
[5]
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874 (2021).
[6]
Jessica Hoth, Martina Döhrmann, Gabriele Kaiser, Andreas Busse, Johannes König, and Sigrid Blömeke. 2016. Diagnostic competence of primary school mathematics teachers during classroom situations. ZDM, Vol. 48 (2016), 41--53.
[7]
Shima Imani, Liang Du, and Harsh Shrivastava. 2023. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398 (2023).
[8]
Tae Soo Kim, Yoonjoo Lee, Jamin Shin, Young-Ho Kim, and Juho Kim. 2023. EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria. arxiv: 2309.13633 [cs.HC]
[9]
Stephen Krulik and Jesse A Rudnick. 1988. Problem Solving: A Handbook for Elementary School Teachers. ERIC.
[10]
Chien I Lee. 2016. An appropriate prompts system based on the Polya method for mathematical problem-solving. Eurasia Journal of Mathematics, Science and Technology Education, Vol. 13, 3 (2016), 893--910.
[11]
Kathryn S McCarthy, Micah Watanabe, Jianmin Dai, and Danielle S McNamara. 2020. Personalized learning in iSTART: Past modifications and future design. Journal of Research on Technology in Education, Vol. 52, 3 (2020), 301--321.
[12]
Nunuy Nurkaeti. 2018. Polya's strategy: an analysis of mathematical problem solving difficulty in 5th grade elementary school. Edu Humanities| Journal of Basic Education Cibiru Campus, Vol. 10, 2 (2018), 140.
[13]
George Polya. 2004. How to solve it: A new aspect of mathematical method. Number 246. Princeton university press.
[14]
Alexander Renkl. 1999. Learning mathematics from worked-out examples: Analyzing and fostering self-explanations. European Journal of Psychology of Education, Vol. 14, 4 (1999), 477--488.
[15]
Alan H Schoenfeld. 1983. Beyond the purely cognitive: Belief systems, social cognitions, and metacognitions as driving forces in intellectual performance. Cognitive science, Vol. 7, 4 (1983), 329--363.
[16]
NSH Simpol, M Shahrill, HC Li, and RCI Prahmana. 2017. Implementing thinking aloud pair and Pólya problem solving strategies in fractions. In Journal of Physics: Conference Series, Vol. 943. IOP Publishing, 012013.
[17]
Dirk T. Tempelaar, André Heck, Hans Cuypers, Henk van der Kooij, and Evert van de Vrie. 2013. Formative assessment and learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge (Leuven, Belgium) (LAK '13). Association for Computing Machinery, New York, NY, USA, 205--209. https://doi.org/10.1145/2460296.2460337
[18]
Yuwalee Thiangthung. 2016. Applying Polya's four-steps and Schoenfeld's behavior categories to enhance students' mathematical problem solving. Journal of Advances in Humanities and Social Sciences, Vol. 2, 5 (2016), 261--268.
[19]
Candace Walkington and Matthew L Bernacki. 2020. Appraising research on personalized learning: Definitions, theoretical alignment, advancements, and future directions., bibinfonumpages235--252 pages.
[20]
Huanhuan Wang and James D Lehman. 2021. Using achievement goal-based personalized motivational feedback to enhance online learning. Educational Technology Research and Development, Vol. 69, 2 (2021), 553--581.
[21]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, Vol. 35 (2022), 24824--24837.
[22]
Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 385, 22 pages. https://doi.org/10.1145/3491102.3517582
[23]
Erna YAYUK and H Husamah. 2020. The difficulties of prospective elementary school teachers in item problem solving for mathematics: Polya's steps. Journal for the Education of Gifted Young Scientists, Vol. 8, 1 (2020), 361--368.

Index Terms

  1. Using Large Language Models To Diagnose Math Problem-solving Skills At Scale

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    L@S '24: Proceedings of the Eleventh ACM Conference on Learning @ Scale
    July 2024
    582 pages
    ISBN:9798400706332
    DOI:10.1145/3657604
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2024

    Check for updates

    Author Tags

    1. educational diagnosis at scale
    2. large language models
    3. mathematical problem-solving skills

    Qualifiers

    • Short-paper

    Funding Sources

    • Algorithm LABS

    Conference

    L@S '24

    Acceptance Rates

    Overall Acceptance Rate 117 of 440 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 359
      Total Downloads
    • Downloads (Last 12 months)359
    • Downloads (Last 6 weeks)86
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media