Abstract
Large Language Models (LLMs) have showcased their value across diverse domains, yet their efficacy in computationally intensive tasks remains limited in accuracy. This paper introduces a comprehensive methodology to construct a resilient dataset focused on High School Physics, leveraging retrieval augmentation. Subsequent finetuning of a Large Language Model through instructional calibration is proposed to elevate outcome precision and depth. The central aspiration is reinforcing LLM efficiency in educational contexts, facilitating more precise, well-contextualized, and informative results. By bridging the gap between LLM capabilities and the demands of complex educational tasks, this approach seeks to empower educators and students alike, offering enhanced support and enriched learning experiences. Compared to Vicuna-7b, the finetuned retrieval augmented model SciPhy-RAG exhibits a 16.67% increase in BERTScore and 35.2% increase on ROUGE-2 scores. This approach has the potential to be used to reshape Physics Q &A by LLMs and has a lasting impact on their use for Physics education. Furthermore, the data sets released can be a reference point for future research and educational domain tasks such as Automatic Evaluation and Question Generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goel, A., Hira, M., Anand, A., Bangar, S., Shah, D.R.R.: Advancements in scientific controllable text generation methods (2023)
Brown, T., et al.: Language models are few-shot learners (2020)
Chung, H., et al.: Scaling instruction-finetuned language models (2022)
Vaswani, A., et al.: Attention is all you need (2017)
Chatakonda, S.K., Kollepara, N., Kumar, P.: SCIMAT: dataset of problems in science and mathematics. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 211–226. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_16
Cobbe, K., et al.: Training verifiers to solve math word problems. ArXiv Preprint ArXiv:2110.14168 (2021)
Hendrycks, D., et al.: Measuring mathematical problem solving with the MATH dataset (2021)
Touvron, H., et al.: Open and efficient foundation language models. In: LLaMA (2023)
Hu, E., et al.: Low-rank adaptation of large language models. In: LoRA (2021)
Chiang, W.L., et al.: Vicuna: an open-source chatbot Impressing GPT-4 with 90 (2023)
Ling, W., Yogatama, D., Dyer, C., Blunsom, P.: Learning to solve and explain algebraic word problems. In: Program induction by rationale generation (2017)
Miao, S., Liang, C., Su, K.: A diverse corpus for evaluating and developing English math word problem solvers (2021)
Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering (2021)
Lu, P., et al.: Multimodal reasoning via thought chains for science question answering, learn to explain (2022)
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval-augmented language model pre-training. In: REALM (2020)
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. In: Transactions Of The Association For Computational Linguistics, vol. 7, pp. 453–466 (2019)
Joshi, M., Choi, E., Weld, D., Zettlemoyer, L.: Triviaqa: a large scale distantly supervised challenge dataset for reading comprehension. ArXiv Preprint ArXiv:1705.03551. (2017)
Miao, S., Liang, C., Su, K.: A diverse corpus for evaluating and developing English math word problem solvers. ArXiv. abs/2106.15772 (2020)
Kumar, V., Maheshwary, R., Pudi, V.: Data augmentation for math word problem solvers. In: Practice Makes a Solver Perfect (2022)
Taori, R., et al.: Hashimoto stanford alpaca: an instruction-following LLaMA model (2023). https://github.com/tatsu-lab/stanford_alpaca
Zhang, T., Kishore, V., Wu, F., Weinberger, K., Artzi, Y. Evaluating text generation with BERT. In: BERTScore (2020)
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering (2020)
Saadany, H.: Constantin orăsan and BLEU, METEOR, BERTScore: evaluation of metrics performance in assessing critical translation errors in sentiment-oriented text. In: Proceedings Of The Translation And Interpreting Technology Online Conference TRITON 2021 (2021). https://doi.org/10.26615/978-954-452-071-7_006
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of The ACL Workshop On Intrinsic And Extrinsic Evaluation Measures For Machine Translation And/or Summarization, pp. 65–72 (2005)
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Wang, W., et al.: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: MiniLM (2020)
Andoni, A., Indyk, P., Razenshteyn, I.: Approximate nearest neighbor search in high dimensions (2018)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models (2023)
Yao, S., et al.: Deliberate problem solving with large language models. In: Tree of Thoughts (2023)
Chowdhery, A., et al.: Scaling language modeling with pathways. In: PaLM (2022)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference On Computer Vision And Pattern Recognition, pp. 248–255 (2009)
Kojima, T., Gu, S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)
He-Yueya, J., Poesia, G., Wang, R., Goodman, N.: Solving math word problems by combining language models with symbolic solvers (2023)
@miscfeng2015applying, title=Applying Deep Learning to Answer Selection: A Study and An Open Task, author=Minwei Feng and Bing Xiang and Michael R. Glass and Lidan Wang and Bowen Zhou, year=2015, eprint=1508.01585, archivePrefix=arXiv, primaryClass=cs.CL
Acknowledgments
We want to acknowledge the contribution of our data annotators, Aryan Goel, Ansh Varshney, Siddhartha Garg and Saurav Mehra. Rajiv Ratn Shah is partly supported by the Infosys Center for AI, the Center for Design and New Media, and the Center of Excellence in Healthcare at IIIT Delhi.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anand, A. et al. (2023). SciPhyRAG - Retrieval Augmentation to Improve LLMs on Physics Q &A. In: Goyal, V., Kumar, N., Bhowmick, S.S., Goyal, P., Goyal, N., Kumar, D. (eds) Big Data and Artificial Intelligence. BDA 2023. Lecture Notes in Computer Science, vol 14418. Springer, Cham. https://doi.org/10.1007/978-3-031-49601-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-49601-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49600-4
Online ISBN: 978-3-031-49601-1
eBook Packages: Computer ScienceComputer Science (R0)