SciPhyRAG - Retrieval Augmentation to Improve LLMs on Physics Q &A

Anand, Avinash; Goel, Arnav; Hira, Medha; Buldeo, Snehal; Kumar, Jatin; Verma, Astha; Gupta, Rushali; Shah, Rajiv Ratn

doi:10.1007/978-3-031-49601-1_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14418))

Included in the following conference series:

International Conference on Big Data Analytics

1152 Accesses
5 Citations

Abstract

Large Language Models (LLMs) have showcased their value across diverse domains, yet their efficacy in computationally intensive tasks remains limited in accuracy. This paper introduces a comprehensive methodology to construct a resilient dataset focused on High School Physics, leveraging retrieval augmentation. Subsequent finetuning of a Large Language Model through instructional calibration is proposed to elevate outcome precision and depth. The central aspiration is reinforcing LLM efficiency in educational contexts, facilitating more precise, well-contextualized, and informative results. By bridging the gap between LLM capabilities and the demands of complex educational tasks, this approach seeks to empower educators and students alike, offering enhanced support and enriched learning experiences. Compared to Vicuna-7b, the finetuned retrieval augmented model SciPhy-RAG exhibits a 16.67% increase in BERTScore and 35.2% increase on ROUGE-2 scores. This approach has the potential to be used to reshape Physics Q &A by LLMs and has a lasting impact on their use for Physics education. Furthermore, the data sets released can be a reference point for future research and educational domain tasks such as Automatic Evaluation and Question Generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

BloomLLM: Large Language Models Based Question Generation Combining Supervised Fine-Tuning and Bloom’s Taxonomy

Automated Long Answer Grading with RiceChem Dataset

Leveraging GPT-4 for Accuracy in Education: A Comparative Study on Retrieval-Augmented Generation in MOOCs

References

Goel, A., Hira, M., Anand, A., Bangar, S., Shah, D.R.R.: Advancements in scientific controllable text generation methods (2023)
Google Scholar
Brown, T., et al.: Language models are few-shot learners (2020)
Google Scholar
Chung, H., et al.: Scaling instruction-finetuned language models (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need (2017)
Google Scholar
Chatakonda, S.K., Kollepara, N., Kumar, P.: SCIMAT: dataset of problems in science and mathematics. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 211–226. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_16
Chapter Google Scholar
Cobbe, K., et al.: Training verifiers to solve math word problems. ArXiv Preprint ArXiv:2110.14168 (2021)
Hendrycks, D., et al.: Measuring mathematical problem solving with the MATH dataset (2021)
Google Scholar
Touvron, H., et al.: Open and efficient foundation language models. In: LLaMA (2023)
Google Scholar
Hu, E., et al.: Low-rank adaptation of large language models. In: LoRA (2021)
Google Scholar
Chiang, W.L., et al.: Vicuna: an open-source chatbot Impressing GPT-4 with 90 (2023)
Google Scholar
Ling, W., Yogatama, D., Dyer, C., Blunsom, P.: Learning to solve and explain algebraic word problems. In: Program induction by rationale generation (2017)
Google Scholar
Miao, S., Liang, C., Su, K.: A diverse corpus for evaluating and developing English math word problem solvers (2021)
Google Scholar
Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering (2021)
Google Scholar
Lu, P., et al.: Multimodal reasoning via thought chains for science question answering, learn to explain (2022)
Google Scholar
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval-augmented language model pre-training. In: REALM (2020)
Google Scholar
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. In: Transactions Of The Association For Computational Linguistics, vol. 7, pp. 453–466 (2019)
Google Scholar
Joshi, M., Choi, E., Weld, D., Zettlemoyer, L.: Triviaqa: a large scale distantly supervised challenge dataset for reading comprehension. ArXiv Preprint ArXiv:1705.03551. (2017)
Miao, S., Liang, C., Su, K.: A diverse corpus for evaluating and developing English math word problem solvers. ArXiv. abs/2106.15772 (2020)
Kumar, V., Maheshwary, R., Pudi, V.: Data augmentation for math word problem solvers. In: Practice Makes a Solver Perfect (2022)
Google Scholar
Taori, R., et al.: Hashimoto stanford alpaca: an instruction-following LLaMA model (2023). https://github.com/tatsu-lab/stanford_alpaca
Zhang, T., Kishore, V., Wu, F., Weinberger, K., Artzi, Y. Evaluating text generation with BERT. In: BERTScore (2020)
Google Scholar
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering (2020)
Google Scholar
Saadany, H.: Constantin orăsan and BLEU, METEOR, BERTScore: evaluation of metrics performance in assessing critical translation errors in sentiment-oriented text. In: Proceedings Of The Translation And Interpreting Technology Online Conference TRITON 2021 (2021). https://doi.org/10.26615/978-954-452-071-7_006
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of The ACL Workshop On Intrinsic And Extrinsic Evaluation Measures For Machine Translation And/or Summarization, pp. 65–72 (2005)
Google Scholar
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Wang, W., et al.: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: MiniLM (2020)
Google Scholar
Andoni, A., Indyk, P., Razenshteyn, I.: Approximate nearest neighbor search in high dimensions (2018)
Google Scholar
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models (2023)
Google Scholar
Yao, S., et al.: Deliberate problem solving with large language models. In: Tree of Thoughts (2023)
Google Scholar
Chowdhery, A., et al.: Scaling language modeling with pathways. In: PaLM (2022)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference On Computer Vision And Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Kojima, T., Gu, S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)
Google Scholar
He-Yueya, J., Poesia, G., Wang, R., Goodman, N.: Solving math word problems by combining language models with symbolic solvers (2023)
Google Scholar
@miscfeng2015applying, title=Applying Deep Learning to Answer Selection: A Study and An Open Task, author=Minwei Feng and Bing Xiang and Michael R. Glass and Lidan Wang and Bowen Zhou, year=2015, eprint=1508.01585, archivePrefix=arXiv, primaryClass=cs.CL
Google Scholar

Download references

Acknowledgments

We want to acknowledge the contribution of our data annotators, Aryan Goel, Ansh Varshney, Siddhartha Garg and Saurav Mehra. Rajiv Ratn Shah is partly supported by the Infosys Center for AI, the Center for Design and New Media, and the Center of Excellence in Healthcare at IIIT Delhi.

Author information

Authors and Affiliations

Indraprastha Institute of Information Technology, Delhi, India
Avinash Anand, Arnav Goel, Medha Hira, Snehal Buldeo, Jatin Kumar, Astha Verma, Rushali Gupta & Rajiv Ratn Shah

Authors

Avinash Anand
View author publications
You can also search for this author in PubMed Google Scholar
Arnav Goel
View author publications
You can also search for this author in PubMed Google Scholar
Medha Hira
View author publications
You can also search for this author in PubMed Google Scholar
Snehal Buldeo
View author publications
You can also search for this author in PubMed Google Scholar
Jatin Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Astha Verma
View author publications
You can also search for this author in PubMed Google Scholar
Rushali Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ratn Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avinash Anand .

Editor information

Editors and Affiliations

Indraprastha Institute of Information Technology, Delhi, India
Vikram Goyal
University of Delhi, Delhi, India
Naveen Kumar
Nanyang Technological University, Singapore, Singapore
Sourav S. Bhowmick
Indian Institute of Technology, Kharagpur, India
Pawan Goyal
Birla Institute of Technology and Science, Pilani, India
Navneet Goyal
Indraprastha Institute of Information Technology, Delhi, India
Dhruv Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anand, A. et al. (2023). SciPhyRAG - Retrieval Augmentation to Improve LLMs on Physics Q &A. In: Goyal, V., Kumar, N., Bhowmick, S.S., Goyal, P., Goyal, N., Kumar, D. (eds) Big Data and Artificial Intelligence. BDA 2023. Lecture Notes in Computer Science, vol 14418. Springer, Cham. https://doi.org/10.1007/978-3-031-49601-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-49601-1_4
Published: 04 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49600-4
Online ISBN: 978-3-031-49601-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SciPhyRAG - Retrieval Augmentation to Improve LLMs on Physics Q &A