Skip to main content

SciPhyRAG - Retrieval Augmentation to Improve LLMs on Physics Q &A

  • Conference paper
  • First Online:
Big Data and Artificial Intelligence (BDA 2023)

Abstract

Large Language Models (LLMs) have showcased their value across diverse domains, yet their efficacy in computationally intensive tasks remains limited in accuracy. This paper introduces a comprehensive methodology to construct a resilient dataset focused on High School Physics, leveraging retrieval augmentation. Subsequent finetuning of a Large Language Model through instructional calibration is proposed to elevate outcome precision and depth. The central aspiration is reinforcing LLM efficiency in educational contexts, facilitating more precise, well-contextualized, and informative results. By bridging the gap between LLM capabilities and the demands of complex educational tasks, this approach seeks to empower educators and students alike, offering enhanced support and enriched learning experiences. Compared to Vicuna-7b, the finetuned retrieval augmented model SciPhy-RAG exhibits a 16.67% increase in BERTScore and 35.2% increase on ROUGE-2 scores. This approach has the potential to be used to reshape Physics Q &A by LLMs and has a lasting impact on their use for Physics education. Furthermore, the data sets released can be a reference point for future research and educational domain tasks such as Automatic Evaluation and Question Generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Goel, A., Hira, M., Anand, A., Bangar, S., Shah, D.R.R.: Advancements in scientific controllable text generation methods (2023)

    Google Scholar 

  2. Brown, T., et al.: Language models are few-shot learners (2020)

    Google Scholar 

  3. Chung, H., et al.: Scaling instruction-finetuned language models (2022)

    Google Scholar 

  4. Vaswani, A., et al.: Attention is all you need (2017)

    Google Scholar 

  5. Chatakonda, S.K., Kollepara, N., Kumar, P.: SCIMAT: dataset of problems in science and mathematics. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 211–226. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_16

    Chapter  Google Scholar 

  6. Cobbe, K., et al.: Training verifiers to solve math word problems. ArXiv Preprint ArXiv:2110.14168 (2021)

  7. Hendrycks, D., et al.: Measuring mathematical problem solving with the MATH dataset (2021)

    Google Scholar 

  8. Touvron, H., et al.: Open and efficient foundation language models. In: LLaMA (2023)

    Google Scholar 

  9. Hu, E., et al.: Low-rank adaptation of large language models. In: LoRA (2021)

    Google Scholar 

  10. Chiang, W.L., et al.: Vicuna: an open-source chatbot Impressing GPT-4 with 90 (2023)

    Google Scholar 

  11. Ling, W., Yogatama, D., Dyer, C., Blunsom, P.: Learning to solve and explain algebraic word problems. In: Program induction by rationale generation (2017)

    Google Scholar 

  12. Miao, S., Liang, C., Su, K.: A diverse corpus for evaluating and developing English math word problem solvers (2021)

    Google Scholar 

  13. Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering (2021)

    Google Scholar 

  14. Lu, P., et al.: Multimodal reasoning via thought chains for science question answering, learn to explain (2022)

    Google Scholar 

  15. Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval-augmented language model pre-training. In: REALM (2020)

    Google Scholar 

  16. Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. In: Transactions Of The Association For Computational Linguistics, vol. 7, pp. 453–466 (2019)

    Google Scholar 

  17. Joshi, M., Choi, E., Weld, D., Zettlemoyer, L.: Triviaqa: a large scale distantly supervised challenge dataset for reading comprehension. ArXiv Preprint ArXiv:1705.03551. (2017)

  18. Miao, S., Liang, C., Su, K.: A diverse corpus for evaluating and developing English math word problem solvers. ArXiv. abs/2106.15772 (2020)

  19. Kumar, V., Maheshwary, R., Pudi, V.: Data augmentation for math word problem solvers. In: Practice Makes a Solver Perfect (2022)

    Google Scholar 

  20. Taori, R., et al.: Hashimoto stanford alpaca: an instruction-following LLaMA model (2023). https://github.com/tatsu-lab/stanford_alpaca

  21. Zhang, T., Kishore, V., Wu, F., Weinberger, K., Artzi, Y. Evaluating text generation with BERT. In: BERTScore (2020)

    Google Scholar 

  22. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering (2020)

    Google Scholar 

  23. Saadany, H.: Constantin orăsan and BLEU, METEOR, BERTScore: evaluation of metrics performance in assessing critical translation errors in sentiment-oriented text. In: Proceedings Of The Translation And Interpreting Technology Online Conference TRITON 2021 (2021). https://doi.org/10.26615/978-954-452-071-7_006

  24. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of The ACL Workshop On Intrinsic And Extrinsic Evaluation Measures For Machine Translation And/or Summarization, pp. 65–72 (2005)

    Google Scholar 

  25. Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  26. Wang, W., et al.: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: MiniLM (2020)

    Google Scholar 

  27. Andoni, A., Indyk, P., Razenshteyn, I.: Approximate nearest neighbor search in high dimensions (2018)

    Google Scholar 

  28. Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models (2023)

    Google Scholar 

  29. Yao, S., et al.: Deliberate problem solving with large language models. In: Tree of Thoughts (2023)

    Google Scholar 

  30. Chowdhery, A., et al.: Scaling language modeling with pathways. In: PaLM (2022)

    Google Scholar 

  31. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference On Computer Vision And Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  32. Kojima, T., Gu, S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)

    Google Scholar 

  33. He-Yueya, J., Poesia, G., Wang, R., Goodman, N.: Solving math word problems by combining language models with symbolic solvers (2023)

    Google Scholar 

  34. @miscfeng2015applying, title=Applying Deep Learning to Answer Selection: A Study and An Open Task, author=Minwei Feng and Bing Xiang and Michael R. Glass and Lidan Wang and Bowen Zhou, year=2015, eprint=1508.01585, archivePrefix=arXiv, primaryClass=cs.CL

    Google Scholar 

Download references

Acknowledgments

We want to acknowledge the contribution of our data annotators, Aryan Goel, Ansh Varshney, Siddhartha Garg and Saurav Mehra. Rajiv Ratn Shah is partly supported by the Infosys Center for AI, the Center for Design and New Media, and the Center of Excellence in Healthcare at IIIT Delhi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avinash Anand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anand, A. et al. (2023). SciPhyRAG - Retrieval Augmentation to Improve LLMs on Physics Q &A. In: Goyal, V., Kumar, N., Bhowmick, S.S., Goyal, P., Goyal, N., Kumar, D. (eds) Big Data and Artificial Intelligence. BDA 2023. Lecture Notes in Computer Science, vol 14418. Springer, Cham. https://doi.org/10.1007/978-3-031-49601-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49601-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49600-4

  • Online ISBN: 978-3-031-49601-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics