Skip to main content

CORE-GPT: Combining Open Access Research and Large Language Models for Credible, Trustworthy Question Answering

  • Conference paper
  • First Online:
Linking Theory and Practice of Digital Libraries (TPDL 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14241))

Included in the following conference series:

Abstract

In this paper, we present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE (https://core.ac.uk). We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT’s performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Data and Code Availability

All data and software code used for the evaluation of CORE-GPT are available to promote transparency and reproducibility of the findings. The dataset of questions and answers and the source code used for the analysis and visualisations in this study are accessible on the CORE-GPT GitHub repository (https://github.com/oacore/core-gpt-evaluation). Any questions or requests for further information can be addressed to the corresponding author.

Notes

  1. 1.

    https://github.com/oacore/core-gpt-evaluation.

References

  1. LSE: LSE (ed.): New AI tools that can write student essays require educators to rethink teaching and assessment. The London School of Economics and Political Science; 2022. Accessed 18 May 2023. https://blogs.lse.ac.uk/impactofsocialsciences/2022/05/17/new-ai-tools-that-can-write-student-essays-require-educators-to-rethink-teaching-and-assessment/

  2. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)

  3. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., et al.: RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  4. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog. 1(8), 9 (2019)

    Google Scholar 

  5. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  6. OpenAI. OpenAI (ed.): GPT-4 Techincal Report. OpenAI; 2023. Accessed 24 Apr 2023. https://cdn.openai.com/papers/gpt-4.pdf

  7. Fan, L., Li, L., Ma, Z., Lee, S., Yu, H., Hemphill, L.: A bibliometric review of large language models research from 2017 to 2023. arXiv preprint arXiv:2304.02020 (2023)

  8. Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

  9. Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., et al.: ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103, 102274 (2023)

    Article  Google Scholar 

  10. Shen, Y., Moy, L. (ed.): ChatGPT and other large language models are double-edged swords. Radiological Society of North America (2023)

    Google Scholar 

  11. Armstrong, K.: BBC (ed.). ChatGPT: US lawyer admits using AI for case research. BBC (2023). Accessed 30 May 2023. https://www.bbc.co.uk/news/world-us-canada-65735769

  12. Gao, C.A., Howard, F.M., Markov, N.S., Dyer, E.C., Ramesh, S., Luo, Y., et al.: Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv. 2022:2022–12

    Google Scholar 

  13. Alkaissi, H., McFarlane, S.I.: Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 15(2), 35179 (2023)

    Google Scholar 

  14. McMichael, J., SMU (ed.): Artificial intelligence and the research paper: a librarian’s perspective. SMU; 2023. Accessed 30 May 2023. https://blog.smu.edu/smulibraries/2023/01/20/artificial-intelligence-and-the-research-paper-a-librarians-perspective/

  15. Knoth, P., et al.: CORE: A Global Aggregation Service for Open Access Papers. Publication due June, Nature Scientific Data (2023)

    Google Scholar 

  16. Gusenbauer, M.: Search where you will find most: comparing the disciplinary coverage of 56 bibliographic databases. Scientometrics 127(5), 2683–2745 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Pride .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pride, D., Cancellieri, M., Knoth, P. (2023). CORE-GPT: Combining Open Access Research and Large Language Models for Credible, Trustworthy Question Answering. In: Alonso, O., Cousijn, H., Silvello, G., Marrero, M., Teixeira Lopes, C., Marchesin, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2023. Lecture Notes in Computer Science, vol 14241. Springer, Cham. https://doi.org/10.1007/978-3-031-43849-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43849-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43848-6

  • Online ISBN: 978-3-031-43849-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics