Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions

Sood, Avnish; Mansoor, Nina; Memmi, Caroline; Lynch, Magnus; Lynch, Jeremy

doi:10.1007/s11548-024-03071-9

Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions

Original Article
Published: 21 February 2024

Volume 19, pages 645–653, (2024)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

214 Accesses
1 Citation
Explore all metrics

Abstract

Purpose

AI-image interpretation, through convolutional neural networks, shows increasing capability within radiology. These models have achieved impressive performance in specific tasks within controlled settings, but possess inherent limitations, such as the inability to consider clinical context. We assess the ability of large language models (LLMs) within the context of radiology specialty exams to determine whether they can evaluate relevant clinical information.

Methods

A database of questions was created with official sample, author written, and textbook questions based on the Royal College of Radiology (United Kingdom) FRCR 2A and American Board of Radiology (ABR) Certifying examinations. The questions were input into the Generative Pretrained Transformer (GPT) versions 3 and 4, with prompting to answer the questions.

Results

One thousand seventy-two questions were evaluated by GPT-3 and GPT-4. 495 (46.2%) were for the FRCR 2A and 577 (53.8%) were for the ABR exam. There were 890 single best answers (SBA), and 182 true/false questions. GPT-4 was correct in 629/890 (70.7%) SBA and 151/182 (83.0%) true/false questions. There was no degradation on author written questions. GPT-4 performed significantly better than GPT-3 which selected the correct answer in 282/890 (31.7%) SBA and 111/182 (61.0%) true/false questions. Performance of GPT-4 was similar across both examinations for all categories of question.

Conclusion

The newest generation of LLMs, GPT-4, demonstrates high capability in answering radiology exam questions. It shows marked improvement from GPT-3, suggesting further improvements in accuracy are possible. Further research is needed to explore the clinical applicability of these AI models in real-world settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TensorFlow for Doctors

Hybrid deep learning model for answering visual medical questions

Article 11 April 2022

How to Pre-train Your Model? Comparison of Different Pre-training Models for Biomedical Question Answering

References

Kelly BS, Judge C, Bollard SM, Clifford SM, Healy GM, Aziz A, Mathur P, Islam S, Yeom KW, Lawlor A, Killeen RP (2022) Radiology artificial intelligence: a systematic review and evaluation of methods (RAISE). Eur Radiol 32(11):7998–8007. https://doi.org/10.1007/s00330-022-08784-6
Article PubMed PubMed Central Google Scholar
Aggarwal R, Sounderajah V, Martin G, Ting DSW, Karthikesalingam A, King D, Ashrafian H, Darzi A (2021) Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit Med 4(1):65. https://doi.org/10.1038/s41746-021-00438-z
Article PubMed PubMed Central Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need [Internet]. Accessed 2023 Apr 16. Available from: https://arxiv.org/abs/1706.03762
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D (2023) How does ChatGPT perform on the united states medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312
Article PubMed PubMed Central Google Scholar
OpenAI. (2023) GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf
Lindsay R (2012) SBAs for the final FRCR 2A. Oxford University Press, Oxford
Book Google Scholar
R Core Team (2020) R: A language and environment for statistical computing [Internet]. Vienna, Austria: R foundation for statistical computing. Available from: https://www.R-project.org/
Shelmerdine SC, Martin H, Shirodkar K, Shamshuddin S, Weir-McCall JR (2022) Can artificial intelligence pass the fellowship of the royal college of radiologists examination? Multi-reader diagnostic accuracy study. BMJ 379:e072826. https://doi.org/10.1136/bmj-2022-072826
Article PubMed PubMed Central Google Scholar
Yu AC, Mohajer B, Eng J (2022) External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol Artif Intell 4(3):e210064. https://doi.org/10.1148/ryai.210064
Article PubMed PubMed Central Google Scholar
Waisberg E, Ong J, Masalkhi M, Kamran SA, Zaman N, Sarker P, Lee AG, Tavakkoli A (2023) GPT-4: a new era of artificial intelligence in medicine. Ir J Med Sci. https://doi.org/10.1007/s11845-023-03377-8
Article PubMed PubMed Central Google Scholar
Janssen BV, Kazemier G, Besselink MG (2023) The use of ChatGPT and other large language models in surgical science. BJS open. 7(2):zrad032. https://doi.org/10.1093/bjsopen/zrad032
Article PubMed PubMed Central Google Scholar
Hardy M, Harvey H (2020) Artificial intelligence in diagnostic imaging: Impact on the radiography profession. Br J Radiol 93(1108):20190840. https://doi.org/10.1259/bjr.20190840
Article PubMed PubMed Central Google Scholar
Vincoff NS, Barish MA, Grimaldi G (2022) The patient-friendly radiology report: history, evolution, challenges and opportunities. Clin Imaging 89:128–135. https://doi.org/10.1016/j.clinimag.2022.06.018
Article PubMed Google Scholar
Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P (2023) Survey of hallucination in natural language generation. ACM Comput Surv 55(12):1–38. https://doi.org/10.1145/3571730
Article Google Scholar

Download references

Acknowledgements

We would like to thank Joshua Eves (Kings College Hospital, London, UK) for helping with the question validation stage.

Author information

Authors and Affiliations

King’s College London, Strand, London, WC2R 2LS, UK
Avnish Sood
Department of Neuroradiology, Kings College Hospital, Denmark Hill, London, SE59RS, UK
Nina Mansoor & Jeremy Lynch
Imperial College London, Exhibition Road, London, SW7 2AZ, UK
Caroline Memmi
King’s College London Centre for Stem Cells and Regenerative Medicine, Guy’s Hospital, Great Maze Pond, London, UK
Magnus Lynch
St John’s Institute of Dermatology, King’s College London, London, UK
Magnus Lynch

Authors

Avnish Sood
View author publications
You can also search for this author in PubMed Google Scholar
Nina Mansoor
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Memmi
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Lynch
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Lynch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeremy Lynch.

Ethics declarations

Conflict of interest

The authors did not receive any financial or non-financial support for the preparation, submission, or conduct of this work and have no competing interests or relevant affiliations to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sood, A., Mansoor, N., Memmi, C. et al. Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions. Int J CARS 19, 645–653 (2024). https://doi.org/10.1007/s11548-024-03071-9

Download citation

Received: 02 November 2023
Accepted: 01 February 2024
Published: 21 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11548-024-03071-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions