Abstract:
This study aims to evaluate the capabilities and limitations of three large language models (LLMs)-GPT-3, GPT-4, and Bard, in the field of pharmacy by assessing their rea...Show MoreMetadata
Abstract:
This study aims to evaluate the capabilities and limitations of three large language models (LLMs)-GPT-3, GPT-4, and Bard, in the field of pharmacy by assessing their reasoning abilities on a sample of the North American Pharmacist Licensure Examination (NAPLEX). Additionally, we explore the potential impacts of LLMs on pharmacy education and practice. To evaluate the LLMs, we utilized the sample of the NAPLEX exam comprising 137 multiple-choice questions. These questions were presented to GPT-3, GPT-4, and Bard through their respective user interfaces, and the answers generated by the LLMs were subsequently compared with the answer key. The results reveal a notable disparity in the performance of the LLMs. GPT-4 emerged as the top performer, accurately answering 78.8% of the questions. This marked a substantial 11% and 27.7% improvement over Bard and GPT-3, respectively. However, when considering questions that required multiple selections, the performance of each LLM decreased significantly. GPT-4, GPT-3, and Bard could only correctly respond to 53.6%, 13.9%, and 21.4% of such questions, respectively. Among the three LLMs evaluated, GPT-4 was the only model capable of passing the NAPLEX exam. Nevertheless, given the continuous evolution of LLMs, it is reasonable to anticipate that future models will effortlessly excel in this context. This highlights the significant potential of LLMs to influence the field of pharmacy. Hence, we must evaluate both the positive and negative implications associated with the integration of LLMs in pharmacy education and practice.
Published in: 2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS)
Date of Conference: 21-24 November 2023
Date Added to IEEE Xplore: 02 January 2024
ISBN Information: