OrphaGPT: An Adapted Large Language Model for Orphan Diseases Classification

Pokhrel, Kushal; Sanin, Cesar; Islam, Md Rafiqul; Hossain Sakib, Md. Kowsar; Ulhaq, Anwaar; Szczerbicki, Edward

doi:10.1007/978-981-97-4982-9_16

Kushal Pokhrel¹⁴,
Cesar Sanin¹⁴,
Md Rafiqul Islam¹⁴,
Md. Kowsar Hossain Sakib¹⁵,
Anwaar Ulhaq¹⁶ &
…
Edward Szczerbicki¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14795))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

349 Accesses
1 Altmetric

Abstract

Orphan diseases (OD) represent a category of rare conditions that affect only a relatively small number of individuals. These conditions are often neglected in research due to the challenges posed by their scarcity, making medical advancements difficult. Then, the ever-evolving medical research and diagnosis landscape calls for more attention and innovative approaches to address the complex challenges of rare diseases and OD. Pre-trained LLMs are a crucial component of contemporary artificial intelligence (AI), contributing to significant advancements in the performance of complex AI tasks. In this research, we aim to introduce a novel model that leverages the capabilities of a fine-tuned GPT-3.5 Turbo model with reasonable accuracy. We design a comprehensive, customized user interface named OrphaGPT, an interactive GPT chat that allows users to engage in deeper conversations about ODs. Our model achieves an 80% accuracy rate, attained through an exploration of Natural Language Processing (NLP), and domain-specific fine-tuning and fine-prompting. Our findings provide valuable insights into the new perspectives of prompting as a way of fine-tuning LLMs while customizing them to specialised domains. This showcases the potential for adaptive generative AI to play a pivotal role in the specific field of OD. The implications of this research extend to medical practitioners, researchers, and the OD community, offering new interactive ways to understand, identify, and diagnose such complex diseases through the customized advanced language model. The successful customization of LLMs into specific fields signifies an advancement of AI, contextualising dialogues and presenting implications for future advances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

Article Open access 06 March 2024

A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs

Article Open access 28 January 2025

Large language models in medicine

Article 17 July 2023

References

Bakker, M., et al.: Fine-tuning language models to find agreement among humans with diverse preferences. Adv. Neural. Inf. Process. Syst. 35, 38176–38189 (2022)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Field, M.J., Boat, T.F., et al.: Profile of rare diseases. In: Rare Diseases and Orphan Products: Accelerating Research and Development (2010)
Google Scholar
Hendy, A., et al.: How good are GPT models at machine translation? A comprehensive evaluation. arXiv preprint arXiv:2302.09210 (2023)
Hu, Z., et al.: LLM-Adapters: an adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933 (2023)
Huang, J., et al.: Large language models can self-improve. arXiv preprint arXiv:2210.11610 (2022)
Lerman, P.: Fitting segmented regression models by grid search. J. R. Stat. Soc. Ser. C Appl. Stat. 29(1), 77–84 (1980)
MathSciNet Google Scholar
Li, J., Zhang, Z., Zhao, H.: Dialogue-adaptive language model pre-training from quality estimation*. Neurocomputing 516, 27–35 (2023)
Article Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Ludan, J.M., et al.: Explanation-based finetuning makes models more robust to spurious cues. arXiv preprint arXiv:2305.04990 (2023)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
Richter, T., et al.: Rare disease terminology and definitions-a systematic global review: report of the ISPOR rare disease special interest group. Value Health 18(6), 906–914 (2015)
Article Google Scholar
Wang, J., Liang, Y., Meng, F., Li, Z., Qu, J., Zhou, J.: Cross-lingual summarization via chatgpt. arXiv preprint arXiv:2302.14229 (2023)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Google Scholar
Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2. IEEE (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Business Information Systems, Australian Institute of Higher Education, Sydney, Australia
Kushal Pokhrel, Cesar Sanin & Md Rafiqul Islam
School of Computer Science, Taylors University, Subang Jaya, Malaysia
Md. Kowsar Hossain Sakib
School of Engineering and Technology, Central Queensland University, Sydney, Australia
Anwaar Ulhaq
Faculty of Management and Economics, Gdansk University of Technology, Gdansk, Poland
Edward Szczerbicki

Authors

Kushal Pokhrel
View author publications
You can also search for this author in PubMed Google Scholar
Cesar Sanin
View author publications
You can also search for this author in PubMed Google Scholar
Md Rafiqul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Md. Kowsar Hossain Sakib
View author publications
You can also search for this author in PubMed Google Scholar
Anwaar Ulhaq
View author publications
You can also search for this author in PubMed Google Scholar
Edward Szczerbicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cesar Sanin .

Editor information

Editors and Affiliations

Wroclaw University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
University of Pau and Adour Countries, Pau, France
Richard Chbeir
Open University of Cyprus, Latsia, Cyprus
Yannis Manolopoulos
Iwate Prefectural University, Takizawa, Japan
Hamido Fujita
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Japan Advanced Institute of Science and Technology, Nomi, Japan
Le Minh Nguyen
Wrocław University of Science and Technology, Wrocław, Poland
Krystian Wojtkiewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pokhrel, K., Sanin, C., Islam, M.R., Hossain Sakib, M.K., Ulhaq, A., Szczerbicki, E. (2024). OrphaGPT: An Adapted Large Language Model for Orphan Diseases Classification. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2024. Lecture Notes in Computer Science(), vol 14795. Springer, Singapore. https://doi.org/10.1007/978-981-97-4982-9_16

Download citation

DOI: https://doi.org/10.1007/978-981-97-4982-9_16
Published: 16 July 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-4981-2
Online ISBN: 978-981-97-4982-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

OrphaGPT: An Adapted Large Language Model for Orphan Diseases Classification