Engaging Preference Optimization Alignment in Large Language Model for Continual Radiology Report Generation: A Hybrid Approach

Izhar, Amaan; Idris, Norisma; Japar, Nurul

doi:10.1007/s12559-025-10404-6

Engaging Preference Optimization Alignment in Large Language Model for Continual Radiology Report Generation: A Hybrid Approach

Correspondence
Published: 27 January 2025

Volume 17, article number 53, (2025)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Amaan Izhar¹,
Norisma Idris¹ &
Nurul Japar¹

303 Accesses
Explore all metrics

Abstract

Large language models (LLMs) remain relatively underutilized in medical imaging, particularly in radiology, which is essential for disease diagnosis and management. Nonetheless, radiology report generation (RRG) is a time-consuming task that can result in delays and inconsistencies. To address these challenges, we present a novel hybrid approach that integrates multi-modal radiology information and preference optimization alignment in LLM for continual RRG. Our method integrates a pre-trained small multi-modal model to analyze radiology images and generate an initial report, which is subsequently refined and aligned by an LLM using odds ratio preference optimization (ORPO) and with historical patient data and assessments to mimic radiologist-like responses, bypassing reinforcement learning from human feedback-based (RLHF) alignment. This two-stage fusion—supervised fine-tuning followed by preference optimization—ensures high accuracy while minimizing hallucinations and errors. We also propose a data field curation strategy extendable to various other RRG modality datasets, focusing on selecting relevant responses for preference alignment. We evaluate our approach on two public datasets, achieving state-of-the-art performance with average Bleu scores of 0.375 and 0.647, Meteor scores of 0.495 and 0.714, Rouge-L scores of 0.483 and 0.732, and average F1-RadGraph scores of 0.488 and 0.487, for chest X-rays and lung CT scan datasets, respectively. We further provide in-depth qualitative analyses and ablation studies to explain the workings of our model and grasp the clinical relevance for RRG. This work presents the first application of preference optimization in continual RRG, representing a significant advancement in automating clinically reliable report generation. By reducing cognitive burdens on radiologists through AI-powered reasoning and alignment in LLMs, the proposed model improves decision-making, perception, and diagnostic precision, streamlining workflows and enhancing patient care. Our code is available at https://github.com/AI-14/r2gpoallm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data Availability

No datasets were generated or analyzed during the current study.

Notes

References

ESR. Medical imaging in personalised medicine: a white paper of the research committee of the European Society of Radiology (ESR). Insights into imaging. 2015;6:141–155.
Ouis MY, Akhloufi M. Deep learning for report generation on chest X-ray images. Comput Med Imag Graph. 2023;102320.
Alfarghaly O, Khaled R, Elkorany A, Helal M, Fahmy A. Automated radiology report generation using conditioned transformers. Inf Med Unlocked. 2021;24:100557.
Article Google Scholar
Liao Y, Liu H. Spasić I. Deep learning approaches to automatic radiology report generation: a systematic review. Inf Med Unlocked. 2023;101273.
Henderson M. Radiology facing a global shortage. Online. Available from: https://www.rsna.org/news/2022/may/global-radiologist-shortage. Accessed 11 May 2023
Fleishon HB. Radiology workforce shortage: the “silver squad” option. J American College of Radiology. 2024.
Singh AK, Kumar A, Mahmud M, Kaiser MS, Kishore A. COVID-19 infection detection from chest X-ray images using hybrid social group optimization and support vector classifier. Cogn Comput. 2024;16(4):1765–77.
Article Google Scholar
Nazi ZA, Peng W. Large language models in healthcare and medical domain: a review. In: Informatics. vol. 11. MDPI; 2024. p. 57.
Xu L, Tang Q, Lv J, Zheng B, Zeng X, Li W. Deep image captioning: a review of methods, trends and future challenges. Neurocomputing. 2023;126287.
Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, et al. Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc. 2016;23(2):304–10.
Article Google Scholar
He K, Mao R, Lin Q, Ruan Y, Lan X, Feng M, et al. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. 2023. arXiv preprint arXiv:2310.05694
Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C. Direct preference optimization: your language model is secretly a reward model. Adv Neural Inf Process Syst. 2024;36.
Selivanov A, Rogov OY, Chesakov D, Shelmanov A, Fedulova I, Dylov DV. Medical image captioning via generative pretrained transformers. Sci Rep. 2023;13(1):4171.
Article Google Scholar
Thieme A, Rajamohan A, Cooper B, Groombridge H, Simister R, Wong B, et al. Challenges for responsible AI design and workflow integration in healthcare: a case study of automatic feeding tube qualification in radiology. 2024. arXiv preprint arXiv:2405.05299
Hyland SL, Bannur S, Bouzid K, Castro DC, Ranjit M, Schwaighofer A, et al. Maira-1: a specialised large multimodal model for radiology report generation. 2023. arXiv preprint arXiv:2311.13668
Hochreiter S. Long short-term memory. Neural Computation MIT-Press. 1997.
Cho K. Learning phrase representations using RNN encoder-decoder for statistical machine translation. 2014. arXiv preprint arXiv:1406.1078
Paalvast O, Nauta M, Koelle M, Geerdink J, Vijlbrief O, Hegeman JH, et al. Radiology report generation for proximal femur fractures using deep classification and language generation models. Artif Intell Med. 2022;128:102281.
Article Google Scholar
Gajbhiye GO, Nandedkar AV, Faye I. Translating medical image to radiological report: adaptive multilevel multi-attention approach. Comput Methods Programs Biomed. 2022;221:106853.
Article Google Scholar
Yang S, Niu J, Wu J, Wang Y, Liu X, Li Q. Automatic ultrasound image report generation with adaptive multimodal attention mechanism. Neurocomputing. 2021;427:40–9.
Article MATH Google Scholar
Wang F, Liang X, Xu L, Lin L. Unifying relational sentence generation and retrieval for medical image report composition. IEEE Trans Cybern. 2020;52(6):5015–25.
Article MATH Google Scholar
Vaswani A. Attention is all you need. Adv Neural Inf Process Syst. 2017.
Aksoy N, Ravikumar N, Frangi AF. Radiology report generation using transformers conditioned with non-imaging data. In: Medical Imaging 2023: Imaging Informatics for Healthcare, Research, and Applications. vol. 12469. SPIE; 2023. p. 146–154.
Zhang S, Zhou C, Chen L, Li Z, Gao Y, Chen Y. Visual prior-based cross-modal alignment network for radiology report generation. Comput Biol Med. 2023;166:107522.
Article Google Scholar
Pahwa E, Mehta D, Kapadia S, Jain D, Luthra A. Medskip: medical report generation using skip connections and integrated attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021. p. 3409–3415.
Chen Z, Song Y, Chang TH, Wan X. Generating radiology reports via memory-driven transformer. 2020. arXiv preprint arXiv:2010.16056
Mohsan MM, Akram MU, Rasool G, Alghamdi NS, Baqai MAA, Abbas M. Vision transformer and language model based radiology report generation. IEEE Access. 2022;11:1814–24.
Article Google Scholar
Wang Z, Liu L, Wang L, Zhou L. Metransformer: radiology report generation by transformer with multiple learnable expert tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 11558–11567.
Yang S, Wu X, Ge S, Zhou SK, Xiao L. Knowledge matters: chest radiology report generation with general and specific knowledge. Med Image Anal. 2022;80:102510.
Article Google Scholar
Liu F, Wu X, Ge S, Fan W, Zou Y. Exploring and distilling posterior and prior knowledge for radiology report generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021. p. 13753–13762.
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: open foundation and fine-tuned chat models. 2023. arXiv preprint arXiv:2307.09288
Zheng L, Chiang WL, Sheng Y, Zhuang S, Wu Z, Zhuang Y, et al.: Judging LLM-as-a-judge with MT-bench and chatbot arena.
Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, Casas Ddl, et al. Mistral 7B. 2023. arXiv preprint arXiv:2310.06825
Han T, Adams LC, Papaioannou JM, Grundmann P, Oberhauser T, Löser A, et al. MedAlpaca–an open-source collection of medical conversational AI models and training data. 2023. arXiv preprint arXiv:2304.08247
Nakaura T, Yoshida N, Kobayashi N, Shiraishi K, Nagayama Y, Uetani H, et al. Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports. Japanese Journal of Radiology. 2023;p. 1–11.
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.
Google Scholar
Brown TB. Language models are few-shot learners. 2020. arXiv preprint arXiv:2005.14165
Jiang Z, Cai X, Yang L, Gao D, Zhao W, Han J, et al. Learning to summarize Chinese radiology findings with a pre-trained encoder. IEEE Transactions on Biomedical Engineering. 2023.
Wang Z, Liu L, Wang L, Zhou L. R2GenGPT: radiology report generation with frozen LLMs. Meta-Radiology. 2023;1(3):100033.
Article Google Scholar
Jin H, Che H, Lin Y, Chen H. PromptMRG: diagnosis-driven prompts for medical report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38; 2024. p. 2607–2615.
Tunstall L, Beeching E, Lambert N, Rajani N, Rasul K, Belkada Y, et al. Zephyr: direct distillation of LM alignment. 2023. arXiv preprint arXiv:2310.16944
Abdin M, Jacobs SA, Awan AA, Aneja J, Awadallah A, Awadalla H, et al. Phi-3 technical report: a highly capable language model locally on your phone. 2024. arXiv preprint arXiv:2404.14219
Su J, Ahmed M, Lu Y, Pan S, Bo W, Liu Y. RoFormer: enhanced transformer with rotary position embedding. Neurocomputing. 2024;568:127063.
Article Google Scholar
Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, et al. LoRA: low-rank adaptation of large language models. 2021. arXiv preprint arXiv:2106.09685
Hong J, Lee N, Thorne J. ORPO: monolithic preference optimization without reference model. 2024;2(4):5. arXiv preprint arXiv:2403.07691
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision; 2021. p. 10012–10022.
Li M, Liu R, Wang F, Chang X, Liang X. Auxiliary signal-guided knowledge encoder-decoder for medical report generation. World Wide Web. 2023;26(1):253–70.
Article MATH Google Scholar
Sennrich R. Neural machine translation of rare words with subword units. 2015. arXiv preprint arXiv:1508.07909
Loshchilov I, Hutter F. Decoupled weight decay regularization. 2017. arXiv preprint arXiv:1711.05101
Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts. 2016. arXiv preprint arXiv:1608.03983
Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics; 2002. p. 311–318.
Lin CY. Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out; 2004. p. 74–81.
Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization; 2005. p. 65–72.
Delbrouck JB, Chambon P, Bluethgen C, Tsai E, Almusa O, Langlotz C. Improving the factual correctness of radiology report generation with semantic rewards. In: Findings of the Association for Computational Linguistics: EMNLP 2022; 2022. p. 4348–4360.
Li Y, Liang X, Hu Z, Xing EP. Hybrid retrieval-generation reinforced agent for medical image report generation. Adv Neural Inf Process Syst. 2018;31.
Li CY, Liang X, Hu Z, Xing EP. Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33; 2019. p. 6666–6673.
Biswal S, Xiao C, Glass LM, Westover B, Sun J. CLARA: clinical report auto-completion. In: Proceedings of The Web Conference 2020; 2020. p. 541–550.
Jing B, Wang Z, Xing E. Show, describe and conclude: on exploiting the structure information of chest X-ray reports. 2020. arXiv preprint arXiv:2004.12274
Jing B, Xie P, Xing E. On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2018. p. 2577–2586.
Wang Z, Zhou L, Wang L, Li X. A self-boosting framework for automated radiographic report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 2433–2442.
Wang X, Peng Y, Lu L, Lu Z, Summers RM. TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 9049–9058.
Xue Y, Xu T, Rodney Long L, Xue Z, Antani S, Thoma GR, et al. Multimodal recurrent model with attention for automated radiology report generation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I. Springer; 2018. p. 457–466.
Liu G, Hsu TMH, McDermott M, Boag W, Weng WH, Szolovits P, et al. Clinically accurate chest X-ray report generation. In: Machine Learning for Healthcare Conference. PMLR; 2019. p. 249–269.
Xue Y, Huang X. Improved disease classification in chest X-rays with transferred features from report generation. In: Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26. Springer; 2019. p. 125–138.
Xiong Y, Du B, Yan P. Reinforced transformer for medical image captioning. In: Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10. Springer; 2019. p. 673–680.
Li M, Lin B, Chen Z, Lin H, Liang X, Chang X. Dynamic graph enhanced contrastive learning for chest X-ray report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 3334–3343.
Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3156–3164.
Rennie SJ, Marcheret E, Mroueh Y, Ross J, Goel V. Self-critical sequence training for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 7008–7024.
Lu J, Xiong C, Parikh D, Socher R. Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 375–383.
Artetxe M, Ruder S, Yogatama D. On the cross-lingual transferability of monolingual representations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2020.
Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, et al. Retrieval-augmented generation for large language models: a survey. 2023. arXiv preprint arXiv:2312.10997
Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, et al. LLaVa-med: training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems. 2024;36.
Johnson AE, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng Cy, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data. 2019;6(1):317.

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
Amaan Izhar, Norisma Idris & Nurul Japar

Authors

Amaan Izhar
View author publications
You can also search for this author inPubMed Google Scholar
Norisma Idris
View author publications
You can also search for this author inPubMed Google Scholar
Nurul Japar
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conceptualization: Amaan Izhar; methodology: Amaan Izhar; software: Amaan Izhar; investigation: Amaan Izhar; writing—original draft: Amaan Izhar; writing—review and editing: Amaan Izhar, Norisma Idris, Nurul Japar; visualization: Amaan Izhar; validation: Amaan Izhar, Norisma Idris, Nurul Japar; supervision: Norisma Idris, Nurul Japar; project administration: Norisma Idris, Nurul Japar; resources: Nurul Japar;

Corresponding author

Correspondence to Nurul Japar.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Izhar, A., Idris, N. & Japar, N. Engaging Preference Optimization Alignment in Large Language Model for Continual Radiology Report Generation: A Hybrid Approach. Cogn Comput 17, 53 (2025). https://doi.org/10.1007/s12559-025-10404-6

Download citation

Received: 23 October 2024
Accepted: 08 January 2025
Published: 27 January 2025
DOI: https://doi.org/10.1007/s12559-025-10404-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Engaging Preference Optimization Alignment in Large Language Model for Continual Radiology Report Generation: A Hybrid Approach

Abstract

Access this article

Subscribe and save

Buy Now

Explore related subjects

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now