Skip to main content

Advertisement

Log in

Engaging Preference Optimization Alignment in Large Language Model for Continual Radiology Report Generation: A Hybrid Approach

  • Correspondence
  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Large language models (LLMs) remain relatively underutilized in medical imaging, particularly in radiology, which is essential for disease diagnosis and management. Nonetheless, radiology report generation (RRG) is a time-consuming task that can result in delays and inconsistencies. To address these challenges, we present a novel hybrid approach that integrates multi-modal radiology information and preference optimization alignment in LLM for continual RRG. Our method integrates a pre-trained small multi-modal model to analyze radiology images and generate an initial report, which is subsequently refined and aligned by an LLM using odds ratio preference optimization (ORPO) and with historical patient data and assessments to mimic radiologist-like responses, bypassing reinforcement learning from human feedback-based (RLHF) alignment. This two-stage fusion—supervised fine-tuning followed by preference optimization—ensures high accuracy while minimizing hallucinations and errors. We also propose a data field curation strategy extendable to various other RRG modality datasets, focusing on selecting relevant responses for preference alignment. We evaluate our approach on two public datasets, achieving state-of-the-art performance with average Bleu scores of 0.375 and 0.647, Meteor scores of 0.495 and 0.714, Rouge-L scores of 0.483 and 0.732, and average F1-RadGraph scores of 0.488 and 0.487, for chest X-rays and lung CT scan datasets, respectively. We further provide in-depth qualitative analyses and ablation studies to explain the workings of our model and grasp the clinical relevance for RRG. This work presents the first application of preference optimization in continual RRG, representing a significant advancement in automating clinically reliable report generation. By reducing cognitive burdens on radiologists through AI-powered reasoning and alignment in LLMs, the proposed model improves decision-making, perception, and diagnostic precision, streamlining workflows and enhancing patient care. Our code is available at https://github.com/AI-14/r2gpoallm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Algorithm 3
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data Availability

No datasets were generated or analyzed during the current study.

Notes

  1. https://huggingface.co/Intel/neural-chat-7b-v3-3

  2. https://github.com/AI-14/pkatransnet

  3. https://huggingface.co/

  4. https://huggingface.co/docs/bitsandbytes/main/en/index

References

  1. ESR. Medical imaging in personalised medicine: a white paper of the research committee of the European Society of Radiology (ESR). Insights into imaging. 2015;6:141–155.

  2. Ouis MY, Akhloufi M. Deep learning for report generation on chest X-ray images. Comput Med Imag Graph. 2023;102320.

  3. Alfarghaly O, Khaled R, Elkorany A, Helal M, Fahmy A. Automated radiology report generation using conditioned transformers. Inf Med Unlocked. 2021;24:100557.

    Article  Google Scholar 

  4. Liao Y, Liu H. Spasić I. Deep learning approaches to automatic radiology report generation: a systematic review. Inf Med Unlocked. 2023;101273.

  5. Henderson M. Radiology facing a global shortage. Online. Available from: https://www.rsna.org/news/2022/may/global-radiologist-shortage. Accessed 11 May 2023

  6. Fleishon HB. Radiology workforce shortage: the “silver squad” option. J American College of Radiology. 2024.

  7. Singh AK, Kumar A, Mahmud M, Kaiser MS, Kishore A. COVID-19 infection detection from chest X-ray images using hybrid social group optimization and support vector classifier. Cogn Comput. 2024;16(4):1765–77.

    Article  Google Scholar 

  8. Nazi ZA, Peng W. Large language models in healthcare and medical domain: a review. In: Informatics. vol. 11. MDPI; 2024. p. 57.

  9. Xu L, Tang Q, Lv J, Zheng B, Zeng X, Li W. Deep image captioning: a review of methods, trends and future challenges. Neurocomputing. 2023;126287.

  10. Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, et al. Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc. 2016;23(2):304–10.

    Article  Google Scholar 

  11. He K, Mao R, Lin Q, Ruan Y, Lan X, Feng M, et al. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. 2023. arXiv preprint arXiv:2310.05694

  12. Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C. Direct preference optimization: your language model is secretly a reward model. Adv Neural Inf Process Syst. 2024;36.

  13. Selivanov A, Rogov OY, Chesakov D, Shelmanov A, Fedulova I, Dylov DV. Medical image captioning via generative pretrained transformers. Sci Rep. 2023;13(1):4171.

    Article  Google Scholar 

  14. Thieme A, Rajamohan A, Cooper B, Groombridge H, Simister R, Wong B, et al. Challenges for responsible AI design and workflow integration in healthcare: a case study of automatic feeding tube qualification in radiology. 2024. arXiv preprint arXiv:2405.05299

  15. Hyland SL, Bannur S, Bouzid K, Castro DC, Ranjit M, Schwaighofer A, et al. Maira-1: a specialised large multimodal model for radiology report generation. 2023. arXiv preprint arXiv:2311.13668

  16. Hochreiter S. Long short-term memory. Neural Computation MIT-Press. 1997.

  17. Cho K. Learning phrase representations using RNN encoder-decoder for statistical machine translation. 2014. arXiv preprint arXiv:1406.1078

  18. Paalvast O, Nauta M, Koelle M, Geerdink J, Vijlbrief O, Hegeman JH, et al. Radiology report generation for proximal femur fractures using deep classification and language generation models. Artif Intell Med. 2022;128:102281.

    Article  Google Scholar 

  19. Gajbhiye GO, Nandedkar AV, Faye I. Translating medical image to radiological report: adaptive multilevel multi-attention approach. Comput Methods Programs Biomed. 2022;221:106853.

    Article  Google Scholar 

  20. Yang S, Niu J, Wu J, Wang Y, Liu X, Li Q. Automatic ultrasound image report generation with adaptive multimodal attention mechanism. Neurocomputing. 2021;427:40–9.

    Article  MATH  Google Scholar 

  21. Wang F, Liang X, Xu L, Lin L. Unifying relational sentence generation and retrieval for medical image report composition. IEEE Trans Cybern. 2020;52(6):5015–25.

    Article  MATH  Google Scholar 

  22. Vaswani A. Attention is all you need. Adv Neural Inf Process Syst. 2017.

  23. Aksoy N, Ravikumar N, Frangi AF. Radiology report generation using transformers conditioned with non-imaging data. In: Medical Imaging 2023: Imaging Informatics for Healthcare, Research, and Applications. vol. 12469. SPIE; 2023. p. 146–154.

  24. Zhang S, Zhou C, Chen L, Li Z, Gao Y, Chen Y. Visual prior-based cross-modal alignment network for radiology report generation. Comput Biol Med. 2023;166:107522.

    Article  Google Scholar 

  25. Pahwa E, Mehta D, Kapadia S, Jain D, Luthra A. Medskip: medical report generation using skip connections and integrated attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021. p. 3409–3415.

  26. Chen Z, Song Y, Chang TH, Wan X. Generating radiology reports via memory-driven transformer. 2020. arXiv preprint arXiv:2010.16056

  27. Mohsan MM, Akram MU, Rasool G, Alghamdi NS, Baqai MAA, Abbas M. Vision transformer and language model based radiology report generation. IEEE Access. 2022;11:1814–24.

    Article  Google Scholar 

  28. Wang Z, Liu L, Wang L, Zhou L. Metransformer: radiology report generation by transformer with multiple learnable expert tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 11558–11567.

  29. Yang S, Wu X, Ge S, Zhou SK, Xiao L. Knowledge matters: chest radiology report generation with general and specific knowledge. Med Image Anal. 2022;80:102510.

    Article  Google Scholar 

  30. Liu F, Wu X, Ge S, Fan W, Zou Y. Exploring and distilling posterior and prior knowledge for radiology report generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021. p. 13753–13762.

  31. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: open foundation and fine-tuned chat models. 2023. arXiv preprint arXiv:2307.09288

  32. Zheng L, Chiang WL, Sheng Y, Zhuang S, Wu Z, Zhuang Y, et al.: Judging LLM-as-a-judge with MT-bench and chatbot arena.

  33. Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, Casas Ddl, et al. Mistral 7B. 2023. arXiv preprint arXiv:2310.06825

  34. Han T, Adams LC, Papaioannou JM, Grundmann P, Oberhauser T, Löser A, et al. MedAlpaca–an open-source collection of medical conversational AI models and training data. 2023. arXiv preprint arXiv:2304.08247

  35. Nakaura T, Yoshida N, Kobayashi N, Shiraishi K, Nagayama Y, Uetani H, et al. Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports. Japanese Journal of Radiology. 2023;p. 1–11.

  36. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.

    Google Scholar 

  37. Brown TB. Language models are few-shot learners. 2020. arXiv preprint arXiv:2005.14165

  38. Jiang Z, Cai X, Yang L, Gao D, Zhao W, Han J, et al. Learning to summarize Chinese radiology findings with a pre-trained encoder. IEEE Transactions on Biomedical Engineering. 2023.

  39. Wang Z, Liu L, Wang L, Zhou L. R2GenGPT: radiology report generation with frozen LLMs. Meta-Radiology. 2023;1(3):100033.

    Article  Google Scholar 

  40. Jin H, Che H, Lin Y, Chen H. PromptMRG: diagnosis-driven prompts for medical report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38; 2024. p. 2607–2615.

  41. Tunstall L, Beeching E, Lambert N, Rajani N, Rasul K, Belkada Y, et al. Zephyr: direct distillation of LM alignment. 2023. arXiv preprint arXiv:2310.16944

  42. Abdin M, Jacobs SA, Awan AA, Aneja J, Awadallah A, Awadalla H, et al. Phi-3 technical report: a highly capable language model locally on your phone. 2024. arXiv preprint arXiv:2404.14219

  43. Su J, Ahmed M, Lu Y, Pan S, Bo W, Liu Y. RoFormer: enhanced transformer with rotary position embedding. Neurocomputing. 2024;568:127063.

    Article  Google Scholar 

  44. Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, et al. LoRA: low-rank adaptation of large language models. 2021. arXiv preprint arXiv:2106.09685

  45. Hong J, Lee N, Thorne J. ORPO: monolithic preference optimization without reference model. 2024;2(4):5. arXiv preprint arXiv:2403.07691

  46. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision; 2021. p. 10012–10022.

  47. Li M, Liu R, Wang F, Chang X, Liang X. Auxiliary signal-guided knowledge encoder-decoder for medical report generation. World Wide Web. 2023;26(1):253–70.

    Article  MATH  Google Scholar 

  48. Sennrich R. Neural machine translation of rare words with subword units. 2015. arXiv preprint arXiv:1508.07909

  49. Loshchilov I, Hutter F. Decoupled weight decay regularization. 2017. arXiv preprint arXiv:1711.05101

  50. Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts. 2016. arXiv preprint arXiv:1608.03983

  51. Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics; 2002. p. 311–318.

  52. Lin CY. Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out; 2004. p. 74–81.

  53. Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization; 2005. p. 65–72.

  54. Delbrouck JB, Chambon P, Bluethgen C, Tsai E, Almusa O, Langlotz C. Improving the factual correctness of radiology report generation with semantic rewards. In: Findings of the Association for Computational Linguistics: EMNLP 2022; 2022. p. 4348–4360.

  55. Li Y, Liang X, Hu Z, Xing EP. Hybrid retrieval-generation reinforced agent for medical image report generation. Adv Neural Inf Process Syst. 2018;31.

  56. Li CY, Liang X, Hu Z, Xing EP. Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33; 2019. p. 6666–6673.

  57. Biswal S, Xiao C, Glass LM, Westover B, Sun J. CLARA: clinical report auto-completion. In: Proceedings of The Web Conference 2020; 2020. p. 541–550.

  58. Jing B, Wang Z, Xing E. Show, describe and conclude: on exploiting the structure information of chest X-ray reports. 2020. arXiv preprint arXiv:2004.12274

  59. Jing B, Xie P, Xing E. On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2018. p. 2577–2586.

  60. Wang Z, Zhou L, Wang L, Li X. A self-boosting framework for automated radiographic report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 2433–2442.

  61. Wang X, Peng Y, Lu L, Lu Z, Summers RM. TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 9049–9058.

  62. Xue Y, Xu T, Rodney Long L, Xue Z, Antani S, Thoma GR, et al. Multimodal recurrent model with attention for automated radiology report generation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I. Springer; 2018. p. 457–466.

  63. Liu G, Hsu TMH, McDermott M, Boag W, Weng WH, Szolovits P, et al. Clinically accurate chest X-ray report generation. In: Machine Learning for Healthcare Conference. PMLR; 2019. p. 249–269.

  64. Xue Y, Huang X. Improved disease classification in chest X-rays with transferred features from report generation. In: Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26. Springer; 2019. p. 125–138.

  65. Xiong Y, Du B, Yan P. Reinforced transformer for medical image captioning. In: Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10. Springer; 2019. p. 673–680.

  66. Li M, Lin B, Chen Z, Lin H, Liang X, Chang X. Dynamic graph enhanced contrastive learning for chest X-ray report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 3334–3343.

  67. Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3156–3164.

  68. Rennie SJ, Marcheret E, Mroueh Y, Ross J, Goel V. Self-critical sequence training for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 7008–7024.

  69. Lu J, Xiong C, Parikh D, Socher R. Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 375–383.

  70. Artetxe M, Ruder S, Yogatama D. On the cross-lingual transferability of monolingual representations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2020.

  71. Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, et al. Retrieval-augmented generation for large language models: a survey. 2023. arXiv preprint arXiv:2312.10997

  72. Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, et al. LLaVa-med: training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems. 2024;36.

  73. Johnson AE, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng Cy, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data. 2019;6(1):317.

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Amaan Izhar; methodology: Amaan Izhar; software: Amaan Izhar; investigation: Amaan Izhar; writing—original draft: Amaan Izhar; writing—review and editing: Amaan Izhar, Norisma Idris, Nurul Japar; visualization: Amaan Izhar; validation: Amaan Izhar, Norisma Idris, Nurul Japar; supervision: Norisma Idris, Nurul Japar; project administration: Norisma Idris, Nurul Japar; resources: Nurul Japar;

Corresponding author

Correspondence to Nurul Japar.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Izhar, A., Idris, N. & Japar, N. Engaging Preference Optimization Alignment in Large Language Model for Continual Radiology Report Generation: A Hybrid Approach. Cogn Comput 17, 53 (2025). https://doi.org/10.1007/s12559-025-10404-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12559-025-10404-6

Keywords