MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

Ghosh, Akash; Acharya, Arkadeep; Jha, Prince; Saha, Sriparna; Gaudgaul, Aniket; Majumdar, Rajdeep; Chadha, Aman; Jain, Raghav; Sinha, Setu; Agarwal, Shivani

doi:10.1007/978-3-031-56069-9_8

Akash Ghosh¹⁴,
Arkadeep Acharya¹⁴,
Prince Jha¹⁴,
Sriparna Saha¹⁴,
Aniket Gaudgaul¹⁴,
Rajdeep Majumdar¹⁴,
Aman Chadha^15,16,
Raghav Jain¹⁴,
Setu Sinha¹⁷ &
…
Shivani Agarwal¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14612))

Included in the following conference series:

European Conference on Information Retrieval

1265 Accesses
2 Citations

Abstract

In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization (MMCQS) dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient’s medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available. https://github.com/ArkadeepAcharya/MedSumm-ECIR2024

A. Chadha—Work does not relate to position at Amazon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training

D-Rax: Domain-Specific Radiologic Assistant Leveraging Multi-modal Data and eXpert Model Predictions

Large Language Models for Binary Health-Related Question Answering: A Zero- and Few-Shot Evaluation

Notes

1.
https://en.unesco.org/sustainabledevelopmentgoals.
2.
https://www.britishcouncil.org/voices-magazine/few-myths-about-speakers-multiple-languages.
3.
https://www.microsoft.com/en-us/bing/apis/bing-image-search-api.
4.
https://pypi.org/project/flashtext/1.0/.
5.
https://textblob.readthedocs.io/en/dev/.
6.
The medical students were compensated through gift vouchers and honorarium amount in lines with https://www.minimum-wage.org/international/india.
7.
To maintain uniformity in the results post-processing like removing extra spaces, repeated sentences are performed.

References

Abacha, A.B., Demner-Fushman, D.: On the summarization of consumer health questions. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2228–2234 (2019)
Google Scholar
Abacha, A.B., M’rabet, Y., Zhang, Y., Shivade, C., Langlotz, C., Demner-Fushman, D.: Overview of the mediqa 2021 shared task on summarization in the medical domain. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 74–85 (2021)
Google Scholar
Abacha, A.B., Yim, W.-W., Michalopoulos, G., Lin, T.: An investigation of evaluation metrics for automated medical note generation. arXiv preprint arXiv:2305.17364 (2023)
Banerjee, S., Lavie, A.: Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Chung, H.W., et al.: Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022)
Das, A., Gambäck, B.: Identifying languages at the word level in code-mixed Indian social media text. arXiv preprint arXiv:2302.13971 (2014)
Delbrouck, J.-B., Zhang, C., Rubin, D.: QIAI at MEDIQA 2021: multimodal radiology report summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 285–290. Association for Computational Linguistics (2021)
Google Scholar
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314 (2023)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dong, Q., et al.: A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022)
Dosovitskiy, A., et al. An image is worth 16$\times $16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Ghosh, A., Acharya, A., Jain, R., Saha, S., Chadha, A., Sinha, S.: Clipsyntel: clip and llm synergy for multimodal question summarization in healthcare. arXiv preprint arXiv:2312.11541 (2023)
Gupta, D., Attal, K., Demner-Fushman, D.: A dataset for medical instructional video classification and question answering (2022)
Google Scholar
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Jiang, A.Q., et al.: Mistral 7b. arXiv preprint arXiv:2310.06825 (2023)
Kojima, T., Gu, S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. arxiv (2023)
Google Scholar
Kumar, R., Chakraborty, R., Tiwari, A., Saha, S., Saini, N.: Diving into a sea of opinions: multi-modal abstractive summarization with comment sensitivity. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 1117–1126 (2023)
Google Scholar
Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Liu, G., et al.: Medical-vlbert: medical visual language bert for covid-19 ct report generation with alternate learning. IEEE Trans. Neural Netw. Learn. Syst. 32(9), 3786–3797 (2021)
Article Google Scholar
Mrini, K., Dernoncourt, F., Chang, W., Farcas, E., Nakashole, N.: Joint summarization-entailment optimization for consumer health question understanding. In: Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations, pp. 58–65 (2021)
Google Scholar
Nittari, G., et al.: Telemedicine practice: review of the current ethical and legal challenges. Telemed. e-Health 26(12), 1427–1437 (2020)
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Qi, W., et al.: Prophetnet: predicting future n-gram for sequence-to-sequence pre-training. arXiv preprint arXiv:2001.04063 (2020)
Thawkar, O., et al.: Xraygpt: chest radiographs summarization using medical vision-language models. arXiv preprint arXiv:2306.07971 (2023)
Tiwari, A., Manthena, M., Saha, S., Bhattacharyya, P., Dhar, M., Tiwari, S.: Dr. can see: towards a multi-modal disease diagnosis virtual assistant. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 1935–1944 (2022)
Google Scholar
Touvron, H., et al.: Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Touvron, H., et al. Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Tunstall, L., et al.: Zephyr: direct distillation of lm alignment (2023)
Google Scholar
Yadav, S., Gupta, D., Abacha, A.B., Demner-Fushman, D.: Reinforcement learning for abstractive question summarization with question-aware semantic rewards. arXiv preprint arXiv:2107.00176 (2021)
Yong, Z.X., et al.: Prompting multilingual large language models to generate code-mixed texts: the case of south east Asian languages. In: Sixth Workshop on Computational Approaches to Linguistic Code-Switching (2023)
Google Scholar
Zhang, J., Huang, J., Jin, S., Lu, S.: Vision-language models for vision tasks: a survey. arXiv preprint arXiv:2304.00685 (2023)
Zhang, J., Zhao, Y., Saleh, M., Liu, P.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020)
Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019)
Zheng, L., et al.: Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685 (2023)
Zhou, J., et al.: Skingpt-4: an interactive dermatology diagnostic system with visual large language model. arXiv preprint arXiv:2304.10691 (2023)
Zhu, D., Chen, J., Shen, X., Li, X., Elhoseiny, M.: Minigpt-4: enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023)

Download references

Acknowledgements

Akash Ghosh and Sriparna Saha express their heartfelt gratitude to the SERB (Science and Engineering Research Board) POWER scheme(SPG/2021/003801) of the Department of Science and Engineering, Govt. of India, for providing the funding for carrying out this research

Author information

Authors and Affiliations

Department of Computer Science And Engineering, Indian Institute of Technology Patna, Patna, India
Akash Ghosh, Arkadeep Acharya, Prince Jha, Sriparna Saha, Aniket Gaudgaul, Rajdeep Majumdar & Raghav Jain
Stanford University, Stanford, USA
Aman Chadha
Amazon GenAI, Seattle, USA
Aman Chadha
Indira Gandhi Institute of Medical Sciences, Patna, India
Setu Sinha & Shivani Agarwal

Authors

Akash Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Arkadeep Acharya
View author publications
You can also search for this author in PubMed Google Scholar
Prince Jha
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Aniket Gaudgaul
View author publications
You can also search for this author in PubMed Google Scholar
Rajdeep Majumdar
View author publications
You can also search for this author in PubMed Google Scholar
Aman Chadha
View author publications
You can also search for this author in PubMed Google Scholar
Raghav Jain
View author publications
You can also search for this author in PubMed Google Scholar
Setu Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Shivani Agarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akash Ghosh .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Ethics declarations

Ethical Considerations

In healthcare summarization, we prioritize ethical considerations, including safety, privacy, and bias. We took extensive measures with the MMCQS dataset, collaborating with medical professionals, obtaining IRB approval, and adhering to legal and ethical guidelines during data handling, image integration, and summary annotation. The dataset is based on the HealthcareMagic Dataset, and medical experts supervised the task. Identity protection was ensured for user privacy.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghosh, A. et al. (2024). MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14612. Springer, Cham. https://doi.org/10.1007/978-3-031-56069-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-56069-9_8
Published: 23 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56068-2
Online ISBN: 978-3-031-56069-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries