Skip to main content

Investigating the Impact of Query Representation on Medical Information Retrieval

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2023)

Abstract

This study investigates the effect that various patient-related information extracted from unstructured clinical notes has on two different tasks, i.e., patient allocation in clinical trials and medical literature retrieval. Specifically, we combine standard and transformer-based methods to extract entities (e.g., drugs, medical problems), disambiguate their meaning (e.g., family history, negations), or expand them with related medical concepts to synthesize diverse query representations. The empirical evaluation showed that certain query representations positively affect retrieval effectiveness for patient allocation in clinical trials, but no statistically significant improvements have been identified in medical literature retrieval. Across the queries, it has been found that removing negated entities using a domain-specific pre-trained transformer model has been more effective than a standard rule-based approach. In addition, our experiments have shown that removing information related to family history can further improve patient allocation in clinical trials.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    All indexing parameter combinations were evaluated, however these parameters lead to greater retrieval performance.

  2. 2.

    https://github.com/inf_extraction_med_ir.

References

  1. Bert-base-uncased clinical NER. https://huggingface.co/samrawal/bert-base-uncased_clinical-ner. Accessed 12 Oct 2022

  2. BioBert. https://github.com/alvaroalon2/bio-nlp/tree/master/models. Accessed 17 Oct 2022

  3. The Thirtieth Text REtrieval Conference (TREC 2021) Proceedings. https://trec.nist.gov/pubs/trec30/trec2021.html. Accessed 03 Oct 2022

  4. Agosti, M., Nunzio, G.M.D., Marchesin, S.: An analysis of query reformulation techniques for precision medicine. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 019, pp. 973–976. ACM (2019). https://doi.org/10.1145/3331184.3331289

  5. van Aken, B., Trajanovska, I., Siu, A., Mayrdorfer, M., Budde, K., Loeser, A.: Assertion detection in clinical notes: medical language models to the rescue? In: Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations. Association for Computational Linguistics (2021). https://aclanthology.org/2021.nlpmc-1.5

  6. Akkasi, A., Varoğlu, E., Dimililer, N.: Chemtok: a new rule based tokenizer for chemical named entity recognition. BioMed Res. Int. (2016). https://doi.org/10.1155/2016/4248026

    Article  Google Scholar 

  7. Alfattni, G., Peek, N., Nenadic, G.: Extraction of temporal relations from clinical free text: a systematic review of current approaches. J. Biomed. Inf. 108, 103488 (2020). https://doi.org/10.1016/j.jbi.2020.103488

    Article  Google Scholar 

  8. Averbuch, M., Karson, T.H., Ben-Ami, B., Maimon, O., Rokach, L.: Context-sensitive medical information retrieval. In: Fieschi, M., Coiera, E.W., Li, Y.J. (eds.) MEDINFO 2004 - Proceedings of the 11th World Congress on Medical Informatics, San Francisco, California, USA, 7–11 September 2004. Studies in Health Technology and Informatics, vol. 107, pp. 282–286. IOS Press (2004). https://doi.org/10.3233/978-1-60750-949-3-282

  9. Balaneshinkordan, S., Kotov, A., Xisto, R.: WSU-IR at TREC 2015 clinical decision support track: joint weighting of explicit and latent medical query concepts from diverse sources. In: Voorhees, E.M., Ellis, A. (eds.) Proceedings of the Twenty-Fourth Text REtrieval Conference, TREC 2015, Gaithersburg, Maryland, USA, 17–20 November 2015. NIST Special Publication, vol. 500–319. National Institute of Standards and Technology (NIST) (2015), http://trec.nist.gov/pubs/trec24/papers/wsu_ir-CL.pdf

  10. Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucleic acids Res. 32(suppl_1), D267–D270 (2004)

    Google Scholar 

  11. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: Evaluation of negation phrases in narrative clinical reports. In: AMIA 2001, American Medical Informatics Association Annual Symposium, Washington, DC, USA, 3–7 November 2001. AMIA (2001). https://knowledge.amia.org/amia-55142-a2001a-1.597057/t-001-1.599654/f-001-1.599655/a-021-1.600074/a-022-1.600071

  12. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inf. 34(5), 301–310 (2001)

    Article  Google Scholar 

  13. Dai, X., Rybinski, M., Karimi, S.: Searchehr: a family history search system for clinical decision support. In: Demartini, G., Zuccon, G., Culpepper, J.S., Huang, Z., Tong, H. (eds.) CIKM 2021: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, 1–5 November 2021, pp. 4701–4705. ACM (2021). https://doi.org/10.1145/3459637.3481986

  14. Dhayne, H., Kilany, R., Haque, R., Taher, Y.: Emr2vec: bridging the gap between patient data and clinical trial. Comput. Ind. Eng. 156, 107236 (2021). https://doi.org/10.1016/j.cie.2021.107236

    Article  Google Scholar 

  15. Eyre, H., et al.: Launching into clinical space with medspacy: a new clinical text processing toolkit in python. In: AMIA Annual Symposium Proceedings, vol. 2021, p. 438. American Medical Informatics Association (2021)

    Google Scholar 

  16. Gliklich, R.E., Leavy, M.B., Dreyer, N.A.: Tools and technologies for registry interoperability, registries for evaluating patient outcomes: a user’s guide, addendum 2 (2019)

    Google Scholar 

  17. Harkema, H., Dowling, J.N., Thornblade, T., Chapman, W.W.: Context: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J. Biomed. Inf. 42(5), 839–851 (2009)

    Article  Google Scholar 

  18. Hersh, W.R.: Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Clin. Pharmacol. Ther. 81, 126–128 (2007)

    Google Scholar 

  19. Koopman, B., Zuccon, G.: Understanding negation and family history to improve clinical information retrieval. In: Geva, S., Trotman, A., Bruza, P., Clarke, C.L.A., Järvelin, K. (eds.) The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia - 06–11 July 2014, pp. 971–974. ACM (2014). https://doi.org/10.1145/2600428.2609487

  20. Koopman, B., Zuccon, G.: A test collection for matching patients to clinical trials. In: Perego, R., Sebastiani, F., Aslam, J.A., Ruthven, I., Zobel, J. (eds.) Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 17–21 July 2016, pp. 669–672. ACM (2016). https://doi.org/10.1145/2911451.2914672

  21. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: Chemdner: The drugs and chemical names extraction challenge. J. Cheminf. 7, 1–11 (2015)

    Google Scholar 

  22. Leaman, R., Islamaj, R., Lu, Z.: The overview of the NLM-Chem BioCreative VII track: full-text chemical identification and indexing in PubMed articles. In: BioCreative VII Challenge Evaluation Workshop, pp. 108–113 (2021)

    Google Scholar 

  23. Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., Ho So, C., Kang, J.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020). https://doi.org/10.1093/bioinformatics/btz682

    Article  Google Scholar 

  24. Lee, J., et al.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinform. 36(4), 1234–1240 (2020). https://doi.org/10.1093/bioinformatics/btz682

    Article  Google Scholar 

  25. Luo, L., et al.: An attention-based bilstm-crf approach to document-level chemical named entity recognition. Bioinformatics (Oxford, England) 34 (2017). https://doi.org/10.1093/bioinformatics/btx761

  26. MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: SIGIR (2021)

    Google Scholar 

  27. Macdonald, C., Tonellotto, N.: Declarative experimentation ininformation retrieval using pyterrier. In: Proceedings of ICTIR 2020 (2020)

    Google Scholar 

  28. Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 319–327. Association for Computational Linguistics, Florence, Italy, August 2019. https://doi.org/10.18653/v1/W19-5034, https://www.aclweb.org/anthology/W19-5034

  29. Pradeep, R., Li, Y., Wang, Y., Lin, J.: Neural query synthesis and domain-specific ranking templates for multi-stage clinical trial matching. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2325–2330. SIGIR 2022, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3477495.3531853

  30. Roberts, K., Simpson, M.S., Demner-Fushman, D., Voorhees, E.M., Hersh, W.R.: State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track. Inf. Retr. J. 19(1-2), 113–148 (2016). https://doi.org/10.1007/s10791-015-9259-x

  31. Roberts, K., Simpson, M.S., Voorhees, E.M., Hersh, W.R.: Overview of the TREC 2015 clinical decision support track. In: Voorhees, E.M., Ellis, A. (eds.) Proceedings of The Twenty-Fourth Text REtrieval Conference, TREC 2015, Gaithersburg, Maryland, USA, 17–20 November 2015. NIST Special Publication, vol. 500–319. National Institute of Standards and Technology (NIST) (2015). http://trec.nist.gov/pubs/trec24/papers/Overview-CL.pdf

  32. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Harman, D.K. (ed.) Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, November 2–4, 1994. NIST Special Publication, vol. 500–225, pp. 109–126. National Institute of Standards and Technology (NIST) (1994). http://trec.nist.gov/pubs/trec3/papers/city.ps.gz

  33. Rybinski, M., Dai, X., Singh, S., Karimi, S., Nguyen, A., et al.: Extracting family history information from electronic health records: natural language processing analysis. JMIR Med. Inf. 9(4), e24020 (2021)

    Article  Google Scholar 

  34. Simpson, M.S., Voorhees, E.M., Hersh, W.R.: Overview of the TREC 2014 clinical decision support track. In: Voorhees, E.M., Ellis, A. (eds.) Proceedings of The Twenty-Third Text REtrieval Conference, TREC 2014, Gaithersburg, Maryland, USA, 19–21 November 2014. NIST Special Publication, vol. 500–308. National Institute of Standards and Technology (NIST) (2014). https://trec.nist.gov/pubs/trec23/papers/overview-clinical.pdf

  35. Soboroff, I.: Overview of trec 2021. In: 30th Text REtrieval Conference. Gaithersburg, Maryland (2021)

    Google Scholar 

  36. Tikk, D., Solt, I.: Improving textual medication extraction using combined conditional random fields and rule-based systems, journal of the american medical informatics association. J. Am. Med. Inf. Assoc. 17, 540–544 (2010). https://doi.org/10.1136/jamia.2010.004119

    Article  Google Scholar 

  37. Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inf. Assoc. 18(5), 552–556 (2011). https://doi.org/10.1136/amiajnl-2011-000203

    Article  Google Scholar 

  38. Xu, B., Xiufeng, S., Zhao, Z., Zheng, W.: Leveraging biomedical resources in bi-lstm for drug drug interaction extraction. IEEE Access 1 (2018). https://doi.org/10.1109/ACCESS.2018.2845840

  39. Zhang, Y., Zhang, Y., Qi, P., Manning, C.D., Langlotz, C.P.: Biomedical and clinical English model packages for the Stanza Python NLP library. J. Am. Med. Inf. Assoc. 28(9), 1892–1899 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the EU Horizon 2020 ITN/ETN on Domain Specific Systems for Information Extraction and Retrieval (H2020-EU.1.3.1., ID: 860721).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Peikos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peikos, G., Alexander, D., Pasi, G., de Vries, A.P. (2023). Investigating the Impact of Query Representation on Medical Information Retrieval. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28238-6_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28237-9

  • Online ISBN: 978-3-031-28238-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics