Abstract
This study investigates the effect that various patient-related information extracted from unstructured clinical notes has on two different tasks, i.e., patient allocation in clinical trials and medical literature retrieval. Specifically, we combine standard and transformer-based methods to extract entities (e.g., drugs, medical problems), disambiguate their meaning (e.g., family history, negations), or expand them with related medical concepts to synthesize diverse query representations. The empirical evaluation showed that certain query representations positively affect retrieval effectiveness for patient allocation in clinical trials, but no statistically significant improvements have been identified in medical literature retrieval. Across the queries, it has been found that removing negated entities using a domain-specific pre-trained transformer model has been more effective than a standard rule-based approach. In addition, our experiments have shown that removing information related to family history can further improve patient allocation in clinical trials.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
All indexing parameter combinations were evaluated, however these parameters lead to greater retrieval performance.
- 2.
References
Bert-base-uncased clinical NER. https://huggingface.co/samrawal/bert-base-uncased_clinical-ner. Accessed 12 Oct 2022
BioBert. https://github.com/alvaroalon2/bio-nlp/tree/master/models. Accessed 17 Oct 2022
The Thirtieth Text REtrieval Conference (TREC 2021) Proceedings. https://trec.nist.gov/pubs/trec30/trec2021.html. Accessed 03 Oct 2022
Agosti, M., Nunzio, G.M.D., Marchesin, S.: An analysis of query reformulation techniques for precision medicine. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 019, pp. 973–976. ACM (2019). https://doi.org/10.1145/3331184.3331289
van Aken, B., Trajanovska, I., Siu, A., Mayrdorfer, M., Budde, K., Loeser, A.: Assertion detection in clinical notes: medical language models to the rescue? In: Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations. Association for Computational Linguistics (2021). https://aclanthology.org/2021.nlpmc-1.5
Akkasi, A., Varoğlu, E., Dimililer, N.: Chemtok: a new rule based tokenizer for chemical named entity recognition. BioMed Res. Int. (2016). https://doi.org/10.1155/2016/4248026
Alfattni, G., Peek, N., Nenadic, G.: Extraction of temporal relations from clinical free text: a systematic review of current approaches. J. Biomed. Inf. 108, 103488 (2020). https://doi.org/10.1016/j.jbi.2020.103488
Averbuch, M., Karson, T.H., Ben-Ami, B., Maimon, O., Rokach, L.: Context-sensitive medical information retrieval. In: Fieschi, M., Coiera, E.W., Li, Y.J. (eds.) MEDINFO 2004 - Proceedings of the 11th World Congress on Medical Informatics, San Francisco, California, USA, 7–11 September 2004. Studies in Health Technology and Informatics, vol. 107, pp. 282–286. IOS Press (2004). https://doi.org/10.3233/978-1-60750-949-3-282
Balaneshinkordan, S., Kotov, A., Xisto, R.: WSU-IR at TREC 2015 clinical decision support track: joint weighting of explicit and latent medical query concepts from diverse sources. In: Voorhees, E.M., Ellis, A. (eds.) Proceedings of the Twenty-Fourth Text REtrieval Conference, TREC 2015, Gaithersburg, Maryland, USA, 17–20 November 2015. NIST Special Publication, vol. 500–319. National Institute of Standards and Technology (NIST) (2015), http://trec.nist.gov/pubs/trec24/papers/wsu_ir-CL.pdf
Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucleic acids Res. 32(suppl_1), D267–D270 (2004)
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: Evaluation of negation phrases in narrative clinical reports. In: AMIA 2001, American Medical Informatics Association Annual Symposium, Washington, DC, USA, 3–7 November 2001. AMIA (2001). https://knowledge.amia.org/amia-55142-a2001a-1.597057/t-001-1.599654/f-001-1.599655/a-021-1.600074/a-022-1.600071
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inf. 34(5), 301–310 (2001)
Dai, X., Rybinski, M., Karimi, S.: Searchehr: a family history search system for clinical decision support. In: Demartini, G., Zuccon, G., Culpepper, J.S., Huang, Z., Tong, H. (eds.) CIKM 2021: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, 1–5 November 2021, pp. 4701–4705. ACM (2021). https://doi.org/10.1145/3459637.3481986
Dhayne, H., Kilany, R., Haque, R., Taher, Y.: Emr2vec: bridging the gap between patient data and clinical trial. Comput. Ind. Eng. 156, 107236 (2021). https://doi.org/10.1016/j.cie.2021.107236
Eyre, H., et al.: Launching into clinical space with medspacy: a new clinical text processing toolkit in python. In: AMIA Annual Symposium Proceedings, vol. 2021, p. 438. American Medical Informatics Association (2021)
Gliklich, R.E., Leavy, M.B., Dreyer, N.A.: Tools and technologies for registry interoperability, registries for evaluating patient outcomes: a user’s guide, addendum 2 (2019)
Harkema, H., Dowling, J.N., Thornblade, T., Chapman, W.W.: Context: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J. Biomed. Inf. 42(5), 839–851 (2009)
Hersh, W.R.: Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Clin. Pharmacol. Ther. 81, 126–128 (2007)
Koopman, B., Zuccon, G.: Understanding negation and family history to improve clinical information retrieval. In: Geva, S., Trotman, A., Bruza, P., Clarke, C.L.A., Järvelin, K. (eds.) The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia - 06–11 July 2014, pp. 971–974. ACM (2014). https://doi.org/10.1145/2600428.2609487
Koopman, B., Zuccon, G.: A test collection for matching patients to clinical trials. In: Perego, R., Sebastiani, F., Aslam, J.A., Ruthven, I., Zobel, J. (eds.) Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 17–21 July 2016, pp. 669–672. ACM (2016). https://doi.org/10.1145/2911451.2914672
Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: Chemdner: The drugs and chemical names extraction challenge. J. Cheminf. 7, 1–11 (2015)
Leaman, R., Islamaj, R., Lu, Z.: The overview of the NLM-Chem BioCreative VII track: full-text chemical identification and indexing in PubMed articles. In: BioCreative VII Challenge Evaluation Workshop, pp. 108–113 (2021)
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., Ho So, C., Kang, J.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020). https://doi.org/10.1093/bioinformatics/btz682
Lee, J., et al.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinform. 36(4), 1234–1240 (2020). https://doi.org/10.1093/bioinformatics/btz682
Luo, L., et al.: An attention-based bilstm-crf approach to document-level chemical named entity recognition. Bioinformatics (Oxford, England) 34 (2017). https://doi.org/10.1093/bioinformatics/btx761
MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: SIGIR (2021)
Macdonald, C., Tonellotto, N.: Declarative experimentation ininformation retrieval using pyterrier. In: Proceedings of ICTIR 2020 (2020)
Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 319–327. Association for Computational Linguistics, Florence, Italy, August 2019. https://doi.org/10.18653/v1/W19-5034, https://www.aclweb.org/anthology/W19-5034
Pradeep, R., Li, Y., Wang, Y., Lin, J.: Neural query synthesis and domain-specific ranking templates for multi-stage clinical trial matching. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2325–2330. SIGIR 2022, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3477495.3531853
Roberts, K., Simpson, M.S., Demner-Fushman, D., Voorhees, E.M., Hersh, W.R.: State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track. Inf. Retr. J. 19(1-2), 113–148 (2016). https://doi.org/10.1007/s10791-015-9259-x
Roberts, K., Simpson, M.S., Voorhees, E.M., Hersh, W.R.: Overview of the TREC 2015 clinical decision support track. In: Voorhees, E.M., Ellis, A. (eds.) Proceedings of The Twenty-Fourth Text REtrieval Conference, TREC 2015, Gaithersburg, Maryland, USA, 17–20 November 2015. NIST Special Publication, vol. 500–319. National Institute of Standards and Technology (NIST) (2015). http://trec.nist.gov/pubs/trec24/papers/Overview-CL.pdf
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Harman, D.K. (ed.) Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, November 2–4, 1994. NIST Special Publication, vol. 500–225, pp. 109–126. National Institute of Standards and Technology (NIST) (1994). http://trec.nist.gov/pubs/trec3/papers/city.ps.gz
Rybinski, M., Dai, X., Singh, S., Karimi, S., Nguyen, A., et al.: Extracting family history information from electronic health records: natural language processing analysis. JMIR Med. Inf. 9(4), e24020 (2021)
Simpson, M.S., Voorhees, E.M., Hersh, W.R.: Overview of the TREC 2014 clinical decision support track. In: Voorhees, E.M., Ellis, A. (eds.) Proceedings of The Twenty-Third Text REtrieval Conference, TREC 2014, Gaithersburg, Maryland, USA, 19–21 November 2014. NIST Special Publication, vol. 500–308. National Institute of Standards and Technology (NIST) (2014). https://trec.nist.gov/pubs/trec23/papers/overview-clinical.pdf
Soboroff, I.: Overview of trec 2021. In: 30th Text REtrieval Conference. Gaithersburg, Maryland (2021)
Tikk, D., Solt, I.: Improving textual medication extraction using combined conditional random fields and rule-based systems, journal of the american medical informatics association. J. Am. Med. Inf. Assoc. 17, 540–544 (2010). https://doi.org/10.1136/jamia.2010.004119
Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inf. Assoc. 18(5), 552–556 (2011). https://doi.org/10.1136/amiajnl-2011-000203
Xu, B., Xiufeng, S., Zhao, Z., Zheng, W.: Leveraging biomedical resources in bi-lstm for drug drug interaction extraction. IEEE Access 1 (2018). https://doi.org/10.1109/ACCESS.2018.2845840
Zhang, Y., Zhang, Y., Qi, P., Manning, C.D., Langlotz, C.P.: Biomedical and clinical English model packages for the Stanza Python NLP library. J. Am. Med. Inf. Assoc. 28(9), 1892–1899 (2021)
Acknowledgements
This work was supported by the EU Horizon 2020 ITN/ETN on Domain Specific Systems for Information Extraction and Retrieval (H2020-EU.1.3.1., ID: 860721).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Peikos, G., Alexander, D., Pasi, G., de Vries, A.P. (2023). Investigating the Impact of Query Representation on Medical Information Retrieval. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-28238-6_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)