Linguistic Uncertainty in Clinical NLP: A Taxonomy, Dataset and Approach

Turner, Mark; Ive, Julia; Velupillai, Sumithra

doi:10.1007/978-3-030-85251-1_11

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1041 Accesses

Abstract

Linguistic uncertainty is prevalent in electronic health records (EHRs). The ability to handle and preserve uncertainty in natural language is an essential skill for clinicians, facilitating decidability and effective clinical reasoning processes despite incomplete knowledge in some situations. This has been addressed by previous research in clinical NLP by the development of algorithms that detect uncertainty expressions. However, existing rule-based algorithms have limited uncertainty detection capabilities. Therefore, we seek to reformulate uncertainty detection as a supervised machine learning problem by (i) reevaluating the concept of uncertainty, (ii) embedding this understanding in an improved linguistic uncertainty taxonomy and (iii) introducing a new dataset of EHRs annotated for nine types of uncertainty – the first publicly available dataset of its kind. Many of our classes are novel and emphasise implicit uncertainties – a form of uncertainty that is ignored by existing algorithms, yet has crucial functions in clinical settings. Through an evaluation of our dataset, we demonstrate the scalability of our approach and its utility in relation to research on clinical information extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All examples hereinafter are from MIMIC and are paraphrased.
2.
https://physionet.org/.

References

Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34, 301–310 (2001). https://doi.org/10.1006/jbin.2001.1029
Article Google Scholar
Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, e215–e220 (2000). https://doi.org/10.1161/01.cir.101.23.e215
Article Google Scholar
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019). https://doi.org/10.1609/aaai.v33i01.3301590
Johnson, A.E., Pollard, T.J., Mark, R.G.: MIMIC-III clinical database (version 1.4). PhysioNet (2016)
Google Scholar
Johnson, A.E., Pollard, T.J., Mark, R.G., Seth, B., Horng, S.: MIMIC-CXR Database (version 2.0.0). PhysioNet (2019)
Google Scholar
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 1–9 (2016). https://doi.org/10.1038/sdata.2016.35
Article Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014). https://doi.org/10.3115/v1/d14-1181
Mowery, D.L., Ave, M., Chapman, W.W.: Medical diagnosis lost in translation – analysis of uncertainty and negation expressions in English and Swedish clinical texts. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (BioNLP 2012) (2012)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Peng, Y., Wang, X., Lu, L., Bagheri, M., Summers, R., Lu, Z.: NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. In: AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science (2018)
Google Scholar
Velupillai, S.: Shades of certainty: annotation and classification of Swedish medical records (2012). http://su.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:512263
Vincze, V., Szarvas, G., Farkas, R., Móra, G., Csirik, J.: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics 9, 1–9 (2008). https://doi.org/10.1186/1471-2105-9-S11-S9
Article Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College London, London, UK
Mark Turner & Julia Ive
King’s College London, London, UK
Sumithra Velupillai

Authors

Mark Turner
View author publications
You can also search for this author in PubMed Google Scholar
Julia Ive
View author publications
You can also search for this author in PubMed Google Scholar
Sumithra Velupillai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Turner .

Editor information

Editors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan
Politehnica University of Bucharest, Bucharest, Romania
Bogdan Ionescu
Université Grenoble Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Aalborg University Copenhagen, Copenhagen, Denmark
Birger Larsen
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Montpellier, Montpellier, France
Alexis Joly
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
TU Wien, Vienna, Austria
Florina Piroi
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Ethics declarations

Ethics

This study has been carried out in accordance with all relevant guidelines and regulations for the use of MIMIC-III data. Assisting human medical experts to make better decisions in complex environments is the sole aim of this paper and the way we handle data in our dataset. Further, all annotators involved in the construction of our dataset were volunteers. Before deployment in an actual clinical setting, we plan to systematically evaluate our methodology under the supervision of expert clinicians.

A Model Implementation

Following the work of Kim [7], a state-of-the-art single channel Convolutional Neural Network (CNN) for sentence classification was used as a binary classifier.

For our experiments (see Sect. 5.2 and 5.3), the majority of hyperparameters were kept constant: the learning rate was set at 0.3; the dropout probability in the dropout layer was 0.1; BioWordVec embeddings were scaled by a factor of 0.65. The window sizes for our two convolutional layers were either 1 and 3 or 3 and 5. The number of training epochs ranged from 30 to 70. These hyperparameters were determined by monitoring the training loss. Random classifiers used as a baseline were drawn from the Scikitlearn library [9].

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Turner, M., Ive, J., Velupillai, S. (2021). Linguistic Uncertainty in Clinical NLP: A Taxonomy, Dataset and Approach. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-85251-1_11
Published: 14 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Linguistic Uncertainty in Clinical NLP: A Taxonomy, Dataset and Approach

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethics

A Model Implementation

A Model Implementation

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation