Word embeddings for negation detection in health records written in Spanish

Santiso, Sara; Casillas, Arantza; Pérez, Alicia; Oronoz, Maite

doi:10.1007/s00500-018-3650-7

Word embeddings for negation detection in health records written in Spanish

Methodologies and Application
Published: 23 November 2018

Volume 23, pages 10969–10975, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Sara Santiso ORCID: orcid.org/0000-0002-2314-3018¹,
Arantza Casillas¹,
Alicia Pérez¹ &
…
Maite Oronoz¹

495 Accesses
11 Citations
Explore all metrics

Abstract

This work focuses on the creation of a system to detect negated medical entities in electronic health records (EHRs) written in Spanish. The importance of this task rests on the influence that the negation can have in the automatic understanding of information given that it inverts the truth value of a clause. We explore a novel continuous characterization as an alternative to previous negation extraction approaches based on discrete characterizations. The aim is to increase the ability of the characterization to generalize over discrete features. We also included other features that could be useful for the negation detection task. In addition, the negation detection is approached as a named entity recognition task where we want to find only the negated entities. EHRs are represented by the corresponding embeddings. In addition, this approach is compared with a traditional discrete characterization based on words. These representations are employed by a supervised classifier such as conditional random fields to infer the predictive model. The approach is assessed on health records from different hospitals, namely IxaMed-GS and IULA. The best performance is achieved by virtue of the embedding-based characterization, leading to an f-measure of 75.3 and 81.6 for the IxaMed-GS and IULA corpus, respectively. With this work, we prove that the use of embedding-based representations can also be useful for the detection of negated medical entities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study

Article 03 January 2024

Dinithi Vithanage, Ping Yu, … Chao Deng

Distributed Representation of Healthcare Text Through Qualitative and Quantitative Analysis

Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

Article Open access 23 December 2019

Rebecka Weegar, Alicia Pérez, … Maite Oronoz

References

Agarwal S, Yu H (2010) Biomedical negation scope detection with conditional random fields. J Am Med Inform Assoc 17(6):696–701
Article Google Scholar
Artetxe M, Labaka G, Lopez-Gazpio I, Agirre E (2018) Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation. In: Proceedings of the 22nd conference on computational natural language learning, pp 282–291
Asker L, Boström H, Papapetrou P, Persson H (2016) Identifying factors for the effectiveness of treatment of heart failure: a registry study. In: 2016 IEEE 29th international symposium on computer-based medical systems, pp 205–206
Brown PF, deDouza PV, Mercer RL, Della Pietra VJ, Lai JC (1992) Class-based n-gram models of natural language. Computat Linguist 18(4):467–479
Google Scholar
Cardellino C (2016) Spanish billion words corpus and embeddings. http://crscardellino.me/SBWCE/
Casillas A, Pérez A, Oronoz M, Gojenola K, Santiso S (2016) Learning to extract adverse drug reaction events from electronic health records in Spanish. Expert Syst Appl 61:235–245
Article Google Scholar
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310
Article Google Scholar
Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18
Article Google Scholar
Copara J, Ochoa J, Thorne C, Glavaš G (2016) Spanish NER with word representations and conditional random fields. In: Proceedings of the sixth named entity workshop, pp 34–40
Costumero R, López F, Gonzalo-Martín C, Millan M, Menasalvas E (2014) An approach to detect negation on medical documents in Spanish. Int Conf Brain Inform Health 8609:366–375
Article Google Scholar
Cotik V, Roller R, Xu F, Uszkoreit H, Budde K, Schmidt D (2016) Negation detection in clinical reports written in German. In: Proceedings of the fifth workshop on building and evaluating resources for biomedical text mining, pp 115–124
Cruz NP, Maña MJ, Mata J (2010) Aprendizaje automático versus expresiones regulares en la detección de la negación y la especulación en biomedicina. Proces Leng Nat 45:77–85
Google Scholar
Cruz NP, Maña MJ, Mata J (2012) A machine-learning approach to negation and speculation detection in clinical texts. J Assoc Inf Sci Technol 63(7):1398–1410
Article Google Scholar
Cruz NP, Morante R, Maña MJ, Mata J, Parra CL (2017) Annotating negation in Spanish clinical texts. In: Proceedings of the workshop computational semantics beyond events and roles, pp 53–58
Dalianis H, Névéol A, Savova G, Zweigenbaum P (2014) Didactic panel: clinical natural language processing in languages other than English. AMIA Annu Symp 2014:1–12
Google Scholar
Deléger L, Grouin C (2012) Detecting negation of medical problems in French clinical notes. In: Proceedings of the 2nd ACM sighit international health informatics symposium, pp 697–702
Goldberg Y, Hirst G (2017) Neural network methods in natural language processing. Morgan & Claypool Publishers, San Rafael
Book Google Scholar
Goldin IM, Chapman WW (2003) Learning to detect negation with ‘not’ in medical texts. In: SIGIR 2003 workshop on text analysis and search for bioinformatics, pp 1–7
Gormley MR, Yu M, Dredze M (2015) Improved relation extraction with feature-rich compositional embedding models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1774–1784
Guo J, Che W, Wang H, Liu T (2014) Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 110–120
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
He H, Lin J (2016) Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of human language technologies: the annual conference of the North American chapter of the association for computational linguistics, pp 937–948
Henriksson A, Zhao J, Boström H, Dalianis H (2015) Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection. In: 2015 IEEE international conference on bioinformatics and biomedicine, pp 343–350
Jacobson O, Dalianis H (2016) Applying deep learning on electronic health records in Swedish to predict healthcare-associated infections. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 191–195
Kang T, Zhang S, Xu N, Wen D, Zhang X, Lei J (2017) Detecting negation and scope in Chinese clinical notes using character and word embedding. Comput Methods Programs Biomed 140:53–59
Article Google Scholar
Kou G, Lu Y, Peng Y, Shi Y (2012) Evaluation of classification algorithms using mcdm and rank correlation. Int J Inf Technol Decis Mak 11(01):197–225
Article Google Scholar
Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275:1–12
Article Google Scholar
Kudo T (2005) CRF++: Yet another CRF toolkit. https://sourceforge.net/projects/crfpp/
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, vol 1, pp 282–289
Ling W, Dyer C, Black AW, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1299–1304
Marimon M, Vivaldi J, Bel N (2017) Annotation of negation in the IULA Spanish clinical record corpus. In: Proceedings of the workshop computational semantics beyond events and roles, pp 43–52
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of workshop at international conference on learning representations, pp 1–12
Morante R, Daelemans W (2009) A metalearning approach to processing the scope of negation. In: Proceedings of the thirteenth conference on computational natural language learning, pp 21–29
Nakov P, Zesch T (eds) (2014) Proceedings of the 8th international workshop on semantic evaluation
Nguyen TH, Grishman R (2016) Combining neural networks and log-linear models to improve relation extraction. In: Proceedings of IJCAI workshop on deep learning for artificial intelligence, pp 1–7
Oronoz M, Casillas A, Gojenola K, Pérez A (2013) Automatic annotation of medical records in Spanish with disease, drug and substance names. In: Progress in pattern recognition, image analysis, computer vision, and applications—18th Iberoamerican congress, vol 8259, pp 536–543
Oronoz M, Gojenola K, Pérez A, Díaz de Ilarraza A, Casillas A (2015) On the creation of a clinical gold standard corpus in Spanish: mining adverse drug reactions. J Biomed Inform 56:318–332
Article Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543
Pérez A, Gojenola K, Casillas A, Oronoz M, Díaz de Ilarraza A (2015) Computer aided classification of diagnostic terms in Spanish. Expert Syst Appl 42(6):2949–2958
Article Google Scholar
Segura-Bedmar I, Suárez-Paniagua V, Martínez P (2015) Exploring word embedding for drug name recognition. In: Proceedings of the sixth international workshop on health text mining and information analysis, pp 64–72
Skeppstedt M (2011) Negation detection in Swedish clinical text an adaption of NegEx to Swedish. J Biomed Semant 2(3):1–12
Google Scholar
Vincze V, Szarvas G, Farkas R, Móra G, Csirik J (2008) The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinform 9(11):1–9
Google Scholar

Download references

Acknowledgements

This work was partially funded by the Spanish Ministry of Science and Innovation (PROSAMED: TIN2016-77820-C3-1-R and TADEEP: TIN2015-70214-P) and the Basque Government (DETEAMI: Ministry of Health 2014111003, Predoctoral Grant: PRE 2016 1 0128).

Author information

Authors and Affiliations

IXA Group, University of the Basque Country (UPV-EHU), Manuel Lardizabal 1, 20080, Donostia/San Sebastián, Spain
Sara Santiso, Arantza Casillas, Alicia Pérez & Maite Oronoz

Authors

Sara Santiso
View author publications
You can also search for this author in PubMed Google Scholar
Arantza Casillas
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Maite Oronoz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara Santiso.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Santiso, S., Casillas, A., Pérez, A. et al. Word embeddings for negation detection in health records written in Spanish. Soft Comput 23, 10969–10975 (2019). https://doi.org/10.1007/s00500-018-3650-7

Download citation

Published: 23 November 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00500-018-3650-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word embeddings for negation detection in health records written in Spanish

Abstract

Access this article

Similar content being viewed by others

Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study

Distributed Representation of Healthcare Text Through Qualitative and Quantitative Analysis

Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Word embeddings for negation detection in health records written in Spanish

Abstract

Access this article

Similar content being viewed by others

Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study

Distributed Representation of Healthcare Text Through Qualitative and Quantitative Analysis

Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation