Skip to main content
Log in

Word embeddings for negation detection in health records written in Spanish

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

This work focuses on the creation of a system to detect negated medical entities in electronic health records (EHRs) written in Spanish. The importance of this task rests on the influence that the negation can have in the automatic understanding of information given that it inverts the truth value of a clause. We explore a novel continuous characterization as an alternative to previous negation extraction approaches based on discrete characterizations. The aim is to increase the ability of the characterization to generalize over discrete features. We also included other features that could be useful for the negation detection task. In addition, the negation detection is approached as a named entity recognition task where we want to find only the negated entities. EHRs are represented by the corresponding embeddings. In addition, this approach is compared with a traditional discrete characterization based on words. These representations are employed by a supervised classifier such as conditional random fields to infer the predictive model. The approach is assessed on health records from different hospitals, namely IxaMed-GS and IULA. The best performance is achieved by virtue of the embedding-based characterization, leading to an f-measure of 75.3 and 81.6 for the IxaMed-GS and IULA corpus, respectively. With this work, we prove that the use of embedding-based representations can also be useful for the detection of negated medical entities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agarwal S, Yu H (2010) Biomedical negation scope detection with conditional random fields. J Am Med Inform Assoc 17(6):696–701

    Article  Google Scholar 

  • Artetxe M, Labaka G, Lopez-Gazpio I, Agirre E (2018) Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation. In: Proceedings of the 22nd conference on computational natural language learning, pp 282–291

  • Asker L, Boström H, Papapetrou P, Persson H (2016) Identifying factors for the effectiveness of treatment of heart failure: a registry study. In: 2016 IEEE 29th international symposium on computer-based medical systems, pp 205–206

  • Brown PF, deDouza PV, Mercer RL, Della Pietra VJ, Lai JC (1992) Class-based n-gram models of natural language. Computat Linguist 18(4):467–479

    Google Scholar 

  • Cardellino C (2016) Spanish billion words corpus and embeddings. http://crscardellino.me/SBWCE/

  • Casillas A, Pérez A, Oronoz M, Gojenola K, Santiso S (2016) Learning to extract adverse drug reaction events from electronic health records in Spanish. Expert Syst Appl 61:235–245

    Article  Google Scholar 

  • Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310

    Article  Google Scholar 

  • Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18

    Article  Google Scholar 

  • Copara J, Ochoa J, Thorne C, Glavaš G (2016) Spanish NER with word representations and conditional random fields. In: Proceedings of the sixth named entity workshop, pp 34–40

  • Costumero R, López F, Gonzalo-Martín C, Millan M, Menasalvas E (2014) An approach to detect negation on medical documents in Spanish. Int Conf Brain Inform Health 8609:366–375

    Article  Google Scholar 

  • Cotik V, Roller R, Xu F, Uszkoreit H, Budde K, Schmidt D (2016) Negation detection in clinical reports written in German. In: Proceedings of the fifth workshop on building and evaluating resources for biomedical text mining, pp 115–124

  • Cruz NP, Maña MJ, Mata J (2010) Aprendizaje automático versus expresiones regulares en la detección de la negación y la especulación en biomedicina. Proces Leng Nat 45:77–85

    Google Scholar 

  • Cruz NP, Maña MJ, Mata J (2012) A machine-learning approach to negation and speculation detection in clinical texts. J Assoc Inf Sci Technol 63(7):1398–1410

    Article  Google Scholar 

  • Cruz NP, Morante R, Maña MJ, Mata J, Parra CL (2017) Annotating negation in Spanish clinical texts. In: Proceedings of the workshop computational semantics beyond events and roles, pp 53–58

  • Dalianis H, Névéol A, Savova G, Zweigenbaum P (2014) Didactic panel: clinical natural language processing in languages other than English. AMIA Annu Symp 2014:1–12

    Google Scholar 

  • Deléger L, Grouin C (2012) Detecting negation of medical problems in French clinical notes. In: Proceedings of the 2nd ACM sighit international health informatics symposium, pp 697–702

  • Goldberg Y, Hirst G (2017) Neural network methods in natural language processing. Morgan & Claypool Publishers, San Rafael

    Book  Google Scholar 

  • Goldin IM, Chapman WW (2003) Learning to detect negation with ‘not’ in medical texts. In: SIGIR 2003 workshop on text analysis and search for bioinformatics, pp 1–7

  • Gormley MR, Yu M, Dredze M (2015) Improved relation extraction with feature-rich compositional embedding models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1774–1784

  • Guo J, Che W, Wang H, Liu T (2014) Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 110–120

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  • He H, Lin J (2016) Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of human language technologies: the annual conference of the North American chapter of the association for computational linguistics, pp 937–948

  • Henriksson A, Zhao J, Boström H, Dalianis H (2015) Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection. In: 2015 IEEE international conference on bioinformatics and biomedicine, pp 343–350

  • Jacobson O, Dalianis H (2016) Applying deep learning on electronic health records in Swedish to predict healthcare-associated infections. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 191–195

  • Kang T, Zhang S, Xu N, Wen D, Zhang X, Lei J (2017) Detecting negation and scope in Chinese clinical notes using character and word embedding. Comput Methods Programs Biomed 140:53–59

    Article  Google Scholar 

  • Kou G, Lu Y, Peng Y, Shi Y (2012) Evaluation of classification algorithms using mcdm and rank correlation. Int J Inf Technol Decis Mak 11(01):197–225

    Article  Google Scholar 

  • Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275:1–12

    Article  Google Scholar 

  • Kudo T (2005) CRF++: Yet another CRF toolkit. https://sourceforge.net/projects/crfpp/

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, vol 1, pp 282–289

  • Ling W, Dyer C, Black AW, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1299–1304

  • Marimon M, Vivaldi J, Bel N (2017) Annotation of negation in the IULA Spanish clinical record corpus. In: Proceedings of the workshop computational semantics beyond events and roles, pp 43–52

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of workshop at international conference on learning representations, pp 1–12

  • Morante R, Daelemans W (2009) A metalearning approach to processing the scope of negation. In: Proceedings of the thirteenth conference on computational natural language learning, pp 21–29

  • Nakov P, Zesch T (eds) (2014) Proceedings of the 8th international workshop on semantic evaluation

  • Nguyen TH, Grishman R (2016) Combining neural networks and log-linear models to improve relation extraction. In: Proceedings of IJCAI workshop on deep learning for artificial intelligence, pp 1–7

  • Oronoz M, Casillas A, Gojenola K, Pérez A (2013) Automatic annotation of medical records in Spanish with disease, drug and substance names. In: Progress in pattern recognition, image analysis, computer vision, and applications—18th Iberoamerican congress, vol 8259, pp 536–543

  • Oronoz M, Gojenola K, Pérez A, Díaz de Ilarraza A, Casillas A (2015) On the creation of a clinical gold standard corpus in Spanish: mining adverse drug reactions. J Biomed Inform 56:318–332

    Article  Google Scholar 

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543

  • Pérez A, Gojenola K, Casillas A, Oronoz M, Díaz de Ilarraza A (2015) Computer aided classification of diagnostic terms in Spanish. Expert Syst Appl 42(6):2949–2958

    Article  Google Scholar 

  • Segura-Bedmar I, Suárez-Paniagua V, Martínez P (2015) Exploring word embedding for drug name recognition. In: Proceedings of the sixth international workshop on health text mining and information analysis, pp 64–72

  • Skeppstedt M (2011) Negation detection in Swedish clinical text an adaption of NegEx to Swedish. J Biomed Semant 2(3):1–12

    Google Scholar 

  • Vincze V, Szarvas G, Farkas R, Móra G, Csirik J (2008) The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinform 9(11):1–9

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by the Spanish Ministry of Science and Innovation (PROSAMED: TIN2016-77820-C3-1-R and TADEEP: TIN2015-70214-P) and the Basque Government (DETEAMI: Ministry of Health 2014111003, Predoctoral Grant: PRE 2016 1 0128).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Santiso.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santiso, S., Casillas, A., Pérez, A. et al. Word embeddings for negation detection in health records written in Spanish. Soft Comput 23, 10969–10975 (2019). https://doi.org/10.1007/s00500-018-3650-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3650-7

Keywords

Navigation