Skip to main content

Advertisement

Log in

Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text

  • Systems-Level Quality Improvement
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Electronic Medical Records (EMRs) are written in an unstructured way, often using natural language. Information Extraction (IE) may be used for acquiring knowledge from such texts, including the automatic recognition of meaningful entities, through models for Named Entity Recognition (NER). However, while most work on the previous was made for English, this experience aimed at testing different methods in Portuguese text, more precisely, on the domain of Neurology, and take some conclusions. This paper comprised the comparison between Conditional Random Fields (CRF), bidirectional Long Short-term Memory - Conditional Random Fields (BiLSTM-CRF) and a BiLSTM-CRF with residual learning connections, using not only Portuguese texts from medical journals but also texts from the Coimbra Hospital and Universitary Centre (CHUC) Neurology Service. Furthermore, the performances of BiLSTM-CRF models using word embeddings (WEs) trained with clinical text and WEs trained with general language texts were compared. Deep learning models achieved F1-Scores of nearly 83% and 75%, respectively for relaxed and strict evaluation, on texts extracted from the medical journal. For texts collected from the Hospital, the same achieved F1-Scores of nearly 71% and 62%. This work concludes that deep learning models outperform the shallow learning models and that in-domain WEs get better results than general language WEs, even when the latter are trained with much more text than the former. Furthermore, the results show that it is possible to extract information from Hospital clinical texts with models trained with clinical cases extracted from medical journals, and thus openly available. Nevertheless, such results still require a healthcare technician to check if the information is well extracted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Notes

  1. https://www.cdc.gov/nchs/fastats/electronic-medical-records.htm

  2. http://www.sinapse.pt/archive.php

  3. https://fasttext.cc/docs/en/crawl-vectors.html

  4. https://scipy.org/

  5. https://github.com/fabioacl/PortugueseClinicalNER

References

  1. Folland, S., Goodman, A.C., Stano, M., Introduction. In: The Economics of Health and Health Care, 8th edn., chap. 1, pp. 29–54. Pearson Prentice Hall Upper Saddle River, NJ, 2017.

  2. Oderkirk, J., Readiness of Electronic Health Record Systems to Contribute to National Health Information and Research. OECD Health Working Papers (99), 1–80, 2017

  3. Lamy, M., Pereira, R., Ferreira, J.C., de Vasconcelos, J.B., Melo, F., Velez, I., Extracting Clinical Information from Electronic Medical Records. In: P. Novais, J.J. Jung, G. Villarrubia-González, A. Fernández-Caballero, E. Navarro, P. González, D. Carneiro, A. pinto, A.T. Campbell, D. Durães (eds.) International Symposium on Ambient Intelligence, Advances in Intelligent Systems and Computing, pp. 113–120. Springer, 2018.

  4. Berezina, K., Bilgihan, A., Cobanoglu, C., and Okumus, F., Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel Reviews. Journal of Hospitality Marketing & Management 25(1):1–24, 2016.

    Article  Google Scholar 

  5. Cai, T., Giannopoulos, A. A., Yu, S., Kelil, T., Ripley, B., Kumamaru, K. K., Rybicki, F. J., and Mitsouras, D., Natural Language Processing Technologies in Radiology Research and Clinical Applications. Radiographics 36(1):176–191, 2016.

    Article  Google Scholar 

  6. Ferreira, L., Teixeira, A.J.S., Cunha, J.P., Information Extraction from Portuguese Hospital Discharge Letters. VI Jornadas en Technologia del Habla and II Iberian SL Tech Workshop (January), 39–42, 2010.

  7. Névéol, A., Dalianis, H., Velupillai, S., Savova, G., Zweigenbaum, P., Clinical Natural Language Processing in Languages other than English: Opportunities and Challenges. Journal of Biomedical Semantics 9(1), 12, 2018. DOI https://doi.org/10.1186/s13326-018-0179-8. URL https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326- 018-0179-8

  8. Lopes, F., Teixeira, C., Gonçalo Oliveira, H., Named entity recognition in portuguese neurology text using crf. In: P. Moura Oliveira, P. Novais, L.P. Reis (eds.) Progress in Artificial Intelligence, pp. 336–348. Springer International Publishing, Cham, 2019.

  9. Gold, S., Elhadad, N., Zhu, X., Cimino, J.J., Hripcsak, G., Extracting structured medication event information from discharge summaries. In: AMIA Annual Symposium Proceedings, pp. 237–241. American Medical Informatics Association, 2008.

  10. Mykowiecka, A., Marciniak, M., and Kupść, A., Rule-based Information Extraction from Patients’ Clinical Data. Journal of Biomedical Informatics 42(5):923–936, 2009. https://doi.org/10.1016/j.jbi.2009.07.007.

    Article  PubMed  Google Scholar 

  11. Skeppstedt, M., Kvist, M., Dalianis, H., Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text. In: LREC, pp. 1250–1257, 2012.

  12. Rais, M., Lachkar, A., Lachkar, A., Ouatik, S.E.A., A Comparative Study of Biomedical Named Entity Recognition Methods based Machine Learning Approach. In: 2014 Third IEEE International Colloquium in Information Science and Technology (CIST), pp. 329–334. IEEE, 2014. DOI https://doi.org/10.1109/CIST.2014.7016641. URL http://ieeexplore.ieee.org/document/7016641/

  13. Wang, Y., Yu, Z., Chen, L., Chen, Y., Liu, Y., Hu, X., and Jiang, Y., Supervised Methods for Symptom Name Recognition in Free-text Clinical Records of Traditional Chinese Medicine: An Empirical Study. Journal of Biomedical Informatics 47:91–104, 2014. https://doi.org/10.1016/j.jbi.2013.09.008.

    Article  PubMed  Google Scholar 

  14. Skeppstedt, M., Kvist, M., Nilsson, G. H., and Dalianis, H., Automatic Recognition of Disorders, Findings, Pharmaceuticals and Body Structures from Clinical Text: An Annotation and Machine Learning Study. Journal of Biomedical Informatics 49:148–158, 2014. https://doi.org/10.1016/j.jbi.2014.01.012.

    Article  PubMed  Google Scholar 

  15. Henriksson, A., Dalianis, H., Kowalski, S., Generating Features for Named Entity Recognition by Learning Prototypes in Semantic Space: The Case of De-identifying Health Records. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 450–457. IEEE, 2014. DOI https://doi.org/10.1109/BIBM.2014.6999199. URL http://ieeexplore.ieee.org/document/6999199/.

  16. Wu, Y., Xu, J., Jiang, M., Zhang, Y., Xu, H., A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text. In: AMIA ... Annual Symposium proceedings. AMIA Symposium, vol. 2015, pp. 1326–1333, 2015. URL http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC 4765694.

  17. Goodfellow, I., Bengio, Y., Courville, A., Sequence Modeling: Recurrent and Recursive Nets. In: Deep Learning, chap. 10, pp. 363–408. MIT Press, 2016.

  18. Hochreiter, S., and Schmidhuber, J., Long Short-Term Memory. Neural Computation 9(8):1735–1780, 1997.

    Article  CAS  Google Scholar 

  19. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y., On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.

  20. Goodfellow, I., Bengio, Y., Courville, A., Convolutional Networks. In: Deep Learning, chap. 9, pp. 321–362. MIT Press, 2016.

  21. Goodfellow, I., Bengio, Y., Courville, A., Deep Feedforward Networks. In: Deep Learning, chap. 6, pp. 163–220. MIT Press, 2016.

  22. Luu, T.M., Phan, R., Davey, R., Chetty, G., Clinical Name Entity Recognition Based on Recurrent Neural Networks. 2018 18th International Conference on Computational Science and Applications (ICCSA) pp. 1–9, 2018. DOI https://doi.org/10.1109/iccsa.2018.8439147

  23. Kelly, L., Goeuriot, L., Suominen, H., Névéol, A., Palotti, J., Zuccon, G., Overview of the CLEF eHealth evaluation lab 2016. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 255–266. Springer, 2016.

  24. Chokwijitkul, T., Nguyen, A., Hassanzadeh, H., Perez, S., Hospital, L., Identifying Risk Factors For Heart Disease in Electronic Medical Records : A Deep Learning Approach. In: Proceedings of the BioNLP 2018 workshop, pp. 18–27, 2018.

  25. Wu, Y., Jiang, M., Xu, J., Zhi, D., Xu, H., Clinical Named Entity Recognition Using Deep Learning Models. In: AMIA Annual Symposium proceedings. AMIA Symposium, pp. 1812–1819, 2018.

  26. Xu, K., Zhou, Z., Hao, T., Liu, W., A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 355–365, 2018. DOI https://doi.org/10.1007/978-3-319-64861-3_33

  27. Jauregi Unanue, I., Zare Borzeshi, E., and Piccardi, M., Recurrent Neural Networks with Specialized Word Embeddings for Health-domain Named-entity Recognition. Journal of Biomedical Informatics 76:102–109, 2017. https://doi.org/10.1016/j.jbi.2017.11.007.

    Article  PubMed  Google Scholar 

  28. Tran, Q., MacKinlay, A., Jimeno Yepes, A., Named Entity Recognition with Stack Residual LSTM and Trainable Bias Decoding. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 566–575. Asian Federation of Natural Language Processing, Taipei, Taiwan, 2017. URL https://www.aclweb.org/anthology/I17-1057.

  29. Prakash, A., Hasan, S.A., Lee, K., Datla, V., Qadir, A., Liu, J., Farri, O., Neural Paraphrase Generation with Stacked Residual LSTM Networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2923–2934. The COLING 2016 Organizing Committee, Osaka, Japan, 2016.

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, 2016. DOI https://doi.org/10.1109/CVPR.2016.90

  31. de Castro, P.V.Q., da Silva, N.F.F., da Silva Soares, A., Portuguese named entity recognition using lstm-crf. In: International Conference on Computational Processing of the Portuguese Language, pp. 83–92. Springer, 2018.

  32. Souza, F., Nogueira, R., Lotufo, R.: Portuguese named entity recognition using bert-crf. arXiv preprint arXiv:1909.10649, 2019.

  33. Devlin, J., Chang, M.W., Lee, K., Toutanova, K., BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. ACL Press, Minneapolis, Minnesota, 2019.

  34. dos Santos, C., Guimarães, V., Boosting Named Entity Recognition with Neural Character Embeddings. Proceedings of the Fifth Named Entity Workshop pp. 25–33 (2015). DOI https://doi.org/10.18653/v1/W15-3904. URL http://aclweb.org/anthology/W15-3904

  35. Santos, C.D., Zadrozny, B., Learning Character-level Representations for Part-of-speech Tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1818–1826, 2014.

  36. de Neurologia, S.P., Sinapse. In: Publicações da Sociedade Portuguesa de Neurologia, 1, vol. 17, pp. 1–196. Sociedade Portuguesa de Neurologia, Lisbon, 2017.

  37. de Neurologia, S.P., Sinapse. In: Publicações da Sociedade Portuguesa de Neurologia, 2, vol. 17, pp. 1–184. Sociedade Portuguesa de Neurologia, Lisbon, 2017.

  38. Klatt, J., Feldwisch-Drentrup, H., Ihle, M., Navarro, V., Neufang, M., Teixeira, C., Adam, C., Valderrama, M., Alvarado-Rojas, C., and Witon, A., Others: The EPILEPSIAE database: An Extensive Electroencephalography Database of Epilepsy Patients. Epilepsia 53(9):1669–1676, 2012.

    Article  Google Scholar 

  39. Tjong Kim Sang, E.F., De Meulder, F., Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL ‘03, pp. 142–147. Association for Computational Linguistics, Stroudsburg, PA, USA, 2003. DOI https://doi.org/10.3115/1119176.1119195.

  40. Lopes, F., Teixeira, C., Gonçalo Oliveira, H., Contributions to clinical named entity recognition in Portuguese. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 223–233. Association for Computational Linguistics, Florence, Italy, 2019. URL https://www.aclweb.org/anthology/W19-5024

  41. Mikolov, T., Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Learning Word Vectors for 157 Languages. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), pp. 3483–3487, 2018.

  42. Rehurek, R., Sojka, P., Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, 2010.

  43. Mikolov, T., Le, Q.V., Sutskever, I., Exploiting Similarities among Languages for Machine Translation. arXiv preprint arXiv:1309.4168, 2013.

  44. Bouma, G., Normalized (Pointwise) Mutual Information in Collocation Extraction. Proceedings of the Biennial GSCL Conference 2009 pp. 31–40, 2009.

  45. Klinger, R., Tomanek, K., Classical Probabilistic Models and Conditional Random Fields. Tech. Rep. TR07-2-013, Department of Computer Science, Dortmund University of Technology, 2007.

  46. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O., Understanding Deep Learning Requires Rethinking Generalization. arXiv preprint arXiv:1611.03530, 2016.

  47. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research 15(1):1929–1958, 2014.

    Google Scholar 

  48. Benjamini, Y., and Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57(1):289–300, 1995.

    Google Scholar 

  49. Newman-Griffis, D., Zirikly, A., Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility. In: Proceedings of the BioNLP 2018 workshop, pp. 1–11 (2018). URL http://arxiv.org/abs/1806.02814

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fábio Lopes.

Ethics declarations

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Systems-Level Quality Improvement

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lopes, F., Teixeira, C. & Gonçalo Oliveira, H. Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text. J Med Syst 44, 77 (2020). https://doi.org/10.1007/s10916-020-1542-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-020-1542-8

Keywords

Navigation