Skip to main content

Automatic Classification of Valve Diseases Through Natural Language Processing in Spanish and Active Learning

  • Conference paper
  • First Online:
Bioengineering and Biomedical Signal and Image Processing (BIOMESIP 2021)

Abstract

A correct label classification of data allows improving healthcare processes and research. However, labeling is a difficult and expensive process, which limits its use and quality. We propose a proof of concept based on Natural Language Processing and active learning, in order to automatically structure information from a text in Spanish in the field of echocardiography.

Echocardiographic reports from a Health National System Cardiology Department were analyzed. Reports were divided into a training corpus (26,699 reports) and a validation corpus (2,881 reports). The design of the model was focused on the automatic labeling of aortic and mitral valve disease (stenosis/insufficiency) and their valve nature (native/prosthetic). The following steps were followed to build the models: data preparation, vectorization, and model fitting and validation. Results were compared with the manually labeled ground truth data from the physicians reporting the echocardiographic studies.

Four machine learning algorithms were compared: logistic regression, naïve bayes, random forest, and support vector machine; obtaining the last our best results with areas under the ROC curve of 0.92 and 0.93 for aortic and mitral stenosis, 0.87 and 0.89 for aortic and mitral insufficiency, and 0.97 and 0.96 for native aortic and mitral valve, respectively. Natural Language processing tools are useful to automatically structure and label echocardiographic information in Spanish text format. The developed models combined with active learning are capable of performing a correct prospective labeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Banerjee, I., Madhavan, S., Goldman, R.E., Rubin, D.L.: Intelligent word embeddings of free-text radiology reports. In: AMIA Annual Symposium Proceedings, pp. 411–420 (2017)

    Google Scholar 

  2. Bressan, R.S., Camargo, G., Bugatti, P.H., Saito, P.T.M.: Exploring active learning based on representativeness and uncertainty for biomedical data classification. IEEE J. Biomed. Health Inf. 23(6), 2238–2244 (2018)

    Article  Google Scholar 

  3. Chen, J., Abbod, M., Shieh, J.S.: Integrations between autonomous systems and modern computing techniques: a mini review. Sensors 19(18), 3897 (2019)

    Article  Google Scholar 

  4. Chen, P.H.: Essential elements of natural language processing: what the radiologist should know. Acad. Radiol. 27(1), 6–12 (2020)

    Article  Google Scholar 

  5. Dorado-Díaz, P.I., Sampedro-Gómez, J., Vicente-Palacios, V., Sánchez, P.L.: Applications of artificial intelligence in cardiology. the future is already here. Rev. Esp. Cardiol. (Engl. Ed.) 72(12), 1065–1075 (2019)

    Google Scholar 

  6. Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)

    Article  Google Scholar 

  7. Evangelista, A., et al.: European association of echocardiography recommendations for standardization of performance, digital storage and reporting of echocardiographic studies. Eur. J. Echocardiogr. 9(4), 438–48 (2008)

    Article  Google Scholar 

  8. Honnibal, M., Johnson, M.: An improved non-monotonic transition system for dependency parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, September 2015

    Google Scholar 

  9. Hughes, K.S., Zhou, J., Bao, Y., Singh, P., Wang, J., Yin, K.: Natural language processing to facilitate breast cancer research and management. Breast J

    Google Scholar 

  10. Hughes, M., Li, I., Kotoulas, S., Suzumura, T.: Medical text classification using convolutional neural networks. Stud. Health Technol. Inform. 235, 246–250 (2017)

    Google Scholar 

  11. Kim, Y., et al.: Extraction of left ventricular ejection fraction information from various types of clinical reports. J. Biomed. Inf. 67, 42–48 (2017)

    Article  Google Scholar 

  12. Kreimeyer, K., et al.: Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J. Biomed. Inf. 73, 14–29 (2017)

    Article  Google Scholar 

  13. McInnes, L., Healy, J., Astels, S.: Hdbscan : hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017)

    Article  Google Scholar 

  14. McInnes, L., Healy, J., Saul, N., Grossberger, L.: Umap: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018)

    Article  Google Scholar 

  15. Nath, C., Albaghdadi, M.S., Jonnalagadda, S.R.: A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One 11(4), e0153749 (2017)

    Article  Google Scholar 

  16. Nowotka, M.M., Gaulton, A., Mendez, D., Bento, A.P., Hersey, A., Leach, A.: Using chembl web services for building applications and data processing workflows relevant to drug discovery. Expert Opin. Drug Discov. 12(8), 757–767 (2017)

    Google Scholar 

  17. Névéol, A., Dalianis, H., Velupillai, S., Savova, G., Zweigenbaum, P.: Clinical natural language processing in languages other than english: opportunities and challenges. J. Biomed. Seman. 9(1), 12 (2018)

    Article  Google Scholar 

  18. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Poder, T.G., Fisette, J.F., Déry, V.: Speech recognition for medical dictation: Overview in quebec and systematic review. J. Med. Syst. 42(5), 89 (2018)

    Article  Google Scholar 

  20. Pons, E., Braun, L.M., Hunink, M.G., Kors, J.A.: Natural language processing in radiology: a systematic review. Radiology 279(2), 329–43 (2016)

    Article  Google Scholar 

  21. Rodríguez, J.D., Pérez, A., Lozano, J.A.: Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 569–75 (2009)

    Article  Google Scholar 

  22. Sampedro-Gómez, J., et al.: Machine learning to predict stent restenosis based on daily demographic, clinical and angiographic characteristics. Can. J. Cardiol. 36, 1624–1630 (2020)

    Article  Google Scholar 

  23. Wong, J., Manderson, T., Abrahamowicz, M., Buckeridge, D.L., Tamblyn, R.: Can hyperparameter tuning improve the performance of a super learner?: a case study. Epidemiology 30(4), 521–531 (2019)

    Article  Google Scholar 

  24. Zech, J., et al.: Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology 287(2), 570–580 (2018)

    Google Scholar 

  25. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Víctor Vicente-Palacios .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pérez-Sánchez, P. et al. (2021). Automatic Classification of Valve Diseases Through Natural Language Processing in Spanish and Active Learning. In: Rojas, I., Castillo-Secilla, D., Herrera, L.J., Pomares, H. (eds) Bioengineering and Biomedical Signal and Image Processing. BIOMESIP 2021. Lecture Notes in Computer Science(), vol 12940. Springer, Cham. https://doi.org/10.1007/978-3-030-88163-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88163-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88162-7

  • Online ISBN: 978-3-030-88163-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics