Abstract
A correct label classification of data allows improving healthcare processes and research. However, labeling is a difficult and expensive process, which limits its use and quality. We propose a proof of concept based on Natural Language Processing and active learning, in order to automatically structure information from a text in Spanish in the field of echocardiography.
Echocardiographic reports from a Health National System Cardiology Department were analyzed. Reports were divided into a training corpus (26,699 reports) and a validation corpus (2,881 reports). The design of the model was focused on the automatic labeling of aortic and mitral valve disease (stenosis/insufficiency) and their valve nature (native/prosthetic). The following steps were followed to build the models: data preparation, vectorization, and model fitting and validation. Results were compared with the manually labeled ground truth data from the physicians reporting the echocardiographic studies.
Four machine learning algorithms were compared: logistic regression, naïve bayes, random forest, and support vector machine; obtaining the last our best results with areas under the ROC curve of 0.92 and 0.93 for aortic and mitral stenosis, 0.87 and 0.89 for aortic and mitral insufficiency, and 0.97 and 0.96 for native aortic and mitral valve, respectively. Natural Language processing tools are useful to automatically structure and label echocardiographic information in Spanish text format. The developed models combined with active learning are capable of performing a correct prospective labeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banerjee, I., Madhavan, S., Goldman, R.E., Rubin, D.L.: Intelligent word embeddings of free-text radiology reports. In: AMIA Annual Symposium Proceedings, pp. 411–420 (2017)
Bressan, R.S., Camargo, G., Bugatti, P.H., Saito, P.T.M.: Exploring active learning based on representativeness and uncertainty for biomedical data classification. IEEE J. Biomed. Health Inf. 23(6), 2238–2244 (2018)
Chen, J., Abbod, M., Shieh, J.S.: Integrations between autonomous systems and modern computing techniques: a mini review. Sensors 19(18), 3897 (2019)
Chen, P.H.: Essential elements of natural language processing: what the radiologist should know. Acad. Radiol. 27(1), 6–12 (2020)
Dorado-Díaz, P.I., Sampedro-Gómez, J., Vicente-Palacios, V., Sánchez, P.L.: Applications of artificial intelligence in cardiology. the future is already here. Rev. Esp. Cardiol. (Engl. Ed.) 72(12), 1065–1075 (2019)
Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)
Evangelista, A., et al.: European association of echocardiography recommendations for standardization of performance, digital storage and reporting of echocardiographic studies. Eur. J. Echocardiogr. 9(4), 438–48 (2008)
Honnibal, M., Johnson, M.: An improved non-monotonic transition system for dependency parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, September 2015
Hughes, K.S., Zhou, J., Bao, Y., Singh, P., Wang, J., Yin, K.: Natural language processing to facilitate breast cancer research and management. Breast J
Hughes, M., Li, I., Kotoulas, S., Suzumura, T.: Medical text classification using convolutional neural networks. Stud. Health Technol. Inform. 235, 246–250 (2017)
Kim, Y., et al.: Extraction of left ventricular ejection fraction information from various types of clinical reports. J. Biomed. Inf. 67, 42–48 (2017)
Kreimeyer, K., et al.: Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J. Biomed. Inf. 73, 14–29 (2017)
McInnes, L., Healy, J., Astels, S.: Hdbscan : hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017)
McInnes, L., Healy, J., Saul, N., Grossberger, L.: Umap: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018)
Nath, C., Albaghdadi, M.S., Jonnalagadda, S.R.: A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One 11(4), e0153749 (2017)
Nowotka, M.M., Gaulton, A., Mendez, D., Bento, A.P., Hersey, A., Leach, A.: Using chembl web services for building applications and data processing workflows relevant to drug discovery. Expert Opin. Drug Discov. 12(8), 757–767 (2017)
Névéol, A., Dalianis, H., Velupillai, S., Savova, G., Zweigenbaum, P.: Clinical natural language processing in languages other than english: opportunities and challenges. J. Biomed. Seman. 9(1), 12 (2018)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Poder, T.G., Fisette, J.F., Déry, V.: Speech recognition for medical dictation: Overview in quebec and systematic review. J. Med. Syst. 42(5), 89 (2018)
Pons, E., Braun, L.M., Hunink, M.G., Kors, J.A.: Natural language processing in radiology: a systematic review. Radiology 279(2), 329–43 (2016)
Rodríguez, J.D., Pérez, A., Lozano, J.A.: Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 569–75 (2009)
Sampedro-Gómez, J., et al.: Machine learning to predict stent restenosis based on daily demographic, clinical and angiographic characteristics. Can. J. Cardiol. 36, 1624–1630 (2020)
Wong, J., Manderson, T., Abrahamowicz, M., Buckeridge, D.L., Tamblyn, R.: Can hyperparameter tuning improve the performance of a super learner?: a case study. Epidemiology 30(4), 521–531 (2019)
Zech, J., et al.: Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology 287(2), 570–580 (2018)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pérez-Sánchez, P. et al. (2021). Automatic Classification of Valve Diseases Through Natural Language Processing in Spanish and Active Learning. In: Rojas, I., Castillo-Secilla, D., Herrera, L.J., Pomares, H. (eds) Bioengineering and Biomedical Signal and Image Processing. BIOMESIP 2021. Lecture Notes in Computer Science(), vol 12940. Springer, Cham. https://doi.org/10.1007/978-3-030-88163-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-88163-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88162-7
Online ISBN: 978-3-030-88163-4
eBook Packages: Computer ScienceComputer Science (R0)