Abstract
Coronary artery disease (CAD) is the major cause of human death worldwide. The development of new CAD early diagnosis methods based on medical big data has a great potential to reduce the risk of CAD death. In this process, neural network (NN), as a powerful tool for electronic medical record (EMR) processing, enables extract structured data accurately to unlock medical information and to further improve CAD diagnosis. However, the excessive time and labor caused by dataset’s annotation is the main limitation of its application, especially on the CAD records situation with large natural language text and biomedical professional content. In this study, we present an annotation cost saving NN approach for CAD records, which is bootstrapped by deep language model with contextual embedding pre-trained on large unannotated CAD corpus. To demonstrate the feasibility and to further evaluate the performance of our approach, we performed pre-training experiment and term classification experiment, by using the unannotated and annotated CAD records, respectively. The results showed that our contextual embedding bootstrapped NN for CAD records has better performance under the condition of annotations reduction.
Graphical abstract
Similar content being viewed by others
References
Bonow RO, Mann DL, Zipes DP, Libby P (2011) Braunwald’s Heart disease e-book: a textbook of cardiovascular medicine. Elsevier Health Sciences
Organization WH (2019) World health statistics 2019: monitoring health for the SDGs, sustainable development goals
Alizadehsani R, Roshanzamir M, Abdar M, Beykikhoshk A, Khosravi A, Panahiazar M, Koohestani A, Khozeimeh F, Nahavandi S, Sarrafzadegan N (2019) A database for using machine learning and data mining techniques for coronary artery disease diagnosis. Sci Data 6:1–13
Alizadehsani R, Abdar M, Roshanzamir M, Khosravi A, Kebria PM, Khozeimeh F, Nahavandi S, Sarrafzadegan N, Acharya UR (2019) Machine learning-based coronary artery disease diagnosis: a comprehensive review. Comput Biol Med 111:103346
Pławiak P (2018) Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system. Expert Syst Appl 92:334–349
Alizadehsani R, Hosseini MJ, Khosravi A, Khozeimeh F, Roshanzamir M, Sarrafzadegan N, Nahavandi S (2018) Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries. Comput Methods Prog Biomed 162:119–127
Lamy M, Pereira R, Ferreira JC, Vasconcelos JB, Melo F, Velez I (2018) Extracting clinical information from electronic medical records. In: International Symposium on Ambient Intelligence. Springer, pp 113–120
Denis M (2017) UK clinical record interactive search (CRIS). Alzheimer’s Dement J Alzheimer’s Assoc 13:P1223
Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 13:395–405
Murdoch TB, Detsky AS (2013) The inevitable application of big data to health care. Jama 309:1351–1352
Karystianis G, Nevado AJ, Kim C, Dehghan A, Keane JA, Nenadic G (2018) Automatic mining of symptom severity from psychiatric evaluation notes. Int J Methods Psychiatr Res 27:e1602
Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9:48–57
Mao R, Zhang P, Li X, Liu X, Lu M (2016) Pivot selection for metric-space indexing. Int J Mach Learn Cybern 7:311–323
Wang P, Qian Y, Soong FK, He L, Zhao H (2015) A unified tagging solution: bidirectional lstm recurrent neural network with word embedding. arXiv Prepr arXiv151100215
Yao C, Qu Y, Jin B, Guo L, Li C, Cui W, Feng L (2016) A convolutional neural network model for online medical guidance. IEEE Access 4:4094–4103
Si Y, Wang J, Xu H, Roberts K (2019) Enhancing clinical concept extraction with contextual embeddings. J Am Med Inform Assoc 26:1297–1304
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. arXiv Prepr arXiv150805326
Yang Z, Salakhutdinov R, Cohen WW (2017) Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv Prepr arXiv170306345
Gligic L, Kormilitzin A, Goldberg P, Nevado-Holgado A (2020) Named entity recognition in electronic health records using transfer learning bootstrapped neural networks. Neural Netw 121:132–139
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv Prepr arXiv180205365
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Prepr arXiv181004805
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp 3111–3119
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp 1532–1543
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Clark K, Luong M-T, Le Q V, Manning CD (2020) Electra: pre-training text encoders as discriminators rather than generators. arXiv Prepr arXiv200310555
Khin K, Burckhardt P, Padman R (2018) A deep learning architecture for de-identification of patient notes: implementation and evaluation. arXiv Prepr arXiv181001570
Zhu H, Paschalidis IC, Tahmasebi A (2018) Clinical concept extraction with contextual word embedding. arXiv Prepr arXiv181010566
Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. arXiv Prepr arXiv150508075
Funding
This work has been supported by the Shanghai Municipal Commission of Economy and Information (Grant no. XX-XXFZ-02-20-2042, XX-RGZN-01-19-6584).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional review board and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.
Informed consent
Verbal informed consent was obtained from all individual participants included in the study.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cen, X., Yuan, J., Pan, C. et al. Contextual embedding bootstrapped neural network for medical information extraction of coronary artery disease records. Med Biol Eng Comput 59, 1111–1121 (2021). https://doi.org/10.1007/s11517-021-02359-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-021-02359-1