Abstract
One of the main problems in treating stroke patients is accurate and timely triage and assessment. Not all stroke events have direct severe consequences. Full strokes are often preceded by transient ischemic attacks (TIA) or mini strokes, which exhibit signs and symptoms similar to less concerning health events, e.g., migraines. In this paper, natural language techniques are presented to process a large collection of medical narrative descriptions extracting features that can be subsequently used for automatic classification using Data Mining algorithms. We reviewed 5658 cases and analyzed the chief complaint and history of the patient illness reported at stroke rapid assessment unit (SRAU) at Victoria General Hospital (VGH). Data were collected by neurologists and stroke nurses between years 2008 and 2013. Based on a clinician-supplied list of important sign and symptom terms, we translated narrative medical text into well-codified sentences achieving an impressive agreement with a human expert. Afterwards, Data Mining algorithms were applied on codified data and obtaining not only prediction models, but also important weights for the codified terms. An extensive experimental evaluation of several classifiers is provided based on past data to predict new cases. Notably, we achieved a sensitivity of about 84 % and specificity of 64 % using support vector machines (SVM). The top terms identified by data mining algorithms were responsible for most of the prediction quality; therefore, they can be used to build a questionnaire-like, online application that can be employed as a first-line screening in triage for detecting stroke/TIA or mimic and help triage decide for the next step of treatment or discharge the patient.
Similar content being viewed by others
Notes
Implementing the application is out of the scope of this project, but it is under progress.
The ABCD score alone did not give us acceptable levels of sensitivity and specificity.
If the estimate is 0.8 or above, there is excellent agreement between the algorithm and the human assessment, the score between 0.6 and 0.8 is considered good agreement (Goryachev et al. 2006).
Due to IP restrictions, we cannot provide here the list of important terms and the weights we derived for them. However, the interested readers can contact Dr. Andrew Penn on how to obtain this information. Also, the front-end application is not part of this study. Again, details on the front-end application can be obtained from Dr. Andrew Penn.
References
Al-Haddad MA, Friedlin J, Kesterson J, Waters JA, Aguilar-Saavedra JR, Schmidt CM (2010) Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms. HPB 12(10):688–695
Amini L, Azarpazhouh R, Farzadfar MT, Mousavi SA, Jazaieri F, Khorvash F, Norouzi R, Toghianfar N (2013) Prediction and control of stroke by data mining. Int J Prev Med 4(2):245
Averbuch M, Karson T, Ben-Ami B, Maimon O, Rokach L (2004) Context-sensitive medical information retrieval. In: Proceedings of the 11th World Congress on Medical Informatics (MEDINFO-2004), Citeseer. 1–8
Barrett N, Weber-Jahnke J (2011) Building a biomedical tokenizer using the token lattice design pattern and the adapted viterbi algorithm. BMC Bioinform 12(3):1
Cerrito P (2001) Application of data mining for examining polypharmacy and adverse effects in cardiology patients. Cardiovasc Toxicol 1(3):177–179
Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G (2000) Coding neuroradiology reports for the northern manhattan stroke study: a comparison of natural language processing and manual review. Comput Biomed Res 33(1):1–10
Fiszman M, Chapman WW, Evans SR, Haug PJ (1999) Automatic identification of pneumonia related concepts on chest X-ray reports. In: Proceedings of the AMIA Symposium, American Medical Informatics Association. 67
Florkowski CM (2008) Sensitivity, specificity, receiver-operating characteristic (roc) curves and likelihood ratios: communicating the performance of diagnostic tests. Clin Biochem Rev 29(1):S83
Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1(2):161–174
Friedman C, Shagina L, Socratous SA, Zeng X (1996) A web-based version of medlee: a medical language extraction and encoding system. In: Proceedings of the AMIA Annual Fall Symposium, American Medical Informatics Association. 938
Glasgow JM, Kaboli PJ (2010) Detecting adverse drug events through data mining. Am J Health Syst Pharm 67(4):317–320
Goryachev S, Sordo M, Zeng QT, Ngo L (2006) Implementation and evaluation of four different methods of negation detection. DSG, Boston
Heart and Stroke foundation (2015) Statistics. http://www.heartandstroke.com/site/c.ikIQLcMWJtE/b.3483991/k.34A8/Statistics.htm. Accessed Jan 2015
Hripcsak G, Austin JH, Alderson PO, Friedman C (2002) Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports 1. Radiology 224(1):157–163
Kelleher JD, Mac Namee B (2008) A review of negation in clinical texts: dit technical report: Soc-aig-001-08. http://www.comp.dit.ie/bmacnamee/papers/negationinclinicaltexts_article.pdf
NIH (2015) Stroke, hope through research. http://www.ninds.nih.gov/disorders/stroke/detail_stroke.htm. Accessed Jan 2015
Regnier M (2012) Focus on stroke: predicting and preventing stroke. http://blog.wellcome.ac.uk/2012/05/07/focus-on-stroke-predicting-and-preventing-stroke/. Accessed Jan 2015
University of Waikato, New Zealand (2014) Weka (machine learning). http://en.wikipedia.org/wiki/Weka(machine learning). Accessed Dec 2014
Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L (2012) Using text-mining techniques in electronic patient records to identify adrs from medicine use. Br J Clin Pharmacol 73(5):674–684
Wendy W (2001) Chapman, will bridewell, paul hanbury, gregory f. cooper, and bruce g. buchanan. 2001. a simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310
Wiktionary (2013) Category:english words suffixed with -n’t. http://en.wiktionary.org/. Accessed Dec 2014
Wikipedia (2014) Abcd score. http://en.wikipedia.org/wiki/ABCD_score. Accessed Dec 2014
Acknowledgments
The authors would like to acknowledge Kristine Votova, Ph.D., the project manager for the SpecTRA Research Project and the Island Health clinical research team at the Stroke Rapid Assessment Unit for their support. Funding for the natural experiment in stroke care and the large-scale personalized medicine for mass spectrometry in rapid TIA triage comes from Canadian Institute of Health Research (2009–2012) and Genome Canada/BC (2013–2017).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sedghi, E., Weber, J.H., Thomo, A. et al. Mining clinical text for stroke prediction. Netw Model Anal Health Inform Bioinforma 4, 16 (2015). https://doi.org/10.1007/s13721-015-0090-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-015-0090-5