Skip to main content

Advertisement

Log in

Mining clinical text for stroke prediction

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

An Erratum to this article was published on 05 October 2015

Abstract

One of the main problems in treating stroke patients is accurate and timely triage and assessment. Not all stroke events have direct severe consequences. Full strokes are often preceded by transient ischemic attacks (TIA) or mini strokes, which exhibit signs and symptoms similar to less concerning health events, e.g., migraines. In this paper, natural language techniques are presented to process a large collection of medical narrative descriptions extracting features that can be subsequently used for automatic classification using Data Mining algorithms. We reviewed 5658 cases and analyzed the chief complaint and history of the patient illness reported at stroke rapid assessment unit (SRAU) at Victoria General Hospital (VGH). Data were collected by neurologists and stroke nurses between years 2008 and 2013. Based on a clinician-supplied list of important sign and symptom terms, we translated narrative medical text into well-codified sentences achieving an impressive agreement with a human expert. Afterwards, Data Mining algorithms were applied on codified data and obtaining not only prediction models, but also important weights for the codified terms. An extensive experimental evaluation of several classifiers is provided based on past data to predict new cases. Notably, we achieved a sensitivity of about 84 % and specificity of 64 % using support vector machines (SVM). The top terms identified by data mining algorithms were responsible for most of the prediction quality; therefore, they can be used to build a questionnaire-like, online application that can be employed as a first-line screening in triage for detecting stroke/TIA or mimic and help triage decide for the next step of treatment or discharge the patient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Implementing the application is out of the scope of this project, but it is under progress.

  2. The ABCD score alone did not give us acceptable levels of sensitivity and specificity.

  3. If the estimate is 0.8 or above, there is excellent agreement between the algorithm and the human assessment, the score between 0.6 and 0.8 is considered good agreement (Goryachev et al. 2006).

  4. Due to IP restrictions, we cannot provide here the list of important terms and the weights we derived for them. However, the interested readers can contact Dr. Andrew Penn on how to obtain this information. Also, the front-end application is not part of this study. Again, details on the front-end application can be obtained from Dr. Andrew Penn.

References

  • Al-Haddad MA, Friedlin J, Kesterson J, Waters JA, Aguilar-Saavedra JR, Schmidt CM (2010) Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms. HPB 12(10):688–695

    Article  Google Scholar 

  • Amini L, Azarpazhouh R, Farzadfar MT, Mousavi SA, Jazaieri F, Khorvash F, Norouzi R, Toghianfar N (2013) Prediction and control of stroke by data mining. Int J Prev Med 4(2):245

    Google Scholar 

  • Averbuch M, Karson T, Ben-Ami B, Maimon O, Rokach L (2004) Context-sensitive medical information retrieval. In: Proceedings of the 11th World Congress on Medical Informatics (MEDINFO-2004), Citeseer. 1–8

  • Barrett N, Weber-Jahnke J (2011) Building a biomedical tokenizer using the token lattice design pattern and the adapted viterbi algorithm. BMC Bioinform 12(3):1

    Article  Google Scholar 

  • Cerrito P (2001) Application of data mining for examining polypharmacy and adverse effects in cardiology patients. Cardiovasc Toxicol 1(3):177–179

    Article  MathSciNet  Google Scholar 

  • Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G (2000) Coding neuroradiology reports for the northern manhattan stroke study: a comparison of natural language processing and manual review. Comput Biomed Res 33(1):1–10

    Article  Google Scholar 

  • Fiszman M, Chapman WW, Evans SR, Haug PJ (1999) Automatic identification of pneumonia related concepts on chest X-ray reports. In: Proceedings of the AMIA Symposium, American Medical Informatics Association. 67

  • Florkowski CM (2008) Sensitivity, specificity, receiver-operating characteristic (roc) curves and likelihood ratios: communicating the performance of diagnostic tests. Clin Biochem Rev 29(1):S83

    Google Scholar 

  • Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1(2):161–174

    Article  Google Scholar 

  • Friedman C, Shagina L, Socratous SA, Zeng X (1996) A web-based version of medlee: a medical language extraction and encoding system. In: Proceedings of the AMIA Annual Fall Symposium, American Medical Informatics Association. 938

  • Glasgow JM, Kaboli PJ (2010) Detecting adverse drug events through data mining. Am J Health Syst Pharm 67(4):317–320

    Article  Google Scholar 

  • Goryachev S, Sordo M, Zeng QT, Ngo L (2006) Implementation and evaluation of four different methods of negation detection. DSG, Boston

    Google Scholar 

  • Heart and Stroke foundation (2015) Statistics. http://www.heartandstroke.com/site/c.ikIQLcMWJtE/b.3483991/k.34A8/Statistics.htm. Accessed Jan 2015

  • Hripcsak G, Austin JH, Alderson PO, Friedman C (2002) Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports 1. Radiology 224(1):157–163

    Article  Google Scholar 

  • Kelleher JD, Mac Namee B (2008) A review of negation in clinical texts: dit technical report: Soc-aig-001-08. http://www.comp.dit.ie/bmacnamee/papers/negationinclinicaltexts_article.pdf

  • NIH (2015) Stroke, hope through research. http://www.ninds.nih.gov/disorders/stroke/detail_stroke.htm. Accessed Jan 2015

  • Regnier M (2012) Focus on stroke: predicting and preventing stroke. http://blog.wellcome.ac.uk/2012/05/07/focus-on-stroke-predicting-and-preventing-stroke/. Accessed Jan 2015

  • University of Waikato, New Zealand (2014) Weka (machine learning). http://en.wikipedia.org/wiki/Weka(machine learning). Accessed Dec 2014

  • Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L (2012) Using text-mining techniques in electronic patient records to identify adrs from medicine use. Br J Clin Pharmacol 73(5):674–684

    Article  Google Scholar 

  • Wendy W (2001) Chapman, will bridewell, paul hanbury, gregory f. cooper, and bruce g. buchanan. 2001. a simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310

    Article  Google Scholar 

  • Wiktionary (2013) Category:english words suffixed with -n’t. http://en.wiktionary.org/. Accessed Dec 2014

  • Wikipedia (2014) Abcd score. http://en.wikipedia.org/wiki/ABCD_score. Accessed Dec 2014

Download references

Acknowledgments

The authors would like to acknowledge Kristine Votova, Ph.D., the project manager for the SpecTRA Research Project and the Island Health clinical research team at the Stroke Rapid Assessment Unit for their support. Funding for the natural experiment in stroke care and the large-scale personalized medicine for mass spectrometry in rapid TIA triage comes from Canadian Institute of Health Research (2009–2012) and Genome Canada/BC (2013–2017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elham Sedghi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sedghi, E., Weber, J.H., Thomo, A. et al. Mining clinical text for stroke prediction. Netw Model Anal Health Inform Bioinforma 4, 16 (2015). https://doi.org/10.1007/s13721-015-0090-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-015-0090-5

Keywords

Navigation