Mining clinical text for stroke prediction

Sedghi, Elham; Weber, Jens H.; Thomo, Alex; Bibok, Maximilian; Penn, Andrew M. W.

doi:10.1007/s13721-015-0090-5

Elham Sedghi¹,
Jens H. Weber¹,
Alex Thomo¹,
Maximilian Bibok² &
…
Andrew M. W. Penn²

368 Accesses
10 Citations
Explore all metrics

An Erratum to this article was published on 05 October 2015

Abstract

One of the main problems in treating stroke patients is accurate and timely triage and assessment. Not all stroke events have direct severe consequences. Full strokes are often preceded by transient ischemic attacks (TIA) or mini strokes, which exhibit signs and symptoms similar to less concerning health events, e.g., migraines. In this paper, natural language techniques are presented to process a large collection of medical narrative descriptions extracting features that can be subsequently used for automatic classification using Data Mining algorithms. We reviewed 5658 cases and analyzed the chief complaint and history of the patient illness reported at stroke rapid assessment unit (SRAU) at Victoria General Hospital (VGH). Data were collected by neurologists and stroke nurses between years 2008 and 2013. Based on a clinician-supplied list of important sign and symptom terms, we translated narrative medical text into well-codified sentences achieving an impressive agreement with a human expert. Afterwards, Data Mining algorithms were applied on codified data and obtaining not only prediction models, but also important weights for the codified terms. An extensive experimental evaluation of several classifiers is provided based on past data to predict new cases. Notably, we achieved a sensitivity of about 84 % and specificity of 64 % using support vector machines (SVM). The top terms identified by data mining algorithms were responsible for most of the prediction quality; therefore, they can be used to build a questionnaire-like, online application that can be employed as a first-line screening in triage for detecting stroke/TIA or mimic and help triage decide for the next step of treatment or discharge the patient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting Automatic Patterns of Stroke Through Text Mining

Using natural language processing to automatically classify written self-reported narratives by patients with migraine or cluster headache

Article Open access 30 September 2022

Assessing stroke severity using electronic health record data: a machine learning approach

Article Open access 08 January 2020

Notes

Implementing the application is out of the scope of this project, but it is under progress.
The ABCD score alone did not give us acceptable levels of sensitivity and specificity.
If the estimate is 0.8 or above, there is excellent agreement between the algorithm and the human assessment, the score between 0.6 and 0.8 is considered good agreement (Goryachev et al. 2006).
Due to IP restrictions, we cannot provide here the list of important terms and the weights we derived for them. However, the interested readers can contact Dr. Andrew Penn on how to obtain this information. Also, the front-end application is not part of this study. Again, details on the front-end application can be obtained from Dr. Andrew Penn.

References

Al-Haddad MA, Friedlin J, Kesterson J, Waters JA, Aguilar-Saavedra JR, Schmidt CM (2010) Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms. HPB 12(10):688–695
Article Google Scholar
Amini L, Azarpazhouh R, Farzadfar MT, Mousavi SA, Jazaieri F, Khorvash F, Norouzi R, Toghianfar N (2013) Prediction and control of stroke by data mining. Int J Prev Med 4(2):245
Google Scholar
Averbuch M, Karson T, Ben-Ami B, Maimon O, Rokach L (2004) Context-sensitive medical information retrieval. In: Proceedings of the 11th World Congress on Medical Informatics (MEDINFO-2004), Citeseer. 1–8
Barrett N, Weber-Jahnke J (2011) Building a biomedical tokenizer using the token lattice design pattern and the adapted viterbi algorithm. BMC Bioinform 12(3):1
Article Google Scholar
Cerrito P (2001) Application of data mining for examining polypharmacy and adverse effects in cardiology patients. Cardiovasc Toxicol 1(3):177–179
Article MathSciNet Google Scholar
Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G (2000) Coding neuroradiology reports for the northern manhattan stroke study: a comparison of natural language processing and manual review. Comput Biomed Res 33(1):1–10
Article Google Scholar
Fiszman M, Chapman WW, Evans SR, Haug PJ (1999) Automatic identification of pneumonia related concepts on chest X-ray reports. In: Proceedings of the AMIA Symposium, American Medical Informatics Association. 67
Florkowski CM (2008) Sensitivity, specificity, receiver-operating characteristic (roc) curves and likelihood ratios: communicating the performance of diagnostic tests. Clin Biochem Rev 29(1):S83
Google Scholar
Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1(2):161–174
Article Google Scholar
Friedman C, Shagina L, Socratous SA, Zeng X (1996) A web-based version of medlee: a medical language extraction and encoding system. In: Proceedings of the AMIA Annual Fall Symposium, American Medical Informatics Association. 938
Glasgow JM, Kaboli PJ (2010) Detecting adverse drug events through data mining. Am J Health Syst Pharm 67(4):317–320
Article Google Scholar
Goryachev S, Sordo M, Zeng QT, Ngo L (2006) Implementation and evaluation of four different methods of negation detection. DSG, Boston
Google Scholar
Heart and Stroke foundation (2015) Statistics. http://www.heartandstroke.com/site/c.ikIQLcMWJtE/b.3483991/k.34A8/Statistics.htm. Accessed Jan 2015
Hripcsak G, Austin JH, Alderson PO, Friedman C (2002) Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports 1. Radiology 224(1):157–163
Article Google Scholar
Kelleher JD, Mac Namee B (2008) A review of negation in clinical texts: dit technical report: Soc-aig-001-08. http://www.comp.dit.ie/bmacnamee/papers/negationinclinicaltexts_article.pdf
NIH (2015) Stroke, hope through research. http://www.ninds.nih.gov/disorders/stroke/detail_stroke.htm. Accessed Jan 2015
Regnier M (2012) Focus on stroke: predicting and preventing stroke. http://blog.wellcome.ac.uk/2012/05/07/focus-on-stroke-predicting-and-preventing-stroke/. Accessed Jan 2015
University of Waikato, New Zealand (2014) Weka (machine learning). http://en.wikipedia.org/wiki/Weka(machine learning). Accessed Dec 2014
Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L (2012) Using text-mining techniques in electronic patient records to identify adrs from medicine use. Br J Clin Pharmacol 73(5):674–684
Article Google Scholar
Wendy W (2001) Chapman, will bridewell, paul hanbury, gregory f. cooper, and bruce g. buchanan. 2001. a simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310
Article Google Scholar
Wiktionary (2013) Category:english words suffixed with -n’t. http://en.wiktionary.org/. Accessed Dec 2014
Wikipedia (2014) Abcd score. http://en.wikipedia.org/wiki/ABCD_score. Accessed Dec 2014

Download references

Acknowledgments

The authors would like to acknowledge Kristine Votova, Ph.D., the project manager for the SpecTRA Research Project and the Island Health clinical research team at the Stroke Rapid Assessment Unit for their support. Funding for the natural experiment in stroke care and the large-scale personalized medicine for mass spectrometry in rapid TIA triage comes from Canadian Institute of Health Research (2009–2012) and Genome Canada/BC (2013–2017).

Author information

Authors and Affiliations

Department of Computer Science, University of Victoria, Victoria, BC, Canada
Elham Sedghi, Jens H. Weber & Alex Thomo
SpecTRA Research Project, Vancouver Island Health Authority, Victoria, BC, Canada
Maximilian Bibok & Andrew M. W. Penn

Authors

Elham Sedghi
View author publications
You can also search for this author in PubMed Google Scholar
Jens H. Weber
View author publications
You can also search for this author in PubMed Google Scholar
Alex Thomo
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Bibok
View author publications
You can also search for this author in PubMed Google Scholar
Andrew M. W. Penn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elham Sedghi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sedghi, E., Weber, J.H., Thomo, A. et al. Mining clinical text for stroke prediction. Netw Model Anal Health Inform Bioinforma 4, 16 (2015). https://doi.org/10.1007/s13721-015-0090-5

Download citation

Received: 26 April 2015
Revised: 26 June 2015
Accepted: 30 June 2015
Published: 14 July 2015
DOI: https://doi.org/10.1007/s13721-015-0090-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining clinical text for stroke prediction

Abstract

Access this article

Similar content being viewed by others

Detecting Automatic Patterns of Stroke Through Text Mining

Using natural language processing to automatically classify written self-reported narratives by patients with migraine or cluster headache

Assessing stroke severity using electronic health record data: a machine learning approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining clinical text for stroke prediction

Abstract

Access this article

Similar content being viewed by others

Detecting Automatic Patterns of Stroke Through Text Mining

Using natural language processing to automatically classify written self-reported narratives by patients with migraine or cluster headache

Assessing stroke severity using electronic health record data: a machine learning approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation