Abstract
Substance abuse poses many negative health risks. Tobacco use increases the rates of many diseases such as coronary heart disease and lung cancer. Clinical notes contain rich information detailing the history of substance abuse from caregivers perspective. In this work, we present our work on automatic identification of substance abuse from clinical text. We created a publicly available dataset that has been annotated for three types of substance abuse including tobacco, alcohol, and drug, with 7 entity types per event, including status, type, method, amount, frequency, exposure-history and quit-history. Using a combination of machine learning and natural language processing approaches, our results on an unseen test set range from 0.51–0.58 F1 on stringent, full event, identification, and from 0.80–0.91 F1 for identification of the substance abuse event and status. These results indicate the feasibility of extracting detailed substance abuse information from clinical records.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anand, P., Kunnumakara, A.B., Sundaram, C., et al.: Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25(9), 2097–2116 (2008)
Srivastava, R.: Complicated lives – taking the social history. NEJM 265(7), 587–589 (2011)
Melton, G.B., Manaktala, S., Sarkar, I.N., Chen, E.S.: Social and behavioral history information in public health datasets. In: AMIA Annual Symposium Proceedings 2012, pp. 625–634 (2012)
Uzuner, Ö., Goldstein, I., Luo, Y., Kohane, I.: Identifying patient smoking status from medical discharge records. J. Am. Med. Inform. Assoc. 15(1), 15–24 (2008)
Cohen, A.M.: Five-way smoking status classification using text hot-spot identification and error-correcting output codes. J. Am. Med. Inform. Assoc. 15(1), 32–35 (2008)
Clark, C., Good, K., Jezierny, L., Macpherson, M., Wilson, B., Chajewska, U.: Identifying smokers with a medical extraction system. J. Am. Med. Inform. Assoc. 15(1), 36–39 (2008)
Jonnagaddala, J., Dai, H.J., Ray, P., Liaw, S.T.: A preliminary study on automatic identification of patient smoking status in unstructured electronic health records. In: ACL-IJCNLP 2015, pp. 147–151, 30 July 2015
Carter, E.W., Sarkar, I.N., Melton, G.B., Chen, E.S.: Representation of drug use in biomedical standards, clinical text, and research measures. In: AMIA Annual Symposium Proceeding 2015, pp. 376–385 (2015)
Chen, E., Garcia-Webb, M.: An analysis of free-text alcohol use documentation in the electronic health record: early findings and implications. Appl. Clin. Inform. 5(2), 402–415 (2014)
Wang, Y., Chen, E.S., Pakhomov, S., Arsoniadis, E., Carter, E.W., Lindemann, E., Sarkar, I.N., Melton, G.B.: Automated extraction of substance use information from clinical texts. In: AMIA Annual Symposium Proceeding 2015, pp. 2121–2130, 5 November 2015
Tepper, M., Capurro, D., Xia, F., Vanderwende, L., Yetisgen-Yildiz, M.: Statistical section segmentation in free-text clinical records. In: Proceedings of LREC, Istanbul, May 2012
Millet, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Bejan, C.A., Vanderwende, L., Xia, F., Yetisgen-Yildiz, M.: Assertion modeling and its role in clinical phenotype identification. J. Biomed. Inform. 46(1), 68–74 (2013)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of CONLL at HLT-NAACL, pp. 188–191 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yetisgen, M., Vanderwende, L. (2017). Automatic Identification of Substance Abuse from Social History in Clinical Text. In: ten Teije, A., Popow, C., Holmes, J., Sacchi, L. (eds) Artificial Intelligence in Medicine. AIME 2017. Lecture Notes in Computer Science(), vol 10259. Springer, Cham. https://doi.org/10.1007/978-3-319-59758-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-59758-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59757-7
Online ISBN: 978-3-319-59758-4
eBook Packages: Computer ScienceComputer Science (R0)