Skip to main content

Automatic Identification of Substance Abuse from Social History in Clinical Text

  • Conference paper
  • First Online:
Artificial Intelligence in Medicine (AIME 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10259))

Included in the following conference series:

Abstract

Substance abuse poses many negative health risks. Tobacco use increases the rates of many diseases such as coronary heart disease and lung cancer. Clinical notes contain rich information detailing the history of substance abuse from caregivers perspective. In this work, we present our work on automatic identification of substance abuse from clinical text. We created a publicly available dataset that has been annotated for three types of substance abuse including tobacco, alcohol, and drug, with 7 entity types per event, including status, type, method, amount, frequency, exposure-history and quit-history. Using a combination of machine learning and natural language processing approaches, our results on an unseen test set range from 0.51–0.58 F1 on stringent, full event, identification, and from 0.80–0.91 F1 for identification of the substance abuse event and status. These results indicate the feasibility of extracting detailed substance abuse information from clinical records.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anand, P., Kunnumakara, A.B., Sundaram, C., et al.: Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25(9), 2097–2116 (2008)

    Article  Google Scholar 

  2. Srivastava, R.: Complicated lives – taking the social history. NEJM 265(7), 587–589 (2011)

    Article  Google Scholar 

  3. Melton, G.B., Manaktala, S., Sarkar, I.N., Chen, E.S.: Social and behavioral history information in public health datasets. In: AMIA Annual Symposium Proceedings 2012, pp. 625–634 (2012)

    Google Scholar 

  4. Uzuner, Ö., Goldstein, I., Luo, Y., Kohane, I.: Identifying patient smoking status from medical discharge records. J. Am. Med. Inform. Assoc. 15(1), 15–24 (2008)

    Article  Google Scholar 

  5. Cohen, A.M.: Five-way smoking status classification using text hot-spot identification and error-correcting output codes. J. Am. Med. Inform. Assoc. 15(1), 32–35 (2008)

    Article  Google Scholar 

  6. Clark, C., Good, K., Jezierny, L., Macpherson, M., Wilson, B., Chajewska, U.: Identifying smokers with a medical extraction system. J. Am. Med. Inform. Assoc. 15(1), 36–39 (2008)

    Article  Google Scholar 

  7. Jonnagaddala, J., Dai, H.J., Ray, P., Liaw, S.T.: A preliminary study on automatic identification of patient smoking status in unstructured electronic health records. In: ACL-IJCNLP 2015, pp. 147–151, 30 July 2015

    Google Scholar 

  8. Carter, E.W., Sarkar, I.N., Melton, G.B., Chen, E.S.: Representation of drug use in biomedical standards, clinical text, and research measures. In: AMIA Annual Symposium Proceeding 2015, pp. 376–385 (2015)

    Google Scholar 

  9. Chen, E., Garcia-Webb, M.: An analysis of free-text alcohol use documentation in the electronic health record: early findings and implications. Appl. Clin. Inform. 5(2), 402–415 (2014)

    Article  Google Scholar 

  10. Wang, Y., Chen, E.S., Pakhomov, S., Arsoniadis, E., Carter, E.W., Lindemann, E., Sarkar, I.N., Melton, G.B.: Automated extraction of substance use information from clinical texts. In: AMIA Annual Symposium Proceeding 2015, pp. 2121–2130, 5 November 2015

    Google Scholar 

  11. Tepper, M., Capurro, D., Xia, F., Vanderwende, L., Yetisgen-Yildiz, M.: Statistical section segmentation in free-text clinical records. In: Proceedings of LREC, Istanbul, May 2012

    Google Scholar 

  12. Millet, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  13. Bejan, C.A., Vanderwende, L., Xia, F., Yetisgen-Yildiz, M.: Assertion modeling and its role in clinical phenotype identification. J. Biomed. Inform. 46(1), 68–74 (2013)

    Article  Google Scholar 

  14. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of CONLL at HLT-NAACL, pp. 188–191 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meliha Yetisgen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yetisgen, M., Vanderwende, L. (2017). Automatic Identification of Substance Abuse from Social History in Clinical Text. In: ten Teije, A., Popow, C., Holmes, J., Sacchi, L. (eds) Artificial Intelligence in Medicine. AIME 2017. Lecture Notes in Computer Science(), vol 10259. Springer, Cham. https://doi.org/10.1007/978-3-319-59758-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59758-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59757-7

  • Online ISBN: 978-3-319-59758-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics