Combining Structured and Free Textual Data of Diabetic Patients’ Smoking Status

Nikolova, Ivelina; Boytcheva, Svetla; Angelova, Galia; Angelov, Zhivko

doi:10.1007/978-3-319-44748-3_6

Ivelina Nikolova¹⁵,
Svetla Boytcheva¹⁵,
Galia Angelova¹⁵ &
…
Zhivko Angelov¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9883))

Included in the following conference series:

International Conference on Artificial Intelligence: Methodology, Systems, and Applications

1350 Accesses

Abstract

The main goal of this research is to identify and extract risk factors for Diabetes Mellitus. The data source for our experiments are 8 mln outpatient records from the Bulgarian Diabetes Registry submitted to the Bulgarian Health Insurance Fund by general practitioners and all kinds of professionals during 2014. In this paper we report our work on automatic identification of the patients’ smoking status. The experiments are performed on free text sections of a randomly extracted subset of the registry outpatient records. Although no rich semantic resources for Bulgarian exist, we were able to enrich our model with semantic features based on categorical vocabularies. In addition to the automatically labeled records we use the records form the Diabetes register that contain diagnoses related to tobacco usage. Finally, a combined result from structured information (ICD-10 codes) and extracted data about the smoking status is associated with each patient. The reported accuracy of the best model is comparable to the highest results reported at the i2b2 Challenge 2006. These method is ready to be validated on big data after minor improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Aramaki, E., Imai, T., Miyo, K., Ohe, K.: Patient status classification by using rule based sentence extraction and BM25 kNN-based classifier. In: i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data (2006)
Google Scholar
Boytcheva, S., Angelova, G., Angelov, Z., Tcharaktchiev, D.: Text mining and big data analytics for retrospective analysis of clinical texts from outpatient care. Cybern. Inf. Technol. 15(4), 58–77 (2015)
Google Scholar
Boytcheva, S., Angelova, G., Angelov, Z., Tcharaktchiev, D.: Mining clinical events to reveal patterns and sequences. In: Margenov, S., Angelova, G., Agre, G. (eds.) Innovative Approaches and Solutions in Advanced Intelligent Systems. Studies in Computational Intelligence, vol. 648, pp. 95–111. Springer, Heidelberg (2016)
Chapter Google Scholar
Clark, C., Good, K., Jezierny, L., Macpherson, M., Wilson, B., Chajewska, U.: Identifying smokers with a medical extraction system. J. Am. Med. Inform. Assoc. 15, 36–39 (2008)
Article Google Scholar
Cohen, A.M.: Five-way smoking status classification using text hot-spot identification and error-correcting output codes. J. Am. Med. Inform. Assoc. 15, 32–35 (2008)
Article Google Scholar
Cohen, K.B., Demner-Fushman, D.: Biomedical Natural Language Processing, vol. 11. John Benjamins Publishing Company, Amsterdam (2014)
Book Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
International Classification of Diseases and Related Health Problems 10th Revision. http://apps.who.int/classifications/icd10/browse/2015/en
Jonnagaddala, J., Dai, H.-J., Ray, P., Liaw, S.-T.: A preliminary study on automatic identification of patient smoking status in unstructured electronic health records. In: ACL-IJCNLP 2015, pp. 147–151 (2015)
Google Scholar
Laurence, A.: AntWordProfiler (Version 1.4.0w) (Computer software). Waseda University, Tokyo, Japan (2014). http://www.laurenceanthony.net/
Nakov, P.: BulStem : Design and evaluation of inflectional stemmer for Bulgarian. In: Proceedings of Workshop on Balkan Language Resources and Tools (1st Balkan Conference in Informatics) (2003)
Google Scholar
Nikolova, I., Tcharaktchiev, D., Boytcheva, S., Angelov, Z., Angelova, G.: Applying language technologies on healthcare patient records for better treatment of Bulgarian diabetic patients. In: Agre, G., Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O. (eds.) AIMSA 2014. LNCS, vol. 8722, pp. 92–103. Springer, Heidelberg (2014)
Google Scholar
Osenova, P., Simov, K.: Using the linguistic knowledge in BulTreeBank for the selection of the correct parses. In: Proceedings of The Ninth International Workshop on Treebanks and Linguistic Theories, Tartu, Estonia, pp. 163–174 (2010)
Google Scholar
Rice, D., Kocurek, B., Snead, C.A.: Chronic disease management for diabetes: Baylor Health Care System’s coordinated efforts and the opening of the Diabetes Health and Wellness Institute. Proc. (Bayl. Univ. Med. Cent.) 23, 230–234 (2010)
Article Google Scholar
Stubbs, A., Uzuner, Ö.: Annotating risk factors for heart disease in clinical narratives for diabetic patients. J. Biomed. Inform. 58, S78–S91 (2015)
Article Google Scholar
Uzuner, Ö., Goldstein, I., Luo, Y., Kohane, I.: Identifying patient smoking status from medical discharge records. J. Am. Med. Inform. Assoc.: JAMIA 15(1), 14–24 (2008)
Article Google Scholar
Wiley, L.K., Shah, A., Xu, H., Bush, W.S.: ICD-9 tobacco use codes are effective identifiers of smoking status. J. Am. Med. Inform. Assoc. 20(4), 652–658 (2013)
Article Google Scholar

Download references

Acknowledgements

This study is partially financed by the grant DFNP-100/04.05.2016 “Automatic analysis of clinical text in Bulgarian for discovery of correlations in the Diabetes Registry” with the Bulgarian Academy of Sciences.

Author information

Authors and Affiliations

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, 25A Acad. G. Bonchev Str., 1113, Sofia, Bulgaria
Ivelina Nikolova, Svetla Boytcheva & Galia Angelova
ADISS Ltd., 4 Hristo Botev Blvd., 1463, Sofia, Bulgaria
Zhivko Angelov

Authors

Ivelina Nikolova
View author publications
You can also search for this author in PubMed Google Scholar
Svetla Boytcheva
View author publications
You can also search for this author in PubMed Google Scholar
Galia Angelova
View author publications
You can also search for this author in PubMed Google Scholar
Zhivko Angelov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivelina Nikolova .

Editor information

Editors and Affiliations

Winston-Salem State University, Winston Salem, North Carolina, USA
Christo Dichev
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
Gennady Agre

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nikolova, I., Boytcheva, S., Angelova, G., Angelov, Z. (2016). Combining Structured and Free Textual Data of Diabetic Patients’ Smoking Status. In: Dichev, C., Agre, G. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. Lecture Notes in Computer Science(), vol 9883. Springer, Cham. https://doi.org/10.1007/978-3-319-44748-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-44748-3_6
Published: 18 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44747-6
Online ISBN: 978-3-319-44748-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics