Abstract
Mining medical data has significantly gained interest in the recent years thanks to the advances in data mining and machine learning fields. In this work, we focus on a challenging issue in medical data mining: automatic diagnosis code assignment to discharge summaries, i.e., characterizing patient’s hospital stay (diseases, symptoms, treatments, etc.) with a set of codes usually derived from the International Classification of Diseases (ICD). We cast the problem as a machine learning task and we experiment some recent approaches based on the probabilistic topic models. We demonstrate the efficiency of these models in terms of high predictive scores and ease of result interpretation. As such, we show how topic models enable gaining insights into this field and provide new research opportunities for possible improvements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
Source codes from: http://www.cs.cmu.edu/~chongw/slda/ (sLDA) and https://github.com/myleott/JGibbLabeledLDA/ (labeledLDA).
- 4.
References
Blei, D.M., Mcauliffe, J.D.: Supervised topic models. In: Advances in Neural Information Processing Systems (NIPS 2007), Vancouver, Canada, pp. 121–128. Curran Associates, Inc. (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. (JMLR) 3, 993–1022 (2003)
Cerri, R., De Carvalho, A.C.P.L.F., Freitas, A.A.: Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification. Intell. Data Anal. 15(6), 861–887 (2011)
Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. In: BMC Bioinformatics, vol. 9(Suppl. 3), p. S10 (2008)
Goldstein, I., Arzrumtsyan, A., Uzuner, O.: Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. In: Proceedings of AMIA Symposium (AMIA 2007), pp. 279–283 (2007)
Jagarlamudi, J., Daumé III, H., Udupa, R.: Incorporating lexical priors into topic models. In: Proceedings of the European Chapter of the ACL (EACL 2012), Avignon, France, pp. 204–213. ACL (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, USA, pp. 1106–1114. NIPS (2012)
Lin, C., He, Y., Everson, R., Ruger, S.: Weakly supervised joint sentiment-topic detection from text. IEEE Trans. Knowl. Data Eng. (TKDE) 24(6), 1134–1145 (2012)
Lita, L.V., Yu, S., Niculescu, S., Bi, J.: Large scale diagnostic code classification for medical patient records. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India, pp. 877–882. ACL (2008)
Medori, J., Fairon, C.: Machine learning and features selection for semi-automatic ICD-9-CM encoding. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents (Louhi 2010), Los Angeles, CA, USA, pp. 84–89. ACL (2010)
Metais, E., Nakache, D., Timsit, J.-F.: Automatic classification of medical reports, the CIREA project. In: Proceedings of the 5th WSEAS International Conference on Telecommunications and Informatics (TELE-INFO 2006), Istanbul, Turkey, pp. 354–359. WSEAS (2006)
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (2015)
Perotte, A., Bartlett, N., Wood, F., Elhadad, N.: Hierarchically supervised latent Dirichlet allocation. In: Advances in Neural Information Processing Systems (NIPS 2011), Granada, Spain, pp. 2609–2617 (2011)
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. (JAMIA) 21(2), 231–237 (2014)
Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (BioNLP 2007), Prague, Czech Republic, pp. 97–104. ACL (2007)
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA : a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), August, Singapore, pp. 248–256. ACL, Singapore (2009)
Ruch, P., Gobeilla, J., Tbahritia, I., Geissbühlera, A.: From episodes of care to diagnosis codes: automatic text categorization for medico-economic encoding. In: Proccedings of the AMIA Symposium (AMIA 2008), Washington D.C., USA, pp. 636–640 (2008)
Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39, 952–960 (2011)
Therneau, T., Atkinson, B., Ripley, B.: rpart: Recursive Partitioning and Regression Trees (2015)
Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_6
Zhang, Y.: A hierarchical approach to encoding medical concepts for clinical notes. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop (HLT-SRWS 2008), Columbus, OH, USA, pp. 67–72. ACL (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Dermouche, M., Velcin, J., Flicoteaux, R., Chevret, S., Taright, N. (2018). Supervised Topic Models for Diagnosis Code Assignment to Discharge Summaries. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-75487-1_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)